The lessons I learned from SpringerLink if i was making a website from scratch again

01 July 2015

SpringerLink is the biggest project I’ve been involved with. It was completely different to every other project in terms of scale and complexity. It was my first real introduction to agile software development and writing software that was actually used by lots of people.

I learnt a lot of random lessons from working on a system developed by hundreds of people and I thought it was an interesting thought experiment to think about what I would do if I were to start again

So in no particular order…

Agile

We were at our fastest when we were developing simple, well understood features that could be done in a week. Even if it’s hard it’s worth the effort to try and trim the fat off a feature. Usually features that take > 2 weeks don’t go well.
Done is when it’s live, if you have a big pile of stuff that’s no longer worked on but is on a branch or not released than that is a huge problem that gets worse with every commit. The reason is to do with feedback loops. If you find a problem in a new feature it becomes more and more difficult for a team to fix it the longer it’s left alone.
We like to see stories going from left to right (i.e, analysis, in dev, qa, live, etc) but it is not the end of the world if stories move backwards. In fact, that usually means you’ve learnt something. At times I have seen situations where stories have been “not ready” for a very long time. Some upfront analysis is important to cut waste but there comes a point where you just need to try it and then you’ll learn rather than speculate.
Continuous delivery (i.e green build = live) is a really nice thing. It’s a shame that releases are still a “thing” for us. The moment software starts going to live on successful builds is the moment people take more ownership of code and you get feedback.
Remember that agile development is meant to be iterative. Get basic skeletal features to live and then improve them. Don’t turn it into a waterfall project delivering in increments.
UX research should not be a recipe for paralysis. If you don’t know, take a punt and then test. If you are in a state where you think changing UX would be too hard so you need to do a lot of upfront research then maybe something else is wrong. One of the reasons why Agile came about is because upfront analysis is almost never as good as build, measure, learn. This does apply to UX.
Tech tasks are important, but can expand to fill the volume you give them. Make the developers feel exicted and interested in the project as a whole so they can strike the right balance in their own minds.
“YAGNI” is a useful stick but don’t use it too often. My rule of thumb in general is do the minimum to get the minimum functionality and at that point you can make a more informed choice as to how you gild the lily.
Actually measure build time and don’t let it get awful. Report it, every week.
Refactor. If the business doesn’t let you, just fucking quit. Also refactoring shouldn’t be a thing. What I mean by that is developers should be empowered to write good software and not have to report every little detail.
Don’t have stats on a new feature? Why not? How do you know a feature is successful?
Why aren’t you deleting features?
NFRs (Non-functional requirements) are important and should be considered for every story, it doesn’t need to take up a lot of time. Too often we have gone live with something without logging and metrics to help support the change.
Retrospectives are really useful but can sometimes go down focusing on a small issue which ends up being blown out of proportion. Needs strong facilitation with someone who has good context on the group. Everything that goes awry does not need a process change. It’s like dropping a footballer every time he misses a pass.
On the other hand, especially in a stable team retrospectives can become too “safe”, not tackling real incumbent issues that maybe the team takes for granted. You should always be asking yourself how can we go faster/better?. A good facilitator can really help this, usually by structuring the retro in a different way to try and make a team think about things differently. Have a look through Game Storming for ideas.
Meetings are important and can be very productive but they need good facilitation and most importantly a focused goal. I have been in too many meetings that have drifted and dived into irrelevant tangents.
Everyone wants a culture of developers “just fucking do it”, but that requires trust in devs and less micromanagement of their time. This contradicts “process” sometimes and is a hard balance to get right.
If your PM insists on story points, dont make it an issue - but don’t report them upwards as it just causes games which is an inefficiency.
That being said, everyone ultimately knows estimates are bollocks. Just refine your process and measure how long real features take to deliver and that gives you your cadence for future planning.
Why are you estimating work that won’t happen for > 6 months? It’s a waste of time for everyone.
Why are you talking about work at any level of detail that won’t happen for 6 months?
Document decisions somehow. 2 years down the line you will struggle to justify your decisions but you probably were doing the most rational thing for the given situation/knowledge.
Stakeholders need to feel involved, informed and consulted. But it is the product team’s job to make the detailed decisions.
Process change is sometimes important, but you need to stick with it for longer than a month to understand if it works. Constant process change just pisses everyone off and slows everything down.
“Process” is often synonymous with micromanagement (in my brief experience) which most developers hate. They almost always boil down to trying to paper over cracks in communication. With an engaged and enthusiastic team who understand the value of all roles and the vision of the product you will get all this for free.
It’s so much harder to refactor code 3 months later; even if the code around it doesn’t change because people and context change. Don’t pay lip service to this, this has cost us lots of time and effort. Just give space between stories to tidy up properly and it will save lots of money
Related: “big refactors” are very dangerous things, especially when new people come in and want to change everything to suit the paradigms they are used to. How do you measure the success of this? That’s your warning, for literally everything in software. Pick your measurements carefully too. If you speed up the build but make the code hard to follow, that is not a success.
Measuring success is hard but it doesn’t have to necessarily be a number so long as you’re honest with yourselves and have a good idea of what success is. We have done work which we know has helped in the long run but you can’t possibly put a number to it. The trouble happens when there isn’t a consensus as to what success looks like, which hurts motivation and is wasteful.
Even though you’re developing software incrementally there should be an overall plan in terms of technical direction. In addition you must include everyone or people will absolutely feel left out and will resent that.
Performance can become a second class citizen which gets delayed until the end of a milestone. Try and incorporate performance metrics sooner rather than later.
As a developer working closely with your BAs and QAs is invaluable. Don’t become one of those devs that just mindlessly picks up stories without talking to people. Remember it is everyone’s responsibility to deliver features end to end. You can help clarify the goal of a story with the BA and QA. You can help make the story easy to test. You can help the BA use technology to measure the value of the story. It shouldn’t be a chore, you’re wrong to think it is and you’re wrong to think it takes too much time.
Put less diplomatically: - ALL FUCKING TALK TO EACH OTHER ABOUT THE WORK. SERIOUSLY. WHY WOULD A STORY BE PLAYED WHEN NO ONE HAS SAT TOGETHER AND TALKED ABOUT IT AS A GROUP. WHYYYY?

People

The right amount and kind of pressure is good. Too much is demoralising and destructive. None at all is also demoralising, and will lead to drift. Set near-term customer-facing objectives that are just challenging enough.
Motivation is hard but visibility of value being delivered really helps.
Don’t be a bastard. People (including yourself) take shortcuts for legit reasons a lot of the time. Most especially everyone (you) makes mistakes or is less experienced with what’s going on. Everything is fixable, eventually. Only be a bastard to people who won’t listen.
A noisy, jolly standup is not necessarily indicative of a productive team. Neither is a quiet one. Don’t judge a team on such superficial things. What does their product owner think?
In the same way, one team works with paradigm/language/framework X. You use Y. Why do you care? Are you all delivering good software? That’s all that matters.
You’re probably doing better than you think as it’s easy to focus on the negatives. Make sure you take the time to celebrate the good stuff the team has done, especially in an organisation with lots of teams.

Technical

Read The 12 factor app. It helps you write software that works beyond your desktop.
Be really strict with statefulness. Things like feature toggles, properties and stuff should either require a server restart or be managed on a request based level. This makes testing simpler and the code much easier to write. If you see tests which are setting and unsetting things, then your server has state. Stop it.
Just use environment variables for config. You don’t need any more complexity than that most of the time. Managing configuration outside the app then becomes language and platform agnostic. Your software is then more likely to be compatible with most PaaS solutions.
No logic in templates, seriously. This has caused us no end of pain.
Don’t use some complicated build tool to do basic things bash can do. Such as… making a zip file.
If your template language doesn’t look like HTML, don’t use it. It might seem fancy and, darn closing those tags is annoying but having to mentally parse template files years down the line is really painful.
Controllers should all look the same. They have some dependencies injected in. Each endpoint parses a request, calls a service and then calls a template with a view model. Anything else is bullshit and is a barrier to understand how particular requests in your system are processed.
Beware of The Onion. Code should read in a declarative way so you understand on a broad level what is happening. This doesn’t mean don’t do encapsulation. Encapsulation means hide how something happens. If it’s hard to understand what is going on then you just need to fix it. The Onion is when you have to dart between a lot of files just to understand what is happening in a system and eventually you have a stack overflow in your brain where you cant handle the call chain anymore.
Logging/events. Don’t have static imports, services should delegate to a “listener”. Logging/metrics is important and should be treated as something that needs to be testing. It seems odd at first but really it isn’t much effort at all and then you can be confident of finding information at 3am when everything blows up.
Trying to encapsulate loads of behaviours in libraries is almost always a bad idea. Saying “doing HTTP and getting circuit breaking, metrics, exceptions yada yada should be standard” is nonsense. Even if it is standard, some day it won’t fit into what we need to do. Composition FTW.
Try not to couple different concerns into one piece of data. In our case having “has Access” coupled with the content meant we cannot cache it.
Follow general good OSS principles. It should be clear how to build the software from a README.
If you use a whacky language like Scala where it’s not very opinionated then a CONTRIBUTING.txt is a useful way to get to a consensus as to how your code should look.
Really think about how you work and what inefficiencies there are and build tooling to overcome it. It was silly how long it took us to have a tool to copy a document from live on to the local box.
If you’re using a statically typed language refuse to use “escape-hatches”. In Scala doing things like “asInstanceOf” is a real code smell and is asking for a runtime error to occur.
If you have a language with a half-decent type system, use it! Code re-use and maintainability can be significantly improved by reflecting everything in the type system. The moment you dont have this (exceptions,nulls,etc) is the moment you have to employ "defensive" (paranoid) programming.
Feature toggles are really useful for degrading the system gracefully in live and also can help testing. In the past we have suffered from having to set specific access rights to test features that have little to do with access rights. If only we had an “all access” feature those tests would be quicker to run and simpler to write.
Making feature toggles a request level concern means you can do things like set a cookie to test a feature in live.
As a functional programmer I really love purity and you can still apply these principles in imperative languages but also in services (i.e APIs). Idempotency is a really powerful thing which makes error handling and general understanding much better. You can always be confident of doing a PUT request more than once wont have some crazy side-effect. Strive to make the endpoints of your APIs as idempotent as possible.

Tests

Make sure everyone is onboard with what kinds of tests you do and what they’re responsible for. For instance:
- Unit tests should be really fast and should not be reliant on anything external (file system, network, etc). Start your project by running these in parallel and then that puts a good restriction as to what you can do. Trying to make them parallelisable a year later is often already a lost cause.
- Integration tests should be testing how your system communicates with another and that’s all. Resist the temptation to start re-testing the downstream service, which complicates your mocks. They should only test “does the system do the correct request and can it parse the responses it might get. Separate out everything else.
- Functional tests are expensive in terms of time and maintenance and therefore should be kept to a minimum. Try and cover key scenarios and the rest should be tested lower down in unit and integration tests. Do they need to test specific rendering of things? Should your functional test fail if the markup changes from a <strong> to an <em>?
Functional tests don’t have to be done through a browser. Why not make your page have a JSON equivalent view that is easier to test. That will also prove you have decoupled your data from your view (i.e a View Model).
Just use fake servers rather than a generic stub framework. The lack of flexibility with fake servers is usually a good thing. You should not have logic in your fakes.
Run your consumer driven contracts against both your fakes and the real downstream service. That way you can be sure your integration tests reflect reality. Perhaps you should use something awesome like mocking-jay
Visualise your test pyramid and try and make it look like the one described by Martin Fowler
If lots of tests unrelated to your change fail then you have some nasty coupling going on there.