Practical ways to shift QA left

12 August 2023

“Inspection to improve quality is too late, ineffective, costly. Quality comes not from inspection, but from the improvement of the production process.”

― W. Edwards Deming, Out of the Crisis

Many prescribe shifting QA left, which sounds clever, but it's not very actionable advice, more an aspiration. This post describes concrete things quality assurance engineers can do to help increase quality and reduce waste within their teams.

The QA role in a continuous delivery and deployment world

The State of DevOps Report asserts that profitable, high-performing teams deploy software tens of times per day. The days of QAs signing off work before it is deployed are gone for elite teams.

Even with feature flags to decouple feature release and software deployment, the code is still being deployed, and it will affect the system's quality, even if it hasn't gone through the JIRA theatre of moving into the done column.

Nonetheless, quality is still essential! The report also asserts that high-performing teams have a low defect rate; throwing things into production and hoping for the best is insufficient. High-performing teams don't sink lots of time fixing bugs because the quality of their releases is excellent.

It's common for teams that are only releasing bi-weekly to have automated tests to give confidence that the system does what it's supposed to, which is good. However, as Deming said, this is still inspection after the work is done.

It's an improvement; at least it's automated, but these tests are rarely run in prod and say little about the system's quality, just that it behaves in specific ways in certain conditions in a non-production environment.

We only really appreciate the quality of a system, when it is in production and used by real people. Beyond trivial systems, what you'll discover can be very unpredictable and interesting. This is when you really start to understand the quality of your system and what needs improving.

The QA role must move away from inspection of features, which can be largely automated, and instead focus more on understanding what the system is doing in production.

Why is quality important?

Sounds obvious, but often teams (or management) make decisions to trade off quality for speed. This is a false economy. Speed and quality are correlated.

Quality is what helps teams continue to deliver, especially over time.

The Accelerate authors have data that shows significant correlations with much more important things.

For example, organizations made up of high-performing teams, based on this model, make more money than orgs that don’t. Here is data that says that there is a correlation between a development approach and the commercial outcome for the company that practices it.

It also goes on to dispel a commonly held belief that “you can have either speed or quality but not both.” This is simply not true. Speed and quality are clearly correlated in the data from this research.”

~~ Dave Farley, Modern Software Engineering

It is too familiar a story for teams to struggle with bugs, support requests and other issues around quality, reducing their capacity to evolve their system further. Many teams will be in this situation, frustrated that they don't have enough time to further iterate on their system to respond to customer demands.

As with all things around agile, DevOps, etc., they all have their roots in lean manufacturing. Lean prescribes a zero-defect culture and reduces waste by increasing quality. By reducing waste, we increase our capacity to do valuable work for the customer. This allowed Toyota et al. to outcompete their American competitors (who deferred quality checks to the end of the assembly line), building cars more quickly and cheaply.

The cost of fixing an issue and its impact increases the further right in the process, and the longer it is in production. Shift-left means a team will catch problems sooner, reduce waste, and increase capacity.

Teams should aim to have zero support requests landing on their plate. Too often, teams accept a normalisation of deviance and become a boiling frog, losing their ability to iterate and improve the product they work on.

A quick waste rant

As I mentioned in The Ghost of Henry Ford is ruining your development team, many managers think about lean in fairly short-sighted ways. They will focus on waste only through the lens of:

Only do work that generates "value".

This can take the form of:

Analysing stories very deeply, so no wasted effort is made on features that might not be valuable.
Deprioritising any tasks around technical debt, toil, etc

This does not work. Analysis paralysis is a form of waste in itself, and not letting teams improve the quality of their system also generates huge waste.

Moving on...

Shifting QA left, and improvement of the process.

The ultimate in shift-left are QAs influencing the formulation of user stories, where they are armed with a deep knowledge of how the system works, what problems commonly occur, its dependencies and so on.

The QAs are equal partners in prioritisation discussion, raising issues that should be addressed, affecting the system's quality and producing waste.

QAs often are too focused on the "outside" of the system (the features), but if they get involved in the innards by looking at logging, metrics, pair-programming and others, they can understand how the system works better. This will help them be more effective on the analytical side of work, allowing them to be more effective at quality recommendations such as "We should write a test for this scenario" and "What should happen if this fails?".

This helps QA distinguish their role from:

business analysts, whose job is to develop a deep understanding of the domain, but not necessarily how the system works on a technical level
product owners who understand the customers, and how the system can help them

...when analysing work right at the start of the value chain.

This will further help improve quality and increase capacity by reducing the chance of waste earlier in the pipeline.

Things to do

Metrics

Metrics give you an overall view of how your system performs in response to requests. When done well, they help you understand how the system works better and can give you warning signs of incoming problems.

This is the one gap I've noticed with many QA teams I've worked with; they are very focused on how a feature behaves but not how it performs. Slow performance is annoying for users and tends to result in more errors because of timeouts and resource contention. Beyond performance, error rates when calling APIs help you understand areas of your system where you need to add more resiliency.

Logs / Tracing

Metrics give you an overview of the system, whereas logs & tracing show specific events. Analysing them can help you identify concrete areas where the system hasn't behaved as expected, common failure patterns, etc.

The QA can then work with the developers to prioritise work to fix systemic issues that could otherwise get ignored, and end up pushing support costs.

The team should pursue a “zero-defect culture”, where the logs are silent unless there is an actionable error. Too often, systems will log errors that are common failures that cannot be acted on, and they end up creating noise, masking actual quality issues.

When you release code and it causes more errors, if you already have 100s, or thousands logged daily, you won't notice it, but your users might.

Tests, pipelines

Flaky tests are a clear quality issue. Engineers will often lazily re-run pipelines, but that is ignoring the systemic issues. Flaky tests can be signals that your system isn't as resilient as you think. QAs can help diagnose these problems to support the developers in fixing them.

QAs should be very aware of the testing strategy of the system and offer statistics and guidance when the suite is not fit for purpose. Maybe the 20-minute functional test suite had some value a few years ago, but you know it is stable, and we should remove some of them in favour of unit tests.

The speed and reliability of the pipeline are critical and are prevalent forms of waste. It also has a heavy influence on your lead time.

QAs and developers should feel ownership in making the pipeline as high quality as possible. It is akin to the factory worker in lean settings, ensuring it is as efficient as possible to reduce waste when manufacturing.

Automating other quality checks

Some tests are part of the development process, which, again, tend to show you that the feature works in given conditions, and other lower-level tests like unit tests. However, there are numerous different kinds of tests that QAs can introduce to the system to help ensure quality.

With their expert knowledge of the system's constraints, how it works, and what it needs to do, they can use their knowledge to tighten the feedback loop on the system's quality.

Other tests/checks QAs can introduce:

Performance (both backend and frontend)
Accessibility
Real user monitoring (RUM)

Trends with bugs, support

QAs should be involved with every support ticket and work with the rest of the team to determine what changes need to be made so the support never arrives on the development team's radar again.

Visibility on common support issues, bugs and so on will help you identify the quality hot spots in your system, to help you prioritise proper fixes to reduce the waste surrounding these activities.

Support tickets are a result of a quality failure. Of course, it has to be expected mistakes will be made; no system is perfect, and we'll never have perfect knowledge, but that shouldn't be used as an excuse not to strive for perfection.

Support issues often form before a line of code is written:

Insufficient
- domain knowledge
- customer knowledge
Not understanding the "real" problem to solve
Not thinking enough about non-happy path scenarios.

If everyone is on the ball, though and shifting left, they can be entirely avoided, but issues can also occur during the software being written and released.

Insufficient test automation around unexpected areas
Incorrect test strategy (e.g. loads of unit tests, no integration tests - meme)
Bad modelling allowing invalid domain states

Conclusion

This can be a touchy subject for some QAs as it can be perceived as making their role redundant. But I have seen first-hand how the good QAs who understand their role as "helping to ensure quality" and not "manual tester", can embrace this change and make their role not only more useful to the team, but also more fulfilling for themselves.

The post's title frames this as a QA issue, but quality is a team-wide issue in practice. Teams that write rubbish, ill-thought software and throw it over the wall to a QA department are doomed to create bad products with lots of waste issues.

If you are a team concerned with its capacity to do meaningful work, you need to be concerned with quality.

Poor quality introduces waste, which is a drag on your capacity. Productivity killers include bugs, support requests, noisy error logs, and slow and flaky pipelines. The further "right" these issues are, the more significant their impact and the costlier they are to fix.

Whilst I have written some specific things you can do, to truly be effective at this, you need to wear a lean hat, practice Kaizen and think about quality across the whole value chain of your work, which will always be context specific.

If something feels harder to do than it should, or you are frustrated at how you feel like you are working hard but not having the impact you'd hoped for, listen to that signal, and look for waste.