How to run experiments properly

Featured on Hashnode

Innovation comes from trial and error. Scientific breakthroughs are often a result of countless experiments. Successful inventions are born out of countless attempts – just like the Edison light bulb.

It looks like the same principle can be applied to digital products. Luckily, in software, we have the true luxury of being able to run experiments in production and learn from them without a huge cost overhead. Compare it with the construction business where a building can be built only once, then possibly tweaked just a little, and all the learnings can be applied only in the next project. However, in startups and small companies, the experiments are surprisingly not that common – while in some large companies, the whole established product can be an automated A/B-test machine. Let me share some points which should be kept in mind while doing experiments to ensure a successful outcome. I hope that even if you are not doing experiments now, you will see why they can be useful for you, and what to pay attention to.

1. Have a clear hypothesis

This is the most important tip. Don’t even start unless you have a clear understanding of what the proposed change should bring, and ideally what the action plan should be after the experiment concludes.

Good example:

If we reduce the number of steps in the onboarding flow, more people will finish it and start using our service. If this turns out to be true, we will roll out the shorter flow.

Bad example:

If we redirect some of our app users to our website, more people will use our service.

Why bad: not really clear if more usage comes from the redirect itself – maybe it’s people who have to continue a critical task on web + there is no clear action plan – should we wind down the web version, should we prioritize the app, should we redirect on certain pages, etc.

2. Adjust for the stage of the product

If your product is rather established, you can run smaller experiments where each of them can move certain metrics. Then you iterate and gradually improve the whole thing. There is almost certainly no point in making drastic changes that can bring a lot of chaos in individual metrics.

If your product is still looking for product market fit, then basically every change can be big enough to steer the whole thing in a new direction, especially if there are not too many users. That means many decisions should be rather driven by your product vision and intuition, although you should still make informed decisions and measure the outcomes. And again, see the first point above – always have a clear hypothesis and an action plan.

3. Running a controlled experiment is better than not running it at all

Often some people on the team can be skeptical about some proposed changes, for example, having a new payment method can bring a lot of new purchases, however, this can be not enough to justify the flat fee that you’d need to pay for the integration. Running an experiment on a small number of users and proving it in a real-life scenario is much more valuable than doing such projections, especially if the internal conversation seems to be stuck, and there is no clear path forward. If the implementation cost is not massive, probably the best move here is to get buy-in from leadership and be clear about the hypothesis and the action plan depending on the outcome.

4. Have the right setup and tools

Always make sure that:

  • Test and control groups have consistent experience – i.e. a user from the test group will always have the test experience during the lifetime of the experiment.

  • Results are statistically significant. You can use some online tools to verify you get a proper result, not just random noise. Also keep in mind that the bigger the metric move you expect, the fewer participants you need to prove it – and vice versa.

  • Metrics are correctly calculated – meaning that you can reliably measure the outcome for the test vs control groups.

5. Be aware of other experiments that can affect your experiment results

Some experiments can be affected by external events – such as seasonality, an operating system update, etc. Other experiments can lead to different outcome because of other tests being run simultaneously. Try to avoid this by making the experiments either smaller or isolated from each other.

Example:

You introduce a new login screen, and also introduce Sign In with Google button.

Probably the best way here is to split into 4 independent groups (two-by-two: old/new login screen and with/without the Google button) and analyze accordingly.

6. Even negative results are good learning

Sometimes people are afraid that their experiments can lead to worse results. I’d say this is still a positive learning because if the experiment was run in a controlled environment with a clear hypothesis, you don’t affect real users much, and most importantly now you know that some of your ideas won’t work out, and you can safely put it away until better times. Just treat it as a lucky scenario vs doing the same change because someone strongly believed in it and didn’t even run an experiment.


If all these points are taken care of, then hopefully the experiments should provide useful learnings to make your product better. Please share any interesting experiments you’ve run and what eye-opening insights they’ve brought.