A bad experiment program is worse than none. None leaves you uncertain. A bad one makes you confident and wrong, and you ship the wrong thing fast.
The first source of lies is calling a test before it's done. You watch the numbers, they look good on day two, you ship. But early results swing wildly and the temptation to stop on a good swing is enormous. Set the sample size and duration before you start, and don't peek for a decision until you hit them.
The second is measuring the wrong thing. A test can move a surface metric while the metric that pays the bills sits flat or drops. Decide upfront which number actually decides the call, and hold to it even when a friendlier number looks better.
The third is ignoring the tests that fail. A program that only remembers its wins learns nothing. The failures are where the real information is, because they tell you something you believed was wrong.
At Atlassian I institutionalized the statistical discipline behind this so the conclusions held up. Boring rigor beats clever analysis. The goal isn't to win every test. It's to actually know which ones you won.