*PNAS*^{1} that indicates your data are not so convincing.

Johnson began having second thoughts about the P-value while he was working as a statistician for clinical trials a few years ago. "Just looking at how hypothesis testing was being done, and how many drugs were passing through the 0.05 filter, it became apparent to me that there was a problem," Johnson recalled.

"I don't think the 0.05 value was ever rigorously tested or derived," Johnson said. He said the value arose in the early 1920s or 1930s when Ronald Fisher, a statistics pioneer, was doing some classical testing and arbitrarily decided he would regard a finding with a test statistic of 0.05 or less as significant. He then wrote Statistical Methods for Research Workers in which he proposed the 0.05 value. Johnson said, "Biologists have since made the mistake of interpreting the P-value as the probability that the null hypothesis is true."

If the P-value were based on the another hypothesis test, known as the Bayesian hypothesis test, then the number would represent the probability that the null hypothesis is true, Johnson said, but the P-value scientists generally use is based on the classical hypothesis. So Johnson modified the Bayesian procedures so that the rejection regions of the Bayesian approach match those in the classical hypothesis, so he could compare between the two.

Johnson's analysis shows that with a P-value of 0.05 there's a 17-25% chance that the null hypothesis is true. That's a probability he says is too high to reject a null hypothesis. Johnson recommends that a P-value of 0.005 be used for significant results, which implies a 2-4% chance that the null hypothesis is true. To declare highly significant findings, a P-value of 0.001 would imply that there was less than a 1% chance that the null hypothesis is true.

Johnson says the P-value problem is a primary reason for retractions and irreproducibility. "I think the situation is pretty serious and it's gone undetected so long because most of the experiments conducted in the biological sciences are never replicated," he said. "Journal editors should require a 0.005 P-value for publication and consumers of these statistics should realize that if a finding is based on a P-value of 0.05 that it is probably a false finding."

Johnson's research is supported by National Cancer Institute Award R01 CA158113.