In statistical hypothesis testing we use a p-value (probability value) to decide whether or not the sample provides strong evidence against the null hypothesis. The p-value is a numerical measure of the statistical significance of a hypothesis test. It tells us how likely it is that we could have gotten our sample data (., 10 heads) even if the null hypothesis is true (., fair coin). By convention, if the p-value is less than 5% (p < ), we conclude that the null hypothesis can be rejected (., the coin is not fair). In other words, when p < we say that the results are statistically significant, meaning we have strong evidence to suggest the null hypothesis is false.

Exploratory data analysis should be interpreted carefully. When testing multiple models at once there is a high chance on finding at least one of them to be significant, but this can be due to a type 1 error . It is important to always adjust the significance level when testing multiple models with, for example, a Bonferroni correction . Also, one should not follow up an exploratory analysis with a confirmatory analysis in the same dataset. An exploratory analysis is used to find ideas for a theory, but not to test that theory as well. When a model is found exploratory in a dataset, then following up that analysis with a confirmatory analysis in the same dataset could simply mean that the results of the confirmatory analysis are due to the same type 1 error that resulted in the exploratory model in the first place. The confirmatory analysis therefore will not be more informative than the original exploratory analysis. [37]