Lab Team: Audrey Stuart, Liza Tugangui, Blake Slattengren
Background
In this lab, we focused on the use of inferential statistics to find patterns between different sets of international data found on the UNEP Environmental Data Explorer. Inferential statistics is a way of comparing and testing different sets of sample data from different variables to see if there is anything we can conclude about the corresponding larger populations. In our group, we wanted to test if there was any correlations between infant mortality rates, percentage of citizens living in urban areas, or urbanization, and total area of forest coverage within a country. Using inferential statistics, we took all the countries that provided data, and grabbed a random sample of 100. From this sample, we hope to see correlations between trends and draw conclusions about the entire population of the world.
Procedure
With the help of UNEP Environmental Data Explorer, we did not have to gather our own data for this lab. We first explored the data sets provided and picked some variables that we thought would provide correlations and links to bigger ideas. The rate of infant mortality being the first variable, we used an independent sample t-test to find a comparison of means between the top and bottom quintiles of our sample of countries. We then wanted to see if there was a correlation in the trend of the rate of infant mortality among countries compared to the total number of citizens living in urban areas. This same procedure was done between all of our variables, both comparing the means of the data between the quintiles of the countries themselves, as well as tests to find correlating trends amongst each data set.
Results
Of the three variables tested, there was no correlation between urbanization and forest cover or between infant mortality and forest cover. We can conclude this because the p-values are greater than the established α of 0.05. This means we stick with the null hypothesis, that there is no correlation between the two variables. The one strong correlation we did find was a negative one between urbanization and infant mortality. The r-value for this was -0.628, which means a significant amount of the variation in infant mortality can be explained by the percent of the population living in urban areas.
Discussion
The results of the correlation tests make sense for the variables tested. The correlation between urbanization and infant mortality matches our hypothesis that more urbanized countries would have better access to health care, and thus lower infant mortality rates. Measuring the total area of forest cover will automatically favor larger countries, and is a vastly different type of variable than the other two. This discrepancy between units may be the reason we did not see a correlation. Therefore, the null hypotheses that infant mortality and urbanization do not depend on forest cover remains true.
Other questions for further study remain. For example, is there a relationship between urbanization and the percentage of forest cover for a country? This would be interesting to test as it seems more likely to yield a result than our test. Similarly, percentage of forest cover vs. infant mortality could also yield a stronger result. Additionally, for the correlation we did discover, it would be useful to test other variables, such as availability and quality of health care, in relation to infant mortality to gain a better understanding of what could cause this observed correlation.