For those following my blog on my Data Analysis and Interpretation Specialization by Wesleyan University through Coursera, this is the final course and the Capstone project. Unlike previous courses, I will move away from urbanization data and try to tackle one of the problems provided by the course’s industry partner.
This is my introduction.
Below is our third assignment – the preliminary results.
Only the results for Burundi, Ethiopia, and Liberia will be reported, as the other countries demonstrated no change or very slight change in the ensure environmental sustainability index.
The following table shows the descriptive statistics for the Ensure Environmental Sustainability Index for each of the selected countries, starting from the lowest GDP per capita group to the highest.
The standard deviations are much greater for the lowest GDP per capita group compared to the others. In three countries, Seychelles, Canada, and Ireland, no change in the value of the index was observed. It would appear that countries that reach a certain GDP per capita will have achieved a mean Ensure Environmental Sustainability Index value above 0.9 and demonstrate little change.
The following graphs are the Ensure Sustainability Index for Burundi, Ethiopia, and Liberia:
Bivariate and Lasso Regression Analysis:
Lasso Regression was performed on each of the country’s ensure environmental sustainability index and their predictors. As Seychelles, Canada, and Ireland had index values that did not change, there were no observed correlations.
Each country demonstrated a different set of predictors that correlated with the ensure environmental sustainability index. However, in the low GDP per capita group, all three countries showed very strong correlations between fertility rate and the ensure environmental sustainability index (as demonstrated by the following graphs). The fertility rate predictor all had correlation coefficients on twice the order of magnitude compared to the other predictors.
The follow table shows the correlation coefficients for the fertility rate predictor along with the mean squared errors for both the training and test data sets.
In all three countries, as the fertility rate lowered, the ensure environmental sustainability index value rose. This main predictor accounts for above 90% of the variance observed in the ensure environmental sustainability index. However, the mean squared errors differed between the test and training data sets. This suggests that the predicative accuracy of the model lowered when applied to the test data set.