All posts tagged: World Economy

Data Analysis and Interpretation Capstone

So, this is the end. It took six months, but today I completed and was certified for the Data Analysis and Interpretation Specialization by Wesleyan University through Coursera. When I first started in October 2015, I had no idea how to write code in Python, let alone produce graphs and run statistical analysis. It has been a fun experience learning how to write code in Python and learning the different kinds of statistical methods. Ironically, I learned these after I left graduate school. One would think that these are method courses you would take in school. For the Capstone Project, I do wish the data was more complete and over a longer period of time. It is difficult to run analysis on data that only goes back as far as 1972 and in many cases, missing records for many years in between. The results can be quite misleading, as it pointed to fertility rate as being highly correlated with environmental sustainability. However, fertility rate, in many cases is contingent on many different factors that are both quantitative …

Capstone Project: Methods

For those following my blog on my Data Analysis and Interpretation Specialization by Wesleyan University through Coursera, this is the final course and the Capstone project. Unlike previous courses, I will move away from urbanization data and try to tackle one of the problems provided by the course’s industry partner. This is my introduction. Below is our second assignment – the data management and analysis methods. Methods Sample: Out of the 211 World Bank recognized sovereignties, 8 (N=8) were chosen for this study. Countries that has the Ensure Environmental Sustainability goal were selected: three countries with the lowest GDP per capita (Burundi, Ethiopia, Liberia), three countries with the highest GDP per capita (Canada, Ireland, United States), and two from the median (Estonia, Seychelles). In addition to identifying associations between variables and the four sustainability indicators, this selection was used to also investigate how variable relationships differ in countries with varying degrees of economic development. Each country, depending on available data, has between 26 to 43 indicators for analysis with 36 years of data from 1972 to …

Capstone: Variables Associated With Environmental Sustainability – A United Nations Millennium Development Goal

For those following my blog since the start of my Data Analysis and Interpretation Specialization by Wesleyan University through Coursera, this is the final course and the Capstone project. Unlike previous courses, I will move away from urbanization data and try to tackle one of the problems provided by the course’s industry partner. Below is our first assignment – the introduction to my final report. Variables Associated with Environmental Sustainability Using data provided by the World Bank, through DrivenData, this study looks to identify factors associated with the Environmental Sustainability Indicator defined as an United Nations Millennium Development Goal (MDG). Preliminary explanatory variables are Gross National Income, Forest Area, CO2 Emissions, Employment, Foreign Direct Investments, Household Final Consumption Expenditure, Adult Literacy Rate, Urban Population, Investments in Energy, and Energy Use. This mix of both economic and social factors will be examined for associations with the UN-MDG indicator of environmental sustainability. After the associated variables are identified, they will be used to create a model to predict data for the years 2008 and 2012. As a social/urban scientist interested …

Random Forests – Machine Learning

Machine Learning Data Analysis This is the second lesson of the fourth course of my Data Analysis and Interpretation Specialization by Wesleyan University through Coursera. If you have been following along with my work, you will know that I am interested in the relationship between urbanization and economic development and am posing the general question of whether urbanization drives economic growth? For this assignment, the goal is to create a random forest that identifies the varying importance of my explanatory variables: Urban Population, Urban Population Growth, GDP Growth, Population Growth, Employment Rate, and Energy Use per Capita in 2007. For my response variable, I created a categorical variable from GDP per Capita 2007. I separated the data into two levels, where GDP per Capita 2007 is lower than 10000 is 0 or low and where GDP per Capita 2007 is higher than 10000 is 1 or high. Just as in the last assignment, when my test sample is set at 40%, the result is 58 test samples and 85 training samples out of 143 total, with …

Decision Trees – Machine Learning

Machine Learning Data Analysis This is the start of the fourth course of my Data Analysis and Interpretation Specialization by Wesleyan University through Coursera. If you have been following along with my work, you will know that I am interested in the relationship between urbanization and economic development and am posing the general question of whether urbanization drives economic growth? Now, as I have started working, I do not have as much time. For this course, I decided to focus solely on Python, instead of both Python and SAS as in the past. I am not abandoning SAS but I will probably take the time to learn SAS after this course ends. For this assignment, the goal is to create a decision tree that correct classifies samples according to a binary, categorical response variable. For my response variable, I created a categorical variable from GDP per Capita 2007. I separated the data into two levels, where GDP per Capita 2007 is lower than 10000 is 0 or low and where GDP per Capita 2007 is …

Logistics Regression on Economic Development

Last lesson of Regression Modelling in Practice… If you have been following along with my work, you will know that I am interested in the relationship between urbanization and economic development and am posing the general question of whether urbanization drives economic growth? Through the past two courses, Data Analysis Tools and Data Management and Visualization, I looked at the correlation between urbanization and economic development and established that there was a correlation between urban population and GDP per capita. For this last assignment in the course Regression Modelling in Practice, I am again examining GDP per Capita as the response variable. I am using the new data set I created in the last assignment from Gapminer, which as  I explained, holds a more complete set of data if I used the year 2007 instead of 2010. As a logistic regression is performed on a categorical response variable with two levels and multiple explanatory variables, I had to bin GDP per Capita into two and recode them: 0 = Countries with a GDP per Capita less than …

Employment and Urbanization

Continuing with Regression Modelling in Practice… If you have been following along with my work, you will know that I am interested in the relationship between urbanization and economic development and am posing the general question of whether urbanization drives economic growth? Through the past two courses, Data Analysis Tools and Data Management and Visualization, I looked at the correlation between urbanization and economic development and established that there was a correlation between urban population and GDP per capita. For this assignment, I decided to look at another measure of economic development – employment rate. However, because data for 2010 is unavailable for some of the new variables I wanted to include, I decided to use data from the year 2007. It is the most recent year where I get the most data for all my variables. For each of the variables, I downloaded data directly from Gapminder and extracted the relevant information for 2007 and compiled a new CSV file. I define my response variable as Employment Rate in 2007. Now that my data …