All posts tagged: SAS

Logistics Regression on Economic Development

Last lesson of Regression Modelling in Practice… If you have been following along with my work, you will know that I am interested in the relationship between urbanization and economic development and am posing the general question of whether urbanization drives economic growth? Through the past two courses, Data Analysis Tools and Data Management and Visualization, I looked at the correlation between urbanization and economic development and established that there was a correlation between urban population and GDP per capita. For this last assignment in the course Regression Modelling in Practice, I am again examining GDP per Capita as the response variable. I am using the new data set I created in the last assignment from Gapminer, which as  I explained, holds a more complete set of data if I used the year 2007 instead of 2010. As a logistic regression is performed on a categorical response variable with two levels and multiple explanatory variables, I had to bin GDP per Capita into two and recode them: 0 = Countries with a GDP per Capita less than …

Employment and Urbanization

Continuing with Regression Modelling in Practice… If you have been following along with my work, you will know that I am interested in the relationship between urbanization and economic development and am posing the general question of whether urbanization drives economic growth? Through the past two courses, Data Analysis Tools and Data Management and Visualization, I looked at the correlation between urbanization and economic development and established that there was a correlation between urban population and GDP per capita. For this assignment, I decided to look at another measure of economic development – employment rate. However, because data for 2010 is unavailable for some of the new variables I wanted to include, I decided to use data from the year 2007. It is the most recent year where I get the most data for all my variables. For each of the variables, I downloaded data directly from Gapminder and extracted the relevant information for 2007 and compiled a new CSV file. I define my response variable as Employment Rate in 2007. Now that my data …

Basic Regression on Urban Population Growth and GDP per Capita

Continuing with Regression Modelling in Practice… If you have been following along with my work, you will know that I am interested in the relationship between urbanization and economic development and am posing the general question of whether urbanization drives economic growth? Through the past two courses, Data Analysis Tools and Data Management and Visualization, I established that there was a correlation between urban population and GDP per capita. For this assignment, my primary explanatory variable is Urban Population Growth rate and response variable is GDP per capita, both figures are from 2010. This is my code in Python: import pandas import numpy import seaborn import matplotlib.pyplot as plt import statsmodels.formula.api as smf import statsmodels.stats.multicomp as multi gapminder = pandas.read_csv(‘Data1.csv’, low_memory=False) gapminder[‘GDP2010’] = gapminder[‘GDP2010’].replace(0,numpy.nan) gapminder[‘GDPGrowth2010’] = gapminder[‘GDPGrowth2010’].replace(0,numpy.nan) gapminder[‘UrbanPop2010’] = gapminder[‘UrbanPop2010’].replace(0,numpy.nan) gapminder[‘UrbanPopGrowth2010’] = gapminder[‘UrbanPopGrowth2010’].replace(0,numpy.nan) gapminder = gapminder[[‘Country’, ‘UrbanPop2010’, ‘UrbanPopGrowth2010’, ‘GDP2010’, ‘GDPGrowth2010’]] gapminder = gapminder.dropna() PopDes = gapminder[‘UrbanPopGrowth2010’].describe() print (PopDes) RegData = gapminder[[‘Country’, ‘UrbanPopGrowth2010’, ‘GDP2010’]] RegData[‘UrbanPopGrowth2010’] = RegData[‘UrbanPopGrowth2010’] – RegData[‘UrbanPopGrowth2010′].mean() print (RegData.describe()) UrbanReg = smf.ols(formula=’GDP2010 ~ UrbanPopGrowth2010′, data=RegData).fit() print (UrbanReg.summary()) seaborn.regplot(x=’UrbanPopGrowth2010′, y=’GDP2010’, fit_reg=True, data=RegData) plt.xlabel(‘Urban Population Growth …

The Moderating Variable

Last Lesson in Data Analysis Tools… If you have not read my previous posts, I am currently enrolled in a Data Analysis Specialization with Wesleyan University through Coursera. With data from Gapminder, I am exploring a broad and basic question: does urbanization drive economic growth? For those of you interested in reading my literature review to gain a background on this project, please visit this page. This is the last lesson in the Data Analysis Tools course. After analyzing for correlations between variables, this assignment focuses on moderating variables. A moderating variable is one that influences the strength and direction of the association between the explanatory and response variables. Last time, I established that there were correlations between the amount of urbanization, as measured by percentage of total population in cities with over 1 million people, urban population growth, and GDP per capita. Additionally, I found that there was a correlation between total populations in cities and urban population growth. I suspect that one of these two variables might be a moderating variable. I first looked at total …

Correlations! Urbanization and Economic Development in Rich and Poor Countries

Continuing with Data Analysis Tools… If you have not read my previous posts, I am currently enrolled in a Data Analysis Specialization with Wesleyan University through Coursera. With data from Gapminder, I am exploring a broad and basic question: does urbanization drive economic growth? For those of you interested in reading my literature review to gain a background on this project, please visit this page. Finally! Quantitative to quantitative variable analysis! This is the lesson I have been waiting for. With my interest in urbanization and economic development, the data I pulled from Gapminder are all quantitative. As I previously mentioned, I do not like categorizing quantitative data because I believe it introduces too much subjectivity. Unless the data is qualitative to begin with, it makes little sense to categorize data. Compared to the other types of correlation tests, Pearson’s Correlation was relatively easy to perform in both Python and SAS. I looked at the relationships between urbanization rate, as measured by both urban population growth rate and percentage of population in large cities with over 1 …

Chi-Square Testing…*Warning: It’s Painful*

Continuing with Data Analysis Tools… If you have not read my previous posts, I am currently enrolled in a Data Analysis Specialization with Wesleyan University through Coursera. With data from Gapminder, I am exploring a broad and basic question: does urbanization drive economic growth? For those of you interested in reading my literature review to gain a background on this project, please visit this page. For this assignment, I had to run Chi-Square tests on my variables. As always, both my Python and SAS codes are posted. Since all my data are quantitative, I had to first categorize them. Since I found a relationship between the absolute measure of urbanization (population in cities with over 1 million people) and GDP Growth rate, I decided to categorize GDP growth rate. Additionally, I wanted to see if there is a relationship between urbanization with the absolute measure of GDP  (GDP per capita). To categorize GDP per capita, I used cut-offs of 5000, 10000, and 100000 to produce three distinctive ranks whereby a country is poor if its GDP per …

A Revelation…Through ANOVA

Now that I finished the first course in the Data Analysis and Interpretations Specialization, this is the start of the second called Data Analysis Tools.  If you have not read my previous posts, I am currently enrolled in a Data Analysis Specialization with Wesleyan University through Coursera. With data from Gapminder, I am exploring a broad and basic question: does urbanization drive economic growth? For those of you interested in reading my literature review to gain a background on this project, please visit this page. Since I have been presenting my SAS work in the previous course, I will be presenting my Python work for this course. I actually enjoy working with Python, it seems to have more flexibility and I am more used to its language, having some experience with R during graduate school. As always, I am also including the other set of code for reference, so please see my SAS code at the very bottom of the post. There will be two parts to this presentation. The first part will be a discussion of …