Career, Cities, Coursera, Data Analysis, Research, Society, Urban Planning
Leave a Comment

Correlations! Urbanization and Economic Development in Rich and Poor Countries

Continuing with Data Analysis Tools

If you have not read my previous posts, I am currently enrolled in a Data Analysis Specialization with Wesleyan University through Coursera. With data from Gapminder, I am exploring a broad and basic question: does urbanization drive economic growth? For those of you interested in reading my literature review to gain a background on this project, please visit this page.

Finally! Quantitative to quantitative variable analysis! This is the lesson I have been waiting for. With my interest in urbanization and economic development, the data I pulled from Gapminder are all quantitative. As I previously mentioned, I do not like categorizing quantitative data because I believe it introduces too much subjectivity. Unless the data is qualitative to begin with, it makes little sense to categorize data.

Compared to the other types of correlation tests, Pearson’s Correlation was relatively easy to perform in both Python and SAS. I looked at the relationships between urbanization rate, as measured by both urban population growth rate and percentage of population in large cities with over 1 million people, and economic development, as measured by GDP per capita and GDP growth rate.

Of the four relationships, only two had significant correlations: Urban Population in Large Cities with GDP per Capita and Urban Growth Rate with GDP per Capita. Furthermore, the correlations are not the strongest, with r = 0.485 and -0.377 respectively, as demonstrated by the graphs below:


Here are the p-values for each relationship:

Urban Population in Large Cities and GDP per Capita
= 5.43e-07

Urban Population in Large Cities and GDP % Growth
= 0.495

Urban Population Growth Rate and GDP per Capita
= 0.000154

Urban Population Growth Rate and GDP % Growth
= 0.301

With a positive r value and a p-value below 0.05, this means that countries with increasing population in large urban centers correlate with higher overall wealth in terms of GDP per Capita. Urban Population Growth on the other hand is negatively correlated with GDP per Capita, meaning that poor countries tend to correlate with higher urban population growth.

This evidence correlates with the ANOVA and Chi-Square tests that demonstrated some relationship between urbanization rate and GDP per capita in countries and that the most significant differences occur between rich and poor countries. However, it does not seem like the relationships are strong. Lastly, there did not appear to be any relationship between urbanization and GDP growth rate, which can have implications on economic policies of countries. A lot of the political rhetoric center around the benefits, especially economic, of urbanization. However, the data suggests otherwise.

The significant relationships demonstrated here may just be an observation of the present circumstances. Rich countries tend to have more people in large cities, while poor countries tend to have higher growth rates in cities. Yet, the data do not show any cause and effect. Rich countries may not be rich because they have more people in large cities and poor countries may not be poor because they have higher growth rates. Here, the data supports the claim that it is not what you have but how you use what you have. This might be a bold claim but I believe social policy matters much more in creating better living conditions than economic policies.

This is my code in Python (Graph Only Urban Populations in Large Cities and GDP per Capita):

import pandas
import numpy
import seaborn
import scipy.stats
import matplotlib.pyplot as plt

gapminder = pandas.read_csv(‘Data1.csv’, low_memory=False)

gapminder[‘GDP2010’] = pandas.to_numeric(gapminder[‘GDP2010’])
gapminder[‘GDPGrowth2010’] = pandas.to_numeric(gapminder[‘GDPGrowth2010’])
gapminder[‘UrbanPop2010’] = pandas.to_numeric(gapminder[‘UrbanPop2010’])
gapminder[‘UrbanPopGrowth2010’] = pandas.to_numeric(gapminder[‘UrbanPopGrowth2010’])
gapminder[‘UrbanAgg2007’] = pandas.to_numeric(gapminder[‘UrbanAgg2007’])

#replace 0 with NaN values, and dropna() removes rows with any NaN values
gapminder[‘GDP2010’] = gapminder[‘GDP2010’].replace(0,numpy.nan)
gapminder[‘GDPGrowth2010’] = gapminder[‘GDPGrowth2010’].replace(0,numpy.nan)
gapminder[‘UrbanPop2010’] = gapminder[‘UrbanPop2010’].replace(0,numpy.nan)
gapminder[‘UrbanPopGrowth2010’] = gapminder[‘UrbanPopGrowth2010’].replace(0,numpy.nan)
gapminder[‘UrbanAgg2007’] = gapminder[‘UrbanAgg2007’].replace(0, numpy.nan)

gapminder = gapminder.dropna(subset = [‘GDP2010′,’GDPGrowth2010′,’UrbanPop2010′,’UrbanPopGrowth2010’, ‘UrbanAgg2007’])
gapminder = gapminder[[‘Country’,’GDP2010′, ‘GDPGrowth2010’, ‘UrbanPop2010’, ‘UrbanPopGrowth2010’, ‘UrbanAgg2007’]]

print (gapminder)

GDPDes = gapminder[‘GDP2010’].describe()
print (GDPDes)

GDPGrowthDes = gapminder[‘GDPGrowth2010’].describe()
print (GDPGrowthDes)

UrbanPopDes = gapminder[‘UrbanPop2010’].describe()
print (UrbanPopDes)

UrbanGrowthDes = gapminder[‘UrbanPopGrowth2010’].describe()
print (UrbanGrowthDes)

UrbanAgg = gapminder[‘UrbanAgg2007′].describe()
print (UrbanAgg)

seaborn.regplot(x=’UrbanAgg2007′, y=’GDP2010’, fit_reg=True, data=gapminder)
plt.xlabel(‘Urban Population in Large Cities’)
plt.ylabel(‘GDP Per Capita 2010’)
plt.title(‘Urbanization and Economic Growth’)

print(‘Urbanization and GDP Growth’)
print(scipy.stats.pearsonr(gapminder[‘UrbanAgg2007’], gapminder[‘GDP2010’]))

print(scipy.stats.pearsonr(gapminder[‘UrbanAgg2007’], gapminder[‘GDPGrowth2010’]))

print(scipy.stats.pearsonr(gapminder[‘UrbanPopGrowth2010’], gapminder[‘GDP2010’]))

print(scipy.stats.pearsonr(gapminder[‘UrbanPopGrowth2010’], gapminder[‘GDPGrowth2010’]))

This is my code in SAS:

FILENAME REFFILE “/home/wfhsu.taiwan0/my_courses/Data1.xlsx” TERMSTR=CR;


LIBNAME mydata “/saswork/SAS_work2EC30000E95E_odaws04-prod-us/SAS_work9F7B0000E95E_odaws04-prod-us ” access=readonly;

DATA new; set Gapminder2010 ;

LABEL ‘GDP per Capita 2010’n=”GDP2010″;
LABEL ‘GDP Growth 2010’n=”GDPGrowth2010″;
LABEL ‘Urban Population 2010’n=”UrbanPop2010″;
LABEL ‘Urban Pop Growth’n=”UrbanPopGrowth2010″;
LABEL ‘Pop in Large Cities’n=”UrbanAgg2007″;

IF GDP2010=”0″ THEN GDP2010=”.” ;
IF GDPGrowth2010=”0″ THEN GDPGrowth2010=”.” ;
IF UrbanPop2010=”0″ THEN UrbanPop2010=”.” ;
IF UrbanPopGrowth2010=”0″ THEN UrbanPopGrowth2010=”.” ;
IF UrbanAgg2007=”0″ THEN UrbanAgg2007=”.”;

IF GDP2010 ne . ;
IF GDPGrowth2010 ne . ;
IF UrbanPop2010 ne . ;
IF UrbanPopGrowth2010 ne . ;

PROC SORT; by country;
PROC PRINT; VAR country GDP2010 GDPGrowth2010 UrbanPop2010 UrbanPopGrowth2010 UrbanAgg2007;

PROC GPLOT; PLOT GDP2010*UrbanAgg2007 ;
PROC CORR; VAR UrbanAgg2007 UrbanPopGrowth2010 GDPGrowth2010 GDP2010;

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s