Career, Coursera, Data Analysis, Personal
Leave a Comment

Frequency Tables…The World is Mainly Poor…

I have always been hungry for knowledge and been ambitious about my goals. Having the chance to finally learn some programming, I decided I was going to learn both SAS and Python. Luckily, this isn’t my first exposure. While at ASU, I learned the basics of R programming and it really helped me to get an understanding of writing syntax in both SAS and Python.

For my second assignment for the Data Management and Visualization course with Wesleyan University through Coursera, I am required to post the program I wrote and the frequency tables that it produced.

Just as a reminder, I am working with Gapminder data on urbanization and economic growth. For the present analysis, I am using GDP per Capita, GDP Growth, Urban Population as a percentage of total population from 2010 and Urban Population in agglomerations over 1 million people from 2007.

This is my code in SAS:

FILENAME REFFILE “/home/wfhsu.taiwan0/my_courses/Data1.xlsx” TERMSTR=CR;
PROC IMPORT DATAFILE=REFFILE
DBMS=XLSX
OUT=Gapminder2010;
GETNAMES=YES;
RUN;

PROC CONTENTS DATA=Gapminder2010; RUN;
/*to importat data from Excel file upload*/

LIBNAME mydata “/saswork/SAS_work2EC30000E95E_odaws04-prod-us/SAS_work9F7B0000E95E_odaws04-prod-us ” access=readonly;
/*LIBNAME describes where the file is located = look in which folder for data*/
DATA new; set Gapminder2010 ;
/*DATA names a new data set for the program to use and which data set in the folder to use as raw data*/
LABEL ‘GDP per Capita 2010’n=”GDP Per Capita” ‘GDP Growth 2010’n=”GDP Growth” ‘Urban Population 2010’n=”Urban Population” ‘Urban Pop in Agglomerations over’n=”Urban Pop in Large Agglomerations”;
PROC SORT; by country;
PROC FREQ; TABLES ‘GDP per Capita 2010’n ‘GDP Growth 2010’n ‘Urban Population 2010’n ‘Urban Pop in Agglomerations over’n;
PROC UNIVARIATE;
ods select Histogram;
var ‘GDP per Capita 2010’n ‘GDP Growth 2010’n ‘Urban Population 2010’n ‘Urban Pop in Agglomerations over’n;
histogram ‘GDP per Capita 2010’n ‘GDP Growth 2010’n ‘Urban Population 2010’n ‘Urban Pop in Agglomerations over’n;
RUN;

This is my code in Python:

import pandas
import numpy

gapminder = pandas.read_csv(‘Data1.csv’, low_memory=False)

print(len(gapminder))
print(len(gapminder.columns))

print(len(gapminder.index))

#ensure column values are numeric
gapminder[‘GDP per Capita 2010’] = pandas.to_numeric(gapminder[‘GDP per Capita 2010’])
gapminder[‘GDP Growth 2010’] = pandas.to_numeric(gapminder[‘GDP Growth 2010’])
gapminder[‘Urban Population 2010’] = pandas.to_numeric(gapminder[‘Urban Population 2010’])
gapminder[‘Urban Pop in Agglomerations over 1 Million People 2007’] = pandas.to_numeric(gapminder[‘Urban Pop in Agglomerations over 1 Million People 2007’])

print(“GDP per Capita”)
g1 = gapminder[“GDP per Capita 2010”].value_counts(sort=False)
print (g1)

print(“Percentages of GDP per Capita”)
p1 = gapminder[“GDP per Capita 2010”].value_counts(sort=False, normalize=True)
print (p1)

print(“GDP Growth”)
g1 = gapminder[“GDP Growth 2010”].value_counts(sort=False)
print (g1)

print(“Percentages of GDP Growth”)
p1 = gapminder[“GDP Growth 2010”].value_counts(sort=False, normalize=True)
print (p1)

print(“Urban Population”)
g1 = gapminder[“Urban Population 2010”].value_counts(sort=False)
print (g1)

print(“Percentages of Urban Population”)
p1 = gapminder[“Urban Population 2010”].value_counts(sort=False, normalize=True)
print (p1)

print(“Urban Population in Large Aggolmerations”)
g1 = gapminder[“Urban Pop in Agglomerations over 1 Million People 2007”].value_counts(sort=False)
print (g1)

print(“Percentages of Urban Pop in Large Agglomerations”)
p1 = gapminder[“Urban Pop in Agglomerations over 1 Million People 2007”].value_counts(sort=False, normalize=True)
print (p1)

Since both programs produce the same results for frequency tables, I will present the ones from SAS because the format is easier to read. In addition, I wrote in some code to produce a histogram to show the distribution of the values.

Frequency Tables:

GDPCapita2010

Missing Value = 93

GDPGrowth

Missing Value = 96

Missing Value = 65

Missing Value = 65

Missing Value = 167

Missing Value = 167

distributiongraph

As I am still not sure about sampling, I decided to include all the data for every single country. Since GDP, GDP Growth, and Urban Population are specific and unique for every country, this meant that all the values show up exactly once – all the frequencies show as 1 and the frequency percentage is equal across the board. This is why I found the histograms helpful. By looking at the histogram, it is fairly apparent that the majority of the world’s countries are clustered around the low end for GDP per Capita in 2010. However, there is more of a normal distribution in the urban population variable. In terms of GDP growth, most countries display positive economic growth, clustering around 0 to 3%. Finally, most countries do have large urban agglomerations over 1 million people. The percentage of their population in such agglomerations is fairly low, with the distribution clustering around 0 to 20%.

To get a better idea of the effect on urbanization on economic growth, if there is one, I will need more longitudinal data. As it currently stands, the data from 2010 and 2007 is not informative enough to answer my research question. However, the data suggests that much of the world lives in economic conditions that we are not familiar with…

How is that for a conclusion?

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s