Career, Cities, Coursera, Data Analysis, Research, Society, Urban Planning
Leave a Comment

In Speaking of Data: Gapminder

This is the start of the third course, Regression Modeling in Practice, in the Data Analysis and Interpretations Specialization by Wesleyan University through Coursera.

The first assignment is to provide a description of the data I have been working with – what is the sample, how the data is collected and how I managed the data.

If you have been following along with my work, you will know that I am interested in the relationship between urbanization and economic development and am posing the general question of whether urbanization drives economic growth?

My sample consists of countries, territories, and other political entities such as disputed territories, dependent territories, or semi-autonomous city-states like Hong Kong. According to Gapminder, where my data was downloaded, this list consists of 193 UN Nations, 51 other entities, 4 French overseas territories, 10 former states, and 2 ad-hoc areas totaling 260 (or N=260). However, because not every entity has data in the indicators I am using, the number of entities in my work is reduced to 164 (or N=164). In the case of the variable, Percent of Total Population in Urban Agglomerations over 1 Million People, the amount of data available further reduced my sample size to only 97 (or N=97). Depending on the variables I am examining, the sample size is either 164 or 97. In my data set, every entity falls under the category Country.

The observational data was collected by the World Bank from national statistical agencies of the various national/political entities. Some records go as far back as 1960, however, not every single entity has such information and some lacks data entirely. Furthermore, urban areas are as defined by each of the government agencies. This introduces some inherent variability and uncertainty to my data because each definition of urban can be different. Gapminder was founded in 2005 in an effort to make time series data on every nation in the world available through their web service in order to demonstrate major global development trends. They collected and compiled the data from the World Bank into downloadable content on their website.

Lastly, the variables I am using are GDP per capita, GDP % Growth, Urban Population expressed as a percentage of the total population, Urban Population % Growth, and Urban Population in Agglomerations over 1 million people. Instead of using data from every year possible, stretching back to 1960, I decided to use 2010 data for each country and political entity and eliminated observations that had no data for 2010. This is because the most recent year for Urban Population in Agglomerations over 1 million data is 2007. I used 2010 for the other variables because it is relatively close to 2007 and still recent enough to be considered relevant. As my question is framed in the context of urbanization driving economic growth, my explanatory variables are Urban Population % Growth and Urban Population in Agglomerations over 1 million. My response variables are GDP per capita and GDP % Growth.

As this project goes along, I am hoping to eventually use data from years prior to 2010 in order to develop more solid analysis through a longitudinal study.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s