# All posts filed under: Data Analysis

## Analyzing Density Bonus Developments in the City of Los Angeles

On February 22, 2016, I started the GIS Specialization Course with UC Davis through Coursera. For those of you who have paid attention, I have started the final course of the specialization: Geospatial Analysis Project. As with other Coursera specializations, this is a Capstone project that is the culmination of the previous courses. For this project, I have to propose, design, analyze, and present a geospatial analysis project from start to finish. This week requires the creation of my project proposal, which is as follows (if any of you have suggestions on data sources and/or analysis, please feel free to comment): What is Density Bonus? Density Bonus is a program through which a developer can apply for a project with a unit density greater than that allowed by the current land use zoning, as calculated from unit floor area and floor area ratio (FAR). In exchange for the higher density, the developer must set aside a certain number of units to be affordable: this is by restricting the rent levels or sale prices to targeted income levels …

## Data Analysis and Interpretation Capstone

So, this is the end. It took six months, but today I completed and was certified for the Data Analysis and Interpretation Specialization by Wesleyan University through Coursera. When I first started in October 2015, I had no idea how to write code in Python, let alone produce graphs and run statistical analysis. It has been a fun experience learning how to write code in Python and learning the different kinds of statistical methods. Ironically, I learned these after I left graduate school. One would think that these are method courses you would take in school. For the Capstone Project, I do wish the data was more complete and over a longer period of time. It is difficult to run analysis on data that only goes back as far as 1972 and in many cases, missing records for many years in between. The results can be quite misleading, as it pointed to fertility rate as being highly correlated with environmental sustainability. However, fertility rate, in many cases is contingent on many different factors that are both quantitative …

## Capstone Project: Results

For those following my blog on my Data Analysis and Interpretation Specialization by Wesleyan University through Coursera, this is the final course and the Capstone project. Unlike previous courses, I will move away from urbanization data and try to tackle one of the problems provided by the course’s industry partner. This is my introduction. Below is our third assignment – the preliminary results. Results Only the results for Burundi, Ethiopia, and Liberia will be reported, as the other countries demonstrated no change or very slight change in the ensure environmental sustainability index. Descriptive Statistics: The following table shows the descriptive statistics for the Ensure Environmental Sustainability Index for each of the selected countries, starting from the lowest GDP per capita group to the highest. The standard deviations are much greater for the lowest GDP per capita group compared to the others. In three countries, Seychelles, Canada, and Ireland, no change in the value of the index was observed. It would appear that countries that reach a certain GDP per capita will have achieved a mean Ensure …

## Capstone Project: Methods

For those following my blog on my Data Analysis and Interpretation Specialization by Wesleyan University through Coursera, this is the final course and the Capstone project. Unlike previous courses, I will move away from urbanization data and try to tackle one of the problems provided by the course’s industry partner. This is my introduction. Below is our second assignment – the data management and analysis methods. Methods Sample: Out of the 211 World Bank recognized sovereignties, 8 (N=8) were chosen for this study. Countries that has the Ensure Environmental Sustainability goal were selected: three countries with the lowest GDP per capita (Burundi, Ethiopia, Liberia), three countries with the highest GDP per capita (Canada, Ireland, United States), and two from the median (Estonia, Seychelles). In addition to identifying associations between variables and the four sustainability indicators, this selection was used to also investigate how variable relationships differ in countries with varying degrees of economic development. Each country, depending on available data, has between 26 to 43 indicators for analysis with 36 years of data from 1972 to …

## Capstone: Variables Associated With Environmental Sustainability – A United Nations Millennium Development Goal

For those following my blog since the start of my Data Analysis and Interpretation Specialization by Wesleyan University through Coursera, this is the final course and the Capstone project. Unlike previous courses, I will move away from urbanization data and try to tackle one of the problems provided by the course’s industry partner. Below is our first assignment – the introduction to my final report. Variables Associated with Environmental Sustainability Using data provided by the World Bank, through DrivenData, this study looks to identify factors associated with the Environmental Sustainability Indicator defined as an United Nations Millennium Development Goal (MDG). Preliminary explanatory variables are Gross National Income, Forest Area, CO2 Emissions, Employment, Foreign Direct Investments, Household Final Consumption Expenditure, Adult Literacy Rate, Urban Population, Investments in Energy, and Energy Use. This mix of both economic and social factors will be examined for associations with the UN-MDG indicator of environmental sustainability. After the associated variables are identified, they will be used to create a model to predict data for the years 2008 and 2012. As a social/urban scientist interested …

## Short Post: New Project

I recently undertook a new project – taking my career forward in the direction I have been aiming for. Care to guess what kind of a project I am doing? Keeping it under wraps for now, but today I made a great deal of progress! Can’t wait to get to the final product. Here’s a sneak preview:

## WLM Financial Marketing and Branding

During September 2015 to January 2016, I worked as a Marketing Coordinator/Analyst for the real-estate broker WLM Financial. Based in Inglewood, CA, the company focused on providing first-time home buyers with financial advice and loans needed to purchase their home. Using my knowledge of GIS and demographics, I identified the locations of their target markets. I proposed ten cities in the Los Angeles Metropolitan Area that they can look to expand marketing operations into. On the broker side of business operations, I looked at home sales data, mortgage data, and property prices to locate other states that WLM Financial can look to apply for broker licenses. After getting to know their operations, targets and goals better, I created a marketing and a branding plan for the company. In terms of brand building, I used their current website and Facebook page as points of interest and set goals to be reached by July 2016 and July 2017. I created a social media schedule for them to post select content and to generate more reach and views to …

## k-Means Cluster Analysis – Machine Learning

Machine Learning Data Analysis This is the last lesson of the fourth course of my Data Analysis and Interpretation Specialization by Wesleyan University through Coursera. If you have been following along with my work, you will know that I am interested in the relationship between urbanization and economic development and am posing the general question of whether urbanization drives economic growth? For this assignment, the goal is to run a k-Means Cluster Analysis using my variables: Urban Population, Urban Population Growth, GDP Growth, Population Growth, Employment Rate, and Energy Use per Capita in 2007. Here, GDP per Capita in 2007 is used as the validation variable. I am trying to identify if there are clusters of characteristics that associate with certain values of GDP per Capita based on national data from 2007. As before, the data is split into 70% training data and 30% test data. However, the k-means cluster analysis will only be run on the training data set. The Elbow Curve Graph shows that 2, 3, and 4 clusters could be interpreted, though it is …

## Lasso Regression – Machine Learning

Machine Learning Data Analysis This is the third lesson of the fourth course of my Data Analysis and Interpretation Specialization by Wesleyan University through Coursera. If you have been following along with my work, you will know that I am interested in the relationship between urbanization and economic development and am posing the general question of whether urbanization drives economic growth? For this assignment, the goal is to run a Lasso Regression that identifies the impact of each of my explanatory variables: Urban Population, Urban Population Growth, GDP Growth, Population Growth, Employment Rate, and Energy Use per Capita in 2007. As it is a linear regression model, I am able to use a quantitative variable. Unlike the previous lesson, I can use GDP per Capita 2007 as is, without having to convert it into a categorical variable. This time, the training data set is 70% and the test data set is 30% of the original data, which means there are 100 observations in my training data set vs. 43 in my test data set. pred_train.shape = (100, 6) …

## Random Forests – Machine Learning

Machine Learning Data Analysis This is the second lesson of the fourth course of my Data Analysis and Interpretation Specialization by Wesleyan University through Coursera. If you have been following along with my work, you will know that I am interested in the relationship between urbanization and economic development and am posing the general question of whether urbanization drives economic growth? For this assignment, the goal is to create a random forest that identifies the varying importance of my explanatory variables: Urban Population, Urban Population Growth, GDP Growth, Population Growth, Employment Rate, and Energy Use per Capita in 2007. For my response variable, I created a categorical variable from GDP per Capita 2007. I separated the data into two levels, where GDP per Capita 2007 is lower than 10000 is 0 or low and where GDP per Capita 2007 is higher than 10000 is 1 or high. Just as in the last assignment, when my test sample is set at 40%, the result is 58 test samples and 85 training samples out of 143 total, with …