Singapore Resident Population Data By Zone in JSON Format
#TL;DR.
Download json data files below
#The Story and Codes
So recently I undertook a small geospatial assignment where I needed to find out the population data of Singapore by planning zones. So I looked up in http://data.gov.sg/ and did some searching.
What I have found is
- There is one dataset called “Subzone Census2010” and another called “Region Census2010”, but I just can’t download them correctly. Every time I click on the SHP file icon, I did download a zip file, but the file is always only 22 bytes in size, so clearly there is something wrong with it and I couldn’t use it directly. And you could try yourself by following links: subzone census data and region census data.
- There is no total population distribution data, but only data about resident population, which I believed consists of only Permanent Resident and Citizen based on this report from National Population And Talent Division. So as an international student, I am not considered as a resident yet. :P
- The dataset mentioned above is called Resident Population by Planning Area/ Subzone, Age Group and Sex (2014), and could be downloaded from the link. I understand that this is only around 70% of the population, but this dataset is really awesome. By saying awesome I mean this dataset is very updated (lastly updated at 26-SEP-2014) and extremely detailed, which breaks population into both gender groups and age groups by very small intervals (5 years).
However, this awesome dataset is in XLS format and I couldn’t do much with it directly, not mentioning using this inside my web application.
So I decided to do some clean up with the data and transformed it into something usable. And of course, using Ruby, since it’s my favorite until now!
#Step 1
I exported the excel file into CSV by using excel 2010 on my mac. And I pick the 2014 CSV file since that’s one relevant to me, then I name the CSV file to exported_data_2014.csv
.
#Step 2
I did a bit search and found this gem called smarter_csv, and it works like a charm. So the code is
#Step 3
And the above code produces results like following
While, this is not exactly what I want, since the key has too many dashes inside(not elegant), and the value I expect is purely number, not string with separator ,
inside.
So I did a simple looping and convert them into the format I want by following codes
So with this snippet, I have converted the number string into actual number by removing the ,
symbol and also map the key names into the format I would like to use.
#Step 4
So now I need to slice the whole data into three sub-groups, namely “total”, “female,” and “male”.
After examining the data for a while, I came with a simple workaround which helps me achieve the goal.
Where sp1
simply means stop point 1, sp2
means stop point 2 and ep
just means end point.
#Step 5
After this I realize that the total
attribute is still in string format and I haven’t converted it yet, but I couldn’t directly convert it before my slicing, since I am actually using the attribute as index points.
So I did another hack on this.
As simple as this, I did a hack and convert all total
into actual numbers.
#Step 6
Now let’s just write those array into files!
#Step 7
All done. Sit back and have some fun with the data.