gets serious

Deals have been signed, commitments made, and Big Data Week is just around the corner – so perhaps it’s time for another review of, the official Open Data portal of the Malaysian Federal Government.

Unlike the previous review, where not much change was found, this time there’s been a lot of activity.

New assets

In total there’s a whopping 44 new data assets (comparing the MYGovDataSet scrape on 25th October 2014 to a new scrape on 7th April 2015). That’s 44 on top of the existing 121, which is a 36% increase.

Picking one of the new entries at random, let’s look at data set 133: Ministry of Transport, a state-by-state breakdown of annual traffic accident count from 2003-2013. It is provided as an Excel file, which, while not ideal, is a huge step up from PDF files. Glancing at a couple of Excel files from the Ministry of Transport, at least they are formatted consistently.

Lost assets

My scrape also indicates that since October 2014, 4 datasets have been removed from the site: IDs 114 to 117. Unfortunately I did not save what those assets were, and obviously they are inaccessible now (I haven’t tried looking at the Internet Archive or any other caching service).

There could be any number of reasons why these data sets have been apparently removed. Perhaps they were simply moved to new IDs, or it could be they were erroneous entries to begin with. Hopefully it is something benign like that rather than some data provider having second thoughts and requesting that something be removed.

Unlike with previous scrapes, there are now some holes in the ID space (e.g. after dataset 141, the next available dataset is 144; ids 142 and 143 seem to be unused), so it seems there’s been a change in how the site is administered. It could potentially be how they deal with upload errors; simply create a new entry and delete the old one, instead of fixing the old one.


Here’s something that may be more a criticism about the functioning of the Malaysian government rather than their open data practice per se. Where would you expect to find data on vehicle licensing? If you guessed all that would come from the Ministry of Transport, you’d be wrong.

Here we have the Prime Minister’s Department providing data that they call “Jumlah Lesen Terkumpul”, which translates to “Number of Licenses Collected”. Their description of the data set is the completely unhelpful “Jumlah Lesen Terkumpul Mengikut Kelas Lesen dan Negeri (31 Disember 2014)” which translates to “Number of licenses collected according to license class and state (31 December 2014)”. What is a “collected” license? And what sort of license are we talking about anyway?

Looking at the Excel file, we can guess that this is about licenses for public transport, perhaps a count of license revocations. It isn’t clear what it means. A bit of metadata and a better description would certainly help.

Trying out the portal’s search functionality, the term ‘lesen’ does turn up this asset, but ‘pengangkutan’, ‘teksi’, and ‘bas’ did not turn up anything, so clearly search is only over the page descriptions – which means it is critical to have a good description.

Machines, start your reading!

I’ve saved the best for last; when manually inspecting some of the newly provided data sets, something interesting stood out: I kept seeing Excel and CSV files. In previous scrapes there weren’t even any CSV files. It used to be the case that PDFs accounted for over 80% of the assets available. Now the story is very different.

Counts of asset types on

Counts of asset types on

PDFs now only account for 38% of the data sets available. Combined, Excel and CSV files make up 43% of the mix, which is a huge boon for machine readability. Notably, Excel files are not an open standard, but at least there are modules to work with the format, and if nothing else certainly they are easier to work with (through manual intervention if need be) than PDFs.

It’s clear that has gotten some love recently, and here’s hoping it keeps going.

This entry was posted in Uncategorized and tagged , . Bookmark the permalink.

3 Responses to gets serious

  1. I was at a MDEC event recently, and fully agree that the initiative is really taking off. I think the folks over there are really putting effort in, and while it’s not there yet, it’s moving in the right direction and at a good pace.

    Also it helps that guys like Sinar Project are helping move things along. Everything else in Malaysia might be falling part, but at least this is an area I’m optimistic about.

    • Tirath says:

      Thanks for the comment Keith, I can certainly agree with your sentiment! Like you, I’ve been fortunate to have witnessed the behind-the-scenes efforts of some key individuals within government (including technically-private-sector-government-agencies like MDeC) pushing the Open Data agenda for a number of years. To me, it has served as an important reminder that individuals can be quite awesome even if their organizations are… quite shit.

      Yes, some cool people have been pushing Open Data in Malaysia for the last couple of years, and their efforts are bearing fruit. But now that “Big Data Analytics” is recognized as a main stage national imperative, things can start to become a lot more political, as other people start to jump in. Too many chefs spoil the broth, and that’s even before allowing for chefs whose primary motivation may not exactly be tasty broth… if you know what I mean 😉

      There’s no reason yet to be pessimistic. Hopefully the Open Data movement in Malaysia will continue to grow positively. But it would be great if we, as “outsiders”, can somehow encourage things to stay on track. I think the most important way we can do that is to try and discover use cases for all the data being published, thus empowering those who pushed for their release, by validating their efforts.

      Sadly I’ve yet to do anything real myself… need more hours in a day…

  2. Pingback: Big Data Week 2015 @ Kuala Lumpur: Unleashing The Power of Big Data, Malaysia’s Journey to Become a Hub For Big Data in ASEAN | Martin Ho Jin Cong

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s