An initial perspective on domestic mass electronic surveillance in Malaysia

The Snowden disclosures brought into public consciousness the issue of domestic mass surveillance. This has triggered debate throughout the developed world, less so in the developing world.

Curious about current perceptions on this issue in Malaysia, I posted a question to the Big Data Malaysia discussion group:

People in this group may well be regarded as Big Data experts by their friends and family, and I’m curious… are you hearing any concern about potential mass electronic surveillance* in Malaysia?

(*I mean the sort of thing brought to light by the Snowden disclosures.)

The following is not my personal opinion, rather it is my personal summary of opinions provided on the above-mentioned discussion thread. There were 39 comments in response. I coded, categorized, and weighted (by likes) each comment to produce the following summary. It is unavoidably subjective, but hopefully it’s not too far off from a useful snapshot of the opinion of the members of Big Data Malaysia on the subject of mass electronic surveillance in Malaysia.

Continue reading

Posted in Uncategorized | Tagged , , , , , , , | Leave a comment gets serious

Deals have been signed, commitments made, and Big Data Week is just around the corner – so perhaps it’s time for another review of, the official Open Data portal of the Malaysian Federal Government.

Unlike the previous review, where not much change was found, this time there’s been a lot of activity.

Continue reading

Posted in Uncategorized | Tagged , | 3 Comments

Reflecting on the importance of domain knowledge in data science

When I was a PhD student, we had a regular internal seminar series for postgrads to present anything relevant to their work, most typically their intermediate findings. At one such session, a mobile telecommunications researcher was presenting his findings on energy consumption. He had created a simulated annealing model and crunched some data and presented some graphs. One Professor in the audience was apparently paying closer attention than most. He leaned in, squinted, then observed: “So basically what you found was that with two batteries the thing lasts twice as long?

Continue reading

Posted in Uncategorized | Tagged , , , | Leave a comment

Ramblings on Australian democracy – theory vs. reality

In 2011, the UK mulled the Alternative Vote (AV), known as preferential voting in Australia. The measure failed to pass, but pundits for the AV held up Australia as a worthy example.

Indeed most Aussies are rather proud of their preferential voting system because in theory it allows fringe parties to compete with a reasonable chance of success since it eliminates vote-splitting/”spoiler” fears (a fear seemingly justified by Al Gore’s loss to George W. Bush because of Ralph Nader).

So yes, in theory, preferential voting is awesome.

Next, there’s compulsory voting, which gains Australia over 80% voter turnout. This is also something Aussies are proud of because high voter turnout means election outcomes are representative of the population.

So yes, in theory, compulsory voting is awesome.

If Aussies are proud of compulsory voting, and proud of preferential voting – which are key features of their political system – then how come these days they don’t seem to be all that proud of their politics?

Tony Abbot’s approval ratings currently sit at 31%, but it’s not just a Tony Abbot problem – the Kevin Rudd/Julia Gillard era was hardly anything to be proud of either.

Posted in Uncategorized | Tagged , | Leave a comment

Tensors for all

Let’s build open source tensor libraries for data science? Sounds good! But wait, this sounds familiar…

Years ago I was dabbling in computational quantum chemistry, which exposed me to the wild and whacky world of quantum physics, and the almost-as-wild-and-almost-as-whacky world of numerical approximations to Schrödinger’s equation.

Schrödinger's Cat on an abacus computing some resting energy... geddit??

Schrödinger’s Cat on an abacus computing some resting energy… geddit??

It was there that I encountered tensors. Basically tensors are generalizations of matrices, and vectors, and (maybe? I think?) scalars. If a matrix has 2 dimensions, then it stands that a vector has 1 dimension. I’m fuzzy on the definitions but perhaps scalars technically have 0 dimensions. Anyway my point is, a vector is a 1-tensor, a matrix is a 2-tensor (maybe a scalar is a 0-tensor) and it follows that we can have arbitrary N-tensors which have N “dimensions”.

We’ve long had well established linear algebra libraries, most prominently LAPACK, which handles matrices (*ahem*, I mean 2-tensors) quite happily, but apparently did not generalize very well to tensors with N > 2, which is understandable since N <= 2 is very special-casey. While there are some general-purpose tensor libraries out there, their coverage of tensor functionality is apparently not sufficiently comprehensive.

Tensors (of the N > 2 variety) were causing grief to computational chemists for some time already, so some cool people created the Tensor Contraction Engine. I don’t know if the project is still being actively maintained, nor do I know if it covers precisely what the author of the first-mentioned paper is looking for.

The computational quantum chemistry field never really exploded the way the data science field is exploding now, but I think everybody understands that so much of this “new” stuff underpinning data science actually has legacy academic roots, so a healthy impulse would be to look backwards. Lesson number 1: don’t reinvent the wheel.

And more importantly, maybe the reason the Tensor Contraction Engine did not endure (unless it did?) is because it settled into a domain-specific mandate. I’ll be the first to defend an application-specific approach (or my PhD would be a 4-year act of blatant hypocrisy) but clearly it’s not just data scientists who need decent tensor support, and clearly it’s not just theoretical chemists who need decent tensor support. So lesson number 2 could be: opportunities to generalize numerical libraries beyond their founding discipline ought to be eagerly embraced.

Posted in Uncategorized | Leave a comment

Self-driving cars vs parking lots

Self driving cars made a splash at CES 2015. As people debate the precise market size, they enumerate the use cases of self-driving cars, mostly from the perspective of accessibility for those who ordinarily would not be able to drive (e.g. the disabled). The counter argument is that people generally do love driving and would not want to give up the steering wheel.

In my mind there is a clear killer application that would make it all worthwhile: parking. A car that can drop me off at my destination then drive off to park itself is such a huge win;

  • As a driver, I won’t have to look for a parking spot. But more importantly…
  • As a property developer I don’t have to co-locate parking spaces close to destinations.

Business districts, public transport hubs, entertainment venues etc. won’t need dedicated parking spaces; there can be one huge lot serving all locations within some (say, 5km) area – too far to park and walk, but close enough to be recalled via a mobile app within a reasonable amount of time. The premium property that frees up will be worth a fortune.

Parking spaces are an economic waste, and self-driving cars can unlock the value trapped within their grey, uninspiring, soon-to-be-no-longer-necessary walls.

Posted in Uncategorized | 2 Comments

bash type to the rescue

tl;dr the bash builtin type is handy for troubleshooting large messy bash things. UEFI vulnerabilities have been coming up a lot lately. At a recent InfoSec conference that I attended, one speaker remarked that the state of the code is bad and that “electrical engineers should not be writing code”. That’s quite an uncharitable comment, but looking at the state of the EDK2 source code by way of CloverGrower I’ve experienced some of that pain first-hand, in the state of the build scripts. The build command was failing and I managed to track it down to this segment in ./edk2/Clover/

    echo "Running edk2 build for Clover$TARGETARCH using the command:"
    echo "$cmd"
    eval "$cmd"

From terminal output we know that cmd is:

build  -p Clover/Clover.dsc -a X64 -b RELEASE -t GCC47 -n 9 

Now the question is, what exactly is “build”? The script has one “source” command:

        source BaseTools

I now know that this ends up calling ./edk2/BaseTools/BuildEnv and stuff, but really I’m just trying to locate what “build” function is being called and surely there must be some way to find that out quickly. That’s where the bash type command comes in. Placing it just before the eval:

    echo "Running edk2 build for Clover$TARGETARCH using the command:"
    echo "$cmd"
    type build
    eval "$cmd"

… reveals the following:

build is /Users/tramdas/repos/Clover/edk2/BaseTools/BinWrappers/PosixLike/build

That’s all I needed. Much more efficient than untangling the web of source commands.

Posted in Uncategorized | Tagged | Leave a comment