WFP gotcha: connections, FwpsFlowAssociateContext, and ALE reauthorization

Anyone building endpoint security software on Windows may require some kind of firewall-like functionality, and if that’s the case they will probably encounter Windows Filtering Platform (WFP) at some point. I’ll let you search the interwebs for an introduction to or overview of WFP, but in this post I want to highlight an issue that isn’t well documented.

But first, a brief explanation of how new connections are typically handled with WFP.

Flow IDs accompany callbacks to allow stateful connections to be uniquely identified. When connections are started, they arrive in a WFP callback with a layer id of FWPM_LAYER_ALE_AUTH_CONNECT_V4 (or V6), which is typically your cue to create a new context object and associate it with the corresponding Flow ID, by calling FwpsFlowAssociateContext. Then with subsequent activity on the same connection, callbacks are made with different layer ids corresponding to the type of activity and the flow ID corresponding to the connection, with the associated context object. So far so good. We have a one-to-one mapping between flow IDs and context objects.

We may summarise the above as the following rule of thumb for how to typically deal with new connections: when we get a callback for FWPM_LAYER_ALE_AUTH_CONNECT_V4 (or V6) with some flow ID, we should create a context object and use FwpsFlowAssociateContext to associate the context object with that flow ID. With that, all is well in the world… right?

The assumption made by the above rule of thumb is that, when we get a FWPM_LAYER_ALE_AUTH_CONNECT_V4 (or V6), we are going to get a brand new flow ID along with it (neglecting the case of reusing old flow IDs for terminated flows). This seems a reasonable assumption, and it almost always holds. There is, however, at least one exception. That exception is ALE reauthorization.

Learning more about ALE reauthorization is left as an exercise for the reader, but the upshot is that ALE reauthorization has the annoying effect of producing FWPM_LAYER_ALE_AUTH_CONNECT_V4 (or V6) callbacks with repeat flow IDs when they actually do not represent new connections at all, and the only way to recognise this occurrence is to check the filtering condition flags. This creates a problem because now conceptually we have one flow ID associated with multiple context objects. The practical reality is that all but the most-recently-associated of the context objects is left dangling.

I should also point out that I’m not entirely sure what external stimuli is required to trigger ALE reauthorization: I observed it happening somewhat intermittently on Windows 7 running in VMWare Fusion doing nothing more than streaming music.

The impact of failing to distinguish ALE reauthorization from an “real” new connection is hard to anticipate definitively because it depends on the implementation of the code using the framework. Perhaps your state tracking will be rendered incorrect because some counters will be wrong. It’s extremely likely that you will be leaking context objects.

Anyway, the consequence of this annoying behaviour of ALE reauthorization is that we should tweak our rule of thumb for how to handle new connections: when we get a callback for FWPM_LAYER_ALE_AUTH_CONNECT_V4 (or V6) with some flow ID AND if the FWP_CONDITION_FLAG_IS_REAUTHORIZE flag is not set, then we should create a context object and use FwpsFlowAssociateContext to associate the context object with that flow ID.

Advertisements
Posted in Uncategorized | Tagged , | Leave a comment

Performance-based contracting for cybersecurity

“Show me the incentive and I will show you the outcome.”

— Charlie Munger

Have you heard the one about how all the viruses are written by the antivirus companies? This joke makes the rounds at cocktail parties because it strikes the right balance between being absurd and making cynical sense. More viruses means more demand for antivirus, right? So the antivirus companies must be making all the viruses! While baseless in reality, this caricature highlights a fundamental problem in cybersecurity: misalignment of incentives.

Incentives.

Remember when your car gave you trouble and your mechanic couldn’t seem to fix it permanently? Every few months it seems like the issue would reoccur, but your mechanic insisted they fixed it right the first time and this is a different issue now even though it looks/sounds the same to you. You choose to believe them, but you are aware that every time one of these “new” issues comes up, your mechanic gets more money from you. We’ve all been there. I’ve had an “arm bush” replaced only to replace the “mounting” just two weeks later – apparently both definitely needed replacing. It’s just one of life’s annoyances that we’ve gotten used to, because there’s no choice but to live with it.

Now imagine that annoyance scaled out across a fleet of corporate jets.

Power by the Hour.

It’s the 1960s and you’ve developed a very cost-efficient aircraft engine. A side-effect of the low-cost design is that it suffers recurring maintenance issues (none catastrophic, but they still need attention). So how do you assuage customer concerns about this literally high-maintenance product? You invent the “Power by the Hour” model.

With “Power by the Hour”, as the buyer of engine maintenance, you pay Rolls Royce only for time that the engine is running. That means engine downtime is a cost to Rolls Royce, so there is an alignment of incentives: the longer the engine is in a runnable state, the happier the customer is, and the more money the mechanic makes. Incentives are aligned so it’s a win-win and everyone is happy.

This application of the “pay for what you eat” model to a loss-minimisation problem was well ahead of it’s time, but today it is considered a classic example of performance-based contracting.

Misaligned incentives in cybersecurity.

Every cybersecurity vendor experiences a moment of happy excitement when their product is seen to stop an attack on a customer. This is understandable. But consider an analogy, would you feel happy the day your home alarm scared burglars away? Most people would be freaked out that someone tried to get in. It’s good that you had the alarm system and it did its job because if the alarm were not there (or did not work) then you would have had a burglar in your home, but nevertheless it is a misalignment of incentives to find that someone who is supposed to be on your side – i.e. the home security service provider – gains a benefit when you are attacked.

Cybersecurity buyers like to complain about “alert fatigue”, but from anecdotal evidence I think there needs to be some baseline level of alerts in order to secure renewals and expansions because otherwise most contemporary buyers don’t see value. Home alarm systems come at a cost, and if yours hasn’t triggered in a year and the general sense is that your neighbourhood has become safer, then why wouldn’t you consider cancelling your service and spending the money on a family vacation?

So if you’re the home alarm company, it’s not to your benefit for neighbourhoods to become safer because with your business model, you don’t get paid to make people safe or even to make them feel safer. You are paid for alerts. The more alerts, the more you benefit.

Cyber Power by the Hour.

Cybersecurity is a trickier problem than engine uptime because:

  • Performance is harder to define in this context. What is a breach? How do you measure the impact of a breach?
  • Driving performance would most likely require invasive controls. You don’t just go pick up an engine and take it back to your workshop (and optionally leave a spare engine with the customer). To do cybersecurity, you generally need to be embedded in the front-line.

If (and I acknowledge this is a huge if) the above two points can be sufficiently addressed, then how would you structure a performance-based holistic solution to cybersecurity? The easiest model I can imagine is “pay me $1M per year to take care of your cybersecurity and I will pay you back $250k every time I fail”. Astute readers will point out penalties for non-performance are standard in many major IT procurement programs, but I’m not taking about failure to deliver on-time or on-spec, I’m talking about a payout on occurrence of a breach (however breach is defined).

If this model of “pay me X and I will pay you Y when Z happens” sounds familiar, it’s because that’s basically how insurance works. Getting to Cyber Power by the Hour requires a further step: the insurance company is responsible for providing your cybersecurity. Then it’s “pay me X and I will pay you Y when Z happens but I’m going to do everything I can to prevent Z so that I can minimise Y.

Presumably the more control you hand over to the insurance company, the lower X is (or the higher Y). On a fundamental business structure level I believe insurance is already well matched to a performance-based approach for cybersecurity. Plus they have the actuarial rigour and discipline to back it up with risk and pricing models, not to mention the float to offer such a product without a beyond-obscene amount of venture capital.

But, there’s a gap in the cybersecurity delivery side of things:

  • How to define performance? The insurance industry has tonnes of experience with this issue in a general sense (e.g. was your house destroyed by water damage or a hurricane). Applying it to cybersecurity will obviously take a lot of effort, but I think it can be done for the most part. The problem is some attacks may be hard to detect/recognise, e.g. industrial espionage.
  • How to deliver performance? What controls do you mandate? What is the economic impact (e.g. on productivity) of these controls? How are these controls implemented and enforced, and by whom? The insurance industry has some experience with this in a general sense (e.g. they may give you a health checkup as a prerequisite to health/life insurance, or make you stick a tracking device in your car for vehicle insurance) but it’s definitely a qualitatively different ballgame with cybersecurity controls.

As they say, the devil is in the details and cybersecurity is a field that is especially unforgiving of ignoring the details. So we shall see if the insurance industry is willing and able to seize this opportunity to move beyond alerts and incident response, and pay more attention to controls and interventions that move the needle at a far higher level than the average enterprise can even catch a whiff of. Maybe you could focus your efforts on controls like convincing Barrack Obama to have a frank word with Xi Jinping, then you can sit back and reap the increased profits due to reduced breach payouts that naturally flow on from a reduction in APT attacks.

Suddenly you’re in the business of safer neighbourhoods. Imagine that.

Posted in Uncategorized | Tagged , | Leave a comment

Podcasting: 5 dos and 5 don’ts

About 10 years ago (or was it 11 years… or 12 years… let’s cap it at 10 before I start to feel too old), I got a Sansa Clip. I loved my Sansa Clip. The killer app for me was podcasts. Today, my Sansa Clip is long gone, but I’m still a voracious podcast listener.

I’ve come to realise that there are things I like to hear in podcasts, and things that I don’t like. In this post I’m going to list 5 things I like, and 5 things I don’t like. If you produce a podcast, read on – you just might find something to make your podcast a little bit better.

As a bonus, you’ll see me specifically mention some of the podcasts I listen to – while this post isn’t really about recommending podcasts, I suppose you could infer that since these are shows I listen to, I must think they’re pretty good. And I do.

Continue reading

Posted in Uncategorized | Tagged , , | Leave a comment

The cyber killchain: wrong, but is it useful?

All models are wrong, but some are useful. The trick is to determine in what circumstances a model may be useful. This is where mistakes are made.

Today I was at a seminar on cybersecurity in the context of individuas, and I asked the speakers about whether the cyber killchain is useful in the context of personal cybersecurity or if there are other models that are more appropriate.

What surprised me was the attitude of one of the speakers towards the model itself – something along the lines of:

“I think we need to get away from the militarisation of cyber security*”

(*I’m paraphrasing because I can’t remember the precise words used, but that was the meaning conveyed).

I don’t know if there is a science to deciding if a model is useful, but I feel confident that the provenance of a model is not the best discriminant. It’s tempting to say an idea born in the military is going to be too militaristic in perspective, but ideas move between fields all the time, and in this era where interdisciplinary approaches yield the most progress, I think the default position should be it doesn’t matter where it came from, what matters is how we can use it.

Yes, the cyber killchain concept came from the military industrial complex. Does that mean it has no place in the civilian non-enterprise Internet?

If there was a better model, then great. But if not, something is better than nothing, and the cyber killchain is something. The stages (Recon, Weaponisation, Delivery etc) are understandable, relatable, relatively generic, and I suspect they could kind of map to the individual context. Maybe they wouldn’t map well, but it would be a start. It would be something.

It seems to me like the current offering to concerned individuals doesn’t amount to much more than a laundry list of horror stories (“you know what happened to this person? Well first they revealed their birthday, and then the scammers used that to get their phone number, and then bla bla bla”), and a list of chores (“always update, always backup bla bla bla”).

Why doesn’t that work? Because there is no model, so there is no comprehension. I think that’s why it doesn’t stick and people are left exposed.

And then they are told that they are to blame because they didn’t do all their chores.

Maybe they need to hear more horror stories…

Posted in Uncategorized | Leave a comment

Should Australia join ASEAN?

Former Australian Prime Minister Paul Keating believes that Australia should join ASEAN. This is a call that he has repeated since the surprise victory of Donald Trump. There are concerns about whether or not Australia could join ASEAN – certainly this is something that requires more seriousness than, say, Eurovision participation – but as they say, where there’s a will there’s a way.

So, is there a will to include Australia in the ASEAN community? Political leaders will always have their own individualistic take on things, but I’m curious about how the common man feels about this fuzzy question. Personally I lack the means to carry out “rigorous polling” (whatever that even means anymore), but I do have a Twitter account and a few hundred bucks to spare. So with that, I ran a small experiment to gauge public sentiment for Australian membership into ASEAN.

Here’s what I found.

Continue reading

Posted in Uncategorized | Tagged , , | Leave a comment

A high-level evaluation of the OpenBSM audit system in OS X

One of the BSD legacy security mechanisms included with OS X is OpenBSM. This is an audit mechanism. In contrast, TrustedBSD (also included with OS X) is a mandatory access control mechanism which can block system calls based on some policy. Audit doesn’t block anything, it only reports what happened. Also unlike TrustedBSD, the audit implementation is quite a full-stack offering, including user space tools to control the kernel audit system and even to parse audit records.

What does this system look like on a high-level, and how does it perform? Read on…

Continue reading

Posted in Uncategorized | Tagged , | Leave a comment

Disabling revoked cert checking for malware research on OS X

Malware research involves running malware samples, typically in VMs. Because developer codesigning certificates are trivial to acquire in the Apple ecosystem, OS X malware samples are very often code signed. When malware is discovered, Apple can and often does revoke the associated cert. While this is a good thing for end-users, it can be a hindrance for malware research.

pkg_certificate_revoked

This app has a revoked certificate

Fortunately, it is easy enough to disable this security feature.

Continue reading

Posted in Uncategorized | Tagged , | Leave a comment