Anyone building endpoint security software on Windows may require some kind of firewall-like functionality, and if that’s the case they will probably encounter Windows Filtering Platform (WFP) at some point. I’ll let you search the interwebs for an introduction to or overview of WFP, but in this post I want to highlight an issue that isn’t well documented.
But first, a brief explanation of how new connections are typically handled with WFP.
Flow IDs accompany callbacks to allow stateful connections to be uniquely identified. When connections are started, they arrive in a WFP callback with a layer id of FWPM_LAYER_ALE_AUTH_CONNECT_V4 (or V6), which is typically your cue to create a new context object and associate it with the corresponding Flow ID, by calling FwpsFlowAssociateContext. Then with subsequent activity on the same connection, callbacks are made with different layer ids corresponding to the type of activity and the flow ID corresponding to the connection, with the associated context object. So far so good. We have a one-to-one mapping between flow IDs and context objects.
We may summarise the above as the following rule of thumb for how to typically deal with new connections: when we get a callback for FWPM_LAYER_ALE_AUTH_CONNECT_V4 (or V6) with some flow ID, we should create a context object and use FwpsFlowAssociateContext to associate the context object with that flow ID. With that, all is well in the world… right?
The assumption made by the above rule of thumb is that, when we get a FWPM_LAYER_ALE_AUTH_CONNECT_V4 (or V6), we are going to get a brand new flow ID along with it (neglecting the case of reusing old flow IDs for terminated flows). This seems a reasonable assumption, and it almost always holds. There is, however, at least one exception. That exception is ALE reauthorization.
Learning more about ALE reauthorization is left as an exercise for the reader, but the upshot is that ALE reauthorization has the annoying effect of producing FWPM_LAYER_ALE_AUTH_CONNECT_V4 (or V6) callbacks with repeat flow IDs when they actually do not represent new connections at all, and the only way to recognise this occurrence is to check the filtering condition flags. This creates a problem because now conceptually we have one flow ID associated with multiple context objects. The practical reality is that all but the most-recently-associated of the context objects is left dangling.
I should also point out that I’m not entirely sure what external stimuli is required to trigger ALE reauthorization: I observed it happening somewhat intermittently on Windows 7 running in VMWare Fusion doing nothing more than streaming music.
The impact of failing to distinguish ALE reauthorization from an “real” new connection is hard to anticipate definitively because it depends on the implementation of the code using the framework. Perhaps your state tracking will be rendered incorrect because some counters will be wrong. It’s extremely likely that you will be leaking context objects.
Anyway, the consequence of this annoying behaviour of ALE reauthorization is that we should tweak our rule of thumb for how to handle new connections: when we get a callback for FWPM_LAYER_ALE_AUTH_CONNECT_V4 (or V6) with some flow ID AND if the FWP_CONDITION_FLAG_IS_REAUTHORIZE flag is not set, then we should create a context object and use FwpsFlowAssociateContext to associate the context object with that flow ID.