You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should document the pflang compiler. In the meantime here's some raw material, that was part of a status mail.
the architecture goes like this:
parse -> lower -> optimize -> codegen
Some notes follow on these phases and their status.
Parsing
The parser is complete, to the best of my knowledge, and validating.
The one deviation is that keywords are always required; the
documentation says:
If an identifier is given without a keyword, the most recent keyword
is assumed. For example,
not host vs and ace
is short for
not host vs and host ace
which should not be confused with
not ( host vs or ace )
I can fix this if it's important. As it is, such a construction fails
to parse.
The documentation also says:
Note that explicit "and" tokens, not juxtaposition, are now required
for concatenation.
However this is not the case in practice; "tcp port 80" is a common
filter, but it actually parses as "tcp and port 80". I don't see them
ever being able to change this.
The result of parsing is a Lisp-like AST of the following pseudo-BNF
form, where {} indicates a Lua table in the table constructor syntax:
Unary primitives have restrictions on the kinds of args they take and
validate them at parse-time. AddressOp is the way that the
left-hand-side (LHS) of the expression "ip[0:2] = 0" parses, as:
{ '=', { '[ip]', 0, 2 }, 0 }
Lowering
The lowering phase is a macro-expansion in which all primitives are
expanded out to packet access and arithmetic comparisons, and all packet
access is transformed to byte access. "and" and "or" are expanded out
to the new form:
Three new logical expressions are available: { 'true' } and { 'false' },
with their usual meanings, and { 'fail' }, which immediately rejects the
packet. 'fail' is necessary because the pcap/bpf engine has the
interesting characteristic that packet access like "ip[0]" actually
immediately aborts the filter if the packet is not an ip packet, or if 0
does not address a byte/short/long within the packet.
Here is an example lowering pass, for the "port" operator:
Here '[ip_]' and '[ip6_]' are special addressing operators that indicate
the payload of ip/ipv6 packets, regardless of the protocol. The
has_foo_protocol functions are helper functions that return LogicalExpr
values.
The result of a primitive lowering is re-expanded. In particular the
packet accessors are expanded out to a new form:
{ '[]', ArithExpr, 1 | 2 | 4 }
The '[]' form indicates host-order byte access within the packet. For
short/long access, the result is wrapped in the new unary arithmetic
expressions:
{ 'htons', ArithExpr }
{ 'htonl', ArithExpr }
A later optimization phase tries to remove these (and mostly succeeds).
Because '[]' is byte access from the beginning of the packet, the
lowering phase depends on the link-level encapsulation; i.e. ethernet,
ip, etc. Currently only ethernet encapsulation is implemented, but
extending that shouldn't be too bad.
The lowering phase is currently unfinished. Only a small subset of
primitives have lowering operators. However finishing this is not as
difficult as it might seem, because we have the libpcap BPF compiler,
and the BPF-to-Lua compiler. To implement a new lowering operator, you
just have to look at the generated Lua code for the BPF filter in
question, and then do something similar in the lowering code.
Optimizing
The optimizer is great!!! At least, I think so ;) It does:
if (if A B C) (if A D E) (if A F G) -> if A (if B D F) (if C E G)
These happen if the copied expression is "simple" (i.e. just true,
false, or fail):
if (if A B C) (if A D E) F -> if A (if B D F) (if C E F)
if (if A B C) D (if A E F) -> if A (if B D E) (if C D F)
Global conditional folding:
if A (if A B C) D -> if A B D
if ( ... (if A B fail) ... ) ( ... (if A C D) ... ) ...
-> if ( ... (if A B fail) ... ) ( ... C ... ) ...
Global range propagation on packet access:
if (= ([] A 1) 13) (if (< ([] A 1) 14) B C) ...
-> if (= ([] A 1) 13) C ...
htonl / htons distribution (where possible and desirable):
(= (htons ([] A 2)) 13)
-> (= (htons (htons ([] A 2))) (htons 13))
-> (= ([] A 2) (htons 13))
Length assertion hoisting: for each subexpression, hoist constant
length checks that must be true as far up the tree as possible, and
elide duplicate length checks.
Length assertion and htonl distribution are interesting cases because
the BPF/pcap semantic model does not favor reasoning on the packet
length or byte order. Every e.g. ip[0] expression requires a runtime
check on the packet size, to provide safety in the kernel. However
since we control the compiler from pcap, we can guarantee that the []
accesses that we residualize are in range, by explicitly representing
length checks.
Likewise access to shorts and longs are specified in the pcap language
as having an implicit htons/htonl, but by representing them explicitly
in our IL we can optimize them out in most cases.
Code generation
The code generator is quite simple, since the lowerer and optimizer
already expanded out all of the domain knowledge. However it does do
one key optimization: common subexpression elimination. We can't do CSE
at optimization time because our IL doesn't have names. However we can
do it just fine at codegen time.
The result is, for the expression "tcp port 5555":
return function(P,length)
if not (24 <= length) then return false end
local v1 = ffi.cast("uint16_t_", P+12)[0]
if not (v1 == 8) then goto L2 end
do
local v2 = P[23]
if not (v2 == 6) then return false end
do
local v3 = ffi.cast("uint16_t_", P+20)[0]
local v4 = bit.band(v3,65311)
if not (v4 == 0) then return false end
local v5 = P[14]
local v6 = bit.band(v5,15)
local v7 = bit.lshift(v6,2)
local v8 = v7+16
if not (v8 <= length) then return false end
local v9 = v7+14
local v10 = ffi.cast("uint16_t_", P+v9)[0]
if v10 == 45845 then return true end
do
local v11 = v7+18
if not (v11 <= length) then return false end
local v12 = ffi.cast("uint16_t_", P+v8)[0]
do return v12 == 45845 end
end
end
end
::L2::
do
if not (56 <= length) then return false end
if not (v1 == 56710) then return false end
do
local v13 = P[20]
if v13 == 6 then goto L5 end
do
if not (v13 == 44) then return false end
do
local v14 = P[54]
if not (v14 == 6) then return false end
end
end
end
::L5::
do
if not (v1 == 56710) then return false end
local v15 = P[20]
if v15 == 6 then goto L9 end
do
if not (v15 == 44) then return false end
do
local v16 = P[54]
if v16 == 6 then goto L9 end
do
if not (v16 == 6) then return false end
end
end
end
::L9::
do
local v17 = ffi.cast("uint16_t_", P+54)[0]
if v17 == 45845 then return true end
do
if not (58 <= length) then return false end
local v18 = ffi.cast("uint16_t_", P+56)[0]
do return v18 == 45845 end
end
end
end
end
end
Note that this is somewhat different from what pcap does; for pcap,
"port 5555" implies ipv4 only, but I see no reason to make this
restriction. Anyway it doesn't impose an overhead on the trace, as the
ipv4/v6 difference is just one branch that was going to be taken anyway,
and the IPv4 code is entirely above L2.
It would be possible to infer a bit more about the length with some
additional optimization passes, but it's not the bottleneck at the
moment. Anyway this code is almost optimal, I think.
Summary
I'll send a following mail about perf and discussing the generated
traces. However I wanted to close with a "future work" list of things
that we might want to work on, feature-wise:
The pcap filter language (I guess we should call it pflang, to give
it a name) errors that you get when compiling a bad expression are
somewhat cryptic and could benefit from source positions. A bit
more time there would make a better user experience.
Currently, symbolic constant strings like icmp-echoreply, etc in
arithmetic expressions are not validated at parse-time. The idea
was to leave them to be expanded out at lower-time, but perhaps
that's not a good idea. In any case it's not yet implemented.
The parser doesn't allow keywords to be elided. Perhaps it should
implement this.
Encapsulations other than EN10MB are not implemented by the lowerer.
We need to implement lowering operators for all primitives.
There is currently an issue in which the results of arithmetic
operations can be negative (due bitops producing signed int32
values), and that would allow pflang filters to access bytes before
0 unsafely. The pflang compiler is currently unsafe in this way.
Probably we need to add range-constricting operators in the lowering
phase around all arithmetic operations, then optimize them out in
the expander.
We currently emit Lua code but it's not unthinkable to emit assembly
instead using DynAsm. The hardest problem there is register
allocation. I'll mention that in my next mail about perf.
OK, so that's the major status.
The text was updated successfully, but these errors were encountered:
We should document the pflang compiler. In the meantime here's some raw material, that was part of a status mail.
the architecture goes like this:
parse -> lower -> optimize -> codegen
Some notes follow on these phases and their status.
Parsing
The parser is complete, to the best of my knowledge, and validating.
The one deviation is that keywords are always required; the
documentation says:
If an identifier is given without a keyword, the most recent keyword
is assumed. For example,
is short for
which should not be confused with
I can fix this if it's important. As it is, such a construction fails
to parse.
The documentation also says:
Note that explicit "and" tokens, not juxtaposition, are now required
for concatenation.
However this is not the case in practice; "tcp port 80" is a common
filter, but it actually parses as "tcp and port 80". I don't see them
ever being able to change this.
The result of parsing is a Lisp-like AST of the following pseudo-BNF
form, where {} indicates a Lua table in the table constructor syntax:
LogicalExpr :=
{ 'and', LogicalExpr, LogicalExpr }
| { 'or', LogicalExpr, LogicalExpr }
| { 'not', LogicalExpr }
| { NullaryPrimitiveOp }
| { UnaryPrimitiveOp, UnaryPrimitiveArg }
| { RelOp, ArithExpr, ArithExpr }
NullaryPrimitiveOp := 'tcp' | 'udp' | ...
UnaryPrimitiveOp := 'dst_host' | 'dst_net' | 'host' | ...
UnaryPrimitiveArg :=
non-negative integer
| string
| { 'ipv4', byte * 4 }
| { 'ipv6', short * 8 }
| { 'ehost', byte * 6 }
| { 'decnet', byte * 2 }
RelOp := '<' | '<=' | '=' | '!=' | '>' | '>='
ArithExpr :=
non-negative integer
| 'len'
| { AddressOp, ArithExpr, 1 | 2 | 4 }
| { ArithBinaryOp, ArithExpr, ArithExpr }
AddressOp := '[ip]' | '[ether]' | ...
ArithBinaryOp := '+' | '-' | ...
Unary primitives have restrictions on the kinds of args they take and
validate them at parse-time. AddressOp is the way that the
left-hand-side (LHS) of the expression "ip[0:2] = 0" parses, as:
{ '=', { '[ip]', 0, 2 }, 0 }
Lowering
The lowering phase is a macro-expansion in which all primitives are
expanded out to packet access and arithmetic comparisons, and all packet
access is transformed to byte access. "and" and "or" are expanded out
to the new form:
IfExpr := { 'if', LogicalExpr, LogicalExpr, LogicalExpr }
Three new logical expressions are available: { 'true' } and { 'false' },
with their usual meanings, and { 'fail' }, which immediately rejects the
packet. 'fail' is necessary because the pcap/bpf engine has the
interesting characteristic that packet access like "ip[0]" actually
immediately aborts the filter if the packet is not an ip packet, or if 0
does not address a byte/short/long within the packet.
Here is an example lowering pass, for the "port" operator:
function(expr)
local port = expr[2]
return { 'if', { 'ip' },
{ 'and',
{ 'or', has_ipv4_protocol(6),
{ 'or', has_ipv4_protocol(17), has_ipv4_protocol(132) } },
{ 'or',
{ '=', { '[ip_]', 0, 2 }, port },
{ '=', { '[ip_]', 2, 2 }, port } } },
{ 'and',
{ 'or', has_ipv6_protocol(6),
{ 'or', has_ipv6_protocol(17), has_ipv6_protocol(132) } },
{ 'or',
{ '=', { '[ip6_]', 0, 2 }, port },
{ '=', { '[ip6_]', 2, 2 }, port } } } }
end
Here '[ip_]' and '[ip6_]' are special addressing operators that indicate
the payload of ip/ipv6 packets, regardless of the protocol. The
has_foo_protocol functions are helper functions that return LogicalExpr
values.
The result of a primitive lowering is re-expanded. In particular the
packet accessors are expanded out to a new form:
{ '[]', ArithExpr, 1 | 2 | 4 }
The '[]' form indicates host-order byte access within the packet. For
short/long access, the result is wrapped in the new unary arithmetic
expressions:
{ 'htons', ArithExpr }
{ 'htonl', ArithExpr }
A later optimization phase tries to remove these (and mostly succeeds).
Because '[]' is byte access from the beginning of the packet, the
lowering phase depends on the link-level encapsulation; i.e. ethernet,
ip, etc. Currently only ethernet encapsulation is implemented, but
extending that shouldn't be too bad.
The lowering phase is currently unfinished. Only a small subset of
primitives have lowering operators. However finishing this is not as
difficult as it might seem, because we have the libpcap BPF compiler,
and the BPF-to-Lua compiler. To implement a new lowering operator, you
just have to look at the generated Lua code for the BPF filter in
question, and then do something similar in the lowering code.
Optimizing
The optimizer is great!!! At least, I think so ;) It does:
Constant folding
Reassociation: (+ (+ EXPR CONST) CONST) -> (+ EXPR + (+ CONST CONST))
Local conditional simplification:
if (if A B C) (if A D E) (if A F G) -> if A (if B D F) (if C E G)
These happen if the copied expression is "simple" (i.e. just true,
false, or fail):
if (if A B C) (if A D E) F -> if A (if B D F) (if C E F)
if (if A B C) D (if A E F) -> if A (if B D E) (if C D F)
Global conditional folding:
if A (if A B C) D -> if A B D
if ( ... (if A B fail) ... ) ( ... (if A C D) ... ) ...
-> if ( ... (if A B fail) ... ) ( ... C ... ) ...
Global range propagation on packet access:
if (= ([] A 1) 13) (if (< ([] A 1) 14) B C) ...
-> if (= ([] A 1) 13) C ...
htonl / htons distribution (where possible and desirable):
(= (htons ([] A 2)) 13)
-> (= (htons (htons ([] A 2))) (htons 13))
-> (= ([] A 2) (htons 13))
Length assertion hoisting: for each subexpression, hoist constant
length checks that must be true as far up the tree as possible, and
elide duplicate length checks.
Length assertion and htonl distribution are interesting cases because
the BPF/pcap semantic model does not favor reasoning on the packet
length or byte order. Every e.g. ip[0] expression requires a runtime
check on the packet size, to provide safety in the kernel. However
since we control the compiler from pcap, we can guarantee that the []
accesses that we residualize are in range, by explicitly representing
length checks.
Likewise access to shorts and longs are specified in the pcap language
as having an implicit htons/htonl, but by representing them explicitly
in our IL we can optimize them out in most cases.
Code generation
The code generator is quite simple, since the lowerer and optimizer
already expanded out all of the domain knowledge. However it does do
one key optimization: common subexpression elimination. We can't do CSE
at optimization time because our IL doesn't have names. However we can
do it just fine at codegen time.
The result is, for the expression "tcp port 5555":
return function(P,length)
if not (24 <= length) then return false end
local v1 = ffi.cast("uint16_t_", P+12)[0]
if not (v1 == 8) then goto L2 end
do
local v2 = P[23]
if not (v2 == 6) then return false end
do
local v3 = ffi.cast("uint16_t_", P+20)[0]
local v4 = bit.band(v3,65311)
if not (v4 == 0) then return false end
local v5 = P[14]
local v6 = bit.band(v5,15)
local v7 = bit.lshift(v6,2)
local v8 = v7+16
if not (v8 <= length) then return false end
local v9 = v7+14
local v10 = ffi.cast("uint16_t_", P+v9)[0]
if v10 == 45845 then return true end
do
local v11 = v7+18
if not (v11 <= length) then return false end
local v12 = ffi.cast("uint16_t_", P+v8)[0]
do return v12 == 45845 end
end
end
end
::L2::
do
if not (56 <= length) then return false end
if not (v1 == 56710) then return false end
do
local v13 = P[20]
if v13 == 6 then goto L5 end
do
if not (v13 == 44) then return false end
do
local v14 = P[54]
if not (v14 == 6) then return false end
end
end
end
::L5::
do
if not (v1 == 56710) then return false end
local v15 = P[20]
if v15 == 6 then goto L9 end
do
if not (v15 == 44) then return false end
do
local v16 = P[54]
if v16 == 6 then goto L9 end
do
if not (v16 == 6) then return false end
end
end
end
::L9::
do
local v17 = ffi.cast("uint16_t_", P+54)[0]
if v17 == 45845 then return true end
do
if not (58 <= length) then return false end
local v18 = ffi.cast("uint16_t_", P+56)[0]
do return v18 == 45845 end
end
end
end
end
end
Note that this is somewhat different from what pcap does; for pcap,
"port 5555" implies ipv4 only, but I see no reason to make this
restriction. Anyway it doesn't impose an overhead on the trace, as the
ipv4/v6 difference is just one branch that was going to be taken anyway,
and the IPv4 code is entirely above L2.
It would be possible to infer a bit more about the length with some
additional optimization passes, but it's not the bottleneck at the
moment. Anyway this code is almost optimal, I think.
Summary
I'll send a following mail about perf and discussing the generated
traces. However I wanted to close with a "future work" list of things
that we might want to work on, feature-wise:
it a name) errors that you get when compiling a bad expression are
somewhat cryptic and could benefit from source positions. A bit
more time there would make a better user experience.
arithmetic expressions are not validated at parse-time. The idea
was to leave them to be expanded out at lower-time, but perhaps
that's not a good idea. In any case it's not yet implemented.
implement this.
operations can be negative (due bitops producing signed int32
values), and that would allow pflang filters to access bytes before
0 unsafely. The pflang compiler is currently unsafe in this way.
Probably we need to add range-constricting operators in the lowering
phase around all arithmetic operations, then optimize them out in
the expander.
instead using DynAsm. The hardest problem there is register
allocation. I'll mention that in my next mail about perf.
OK, so that's the major status.
The text was updated successfully, but these errors were encountered: