The pcre parser in the dfa backend is not correctly converting escaped
hex string like
\0x0d
This is the minimal patch to fix, and we should investigate just using
the C/C++ conversion routines here.
I also I nominated for the 2.7 series.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Seth Arnold <seth.arnold@gmail.com>
The removal of deny information is a one way operation, that can result
in a smaller dfa, but also results in a dfa that should not be used in
future operations because the deny rules from the precomputed dfa would
not get applied.
For now default filtering out of deny information to off, as it takes
extra time and seldom results in further state reduction.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
Previously permission information was thrown away early and permissions
where packed to their CHFA form at the start of DFA construction. Because
of this permissions hashing to setup the initial DFA partitions was
required as x transition conflicts, etc. could not be resolved.
Move the mapping of permissions to CHFA construction, and track the full
permission set through DFA construction. This allows removal of the
perm_hashing hack, which prevented a full minimization from happening
in some DFAs. It also could result in x conflicts not being correctly
detected, and deny rules not being fully applied in some situations.
Eg.
pre full minimization
Created dfa: states 33451
Minimized dfa: final partitions 17033
with full minimization
Created dfa: states 33451
Minimized dfa: final partitions 9550
Dfa minimization no states removed: partitions 9550
The tracking of deny rules through to the completed DFA construction creates
a new class of states. That is states that are marked as being accepting
(carry permission information) but infact are non-accepting as they
only carry deny information. We add a second minimization pass where such
states have their permission information cleared and are thus moved into the
non-accepting partion.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
Delay the packing of audit and quiet permissions until chfa construction,
and track deny and quiet perms during DFA construction, so that we will
be able to do full minimization. Also delay the packing of audit and
Signed-off-by: John Johansen <john.johansen@canonical.com>
Currently hfa::match calls hfa::match_len to do matching. However this
requires walking the input string twice. Instead provide a match routine
for input that is supposed to terminate at a given input character.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-By: Steve Beattie <sbeattie@ubuntu.com>
Add the ability to match strings directly from the hfa instead of needing
to build a cfha.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-By: Steve Beattie <sbeattie@ubuntu.com>
instead of a NodeSet.
We need to store sets of Nodes, to compute the dfa but the C++ set is
not the most efficient way to do this as, it has a has a lot of overhead
just to store a single pointer.
Instead we can use an array of tightly packed pointers + a some header
information. We can do this because once the Set is finalized it will
not change, we just need to be able to reference and compare to it.
We don't use C++ Vectors as they have more overhead than a plain array
and we don't need their additional functionality.
We only replace the use of hashedNodeSets for non-accepting states as
these sets are only used in the dfa construction, and dominate the memory
usage. The accepting states still may need to be modified during
minimization and there are only a small number of entries (20-30), so
it does not make sense to convert them.
Also introduce a NodeVec cache that serves the same purpose as the NodeSet
cache that was introduced earlier.
This is not abstracted this out as nicely as might be desired but avoiding
the use of a custom iterator and directly iterating on the Node array
allows for a small performance gain, on larger sets.
This patch reduces the amount of heap memory used by dfa creation by about
4x - overhead. So for small dfas the savings is only 2-3x but on larger
dfas the savings become more and more pronounced.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
non-accepting, and have the proto-state use them.
To reduce memory overhead each set gains its own "cache" that make sure
there is only a single instance of each NodeSet generated. And since
we have a cache abstraction, move relavent stats into it.
Also refactor code slightly to make caches and work_queue etc, DFA member
variables instead of passing them as parameters.
The split + caching results in a small reduction in memory use as the
cost of ProtoState + Caching is less than the redundancy that is eliminated.
However this results in a small decrease in performance.
Sorry I know this really should have been split into multiple patches
but the patch evolved and I got lazy and decided to just not bother
splitting it.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
It is the functional equivalent of ProtoState. We do this to provide a
new level of abstraction that ProtoState can leverage, when the node types
are split.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
Create a new ProtoState class that will encapsulate the split, but for
this patch it will just contain what was done previously with NodeSet
Signed-off-by: John Johansen <john.johansen@canonical.com>
is done to be clear what TransitionTable is, as we will then add matching
capabilities. Renaming the files is just to make them consistent with
the class in the file.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
Allow dumping out which states where dropped during partition minimization
and which state became the partitions representative state.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-By: Steve Beattie <sbeattie@ubuntu.com>
The dfa graph dump was broken by previous dfa cleanups so that the graph
transition target is the output of a pointer instead of the dfa state
number.
Signed-off-by: John Johansen <john.johansen@canonical.com>
32bit arch, due to size_t objects being passed to fprintf with format
strings expecting longs. It does this by adjusting the fprintf rules
to expect size_t objects.
the default compilation rules when compiling C++ files, so that things
like CFLAGS et al will be honored. Without this, doing 'make DEBUG=y'
in the parser/ tree will not have its added -pg flag honored, breaking
profiling of the parser.
Split hfa into hfa and compressed_hfa files. The hfa portion focuses on
creating an manipulating hfas, while compressed_hfa is used for creating
compressed hfas that can be used/reused at run time with much less memory
usage than the full blown hfa.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-By: Steve Beattie <sbeattie@ubuntu.com>
Split out the aare_rule bits that encapsulate the convertion of apparmor
rules into the final compressed dfa.
This patch will not compile because of the it needs hfa to export an interface
but hfa is going to be split so just delay until hfa and transtable are
split and they can each export their own interface.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-By: Steve Beattie <sbeattie@ubuntu.com>
Start of splitting regexp.y into logical components instead of the mess
it is today. Split out the expr-tree and parsing components from regexp.y
int expr-tree.x and parse.y and since regexp.y no longer does parsing
rename it to hfa.cc
Some code cleanups snuck their way into this patch and since I am to
lazy to redo it, I have left them in.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-By: Steve Beattie <sbeattie@ubuntu.com>
Output a better failure message when a conflict of x permissions cause
policy compilation to fail. We don't have enough information available
to output which rules during the dfa compilation so just improve the
message to let people know that it means there are conflicting x modifiers
in the rules.
Signed-off-by: John Johansen <john.johansen@canonical.com>
The is_merged_x_consistend macro was incorrect in that is tested for
USER_EXEC_TYPE to determine if there was an x transition. This fails
for unconfined execs so an unconfined exec would not correctly conflict
with another exec type.
The dfa match flag table for xtransitions was not large enough and not
indexed properly for pux, and cux transitions. The index calculation did
not take into account the pux flag so that pux and px aliased to the same
location and cux and cx aliased to the same location.
This would result in the first rule being processed defining what the
transition type was for all following rules of the type following. So
if a px transition was processed first all pux, transitions in the profile
would be treated pux.
Signed-off-by: John Johansen <john.johansen@canonical.com>
The dfa engine uses the defines from immunix.h for permission conflict
checking, so make the build depend on it.
Signed-off-by: John Johansen <john.johansen@canonical.com>
During some of the dfa cleanups, the checks for conflicting xtransition
was removed. This adds the conflict checking back in and makes it part
of dfa creation.
Signed-off-by: John Johansen <john.johansen@canonical.com>
The other changes have made it so that using a macro really isn't justified
so rework the code to get rid of the hiddeous update_for_nodes macro.
Signed-off-by: John Johansen <john.johansen@canonical.com>
With the addition of the nodes field to the state we can make the work
queue, be based off of the state instead of the node, and avoid doing
the node to map lookup to get back to the state.
This means that the NodeMap is now only used for duplicate elimination.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Factoring the updating of the state transitions doesn't save on any code
but it provides a nice logical seperation and makes the dfa work_queue
loop and the updating of the state transitions easier to understand as
units.
Signed-off-by: John Johansen <john.johansen@canonical.com>
The match_count variable is a sum of the number of duplicates node sets
that have been encountered and discarded. Rename it to better reflect what
it is doing.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Embedding the nodes are part of the state gives fast back reference from
the state to the nodes that created it. This is useful for the state to
nodes mapping dump as it lets us output the states in order. It will also
let us avoid certain nodemap lookup in the future.
Overlay the nodes field (used only in dfa construction) with the partition
field which is only used during dfa minimization to avoid making the state
any larger.
Signed-off-by: John Johansen <john.johansen@canonical.com>
commits were made (as well as a few other minor warnings elsewhere).
The Makefile change is to avoid passing -Wstrict-prototypes and
-Wnested-externs to the C++ compiler, which the compiler yells about and
then ignores.
Since we compile with -Wmissing-field-initializers I dropped the
unreferenced zero-width fields in the header structs, and then explicitly
initialized the remaining fields.
I tagged several unused function parameters to silence those warnings.
And finally, I dropped the unused filter_escapes() too.
Embedding the the partition mapping into the State structure significantly
speeds up dfa minimization, by converting rbtree finds to straight direct
references when checking for same mappings.
The overall time improvement is small but it can half the time spent in
minimization.