The match_count variable is a sum of the number of duplicates node sets
that have been encountered and discarded. Rename it to better reflect what
it is doing.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Embedding the nodes are part of the state gives fast back reference from
the state to the nodes that created it. This is useful for the state to
nodes mapping dump as it lets us output the states in order. It will also
let us avoid certain nodemap lookup in the future.
Overlay the nodes field (used only in dfa construction) with the partition
field which is only used during dfa minimization to avoid making the state
any larger.
Signed-off-by: John Johansen <john.johansen@canonical.com>
commits were made (as well as a few other minor warnings elsewhere).
The Makefile change is to avoid passing -Wstrict-prototypes and
-Wnested-externs to the C++ compiler, which the compiler yells about and
then ignores.
Since we compile with -Wmissing-field-initializers I dropped the
unreferenced zero-width fields in the header structs, and then explicitly
initialized the remaining fields.
I tagged several unused function parameters to silence those warnings.
And finally, I dropped the unused filter_escapes() too.
Embedding the the partition mapping into the State structure significantly
speeds up dfa minimization, by converting rbtree finds to straight direct
references when checking for same mappings.
The overall time improvement is small but it can half the time spent in
minimization.
The nodemap.size() increases by one with each node added, every time we
add a state we label it so this provides the proper labeling without needing
a separate variable.
add short options to turn on all stats, and all progress indicators,
also allow adding "no-" prefix to dump options to allow subtracting
individual options when short options are used.
eg.
-D stats -D no-expr-simplify
Move the -O and -D options into tables, that keep the option and its
description. This will help keep the options consistent and the description
up to date, as all information is now in one place.
Previously the options, and descriptions kept getting out of sync as all
relavent parts were spread out.
help reduce peak memory usage in some cases.
Also disbale remove_unreachable, as the current dfa code isn't generating
unreachable states, and minimization removes any states that are connected
but redundant.
hold permission information. We currently keep them in a table with a
refcount so that they don't go away, until we delete the table.
We can simulate this by getting rid of the refcount, and making dup and release
virtual, and overriding it for the special accept nodes.
improves minimization performance, it can slow down total creation time and
result in larger compressed dfas.
This is because it results in the dfa not being completely minimized which
with the current O(n2) dfa table compression algorithm can result in slower
compressed dfa generation.
first hash does hashing on state just state transitions, which always results
in a performance improvement.
The second does hashing based off of accept permissions, which can create
more initial states but can result in not being able to achieve a true
minimum dfa. This can also lead to slowing down total dfa creation because
while minimization, compression can take longer if the dfa isn't completely
minimized.
permission hashing is currently required, as minimization does not accumulate
redundant Node permissions.
memory than just using the NodeSet size to short circuit comparison but it
improves on the case where compared sets have the same size. It is possible
that this will slow down small dfa generation slightly but the trade off for
large dfa's (which are the slow ones to generate) is worth it.
This results in another performance bump over using the NodeSize is NodeSet
comparison, and the amount of improvement increases with larger dfas
of pointers when it isn't necessary. This results in a nice little
performance increase in dfa creation.
This is more of a proof of concept patch, and is replaced by the next
patch which does better short circuiting via hashing
the large side, and I experimented with different ways to split this up but in
the end, anything I could do would result in a series of dependent patches
that would require all of them to be applied to get meaningful functional
changes.
The patch structural reworks the dfa so that
- there is a new State class, it takes the place of sets of nodes in the
dfa, and allows storing state information within the state
- removes the dfa transition table, which mapped sets of nodes to a
transition table, by moving the transition into the new state class
- computes dfa state permissions once (stored in the state)
- expression tree nodes are independent from a created dfa. This allows
computed expression trees, and sets of Nodes (used as protostates when
computing the dfa). To be managed independent of the dfa life time.
This will allow reducing the amount of memory used, in the future,
and will also allow separating the expression tree logic out into
its own file.
The patch has some effect on reducing peak memory usage, and computation
time. The actual amount of reduction is dependent on the number of states
in the dfa with larger saving being achieved on larger dfas. Eg. for
the test evince profile I was using it makes the parser about 7% faster with a
peak memory usage about 12% less.
This patch changes the initial partition hashing of minimization resulting
in slightly smaller dfas.
of a mix of symlinks to non-prefixed comands, and "apparmor_" prefixed
commands.
This also refactors the manpage generation slightly since we no longer
need special cases for the manpages, and drops aa-eventd from the default
list of tools to install (it also lacks a manpage).