Commit graph

70 commits

Author SHA1 Message Date
John Johansen
51f443c7b6 Update state progress/stats output to dump the number of accepting
states/partitions occur in the minimized dfa.
2010-11-09 11:26:50 -08:00
John Johansen
c2601dbd30 Cleanup the perm_map as soon as it is no longer needed. Cleaning up the map
before the end of the functions reduces the peak memory of the function
2010-11-09 11:26:18 -08:00
John Johansen
2fb64fa85e When hashing Nodes ensure that cases.otherwise == NULL is treated the same
as pointing to the nonmatching state.  Having this mix shouldn't currently
exist but adding the extra check makes the code more robust.
2010-11-09 11:25:44 -08:00
John Johansen
4e80416a4f Do permission accumulation in dfa minimization. This is necessary if accept
states with different permissions are to ever share a partition.
2010-11-09 11:24:51 -08:00
John Johansen
a949b075b4 The dfa flags currently are a weird mix of position and negative assertions.
Its cleaner just to have them all assert one way and let the cmd line
options apply them correctly.
2010-11-09 11:23:45 -08:00
John Johansen
36e99af7fb Split dfa minimizing hashing into two seperately controllable hashes. The
first hash does hashing on state just state transitions, which always results
in a performance improvement.

The second does hashing based off of accept permissions, which can create
more initial states but can result in not being able to achieve a true
minimum dfa.  This can also lead to slowing down total dfa creation because
while minimization, compression can take longer if the dfa isn't completely
minimized.

permission hashing is currently required, as minimization does not accumulate
redundant Node permissions.
2010-11-09 11:22:54 -08:00
John Johansen
9b99039fdb Convert Nodemap comparision to use a hash value. This uses a little more
memory than just using the NodeSet size to short circuit comparison but it
improves on the case where compared sets have the same size.  It is possible
that this will slow down small dfa generation slightly but the trade off for
large dfa's (which are the slow ones to generate) is worth it.

This results in another performance bump over using the NodeSize is NodeSet
comparison, and the amount of improvement increases with larger dfas
2010-11-09 11:20:08 -08:00
John Johansen
344e11a539 Use set size as part of set comparison, short circuiting comparing sets
of pointers when it isn't necessary.  This results in a nice little
performance increase in dfa creation.

This is more of a proof of concept patch, and is replaced by the next
patch which does better short circuiting via hashing
2010-11-09 11:18:46 -08:00
John Johansen
ca1d891799 This patch reworks the internal structures used to compute the dfa. It is on
the large side, and I experimented with different ways to split this up but in
the end, anything I could do would result in a series of dependent patches
that would require all of them to be applied to get meaningful functional
changes.

The patch structural reworks the dfa so that
- there is a new State class, it takes the place of sets of nodes in the
  dfa, and allows storing state information within the state
- removes the dfa transition table, which mapped sets of nodes to a
  transition table, by moving the transition into the new state class
- computes dfa state permissions once (stored in the state)
- expression tree nodes are independent from a created dfa.  This allows
  computed expression trees, and sets of Nodes (used as protostates when
  computing the dfa).  To be managed independent of the dfa life time.
  This will allow reducing the amount of memory used, in the future,
  and will also allow separating the expression tree logic out into
  its own file.


The patch has some effect on reducing peak memory usage, and computation
time.  The actual amount of reduction is dependent on the number of states
in the dfa with larger saving being achieved on larger dfas.  Eg. for
the test evince profile I was using it makes the parser about 7% faster with a
peak memory usage about 12% less.

This patch changes the initial partition hashing of minimization resulting
in slightly smaller dfas.
2010-11-09 11:14:55 -08:00
John Johansen
291066dcbd On certain graphs the dfa graph dump output can become messed up as it isn't properly handling non-printing characters in the case of single character
output.  Drop the cast to signed character which messes up the output.
2010-08-17 08:02:27 -07:00
John Johansen
6259edac38 Update and expand comments on regex tree normalization 2010-08-04 10:23:22 -07:00
John Johansen
f0220611aa Epsnodes carry no information beyond the node type. Convert to using
a single static node, which will reduce allocations and peak memory
use slightly.
2010-08-04 09:53:46 -07:00
John Johansen
4be07c3265 This adds a basic debug dump for the conversion of each rule in a profile to its expression
tree.  It is limited in that it doesn't currently handle the permissions of a rule.

conversion output presents an aare -> prce conversion followed by 1 or more expression
tree rules, governed by what the rule does.
eg.
  aare: /**   ->   /[^/\x00][^\x00]*
  rule: /[^/\x00][^\x00]*  ->  /[^\0000/]([^\0000])*

eg.
echo "/foo { /** rwlkmix, } " | ./apparmor_parser -QT -D rule-exprs -D expr-tree

aare: /foo   ->   /foo
aare: /**   ->   /[^/\x00][^\x00]*
rule: /[^/\x00][^\x00]*  ->  /[^\0000/]([^\0000])*

rule: /[^/\x00][^\x00]*\x00/[^/].*  ->  /[^\0000/]([^\0000])*\0000/[^/](.)*


DFA: Expression Tree
(/[^\0000/]([^\0000])*(((((((((((((<513>|<2>)|<4>)|<8>)|<16>)|<32>)|<64>)|<8404992>)|<32768>)|<65536>)|<131072>)|<262144>)|<524288>)|<1048576>)|/[^\0000/]([^\0000])*\0000/[^/](.)*((<16>|<32>)|<262144>))


This simple example shows many things
1. The profile name under goes pcre conversion.  But since no regular expressions where found
   it doesn't generate any expr rules
2. /** is converted into the pcre expression /[^\0000/]([^\0000])*
3. The pcre expression /[^\0000/]([^\0000])* is converted into two rules that are then
   converted into expression trees.

   The reason for this can not be seen by the output as this is actually triggered by
   permissions separation for the rule.  In this case the link permission is separated
   into what is shown as the second rule: statement.
4. DFA: Expression Tree dump shows how these rules are combined together

You will notice that the rule conversion statement is fairly redundant currently as it just
show pcre to expression tree pcre.  This will change when direct aare parsing occurs,
but currently serves to verify the pcre conversion step.


It is not the prettiest patch, as its touching some ugly code that is schedule to be cleaned
up/replaced. eg. convert_aaregex_to_pcre is going to replaced with native parse conversion
from an aare straight to the expression tree, and dfaflag passing will become part of the
rule set.
2010-07-23 13:29:35 +02:00
John Johansen
837f47c921 This is the user space fix for launchpad.net/busgs/599450
It changes the table resizing so that there is always sufficient
high entries in the table, preventing bounds violations from
occurring.

Previously the resize allocation was always based on the character
set range for a state, which could be more or less than actually
required, and packing would waste some space when over allocation
was done.

As a result this patch in general results in slightly smaller
transition tables even though it enforcing the minimum required
padding to avoid bounds violations.
2010-07-23 04:30:31 +02:00
John Johansen
bfb96638f6 This is a preparatory patch for the fix to launchpad.net/bugs/599450.
It combines the two separate table resize code segments into a single
functionally equivalent segment.  It does not fix the bug.
2010-07-23 04:29:54 +02:00
John Johansen
6453a41a28 Add extra transition table labeling to help with interpretation of the
dump output.
2010-07-23 04:29:29 +02:00
John Johansen
af3476afb9 The templatization of deref_less_than is unnecessary and complicates the code
replace it with its none templatized version.
2010-07-10 17:53:04 -07:00
John Johansen
4f8e01ff36 expression tree node labeling is used during debugging dumps. Currently the node labels
are computed and stored in a map, that is not cleaned up.  This means that the labeling
is retained across different dfas.

Move the labeling into expr node as this takes less memory than using a map and will
also separates node labeling so its per dfa instead of global.  In addition this means
the labeling is cleanedup/freed when the expr tree is freed without any extra work.
2010-07-10 17:52:13 -07:00
John Johansen
d0dcab10f1 Make the transition table dump easier to understand by labeling each entry with its
index.
2010-07-10 17:49:32 -07:00
John Johansen
1004f039ec When creating the dfa the sets firstpos, lastpos, and followpos are computed for
each expression tree node and then used as input to create the dfa states.

Currently they are not being freed until the nodes are destroyed, but the information
is no longer needed once the dfa has been created.  Cleaning them up early reduces
peak memory usage.
2010-07-10 17:47:25 -07:00
John Johansen
9efd526f6f Fix memory leak during dfa minimization.
Dfa minimization wasn't deleting the states it eliminated during the
minimization process, and hence leaking memory.
2010-03-13 02:23:23 -08:00
John Johansen
8dd795dec1 Rework the partitioning to take advantage of Partitions now being a list 2010-01-31 23:21:00 -08:00
John Johansen
8bcfa1a32f Move partitions from using sets to lists as this is a better match
for what is being done.
2010-01-31 23:19:54 -08:00
John Johansen
e984b6ff74 Seperate Partition definition for States. This is a small step to cleaning
up the code
2010-01-31 23:18:14 -08:00
John Johansen
1179c1a42c Improve partitioning performance slightly by inserting new partitions
imediately after the current partition being considered, instead of
at the back of the parition list.  This does two things, it makes it
more likely the data is in cache, and it also in general results in
more partitions being created in a single pass.
2010-01-31 23:12:33 -08:00
John Johansen
80c7ee74a2 Speedup transition table compression. This is a basic improvement and
not an algorithmic improvement.  It does the same basic algorithm of
test until it can insert the data, but instead of only tracking the
first free entry (and recomputing it each pass).  It tracks all
free entries reducing the number of comparisons done and the table
grows in size.

This may actually result in a small loss on small tables, but is a win
for larger tables.
2010-01-27 17:20:13 -08:00
John Johansen
f9906a9584 Update hash calculation
Update the hash calculation to guarentee that states with a different
number of transition entries will be placed in seperate partitions.

This will allow for a better character transition based state comparison.
2010-01-20 05:10:38 -08:00
John Johansen
91dd7527d9 Dfa minimization and unreachable state removal
Add basic Hopcroft based dfa minimization.  It currently does a simple
straight state comparison that can be quadratic in time to split partitions.
This is offset however by using hashing to setup the initial partitions so
that the number of states within a partition are relative few.

The hashing of states for initial partition setup is linear in time.  This
means the closer the initial partition set is to the final set, the closer
the algorithm is to completing in a linear time.  The hashing works as
follows:  For each state we know the number of transitions that are not
the default transition.  For each of of these we hash the set of letters
it can transition on using a simple djb2 hash algorithm.  This creates
a unique hash based on the number of transitions and the input it can
transition on.  If a state does not have the same hash we know it can not
the same as another because it either has a different number of transitions
or or transitions on a different set.

To further distiguish states, the number of transitions of each transitions
target state are added into the hash.  This serves to further distiguish
states as a transition to a state with a different number of transitions
can not possibly be reduced to an equivalent state.

A further distinction of states is made for accepting states in that
we know each state with a unique set of accept permissions must be in
its own partition to ensure the unique accept permissions are in the
final dfa.

The unreachable state removal is a basic walk of the dfa from the start
state marking all states that are reached.  It then sweeps any state not
reached away.  This does not do dead state removal where a non accepting
state gets into a loop that will never result in an accepting state.
2010-01-20 03:32:34 -08:00
John Johansen
d87145ad23 Update trans table reporting to include some statistics 2010-01-08 05:29:25 -08:00
John Johansen
dce395e7ad Add basic controls for dfa optimization 2010-01-08 04:30:56 -08:00
John Johansen
926b0c72e8 Update the output of transtable creation 2010-01-08 03:18:59 -08:00
John Johansen
4f044e753c Add basic dfa stats and debug dumps for
equivelence classes
expr tree (add stats, update parser switch)
dfa
transition table
2010-01-08 02:17:45 -08:00
John Johansen
17a67d7227 Update parser to allow for multiple debug dump options via -D or --dump.
This will allow turning on and off various debug dumps as needed.
Multiple dump options can be specified as needed by using multiple
options.
  eg. apparmor_parser -D variables
      apparmor_parser -D dfa-tree -D dfa-simple-tree


The help option has also been updated to take an optional argument
to display help about give parameters, currently only dump is supported.

  eg.  apparmor_parser -h       # standard help
       apparmor_parser -h=dump  # dump info about --dump options

Also Enable the dfa expression tree dumps
2010-01-07 16:21:02 -08:00
Kees Cook
da6c9246f5 clear remaining $Id$ tags, since bzr does not suppor them 2009-11-11 10:44:26 -08:00
John Johansen
aced280818 Make cache warning respect the quiet flag 2009-08-20 23:48:32 +00:00
John Johansen
397ead10af add aare_reset_matchflags() to reset match flags
Signed-Off-By: Kees Cook <kees.cook@canonical.com>
2009-07-24 07:33:09 +00:00
John Johansen
5148942b90 Fix a missing case in the pcre-expression parsing "\\"
Change the globbing conversion to include [^\x00].  This reduces cases of
artifical overlap between globbing rules, and link rules.  Link rules
are encoded to use a \0 char to seperate the 2 match parts of the rule.

Before this fix a glob * or ** could match against the \0 seperator
resulting the generation of dfa states for that overlap.  This of course
can never happen as \0 is not a valid path name character.

In one example stress policy when adding the rule
  owner /** rwl,
this change made the difference between having a dfa with 55152 states
and one with 30019
2008-12-04 10:44:02 +00:00
John Johansen
037d7b5a57 Clean up the tree simplification code, and make the following improvements
- disable charter, charset merging.  This can actually hamper optimization
  in some cases and needs special cases added to the factoring code.

  The charset code is merged off into its own routines that can be
  reenabled at a later time.

- fix a couple bugs in tree simplifications that would cause earlier
  exit before the tree had even reached a local minima

  I particular the t != c portion of the simplify_tree, would cause
  the loop to exit early if it didn't change but other modifications
  had been made.

- remove the extra epsnode that was getting added to the created tree

- optimize the forward factor alt loop so that it will find all left
  factor matches down the alt subtree without having to loop and recompare
  against nodes that were already checked

These changes result in small improvements in most cases, but in some
policies the changes result in very large wins.  The early bailout of
optimizations was causing 2.5* as many dfa states in one particular
stress test policy.
2008-12-03 03:47:31 +00:00
John Johansen
b017899f12 Fix a bug in tree normalization, where it could get stuck in an infinite loop
when doing Epsnode move, when cating or alting two epsnodes.
2008-11-20 16:19:51 +00:00
John Johansen
0491e8d707 Add char node, and char node set merging. This does not have a substantive
impact on performance but makes tree debugging nicer.
2008-11-20 13:23:13 +00:00
John Johansen
c0533b390b Reintroduce calling back into tree simplification when any modifications have
been made but only from the top level.  This allows us to get the
optimizations that were missed, while not causing the massive recursive call
explosion we had before.
2008-11-20 13:21:23 +00:00
John Johansen
1855fde331 Reduce the use of simplify recursion, repeating the recursion of single
changes is a waste especially as we get to larger subtrees.

Unfortunately this also means that a fair bit of optimization is lost.
2008-11-20 13:18:30 +00:00
John Johansen
91eb71e9fa Improve tree normalization
- reduce the amount it is called, and the amount of recursion it does
- fix a bug that would prevent trees from being fully normalized
2008-11-19 16:54:26 +00:00
John Johansen
5b97455878 Improve dfa generation.
Apply tree factoring and simplification techniques to reduce the number of
states used in computing the dfa.  This can have an exponential impact
on both space and time for dfa generation.
2008-11-07 13:00:05 +00:00
John Johansen
6b6c57887c Reverting previous commit. 2008-11-07 01:31:19 +00:00
John Johansen
1b0dd32cca fix race condition between boot.apparmor and boot.cleanup bnc#426149 2008-11-07 01:19:55 +00:00
John Johansen
b2f4863231 Fix to stop leaking the dfa ruleset. On large policies containing lots of
hats this will result in a marked improvement on memory usage.
2008-06-08 08:56:37 +00:00
John Johansen
db34aac811 Basis for named transitions 2008-04-16 04:44:21 +00:00
John Johansen
4dd0e8ead8 allow for ptrace rules 2008-04-09 09:04:08 +00:00
John Johansen
c460dcc52f update change_hats rules to generate rules for all hats 2008-04-06 18:52:47 +00:00