apparmor

mirror of https://gitlab.com/apparmor/apparmor.git synced 2025-03-04 08:24:42 +01:00

History

John Johansen 2809060bec parser: limit the number of passes expr tree simplification does Expr tree simplification makes multiple passes at simplifying the expression tree trying to use fatoring rules and heuristics to achieve the minimum tree, so that dfa construction has fewer nodes to deal with. Unfortunately expr tree simplification can slow some policy compiles, dependent on the type of expressions generated, down, and even worse is currently subject to never terminating on some expressions as the left and right passes keep undoing each others work. Limiting the number of passes that expr tree simplification does can provide most of its benefits (later passes generally have diminishing returns), reduces the overhead it has on simple policy where it is of little benefit, and insures that simplifications can not get stuck in an infinite loop due to the left and right passes ping-ponging on each others factoring. Note: This also results in a performance improvement in evince compiles, and general policy compiles because it achieves a better balance between time spent on simplifying the tree to remove nodes and time the dfa build requires to build with extra nodes and then eliminate with minimization. $ time apparmor_parser -QT /etc/apparmor.d/usr.bin.evince real 0m2.744s user 0m2.714s sys 0m0.028s vs. $ time apparmor_parser -QT /etc/apparmor.d/usr.bin.evince real 0m2.992s user 0m2.979s sys 0m0.012s and $ time apparmor_parser -QT /etc/apparmor.d/ real 0m3.568s user 0m14.529s sys 0m0.152s vs. $ time apparmor_parser -QT /etc/apparmor.d/ real 0m3.741s user 0m15.400s sys 0m0.179s PR: https://gitlab.com/apparmor/apparmor/merge_requests/246 Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>		2018-11-09 13:01:01 -08:00
..
aare_rules.cc	parser: revert changes from commit rev 3248	2015-10-14 13:49:26 -07:00
aare_rules.h	Change expr tree construction so that rules are grouped by perms	2015-06-25 16:38:02 -06:00
apparmor_re.h	Fix dfa minimization	2014-01-09 17:06:48 -08:00
chfa.cc	parser: fix compilation failure on 32 bit systems	2014-01-10 11:02:59 -08:00
chfa.h	Fixes to that where dropped from the diff-encode patch	2014-01-09 17:24:40 -08:00
expr-tree.cc	parser: limit the number of passes expr tree simplification does	2018-11-09 13:01:01 -08:00
expr-tree.h	parser/libapparmor_re: expand comment of firstpos, lastpos, followpos	2018-11-06 19:08:28 -08:00
flex-tables.h	Add Differential State Compression to the DFA	2014-01-09 16:55:55 -08:00
hfa.cc	Fix compilation of audit modifiers	2015-03-18 10:05:55 -07:00
hfa.h	parser: Refactor add_new_state into two versions	2014-09-03 14:36:08 -07:00
Makefile	parser: Honor USE_SYSTEM make variable in libapparmor_re	2015-03-25 17:09:25 -05:00
parse.h	Split out parsing and expression trees from regexp.y	2011-03-13 05:46:29 -07:00
parse.y	parser: revert changes from commit rev 3248	2015-10-14 13:49:26 -07:00
README	Add DFA table format README.	2007-04-03 13:53:24 +00:00

README

Regular Expression Scanner Generator
====================================

Notes in the scanner File Format
--------------------------------

The file format used is based on the GNU flex table file format
(--tables-file option; see Table File Format in the flex info pages and
the flex sources for documentation). The magic number used in the header
is set to 0x1B5E783D insted of 0xF13C57B1 though, which is meant to
indicate that the file format logically is not the same: the YY_ID_CHK
(check) and YY_ID_DEF (default) tables are used differently.

Flex uses state compression to store only the differences between states
for states that are similar. The amount of compresion influences the parse
speed.

The following two states could be stored as in the tables outlined
below:

States and transitions on specific characters to next states
------------------------------------------------------------
 1: ('a' => 2, 'b' => 3, 'c' => 4)
 2: ('a' => 2, 'b' => 3, 'd' => 5)

Flex-like table format
----------------------
index: (default, base)
    0: (      0,    0)  <== dummy state (nonmatching)
    1: (      0,    0)
    2: (      1,  256)

  index: (next, check)
      0: (   0,     0)  <== unused entry
	 (   0,     1)  <== ord('a') identical entries
  0+'a': (   2,     1)
  0+'b': (   3,     1)
  0+'c': (   4,     1)
	 (   0,     1)  <== (255 - ord('c')) identical entries
256+'c': (   0,     2)
256+'d': (   5,     2)

Here, state 2 is described as ('c' => 0, 'd' => 5), and everything else
as in state 1. The matching algorithm is as follows.

Flex-like scanner algorithm
---------------------------
  /* current state is in <state>, input character <c> */
  while (check[base[state] + c] != state)
    state = default[state];
  state = next[state];
  /* continue with the next input character */

This state compression algorithm performs well, except when there are
many inverted or wildcard matches ("[^x]", "."). Each input character
may cause several iterations in the while loop.


We will have many inverted character classes ("[^/]") that wouldn't
compress very well. Therefore, the regexp matcher uses no state
compression, and uses the check and default tables differently. The
above states could be stored as follows:

Regexp table format
-------------------

index: (default, base)
    0: (      0,    0)  <== dummy state (nonmatching)
    1: (      0,    0)
    2: (      1,    3)

  index: (next, check)
      0: (   0,     0)  <== unused entry
	 (   0,     0)  <== ord('a') identical, unused entries
  0+'a': (   2,     1)
  0+'b': (   3,     1)
  0+'c': (   4,     1)
  3+'a': (   2,     2)
  3+'b': (   3,     2)
  3+'c': (   0,     0)  <== entry is unused
  3+'d': (   5,     2)
	 (   0,     0)  <== (255 - ord('d')) identical, unused entries

All the entries with 0 in check (except the first entry, which is
deliberately reserved) are still available for other states that
fit in there.

Regexp scanner algorithm
------------------------
  /* current state is in <state>, matching character <c> */
  if (check[base[state] + c] == state)
    state = next[state];
  else
    state = default[state];
  /* continue with the next input character */

This representation and algorithm allows states which match more
characters than they do not match to be represented as their inverse. 
For example, a third state that accepts everything other than 'a' can
be added to the tables as one entry in (default, base) and one entry in
(next, check):

State
-----
 3: ('a' => 0, everything else => 5)

Regexp tables
-------------
index: (default, base)
    0: (      0,    0)  <== dummy state (nonmatching)
    1: (      0,    0)
    2: (      1,    3)
    3: (      5,    7)

  index: (next, check)
      0: (   0,     0)  <== unused entry
	 (   0,     0)  <== ord('a') identical, unused entries
  0+'a': (   2,     1)
  0+'b': (   3,     1)
  0+'c': (   4,     1)
  3+'a': (   2,     2)
  3+'b': (   3,     2)
  3+'c': (   0,     0)  <== entry is unused
  3+'d': (   5,     2)
  7+'a': (   0,     3)
	 (   0,     0)  <== (255 - ord('a')) identical, unused entries

While the current code does not implement any form of state compression,
the flex state compression representation could be combined by
remembering (in a bit per state, for example) which default entries
refer to inverted matches, and which refer to parent states.