mirror of
https://gitlab.com/apparmor/apparmor.git
synced 2025-03-04 08:24:42 +01:00

Add a basic overview of the ordering of the backend of the compiler and which stages specific dump info lines up with. Signed-off-by: John Johansen <john.johansen@canonical.com>
430 lines
13 KiB
Text
430 lines
13 KiB
Text
apparmor_re.h - control flags for hfa generation
|
|
expr-tree.{h,cc} - abstract syntax tree (ast) built from a regex parse
|
|
parse.{h,y} - code to parse a regex into an ast
|
|
hfc.{h,cc} - code to build and manipulate a hybrid finite automata (state
|
|
machine).
|
|
flex-tables.h - basic defines used by chfa
|
|
chfa.{h,cc} - code to build a highly compressed runtime readonly version
|
|
of an hfa.
|
|
aare_rules.{h,cc} - code to that binds parse -> expr-tree -> hfa generation
|
|
-> chfa generation into a basic interface for converting
|
|
rules to a runtime ready state machine.
|
|
|
|
Notes on the compiler pipeline order
|
|
============================================
|
|
|
|
Front End: Program driver logic and policy text parsing into an
|
|
abstract syntax tree.
|
|
Middle Layer: Transforms and operations on the abstract syntax tree.
|
|
Converts syntax tree into expression tree for back end.
|
|
Back End: transforms of syntax tree, and creation of policy HFA from
|
|
expression trees and HFAs.
|
|
|
|
|
|
Basic order of the backend of the compiler pipe line and where the
|
|
dump information occurs in the pipeline.
|
|
|
|
===== Front End (parse -> AST ================
|
|
|
|
|
v
|
|
yyparse
|
|
|
|
|
+--->--+-->-+
|
|
| |
|
|
| +-->---- +---------------------------<-----------------------+
|
|
| | | |
|
|
| | v |
|
|
| | yylex |
|
|
| | | |
|
|
| ^ token match |
|
|
| | | |
|
|
| | +----------------------------+ |
|
|
| | | | ^
|
|
| | v v |
|
|
| +-<- rule match? preprocess |
|
|
| | | |
|
|
| early var expansion +----------+-----------+ |
|
|
| | | | | |
|
|
^ v v v v |
|
|
| new rule() / new ent include variable conditional |
|
|
| | | | | |
|
|
| v +---->-----+----->-----+----->----+
|
|
| new rule semantic check
|
|
| |
|
|
+-----<-----+
|
|
|
|
|
----------- | ------ End of Parse --------------------
|
|
|
|
|
v
|
|
post_parse_profile semantic check
|
|
|
|
|
v
|
|
post_process
|
|
|
|
|
v
|
|
add implied rules()
|
|
|
|
|
v
|
|
process_profile_variables()
|
|
|
|
|
v
|
|
rule->expand_variables()
|
|
|
|
|
+--------+
|
|
|
|
|
v
|
|
replace aliases (to be moved to backend rewrite)
|
|
|
|
|
v
|
|
merge rules
|
|
|
|
|
v
|
|
profile->merge_rules()
|
|
|
|
|
v
|
|
+-->--rule->is_mergeable()
|
|
| |
|
|
^ v
|
|
| add to table
|
|
| |
|
|
+-------+--------+
|
|
|
|
|
v
|
|
sort->cmp()/oper<()
|
|
|
|
|
rule->merge()
|
|
|
|
|
+------------+
|
|
|
|
|
v
|
|
process_profile_rules
|
|
|
|
|
v
|
|
rule->gen_policy_re()
|
|
|
|
|
v
|
|
===== Mid layer (AST -> expr tree) =================
|
|
|
|
|
+-> add_rule() (aare_rules.{h,cc})
|
|
| |
|
|
| v
|
|
| rule parse (parse.y)
|
|
| | |
|
|
| | v
|
|
| | expr tree (expr-tree.{h,cc})
|
|
| | |
|
|
| v |
|
|
| unique perms | (aare_rules.{h,cc})
|
|
| | |
|
|
| +------ +
|
|
| |
|
|
| v
|
|
| add to rules expr tree (aare_rules.{h,c})
|
|
| |
|
|
+------+
|
|
|
|
|
+------------------+
|
|
|
|
|
v
|
|
create_dfablob()
|
|
|
|
|
v
|
|
expr tree
|
|
|
|
|
v
|
|
create_chfa() (aare_rules.cc)
|
|
|
|
|
v
|
|
expr normalization (expr-tree.{h,cc})
|
|
|
|
|
v
|
|
expr simplification (expr-tree.{h,c})
|
|
|
|
|
+- D expr-tree
|
|
|
|
|
+- D expr-simplified
|
|
|
|
|
==== Back End - Create cHFA out of expr tree and other HFAs ====
|
|
v
|
|
hfa creation (hfa.{h,cc})
|
|
|
|
|
+- D dfa-node-map
|
|
|
|
|
+- D dfa-uniq-perms
|
|
|
|
|
+- D dfa-states-initial
|
|
|
|
|
v
|
|
hfa rewrite (not yet implemented)
|
|
|
|
|
v
|
|
filter deny (hfa.{h,cc})
|
|
|
|
|
+- D dfa-states-post-filter
|
|
|
|
|
v
|
|
minimization (hfa.{h,cc})
|
|
|
|
|
+- D dfa-minimize-partitions
|
|
|
|
|
+- D dfa-minimize-uniq-perms
|
|
|
|
|
+- D dfa-states-post-minimize
|
|
|
|
|
v
|
|
unreachable state removal (hfa.{h,cc})
|
|
|
|
|
+- D dfa-states-post-unreachable
|
|
|
|
|
+- D dfa-states constructed hfa
|
|
|
|
|
+- D dfa-graph
|
|
|
|
|
v
|
|
equivalence class construction
|
|
|
|
|
+- D equiv
|
|
|
|
|
diff encode (hfa.{h,cc})
|
|
|
|
|
+- D diff-encode
|
|
|
|
|
compute perms table
|
|
|
|
|
+- D compressed-dfa == perm table dump
|
|
|
|
|
compressed hfa (chfa.{h,cc}
|
|
|
|
|
+- D compressed-dfa == transition tables
|
|
|
|
|
+- D dfa-compressed-states - compress HFA in state form
|
|
|
|
|
v
|
|
Return to Mid Layer
|
|
|
|
|
|
Notes on the compress hfa file format (chfa)
|
|
==============================================
|
|
|
|
The file format used is based on the GNU flex table file format
|
|
(--tables-file option; see Table File Format in the flex info pages and
|
|
the flex sources for documentation). The magic number used in the header
|
|
is set to 0x1B5E783D instead of 0xF13C57B1 though, which is meant to
|
|
indicate that the file format logically is not the same: the YY_ID_CHK
|
|
(check) and YY_ID_DEF (default), YY_ID_BASE tables are used differently.
|
|
|
|
The YY_ID_ACCEPTX tables either encode permissions directly, or are an
|
|
index, into an external tables.
|
|
|
|
There are two DFA table formats to support different size state machines
|
|
DFA16
|
|
default/next/check - are 16 bit tables
|
|
DFA32
|
|
default/next/check - are 32 bit tables
|
|
|
|
DFA32 is limited to 2^24 states, due to the upper 8 bits being used
|
|
as flags in the base table, unless the flags table is defined. When
|
|
the flags table is defined, DFA32 can have a full 2^32 states.
|
|
|
|
In both DFA16 and DFA32
|
|
base and accept are 32 bit tables.
|
|
|
|
State 0 is always used as the trap state. Its accept, base and default
|
|
fields should be 0.
|
|
|
|
State 1 is the default start state. Alternate start states are stored
|
|
external to the state machine.
|
|
|
|
If the flags table is not defined, the base table uses the lower 24
|
|
bits as index into the next/check tables, and the upper 8 bits are used
|
|
as flags.
|
|
|
|
The currently defined flags are
|
|
#define MATCH_FLAG_DIFF_ENCODE 0x80000000
|
|
#define MARK_DIFF_ENCODE 0x40000000
|
|
#define MATCH_FLAG_OOB_TRANSITION 0x20000000
|
|
|
|
Note the default[state] is used in two different ways.
|
|
|
|
1. When diff_encode is set, the state stores the difference to another
|
|
state defined by default. The next field will only store the
|
|
transitions that are unique to this state. Those transition may mask
|
|
transitions in the state that the current state is relative to, also
|
|
note the state that this state is relative might also be relative to
|
|
another state. Cycles are forbidden and checked for by the verifier.
|
|
The exact algorithm used to build these state difference will be
|
|
discussed in another section.
|
|
|
|
|
|
States and transitions on specific characters to next states
|
|
------------------------------------------------------------
|
|
1: ('a' => 2, 'b' => 3, 'c' => 4)
|
|
2: ('a' => 2, 'b' => 3, 'd' => 5)
|
|
|
|
Table format - where D in base represnts Diff encode flag
|
|
----------------------
|
|
index: (default, base)
|
|
0: ( 0, 0) <== dummy state (nonmatching)
|
|
1: ( 0, 0)
|
|
2: ( 1, D 256)
|
|
|
|
index: (next, check)
|
|
0: ( 0, 0) <== unused entry
|
|
( 0, 1) <== ord('a') identical entries
|
|
0+'a': ( 2, 1)
|
|
0+'b': ( 3, 1)
|
|
0+'c': ( 4, 1)
|
|
( 0, 1) <== (255 - ord('c')) identical entries
|
|
256+'c': ( 0, 2)
|
|
256+'d': ( 5, 2)
|
|
|
|
Here, state 2 is described as ('c' => 0, 'd' => 5), and everything else
|
|
as in state 1. The matching algorithm is as follows.
|
|
|
|
Scanner algorithm
|
|
---------------------------
|
|
/* current state is in <state>, input character <c> */
|
|
|
|
while (check[base[state] + c] != state) {
|
|
diff = (FLAGS(base) & diff_encode);
|
|
state = default[state];
|
|
if (!diff)
|
|
goto done;
|
|
}
|
|
state = next[base[state] + c];
|
|
done:
|
|
|
|
/* continue with the next input character */
|
|
|
|
2. When diff_encode is NOT set, the default state is used to represent
|
|
all none matching transitions (ie. check[base[state] + c] != state).
|
|
The dfa build will compute the transition with the most transitions
|
|
and use that for the default state. ie.
|
|
|
|
if we have
|
|
1: ('a' => 2)
|
|
("[^a]" => 0)
|
|
then 0 will be used as the default state
|
|
|
|
if we have
|
|
1: ("[^a]" => 2)
|
|
('a' => 0)
|
|
then 2 will be used as the default state, and the only state encoded
|
|
in the next/check tables will be for 'a'
|
|
|
|
The combination of the diff-encoded and non-diff encoded states performs
|
|
well even when there are many inverted or wildcard matches ("[^x]", ".").
|
|
|
|
|
|
Simplified Regexp scanner algorithm for non-diff encoded state (note
|
|
diff encode algorithm above works as well)
|
|
|
|
------------------------
|
|
/* current state is in <state>, matching character <c> */
|
|
if (check[base[state] + c] == state)
|
|
state = next[base[state] + c];
|
|
else
|
|
state = default[state];
|
|
/* continue with the next input character */
|
|
|
|
|
|
Each input character may cause several iterations in the while loop,
|
|
but due to guarantees in the build at most 2n states will be
|
|
transitioned for n input characters. The expected number of states
|
|
walked is much closer to n and in practice due to cache locality the
|
|
diff encoded state machine is usually faster than a non-diff encoded
|
|
state machine with a strict n state for n input walk.
|
|
|
|
|
|
Comb Compression
|
|
-----------------
|
|
|
|
The next/check tables of states are only used to encode transitions
|
|
not covered by the default transition. The input byte is indexed off
|
|
the base value, covering 256 positions within the next/check
|
|
tables. However a state may only encode a few transitions within that
|
|
range, leaving holes. These holes are filled by other states
|
|
transitions whose range will overlap.
|
|
|
|
1: ('a' => 2, 'b' => 3, 'c' => 4)
|
|
2: ('a' => 2, 'b' => 3, 'd' => 5)
|
|
3: ('a' => 0, everything else => 5)
|
|
|
|
Regexp tables
|
|
-------------
|
|
index: (default, base)
|
|
0: ( 0, 0) <== dummy state (nonmatching)
|
|
1: ( 0, 0)
|
|
2: ( 1, 3)
|
|
3: ( 5, 7)
|
|
|
|
index: (next, check)
|
|
0: ( 0, 0) <== unused entry
|
|
( 0, 0) <== ord('a') identical, unused entries
|
|
0+'a': ( 2, 1)
|
|
0+'b': ( 3, 1)
|
|
0+'c': ( 4, 1)
|
|
3+'a': ( 2, 2)
|
|
3+'b': ( 3, 2)
|
|
3+'c': ( 0, 0) <== entry is unused, hole that could be filled
|
|
3+'d': ( 5, 2)
|
|
7+'a': ( 0, 3)
|
|
( 0, 0) <== (255 - ord('a')) identical, unused entries
|
|
|
|
|
|
Regexp tables comb compressed
|
|
-------------
|
|
index: (default, base)
|
|
0: ( 0, 0)
|
|
1: ( 0, 0)
|
|
2: ( 1, 3)
|
|
3: ( 5, 5)
|
|
|
|
index: (next, check)
|
|
0: ( 0, 0)
|
|
( 0, 0)
|
|
0+'a': ( 2, 1)
|
|
0+'b': ( 3, 1)
|
|
0+'c': ( 4, 1)
|
|
3+'a': ( 2, 2)
|
|
3+'b': ( 3, 2)
|
|
5+'a': ( 0, 3) <== entry was previously at 7+'a'
|
|
3+'d': ( 5, 2)
|
|
( 0, 0) <== (255 - ord('a')) identical, unused entries
|
|
|
|
|
|
Out of Band Transitions (oobs)
|
|
---------------------------------
|
|
|
|
Out of band transitions (oobs) allow for a state to have transitions
|
|
that can not be triggered by input. Any state that has oobs must have
|
|
the OOB flag set on the state. An oob is triggered by subtracting the
|
|
oob number from the the base index value, to find the next and check
|
|
value. Current only single oob is supported. And all states using
|
|
an oob must have the oob flag set.
|
|
|
|
if ((FLAG(base) & OOB) && check[base[state] - oob] == state)
|
|
state = next[base[state]] - oob]
|
|
|
|
oobs might be expressed as a negative number eg. -1 for the first
|
|
oob. In which case the oob transition above uses a + oob instead.
|
|
|
|
If more oobs are needed a second oob flag can be allocated, and if
|
|
used in combination with the original, would allow a state to have
|
|
up to 3 oobs
|
|
|
|
00 - none
|
|
01 - 1
|
|
10 - 2
|
|
11 - 3
|
|
|
|
|
|
Diff Encode Spanning Tree
|
|
============================================
|
|
To build the state machine with diff encoded states and to still meet
|
|
run time guaratees about traversing no more than 2n states for n input
|
|
a spanning tree is use.
|
|
|
|
* TODO *
|
|
|
|
|