This pages tracks improvements made to the AppArmor compiler performance over each release.
Areas for improvement
-
flags to make toggling between memory and speed optimizations easier
-O fast
-O mem
???
-
Better auto tuning of jobs
- -j0 really small systems
- small system
- large system
-
User side compression
- means compress once in userspace for cache
- kernel only has to unpack, and store compressed instead of doing compression
- better compression if done in userspace (less mem)
-
file cache (memory for speed)
- Include parse caching
- cache include files post parsing so that they don't need to be re-parsed
- generic file cache
- cache files that have already been read so they don't need to be reread
- requires multiple profiles be processed by one compiler call to be effective
- use mmap when possible
- Include parse caching
-
aare -> pcre convertion
- add native parsing of aare to the background eliminating this step
-
early tree node merging
- as part of tree creation instead of blindly creating new nodes, merge into existing tree as parsing
- requires: 3
-
tree simplification
- store alt nodes is sorted vector
- store cat nodes in vector
- do tree factoring on vectors, does not require tree rewrites or left/right manipulations
- merge char nodes into charset nodes (requires fix to dominance)
- replace char nodes with charset node to remove duplicate code
-
dfa creation - currently creates sets and removes duplicates which means a lot of creation and freeing
- push anode and nnode split into expr tree for nullable, firstpos, lastpos and follow calculations
- removes need to split sets, reducing allocation of set just to split it apart and lookup its constituent anode and nnode sets
- push anode and nnode caching into expr tree so it can be used by nullable, firstpos, lastpos
- add nnode vec merging, and move nnode to nodevec all the way through
- eliminates need to create nnode set to be immediately converted to nnode vec
- early accept node merging
- Instead of creating accept node sets do accept node merging and use single accept node to represent merged state
- nodevec merging cache
- merge operation cache provides duplicate elimination so that many node sets don't need to be recreated and freed after being lookuped at the cache level
- Eliminate nnode cache?
- its possible that between the nodevec merge cache and final combined nnode, anode cache the dedicated nnode cache is no longer benifical
- push anode and nnode split into expr tree for nullable, firstpos, lastpos and follow calculations
-
comb compression
- change to sliding window algorithm, to reduce set of slot comparisons done
- Note: diff compression also reduces the number of slot comparisons being done by reducing the number of transition
-
partial compiles - shared/precompiled
- share and potentially cache partial compile components
- includes are a natural boundary to precompile
- Requires dfa set operations
-
threaded parallel compile
- thread pool
- requires removal of global state
- shared file caches
- compile profile components in parallel (see partial compiles)
- once partial compiles are in place it should be easy to compile these components separately
- pipeline profile construction.
- different stages of the dfa constructions could be pipelined.
- tree optimization -> DFA construction -> minimization -> diff-encode -> comb compression
- since each stage is separate but dependent, separate threads could work on a given thread
Ways to reduce memory usage during a compile
- -O expr-simplify (may slow small policy compiles, will speed up large compiles)
- -O diff-encode (will slow down small compiles slightly, will speed up large compiles)
- -j N (will slow policy compiles that can happen in parallel) ** N=0 (no workers, everything done by driver, requires 3.0) ** N=1 (one worker) ** N=xC where C is less than the default of 8
- --max-jobs=N (cap workers used by -j irrespective of number of cpus
- ship pre-compiled policy
- tweak policy expressions
- ensure parser is stripped
Tunables to reduce policy size
- -O minimize (may slow small policy compiles, will speed-up large compiles)
- -O compress-small (will slow down compiles)
- -O diff-encode (may slow small policy compiles, will speed-up large compiles)
Tunables to increase compile speed
???TODO
Improvements per Release
- 2.1 DFA introduced
- 2.3 tree factoring
- 2.4 minimization
- 2.5
- 2.6 group info on state, singleton epsnode, remove ref count,
- 2.7 split dfa code - no major improvements
- 2.8 diff encoding
- 2.9 remove node hashing, fixing full minimization
- 2.10 early accept node grouping, and removal of accept alt node trees
- 2.11 -j N parallel compiles
- 2.12
- 2.13 per kernel cache files
- 3.0 --jobs=0, limit expr-simplification passes,
Performance Tests
small | Release | Time | Peak Mem | Nodes | States | Transitions 2.1 2.3 2.4 2.5 2.6 2.7 2.8
medium | Release | Time | Peak Mem | Nodes | States | Transitions 2.1 2.3 2.4 2.5 2.6 2.7 2.8
large | Release | Time | Peak Mem | Nodes | States | Transitions 2.1 2.3 2.4 2.5 2.6 2.7 2.8
huge | Release | Time | Peak Mem | Nodes | States | Transitions 2.1 2.3 2.4 2.5 2.6 2.7 2.8