mirrors/apparmor

Fork 0

mirror of https://gitlab.com/apparmor/apparmor.git synced 2025-03-04 00:14:44 +01:00

Table of Contents

Areas for improvement
Ways to reduce memory usage during a compile
Tunables to reduce policy size
Tunables to increase compile speed
Improvements per Release
Performance Tests

This pages tracks improvements made to the AppArmor compiler performance over each release.

Areas for improvement

flags to make toggling between memory and speed optimizations easier
- -O fast
- -O mem???
Better auto tuning of jobs
- -j0 really small systems
- small system
- large system
User side compression
- means compress once in userspace for cache
- kernel only has to unpack, and store compressed instead of doing compression
  - better compression if done in userspace (less mem)
file cache (memory for speed)
- Include parse caching
  - cache include files post parsing so that they don't need to be re-parsed
- generic file cache
- cache files that have already been read so they don't need to be reread
- requires multiple profiles be processed by one compiler call to be effective
- use mmap when possible
aare -> pcre convertion
- add native parsing of aare to the background eliminating this step
early tree node merging
- as part of tree creation instead of blindly creating new nodes, merge into existing tree as parsing
- requires: 3
tree simplification
- store alt nodes is sorted vector
- store cat nodes in vector
- do tree factoring on vectors, does not require tree rewrites or left/right manipulations
- merge char nodes into charset nodes (requires fix to dominance)
- replace char nodes with charset node to remove duplicate code
dfa creation - currently creates sets and removes duplicates which means a lot of creation and freeing
- push anode and nnode split into expr tree for nullable, firstpos, lastpos and follow calculations
  - removes need to split sets, reducing allocation of set just to split it apart and lookup its constituent anode and nnode sets
- push anode and nnode caching into expr tree so it can be used by nullable, firstpos, lastpos
- add nnode vec merging, and move nnode to nodevec all the way through
  - eliminates need to create nnode set to be immediately converted to nnode vec
- early accept node merging
  - Instead of creating accept node sets do accept node merging and use single accept node to represent merged state
- nodevec merging cache
  - merge operation cache provides duplicate elimination so that many node sets don't need to be recreated and freed after being lookuped at the cache level
- Eliminate nnode cache?
  - its possible that between the nodevec merge cache and final combined nnode, anode cache the dedicated nnode cache is no longer benifical
comb compression
- change to sliding window algorithm, to reduce set of slot comparisons done
- Note: diff compression also reduces the number of slot comparisons being done by reducing the number of transition
partial compiles - shared/precompiled
- share and potentially cache partial compile components
- includes are a natural boundary to precompile
- Requires dfa set operations
threaded parallel compile
- thread pool
- requires removal of global state
- shared file caches
- compile profile components in parallel (see partial compiles)
  - once partial compiles are in place it should be easy to compile these components separately
- pipeline profile construction.
  - different stages of the dfa constructions could be pipelined.
  - tree optimization -> DFA construction -> minimization -> diff-encode -> comb compression
  - since each stage is separate but dependent, separate threads could work on a given thread

Ways to reduce memory usage during a compile

-O expr-simplify (may slow small policy compiles, will speed up large compiles)
-O diff-encode (will slow down small compiles slightly, will speed up large compiles)
-j N (will slow policy compiles that can happen in parallel) ** N=0 (no workers, everything done by driver, requires 3.0) ** N=1 (one worker) ** N=xC where C is less than the default of 8
--max-jobs=N (cap workers used by -j irrespective of number of cpus
ship pre-compiled policy
tweak policy expressions
ensure parser is stripped

Tunables to reduce policy size

-O minimize (may slow small policy compiles, will speed-up large compiles)
-O compress-small (will slow down compiles)
-O diff-encode (may slow small policy compiles, will speed-up large compiles)

Tunables to increase compile speed

???TODO

Improvements per Release

2.1 DFA introduced
2.3 tree factoring
2.4 minimization
2.5
2.6 group info on state, singleton epsnode, remove ref count,
2.7 split dfa code - no major improvements
2.8 diff encoding
2.9 remove node hashing, fixing full minimization
2.10 early accept node grouping, and removal of accept alt node trees
2.11 -j N parallel compiles
2.12
2.13 per kernel cache files
3.0 --jobs=0, limit expr-simplification passes,

Performance Tests