Add the ability to match strings directly from the hfa instead of needing
to build a cfha.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-By: Steve Beattie <sbeattie@ubuntu.com>
instead of a NodeSet.
We need to store sets of Nodes, to compute the dfa but the C++ set is
not the most efficient way to do this as, it has a has a lot of overhead
just to store a single pointer.
Instead we can use an array of tightly packed pointers + a some header
information. We can do this because once the Set is finalized it will
not change, we just need to be able to reference and compare to it.
We don't use C++ Vectors as they have more overhead than a plain array
and we don't need their additional functionality.
We only replace the use of hashedNodeSets for non-accepting states as
these sets are only used in the dfa construction, and dominate the memory
usage. The accepting states still may need to be modified during
minimization and there are only a small number of entries (20-30), so
it does not make sense to convert them.
Also introduce a NodeVec cache that serves the same purpose as the NodeSet
cache that was introduced earlier.
This is not abstracted this out as nicely as might be desired but avoiding
the use of a custom iterator and directly iterating on the Node array
allows for a small performance gain, on larger sets.
This patch reduces the amount of heap memory used by dfa creation by about
4x - overhead. So for small dfas the savings is only 2-3x but on larger
dfas the savings become more and more pronounced.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
non-accepting, and have the proto-state use them.
To reduce memory overhead each set gains its own "cache" that make sure
there is only a single instance of each NodeSet generated. And since
we have a cache abstraction, move relavent stats into it.
Also refactor code slightly to make caches and work_queue etc, DFA member
variables instead of passing them as parameters.
The split + caching results in a small reduction in memory use as the
cost of ProtoState + Caching is less than the redundancy that is eliminated.
However this results in a small decrease in performance.
Sorry I know this really should have been split into multiple patches
but the patch evolved and I got lazy and decided to just not bother
splitting it.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
It is the functional equivalent of ProtoState. We do this to provide a
new level of abstraction that ProtoState can leverage, when the node types
are split.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
Create a new ProtoState class that will encapsulate the split, but for
this patch it will just contain what was done previously with NodeSet
Signed-off-by: John Johansen <john.johansen@canonical.com>
Split hfa into hfa and compressed_hfa files. The hfa portion focuses on
creating an manipulating hfas, while compressed_hfa is used for creating
compressed hfas that can be used/reused at run time with much less memory
usage than the full blown hfa.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-By: Steve Beattie <sbeattie@ubuntu.com>