Commit graph

11 commits

Author SHA1 Message Date
John Johansen
3ff8b4d19a Add basic string matching to the hfa
Add the ability to match strings directly from the hfa instead of needing
to build a cfha.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-By: Steve Beattie <sbeattie@ubuntu.com>
2012-01-06 09:03:20 -08:00
John Johansen
18821b079b To reduce memory overhead of dfa creation convert to using a Node Vector
instead of a NodeSet.

We need to store sets of Nodes, to compute the dfa but the C++ set is
not the most efficient way to do this as, it has a has a lot of overhead
just to store a single pointer.

Instead we can use an array of tightly packed pointers + a some header
information.  We can do this because once the Set is finalized it will
not change, we just need to be able to reference and compare to it.

We don't use C++ Vectors as they have more overhead than a plain array
and we don't need their additional functionality.

We only replace the use of hashedNodeSets for non-accepting states as
these sets are only used in the dfa construction, and dominate the memory
usage.  The accepting states still may need to be modified during
minimization and there are only a small number of entries (20-30), so
it does not make sense to convert them.

Also introduce a NodeVec cache that serves the same purpose as the NodeSet
cache that was introduced earlier.

This is not abstracted this out as nicely as might be desired but avoiding
the use of a custom iterator and directly iterating on the Node array
allows for a small performance gain, on larger sets.

This patch reduces the amount of heap memory used by dfa creation by about
4x - overhead.  So for small dfas the savings is only 2-3x but on larger
dfas the savings become more and more pronounced.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
2011-12-15 05:16:03 -08:00
John Johansen
2674a8b708 Split the nodeset used in computing the dfa into two sets, accepting and
non-accepting, and have the proto-state use them.

To reduce memory overhead each set gains its own "cache" that make sure
there is only a single instance of each NodeSet generated.  And since
we have a cache abstraction, move relavent stats into it.

Also refactor code slightly to make caches and work_queue etc, DFA member
variables instead of passing them as parameters.

The split + caching results in a small reduction in memory use as the
cost of ProtoState + Caching is less than the redundancy that is eliminated.
However this results in a small decrease in performance.

Sorry I know this really should have been split into multiple patches
but the patch evolved and I got lazy and decided to just not bother
splitting it.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
2011-12-15 05:14:37 -08:00
John Johansen
8bc30c8851 Replace usage of NodeSet with ProtoState in dfa creation.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
2011-12-15 05:12:30 -08:00
John Johansen
bd10235397 Add a new class hashedNodeSet.
It is the functional equivalent of ProtoState.  We do this to provide a
new level of abstraction that ProtoState can leverage, when the node types
are split.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
2011-12-15 05:11:09 -08:00
John Johansen
35b7ee91eb Now that we have a proper class we don't need a functor to do comparisons,
we can fold it into the classes operator<.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
2011-12-15 05:09:47 -08:00
John Johansen
d452f53576 Begin preparing to split accept nodes and non-accept nodes.
Create a new ProtoState class that will encapsulate the split, but for
this patch it will just contain what was done previously with NodeSet

Signed-off-by: John Johansen <john.johansen@canonical.com>
2011-12-15 05:08:31 -08:00
John Johansen
4beee46c52 Make sure that state always has otherwise set
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
2011-12-15 05:01:35 -08:00
John Johansen
bd66fba55f This helps make the meaning of things a little clearer and provides a clear
distinction betwen NodeCases, and State transitions

Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees@ubuntu.com>
2011-12-15 04:58:33 -08:00
John Johansen
9a377bb9da Lindent + some hand cleanups hfa
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Seth Arnold <seth.arnold@gmail.com>
2011-03-13 05:55:25 -07:00
John Johansen
6aad970d1c Split out compressed dfa "transition table" compression
Split hfa into hfa and compressed_hfa files.  The hfa portion focuses on
creating an manipulating hfas, while compressed_hfa is used for creating
compressed hfas that can be used/reused at run time with much less memory
usage than the full blown hfa.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-By: Steve Beattie <sbeattie@ubuntu.com>
2011-03-13 05:50:34 -07:00