apparmor/parser/libapparmor_re/hfa.h

418 lines
11 KiB
C
Raw Permalink Normal View History

/*
* (C) 2006, 2007 Andreas Gruenbacher <agruen@suse.de>
* Copyright (c) 2003-2008 Novell, Inc. (All rights reserved)
* Copyright 2009-2012 Canonical Ltd.
*
* The libapparmor library is licensed under the terms of the GNU
* Lesser General Public License, version 2.1. Please see the file
* COPYING.LGPL.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*
*
* Base of implementation based on the Lexical Analysis chapter of:
* Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman:
* Compilers: Principles, Techniques, and Tools (The "Dragon Book"),
* Addison-Wesley, 1986.
*/
#ifndef __LIBAA_RE_HFA_H
#define __LIBAA_RE_HFA_H
#include <list>
#include <map>
#include <vector>
#include <iostream>
Add Differential State Compression to the DFA Differential state compression encodes a state's transitions as the difference between the state and its default state (the state it is relative too). This reduces the number of transitions that need to be stored in the transition table, hence reducing the size of the dfa. There is a trade off in that a single input character may have to traverse more than one state. This is somewhat offset by reduced table sizes providing better locality and caching properties. With carefully encoding we can still make constant match time guarentees. This patch guarentees that a state that is differentially encoded will do at most 3m state traversal to match an input of length m (as opposed to a non-differentially compressed dfa doing exactly m state traversals). In practice the actually number of extra traversals is less than this becaus we selectively choose which states are differentially encoded. In addition to reducing the size of the dfa by reducing the number of transitions that have to be stored. Differential encoding reduces the number of transitions that need to be considered by comb compression, which can result in tighter packing, due to a reduction in sparseness, and also reduces the time spent in comb compression which currently uses an O(n^2) algorithm. Differential encoding will always result in a DFA that is smaller or equal in size to the encoded DFA, and will usually improve compilation times, with the performance improvements increasing as the DFA gets larger. Eg. Given a example DFA that created 8991 states after minimization. * If only comb compression (current default) is used 52057 transitions are packed into a table of 69591 entries. Achieving an efficiency of about 75% (an average of about 7.74 table entries per state). With a resulting compressed dfa16 size of 404238 bytes and a run time for the dfa compilation of real 0m9.037s user 0m8.893s sys 0m0.036s * If differential encoding + comb compression is used, 8292 of the 8991 states are differentially encoded, with 31557 trans removed. Resulting in 20500 transitions are packed into a table of 20675 entries. Acheiving an efficiency of about 99.2% (an average of about 2.3 table entries per state With a resulting compressed dfa16 size of 207874 bytes (about 48.6% reduction) and a run time for the dfa compilation of real 0m5.416s (about 40% faster) user 0m5.280s sys 0m0.040s Repeating with a larger DFA that has 17033 states after minimization. * If only comb compression (current default) is used 102992 transitions are packed into a table of 137987 entries. Achieving an efficiency of about 75% (an average of about 8.10 entries per state). With a resultant compressed dfa16 size of 790410 bytes and a run time for d compilation of real 0m28.153s user 0m27.634s sys 0m0.120s * with differential encoding 39374 transition are packed into a table of 39594 entries. Achieving an efficiency of about 99.4% (an average of about 2.32 entries per state). With a resultant compressed dfa16 size of 396838 bytes (about 50% reduction and a run time for dfa compilation of real 0m11.804s (about 58% faster) user 0m11.657s sys 0m0.084s Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2014-01-09 16:55:55 -08:00
#include <assert.h>
#include <limits.h>
#include <stdint.h>
#include "expr-tree.h"
#include "policy_compat.h"
#include "../rule.h"
extern int prompt_compat_mode;
Add Differential State Compression to the DFA Differential state compression encodes a state's transitions as the difference between the state and its default state (the state it is relative too). This reduces the number of transitions that need to be stored in the transition table, hence reducing the size of the dfa. There is a trade off in that a single input character may have to traverse more than one state. This is somewhat offset by reduced table sizes providing better locality and caching properties. With carefully encoding we can still make constant match time guarentees. This patch guarentees that a state that is differentially encoded will do at most 3m state traversal to match an input of length m (as opposed to a non-differentially compressed dfa doing exactly m state traversals). In practice the actually number of extra traversals is less than this becaus we selectively choose which states are differentially encoded. In addition to reducing the size of the dfa by reducing the number of transitions that have to be stored. Differential encoding reduces the number of transitions that need to be considered by comb compression, which can result in tighter packing, due to a reduction in sparseness, and also reduces the time spent in comb compression which currently uses an O(n^2) algorithm. Differential encoding will always result in a DFA that is smaller or equal in size to the encoded DFA, and will usually improve compilation times, with the performance improvements increasing as the DFA gets larger. Eg. Given a example DFA that created 8991 states after minimization. * If only comb compression (current default) is used 52057 transitions are packed into a table of 69591 entries. Achieving an efficiency of about 75% (an average of about 7.74 table entries per state). With a resulting compressed dfa16 size of 404238 bytes and a run time for the dfa compilation of real 0m9.037s user 0m8.893s sys 0m0.036s * If differential encoding + comb compression is used, 8292 of the 8991 states are differentially encoded, with 31557 trans removed. Resulting in 20500 transitions are packed into a table of 20675 entries. Acheiving an efficiency of about 99.2% (an average of about 2.3 table entries per state With a resulting compressed dfa16 size of 207874 bytes (about 48.6% reduction) and a run time for the dfa compilation of real 0m5.416s (about 40% faster) user 0m5.280s sys 0m0.040s Repeating with a larger DFA that has 17033 states after minimization. * If only comb compression (current default) is used 102992 transitions are packed into a table of 137987 entries. Achieving an efficiency of about 75% (an average of about 8.10 entries per state). With a resultant compressed dfa16 size of 790410 bytes and a run time for d compilation of real 0m28.153s user 0m27.634s sys 0m0.120s * with differential encoding 39374 transition are packed into a table of 39594 entries. Achieving an efficiency of about 99.4% (an average of about 2.32 entries per state). With a resultant compressed dfa16 size of 396838 bytes (about 50% reduction and a run time for dfa compilation of real 0m11.804s (about 58% faster) user 0m11.657s sys 0m0.084s Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2014-01-09 16:55:55 -08:00
#define DiffEncodeFlag 1
class State;
typedef map<transchar, State *> StateTrans;
typedef list<State *> Partition;
#include "../immunix.h"
ostream &operator<<(ostream &os, const State &state);
ostream &operator<<(ostream &os, State &state);
class perms_t {
public:
perms_t(void): allow(0), deny(0), prompt(0), audit(0), quiet(0), exact(0) { };
bool is_accept(void) { return (allow | deny | prompt | audit | quiet); }
void dump_header(ostream &os)
{
os << "(allow/deny/prompt/audit/quiet)";
}
void dump(ostream &os)
{
os << "(0x " << hex
<< allow << "/" << deny << "/" << "/" << prompt << "/" << audit << "/" << quiet
<< ')' << dec;
}
void clear(void) {
allow = deny = prompt = audit = quiet = exact = 0;
}
void clear_bits(perm32_t bits)
{
allow &= ~bits;
deny &= ~bits;
prompt &= ~bits;
audit &= ~bits;
quiet &= ~bits;
exact &= ~bits;
}
void add(perms_t &rhs, bool filedfa)
{
deny |= rhs.deny;
if (filedfa && !is_merged_x_consistent(allow & ALL_USER_EXEC,
rhs.allow & ALL_USER_EXEC)) {
if ((exact & AA_USER_EXEC_TYPE) &&
!(rhs.exact & AA_USER_EXEC_TYPE)) {
/* do nothing */
} else if ((rhs.exact & AA_USER_EXEC_TYPE) &&
!(exact & AA_USER_EXEC_TYPE)) {
allow = (allow & ~AA_USER_EXEC_TYPE) |
(rhs.allow & AA_USER_EXEC_TYPE);
} else
throw 1;
} else if (filedfa)
allow |= rhs.allow & AA_USER_EXEC_TYPE;
if (filedfa && !is_merged_x_consistent(allow & ALL_OTHER_EXEC,
rhs.allow & ALL_OTHER_EXEC)) {
if ((exact & AA_OTHER_EXEC_TYPE) &&
!(rhs.exact & AA_OTHER_EXEC_TYPE)) {
/* do nothing */
} else if ((rhs.exact & AA_OTHER_EXEC_TYPE) &&
!(exact & AA_OTHER_EXEC_TYPE)) {
allow = (allow & ~AA_OTHER_EXEC_TYPE) |
(rhs.allow & AA_OTHER_EXEC_TYPE);
} else
throw 1;
} else if (filedfa)
allow |= rhs.allow & AA_OTHER_EXEC_TYPE;
if (filedfa)
allow = (allow | (rhs.allow & ~ALL_AA_EXEC_TYPE));
else
allow |= rhs.allow;
prompt |= rhs.prompt;
audit |= rhs.audit;
quiet = (quiet | rhs.quiet);
/*
if (exec & AA_USER_EXEC_TYPE &&
(exec & AA_USER_EXEC_TYPE) != (allow & AA_USER_EXEC_TYPE))
throw 1;
if (exec & AA_OTHER_EXEC_TYPE &&
(exec & AA_OTHER_EXEC_TYPE) != (allow & AA_OTHER_EXEC_TYPE))
throw 1;
*/
}
parser: bug fix do not change auditing information when applying deny The parser recently changed how/where deny information is applied. commit 1fa45b7c1 ("parser: dfa minimization prepare for extended permissions") removed the implicit filtering of explicit denies during the minimization pass. The implicit clear allowed the explicit information to be carried into the minimization pass and merged with implicit denies. The end result being a minimized dfa with the explicit deny information available to be applied post minimization, and then dropped later at permission encoding in the accept entries. Extended permission however enable carrying explicit deny information into the kernel to fix certain bugs like complain mode not being able to distinguish between implicit and explicit deny rules (ie. deny rules get ignored in complain mode). However keeping explicit deny information when unnecessary result in a larger state machine than necessary and slower compiles. commit 179c1c1ba ("parser: fix minimization check for filtering_deny") Moved the explicit apply_and_clear_deny() pass to before minimization to restore mnimization's ability to create a minimized dfa with explicit and implicit deny information merged but this also cleared the explicit deny information that used to be carried through minimization. This meant that when the deny information was applied post minimization it resulted in the audit and quiet information being cleared. This resulted in the query_label tests failing as they are checking for the expected audit infomation in the permissions. Fixes: 179c1c1ba ("parser: fix minimization check for filtering_deny") Bug: https://gitlab.com/apparmor/apparmor/-/issues/461 Signed-off-by: John Johansen <john.johansen@canonical.com>
2024-10-27 10:52:36 -06:00
/* returns true if perm is no longer accept */
int apply_and_clear_deny(void)
{
if (deny) {
allow &= ~deny;
parser: bug fix do not change auditing information when applying deny The parser recently changed how/where deny information is applied. commit 1fa45b7c1 ("parser: dfa minimization prepare for extended permissions") removed the implicit filtering of explicit denies during the minimization pass. The implicit clear allowed the explicit information to be carried into the minimization pass and merged with implicit denies. The end result being a minimized dfa with the explicit deny information available to be applied post minimization, and then dropped later at permission encoding in the accept entries. Extended permission however enable carrying explicit deny information into the kernel to fix certain bugs like complain mode not being able to distinguish between implicit and explicit deny rules (ie. deny rules get ignored in complain mode). However keeping explicit deny information when unnecessary result in a larger state machine than necessary and slower compiles. commit 179c1c1ba ("parser: fix minimization check for filtering_deny") Moved the explicit apply_and_clear_deny() pass to before minimization to restore mnimization's ability to create a minimized dfa with explicit and implicit deny information merged but this also cleared the explicit deny information that used to be carried through minimization. This meant that when the deny information was applied post minimization it resulted in the audit and quiet information being cleared. This resulted in the query_label tests failing as they are checking for the expected audit infomation in the permissions. Fixes: 179c1c1ba ("parser: fix minimization check for filtering_deny") Bug: https://gitlab.com/apparmor/apparmor/-/issues/461 Signed-off-by: John Johansen <john.johansen@canonical.com>
2024-10-27 10:52:36 -06:00
exact &= ~deny;
prompt &= ~deny;
parser: bug fix do not change auditing information when applying deny The parser recently changed how/where deny information is applied. commit 1fa45b7c1 ("parser: dfa minimization prepare for extended permissions") removed the implicit filtering of explicit denies during the minimization pass. The implicit clear allowed the explicit information to be carried into the minimization pass and merged with implicit denies. The end result being a minimized dfa with the explicit deny information available to be applied post minimization, and then dropped later at permission encoding in the accept entries. Extended permission however enable carrying explicit deny information into the kernel to fix certain bugs like complain mode not being able to distinguish between implicit and explicit deny rules (ie. deny rules get ignored in complain mode). However keeping explicit deny information when unnecessary result in a larger state machine than necessary and slower compiles. commit 179c1c1ba ("parser: fix minimization check for filtering_deny") Moved the explicit apply_and_clear_deny() pass to before minimization to restore mnimization's ability to create a minimized dfa with explicit and implicit deny information merged but this also cleared the explicit deny information that used to be carried through minimization. This meant that when the deny information was applied post minimization it resulted in the audit and quiet information being cleared. This resulted in the query_label tests failing as they are checking for the expected audit infomation in the permissions. Fixes: 179c1c1ba ("parser: fix minimization check for filtering_deny") Bug: https://gitlab.com/apparmor/apparmor/-/issues/461 Signed-off-by: John Johansen <john.johansen@canonical.com>
2024-10-27 10:52:36 -06:00
/* don't change audit or quiet based on clearing
* deny at this stage. This was made unique in
* accept_perms, and the info about whether
* we are auditing or quieting based on the explicit
* deny has been discarded and can only be inferred.
* But we know it is correct from accept_perms()
* audit &= deny;
* quiet &= deny;
*/
deny = 0;
return !is_accept();
}
return 0;
}
bool operator<(perms_t const &rhs)const
{
if (allow < rhs.allow)
return allow < rhs.allow;
if (deny < rhs.deny)
return deny < rhs.deny;
if (prompt < rhs.prompt)
return prompt < rhs.prompt;
if (audit < rhs.audit)
return audit < rhs.audit;
return quiet < rhs.quiet;
}
perm32_t allow, deny, prompt, audit, quiet, exact;
};
int accept_perms(optflags const &opts, NodeVec *state, perms_t &perms,
bool filedfa);
/*
* ProtoState - NodeSet and ancillery information used to create a state
*/
class ProtoState {
public:
NodeVec *nnodes;
NodeVec *anodes;
/* init is used instead of a constructor because ProtoState is used
* in a union
*/
void init(NodeVec *n, NodeVec *a = NULL)
{
nnodes = n;
anodes = a;
}
bool operator<(ProtoState const &rhs)const
{
if (nnodes == rhs.nnodes)
return anodes < rhs.anodes;
return nnodes < rhs.nnodes;
}
unsigned long size(void)
{
if (anodes)
return nnodes->size() + anodes->size();
return nnodes->size();
}
};
Add Differential State Compression to the DFA Differential state compression encodes a state's transitions as the difference between the state and its default state (the state it is relative too). This reduces the number of transitions that need to be stored in the transition table, hence reducing the size of the dfa. There is a trade off in that a single input character may have to traverse more than one state. This is somewhat offset by reduced table sizes providing better locality and caching properties. With carefully encoding we can still make constant match time guarentees. This patch guarentees that a state that is differentially encoded will do at most 3m state traversal to match an input of length m (as opposed to a non-differentially compressed dfa doing exactly m state traversals). In practice the actually number of extra traversals is less than this becaus we selectively choose which states are differentially encoded. In addition to reducing the size of the dfa by reducing the number of transitions that have to be stored. Differential encoding reduces the number of transitions that need to be considered by comb compression, which can result in tighter packing, due to a reduction in sparseness, and also reduces the time spent in comb compression which currently uses an O(n^2) algorithm. Differential encoding will always result in a DFA that is smaller or equal in size to the encoded DFA, and will usually improve compilation times, with the performance improvements increasing as the DFA gets larger. Eg. Given a example DFA that created 8991 states after minimization. * If only comb compression (current default) is used 52057 transitions are packed into a table of 69591 entries. Achieving an efficiency of about 75% (an average of about 7.74 table entries per state). With a resulting compressed dfa16 size of 404238 bytes and a run time for the dfa compilation of real 0m9.037s user 0m8.893s sys 0m0.036s * If differential encoding + comb compression is used, 8292 of the 8991 states are differentially encoded, with 31557 trans removed. Resulting in 20500 transitions are packed into a table of 20675 entries. Acheiving an efficiency of about 99.2% (an average of about 2.3 table entries per state With a resulting compressed dfa16 size of 207874 bytes (about 48.6% reduction) and a run time for the dfa compilation of real 0m5.416s (about 40% faster) user 0m5.280s sys 0m0.040s Repeating with a larger DFA that has 17033 states after minimization. * If only comb compression (current default) is used 102992 transitions are packed into a table of 137987 entries. Achieving an efficiency of about 75% (an average of about 8.10 entries per state). With a resultant compressed dfa16 size of 790410 bytes and a run time for d compilation of real 0m28.153s user 0m27.634s sys 0m0.120s * with differential encoding 39374 transition are packed into a table of 39594 entries. Achieving an efficiency of about 99.4% (an average of about 2.32 entries per state). With a resultant compressed dfa16 size of 396838 bytes (about 50% reduction and a run time for dfa compilation of real 0m11.804s (about 58% faster) user 0m11.657s sys 0m0.084s Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2014-01-09 16:55:55 -08:00
/* Temporary state structure used when building differential encoding
* @parents - set of states that have transitions to this state
* @depth - level in the DAG
* @state - back reference to state this DAG entry belongs
* @rel - state that this state is relative to for differential encoding
*/
struct DiffDag {
Partition parents;
int depth;
State *state;
State *rel;
};
/*
* State - DFA individual state information
* label: a unique label to identify the state used for pretty printing
* the non-matching state is setup to have label == 0 and
* the start state is setup to have label == 1
* audit: the audit permission mask for the state
* accept: the accept permissions for the state
* trans: set of transitions from this state
* otherwise: the default state for transitions not in @trans
* partition: Is a temporary work variable used during dfa minimization.
* it can be replaced with a map, but that is slower and uses more
* memory.
* proto: Is a temporary work variable used during dfa creation. It can
* be replaced by using the nodemap, but that is slower
*/
class State {
public:
State(optflags const &opts, int l, ProtoState &n, State *other,
bool filedfa):
label(l), flags(0), idx(0), perms(), trans()
{
int error;
if (other)
otherwise = other;
else
otherwise = this;
proto = n;
/* Compute permissions associated with the State. */
error = accept_perms(opts, n.anodes, perms, filedfa);
if (error) {
//cerr << "Failing on accept perms " << error << "\n";
throw error;
}
};
State *next(transchar c) {
Add Differential State Compression to the DFA Differential state compression encodes a state's transitions as the difference between the state and its default state (the state it is relative too). This reduces the number of transitions that need to be stored in the transition table, hence reducing the size of the dfa. There is a trade off in that a single input character may have to traverse more than one state. This is somewhat offset by reduced table sizes providing better locality and caching properties. With carefully encoding we can still make constant match time guarentees. This patch guarentees that a state that is differentially encoded will do at most 3m state traversal to match an input of length m (as opposed to a non-differentially compressed dfa doing exactly m state traversals). In practice the actually number of extra traversals is less than this becaus we selectively choose which states are differentially encoded. In addition to reducing the size of the dfa by reducing the number of transitions that have to be stored. Differential encoding reduces the number of transitions that need to be considered by comb compression, which can result in tighter packing, due to a reduction in sparseness, and also reduces the time spent in comb compression which currently uses an O(n^2) algorithm. Differential encoding will always result in a DFA that is smaller or equal in size to the encoded DFA, and will usually improve compilation times, with the performance improvements increasing as the DFA gets larger. Eg. Given a example DFA that created 8991 states after minimization. * If only comb compression (current default) is used 52057 transitions are packed into a table of 69591 entries. Achieving an efficiency of about 75% (an average of about 7.74 table entries per state). With a resulting compressed dfa16 size of 404238 bytes and a run time for the dfa compilation of real 0m9.037s user 0m8.893s sys 0m0.036s * If differential encoding + comb compression is used, 8292 of the 8991 states are differentially encoded, with 31557 trans removed. Resulting in 20500 transitions are packed into a table of 20675 entries. Acheiving an efficiency of about 99.2% (an average of about 2.3 table entries per state With a resulting compressed dfa16 size of 207874 bytes (about 48.6% reduction) and a run time for the dfa compilation of real 0m5.416s (about 40% faster) user 0m5.280s sys 0m0.040s Repeating with a larger DFA that has 17033 states after minimization. * If only comb compression (current default) is used 102992 transitions are packed into a table of 137987 entries. Achieving an efficiency of about 75% (an average of about 8.10 entries per state). With a resultant compressed dfa16 size of 790410 bytes and a run time for d compilation of real 0m28.153s user 0m27.634s sys 0m0.120s * with differential encoding 39374 transition are packed into a table of 39594 entries. Achieving an efficiency of about 99.4% (an average of about 2.32 entries per state). With a resultant compressed dfa16 size of 396838 bytes (about 50% reduction and a run time for dfa compilation of real 0m11.804s (about 58% faster) user 0m11.657s sys 0m0.084s Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2014-01-09 16:55:55 -08:00
State *state = this;
do {
StateTrans::iterator i = state->trans.find(c);
if (i != state->trans.end())
return i->second;
if (!(state->flags & DiffEncodeFlag))
return state->otherwise;
state = state->otherwise;
} while (state);
/* never reached */
assert(0);
return NULL;
}
ostream &dump(ostream &os)
{
os << *this << "\n";
for (StateTrans::iterator i = trans.begin(); i != trans.end(); i++) {
os << " " << i->first.c << " -> " << *i->second << "\n";
}
return os;
}
add ability to use out of band transitions Currently the NULL character is used as an out of band transition for string/path elements. This works for them as the NULL character is not valid for this data. However this does not work for binary data that can contain a NULL character. So far we have only dealt with fixed length fields of binary data making the NULL separator either unnecessary. However binary data like in the xattr match and mount data field are variable length and can contain NULL characters. To deal with this add the ability to specify out of band transitions, that can only be triggered by code not input data. The out of band transition can be used to separate variable length data fields just as the NULL transition has been used to separate variable length strings. In the compressed hfa out of band transitions are expressed as a negative offset from the states base. This leaves us room to expand the character match range in the future if desired and on average makes the range between the out of band transition and the input transitions smaller than would be had if the out of band transition had been stored after the valid input transitions. Out of band transitions in the dfa will not break old kernels that don't know about them, but they won't be able to trigger the out of band transition match. So they should not be used unless the kernel indicates that it supports them. It should be noted that this patch only adds support for a single out of band transition. If multiple out of band transitions are required. It is trivial to extend. - Add a tag indicating support in the kernel - add a oob max range field to the dfa header so the kernel knows what the max range that needs verifying is. - extend oob generation fns to generate oob based on value instead of a fixed -1. Signed-off-by: John Johansen <john.johansen@canonical.com>
2019-08-11 06:18:27 -07:00
int diff_weight(State *rel, int max_range, int upper_bound);
int make_relative(State *rel, int upper_bound);
void flatten_relative(State *, int upper_bound);
int apply_and_clear_deny(void) { return perms.apply_and_clear_deny(); }
void map_perms_to_accept(perm32_t &accept1, perm32_t &accept2,
perm32_t &accept3, bool prompt)
{
accept1 = perms.allow;
if (prompt && prompt_compat_mode == PROMPT_COMPAT_DEV)
parser: bug fix do not change auditing information when applying deny The parser recently changed how/where deny information is applied. commit 1fa45b7c1 ("parser: dfa minimization prepare for extended permissions") removed the implicit filtering of explicit denies during the minimization pass. The implicit clear allowed the explicit information to be carried into the minimization pass and merged with implicit denies. The end result being a minimized dfa with the explicit deny information available to be applied post minimization, and then dropped later at permission encoding in the accept entries. Extended permission however enable carrying explicit deny information into the kernel to fix certain bugs like complain mode not being able to distinguish between implicit and explicit deny rules (ie. deny rules get ignored in complain mode). However keeping explicit deny information when unnecessary result in a larger state machine than necessary and slower compiles. commit 179c1c1ba ("parser: fix minimization check for filtering_deny") Moved the explicit apply_and_clear_deny() pass to before minimization to restore mnimization's ability to create a minimized dfa with explicit and implicit deny information merged but this also cleared the explicit deny information that used to be carried through minimization. This meant that when the deny information was applied post minimization it resulted in the audit and quiet information being cleared. This resulted in the query_label tests failing as they are checking for the expected audit infomation in the permissions. Fixes: 179c1c1ba ("parser: fix minimization check for filtering_deny") Bug: https://gitlab.com/apparmor/apparmor/-/issues/461 Signed-off-by: John Johansen <john.johansen@canonical.com>
2024-10-27 10:52:36 -06:00
accept2 = PACK_AUDIT_CTL(perms.prompt, perms.quiet);
else
parser: bug fix do not change auditing information when applying deny The parser recently changed how/where deny information is applied. commit 1fa45b7c1 ("parser: dfa minimization prepare for extended permissions") removed the implicit filtering of explicit denies during the minimization pass. The implicit clear allowed the explicit information to be carried into the minimization pass and merged with implicit denies. The end result being a minimized dfa with the explicit deny information available to be applied post minimization, and then dropped later at permission encoding in the accept entries. Extended permission however enable carrying explicit deny information into the kernel to fix certain bugs like complain mode not being able to distinguish between implicit and explicit deny rules (ie. deny rules get ignored in complain mode). However keeping explicit deny information when unnecessary result in a larger state machine than necessary and slower compiles. commit 179c1c1ba ("parser: fix minimization check for filtering_deny") Moved the explicit apply_and_clear_deny() pass to before minimization to restore mnimization's ability to create a minimized dfa with explicit and implicit deny information merged but this also cleared the explicit deny information that used to be carried through minimization. This meant that when the deny information was applied post minimization it resulted in the audit and quiet information being cleared. This resulted in the query_label tests failing as they are checking for the expected audit infomation in the permissions. Fixes: 179c1c1ba ("parser: fix minimization check for filtering_deny") Bug: https://gitlab.com/apparmor/apparmor/-/issues/461 Signed-off-by: John Johansen <john.johansen@canonical.com>
2024-10-27 10:52:36 -06:00
accept2 = PACK_AUDIT_CTL(perms.audit, perms.quiet);
accept3 = perms.prompt;
}
int label;
Add Differential State Compression to the DFA Differential state compression encodes a state's transitions as the difference between the state and its default state (the state it is relative too). This reduces the number of transitions that need to be stored in the transition table, hence reducing the size of the dfa. There is a trade off in that a single input character may have to traverse more than one state. This is somewhat offset by reduced table sizes providing better locality and caching properties. With carefully encoding we can still make constant match time guarentees. This patch guarentees that a state that is differentially encoded will do at most 3m state traversal to match an input of length m (as opposed to a non-differentially compressed dfa doing exactly m state traversals). In practice the actually number of extra traversals is less than this becaus we selectively choose which states are differentially encoded. In addition to reducing the size of the dfa by reducing the number of transitions that have to be stored. Differential encoding reduces the number of transitions that need to be considered by comb compression, which can result in tighter packing, due to a reduction in sparseness, and also reduces the time spent in comb compression which currently uses an O(n^2) algorithm. Differential encoding will always result in a DFA that is smaller or equal in size to the encoded DFA, and will usually improve compilation times, with the performance improvements increasing as the DFA gets larger. Eg. Given a example DFA that created 8991 states after minimization. * If only comb compression (current default) is used 52057 transitions are packed into a table of 69591 entries. Achieving an efficiency of about 75% (an average of about 7.74 table entries per state). With a resulting compressed dfa16 size of 404238 bytes and a run time for the dfa compilation of real 0m9.037s user 0m8.893s sys 0m0.036s * If differential encoding + comb compression is used, 8292 of the 8991 states are differentially encoded, with 31557 trans removed. Resulting in 20500 transitions are packed into a table of 20675 entries. Acheiving an efficiency of about 99.2% (an average of about 2.3 table entries per state With a resulting compressed dfa16 size of 207874 bytes (about 48.6% reduction) and a run time for the dfa compilation of real 0m5.416s (about 40% faster) user 0m5.280s sys 0m0.040s Repeating with a larger DFA that has 17033 states after minimization. * If only comb compression (current default) is used 102992 transitions are packed into a table of 137987 entries. Achieving an efficiency of about 75% (an average of about 8.10 entries per state). With a resultant compressed dfa16 size of 790410 bytes and a run time for d compilation of real 0m28.153s user 0m27.634s sys 0m0.120s * with differential encoding 39374 transition are packed into a table of 39594 entries. Achieving an efficiency of about 99.4% (an average of about 2.32 entries per state). With a resultant compressed dfa16 size of 396838 bytes (about 50% reduction and a run time for dfa compilation of real 0m11.804s (about 58% faster) user 0m11.657s sys 0m0.084s Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2014-01-09 16:55:55 -08:00
int flags;
int idx;
perms_t perms;
StateTrans trans;
State *otherwise;
/* temp storage for State construction */
union {
Partition *partition; /* used during minimization */
ProtoState proto; /* used during creation */
Add Differential State Compression to the DFA Differential state compression encodes a state's transitions as the difference between the state and its default state (the state it is relative too). This reduces the number of transitions that need to be stored in the transition table, hence reducing the size of the dfa. There is a trade off in that a single input character may have to traverse more than one state. This is somewhat offset by reduced table sizes providing better locality and caching properties. With carefully encoding we can still make constant match time guarentees. This patch guarentees that a state that is differentially encoded will do at most 3m state traversal to match an input of length m (as opposed to a non-differentially compressed dfa doing exactly m state traversals). In practice the actually number of extra traversals is less than this becaus we selectively choose which states are differentially encoded. In addition to reducing the size of the dfa by reducing the number of transitions that have to be stored. Differential encoding reduces the number of transitions that need to be considered by comb compression, which can result in tighter packing, due to a reduction in sparseness, and also reduces the time spent in comb compression which currently uses an O(n^2) algorithm. Differential encoding will always result in a DFA that is smaller or equal in size to the encoded DFA, and will usually improve compilation times, with the performance improvements increasing as the DFA gets larger. Eg. Given a example DFA that created 8991 states after minimization. * If only comb compression (current default) is used 52057 transitions are packed into a table of 69591 entries. Achieving an efficiency of about 75% (an average of about 7.74 table entries per state). With a resulting compressed dfa16 size of 404238 bytes and a run time for the dfa compilation of real 0m9.037s user 0m8.893s sys 0m0.036s * If differential encoding + comb compression is used, 8292 of the 8991 states are differentially encoded, with 31557 trans removed. Resulting in 20500 transitions are packed into a table of 20675 entries. Acheiving an efficiency of about 99.2% (an average of about 2.3 table entries per state With a resulting compressed dfa16 size of 207874 bytes (about 48.6% reduction) and a run time for the dfa compilation of real 0m5.416s (about 40% faster) user 0m5.280s sys 0m0.040s Repeating with a larger DFA that has 17033 states after minimization. * If only comb compression (current default) is used 102992 transitions are packed into a table of 137987 entries. Achieving an efficiency of about 75% (an average of about 8.10 entries per state). With a resultant compressed dfa16 size of 790410 bytes and a run time for d compilation of real 0m28.153s user 0m27.634s sys 0m0.120s * with differential encoding 39374 transition are packed into a table of 39594 entries. Achieving an efficiency of about 99.4% (an average of about 2.32 entries per state). With a resultant compressed dfa16 size of 396838 bytes (about 50% reduction and a run time for dfa compilation of real 0m11.804s (about 58% faster) user 0m11.657s sys 0m0.084s Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2014-01-09 16:55:55 -08:00
DiffDag *diff; /* used during diff encoding */
};
};
class NodeMap: public CacheStats
{
public:
typedef map<ProtoState, State *>::iterator iterator;
iterator begin() { return cache.begin(); }
iterator end() { return cache.end(); }
map<ProtoState, State *> cache;
NodeMap(void): cache() { };
~NodeMap() { clear(); };
virtual unsigned long size(void) const { return cache.size(); }
void clear()
{
cache.clear();
CacheStats::clear();
}
pair<iterator,bool> insert(ProtoState &proto, State *state)
{
pair<iterator,bool> uniq;
uniq = cache.insert(make_pair(proto, state));
if (uniq.second == false) {
dup++;
} else {
sum += proto.size();
if (proto.size() > max)
max = proto.size();
}
return uniq;
}
};
2024-12-30 01:27:21 -08:00
typedef map<const State *, size_t> Renumber_Map;
Add Differential State Compression to the DFA Differential state compression encodes a state's transitions as the difference between the state and its default state (the state it is relative too). This reduces the number of transitions that need to be stored in the transition table, hence reducing the size of the dfa. There is a trade off in that a single input character may have to traverse more than one state. This is somewhat offset by reduced table sizes providing better locality and caching properties. With carefully encoding we can still make constant match time guarentees. This patch guarentees that a state that is differentially encoded will do at most 3m state traversal to match an input of length m (as opposed to a non-differentially compressed dfa doing exactly m state traversals). In practice the actually number of extra traversals is less than this becaus we selectively choose which states are differentially encoded. In addition to reducing the size of the dfa by reducing the number of transitions that have to be stored. Differential encoding reduces the number of transitions that need to be considered by comb compression, which can result in tighter packing, due to a reduction in sparseness, and also reduces the time spent in comb compression which currently uses an O(n^2) algorithm. Differential encoding will always result in a DFA that is smaller or equal in size to the encoded DFA, and will usually improve compilation times, with the performance improvements increasing as the DFA gets larger. Eg. Given a example DFA that created 8991 states after minimization. * If only comb compression (current default) is used 52057 transitions are packed into a table of 69591 entries. Achieving an efficiency of about 75% (an average of about 7.74 table entries per state). With a resulting compressed dfa16 size of 404238 bytes and a run time for the dfa compilation of real 0m9.037s user 0m8.893s sys 0m0.036s * If differential encoding + comb compression is used, 8292 of the 8991 states are differentially encoded, with 31557 trans removed. Resulting in 20500 transitions are packed into a table of 20675 entries. Acheiving an efficiency of about 99.2% (an average of about 2.3 table entries per state With a resulting compressed dfa16 size of 207874 bytes (about 48.6% reduction) and a run time for the dfa compilation of real 0m5.416s (about 40% faster) user 0m5.280s sys 0m0.040s Repeating with a larger DFA that has 17033 states after minimization. * If only comb compression (current default) is used 102992 transitions are packed into a table of 137987 entries. Achieving an efficiency of about 75% (an average of about 8.10 entries per state). With a resultant compressed dfa16 size of 790410 bytes and a run time for d compilation of real 0m28.153s user 0m27.634s sys 0m0.120s * with differential encoding 39374 transition are packed into a table of 39594 entries. Achieving an efficiency of about 99.4% (an average of about 2.32 entries per state). With a resultant compressed dfa16 size of 396838 bytes (about 50% reduction and a run time for dfa compilation of real 0m11.804s (about 58% faster) user 0m11.657s sys 0m0.084s Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2014-01-09 16:55:55 -08:00
/* Transitions in the DFA. */
class DFA {
void dump_node_to_dfa(void);
State *add_new_state(optflags const &opts, NodeSet *nodes,
State *other);
State *add_new_state(optflags const &opts,NodeSet *anodes,
NodeSet *nnodes, State *other);
void update_state_transitions(optflags const &opts, State *state);
void process_work_queue(const char *header, optflags const &);
Add Differential State Compression to the DFA Differential state compression encodes a state's transitions as the difference between the state and its default state (the state it is relative too). This reduces the number of transitions that need to be stored in the transition table, hence reducing the size of the dfa. There is a trade off in that a single input character may have to traverse more than one state. This is somewhat offset by reduced table sizes providing better locality and caching properties. With carefully encoding we can still make constant match time guarentees. This patch guarentees that a state that is differentially encoded will do at most 3m state traversal to match an input of length m (as opposed to a non-differentially compressed dfa doing exactly m state traversals). In practice the actually number of extra traversals is less than this becaus we selectively choose which states are differentially encoded. In addition to reducing the size of the dfa by reducing the number of transitions that have to be stored. Differential encoding reduces the number of transitions that need to be considered by comb compression, which can result in tighter packing, due to a reduction in sparseness, and also reduces the time spent in comb compression which currently uses an O(n^2) algorithm. Differential encoding will always result in a DFA that is smaller or equal in size to the encoded DFA, and will usually improve compilation times, with the performance improvements increasing as the DFA gets larger. Eg. Given a example DFA that created 8991 states after minimization. * If only comb compression (current default) is used 52057 transitions are packed into a table of 69591 entries. Achieving an efficiency of about 75% (an average of about 7.74 table entries per state). With a resulting compressed dfa16 size of 404238 bytes and a run time for the dfa compilation of real 0m9.037s user 0m8.893s sys 0m0.036s * If differential encoding + comb compression is used, 8292 of the 8991 states are differentially encoded, with 31557 trans removed. Resulting in 20500 transitions are packed into a table of 20675 entries. Acheiving an efficiency of about 99.2% (an average of about 2.3 table entries per state With a resulting compressed dfa16 size of 207874 bytes (about 48.6% reduction) and a run time for the dfa compilation of real 0m5.416s (about 40% faster) user 0m5.280s sys 0m0.040s Repeating with a larger DFA that has 17033 states after minimization. * If only comb compression (current default) is used 102992 transitions are packed into a table of 137987 entries. Achieving an efficiency of about 75% (an average of about 8.10 entries per state). With a resultant compressed dfa16 size of 790410 bytes and a run time for d compilation of real 0m28.153s user 0m27.634s sys 0m0.120s * with differential encoding 39374 transition are packed into a table of 39594 entries. Achieving an efficiency of about 99.4% (an average of about 2.32 entries per state). With a resultant compressed dfa16 size of 396838 bytes (about 50% reduction and a run time for dfa compilation of real 0m11.804s (about 58% faster) user 0m11.657s sys 0m0.084s Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2014-01-09 16:55:55 -08:00
void dump_diff_chain(ostream &os, map<State *, Partition> &relmap,
Partition &chain, State *state,
unsigned int &count, unsigned int &total,
unsigned int &max);
/* temporary values used during computations */
NodeVecCache anodes_cache;
To reduce memory overhead of dfa creation convert to using a Node Vector instead of a NodeSet. We need to store sets of Nodes, to compute the dfa but the C++ set is not the most efficient way to do this as, it has a has a lot of overhead just to store a single pointer. Instead we can use an array of tightly packed pointers + a some header information. We can do this because once the Set is finalized it will not change, we just need to be able to reference and compare to it. We don't use C++ Vectors as they have more overhead than a plain array and we don't need their additional functionality. We only replace the use of hashedNodeSets for non-accepting states as these sets are only used in the dfa construction, and dominate the memory usage. The accepting states still may need to be modified during minimization and there are only a small number of entries (20-30), so it does not make sense to convert them. Also introduce a NodeVec cache that serves the same purpose as the NodeSet cache that was introduced earlier. This is not abstracted this out as nicely as might be desired but avoiding the use of a custom iterator and directly iterating on the Node array allows for a small performance gain, on larger sets. This patch reduces the amount of heap memory used by dfa creation by about 4x - overhead. So for small dfas the savings is only 2-3x but on larger dfas the savings become more and more pronounced. Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Kees Cook <kees@ubuntu.com>
2011-12-15 05:16:03 -08:00
NodeVecCache nnodes_cache;
NodeMap node_map;
list<State *> work_queue;
public:
DFA(Node *root, optflags const &flags, bool filedfa);
virtual ~DFA();
State *match_len(State *state, const char *str, size_t len);
State *match_until(State *state, const char *str, const char term);
State *match(const char *str);
void remove_unreachable(optflags const &flags);
bool same_mappings(State *s1, State *s2);
void minimize(optflags const &flags);
int apply_and_clear_deny(void);
void clear_priorities(void);
Add Differential State Compression to the DFA Differential state compression encodes a state's transitions as the difference between the state and its default state (the state it is relative too). This reduces the number of transitions that need to be stored in the transition table, hence reducing the size of the dfa. There is a trade off in that a single input character may have to traverse more than one state. This is somewhat offset by reduced table sizes providing better locality and caching properties. With carefully encoding we can still make constant match time guarentees. This patch guarentees that a state that is differentially encoded will do at most 3m state traversal to match an input of length m (as opposed to a non-differentially compressed dfa doing exactly m state traversals). In practice the actually number of extra traversals is less than this becaus we selectively choose which states are differentially encoded. In addition to reducing the size of the dfa by reducing the number of transitions that have to be stored. Differential encoding reduces the number of transitions that need to be considered by comb compression, which can result in tighter packing, due to a reduction in sparseness, and also reduces the time spent in comb compression which currently uses an O(n^2) algorithm. Differential encoding will always result in a DFA that is smaller or equal in size to the encoded DFA, and will usually improve compilation times, with the performance improvements increasing as the DFA gets larger. Eg. Given a example DFA that created 8991 states after minimization. * If only comb compression (current default) is used 52057 transitions are packed into a table of 69591 entries. Achieving an efficiency of about 75% (an average of about 7.74 table entries per state). With a resulting compressed dfa16 size of 404238 bytes and a run time for the dfa compilation of real 0m9.037s user 0m8.893s sys 0m0.036s * If differential encoding + comb compression is used, 8292 of the 8991 states are differentially encoded, with 31557 trans removed. Resulting in 20500 transitions are packed into a table of 20675 entries. Acheiving an efficiency of about 99.2% (an average of about 2.3 table entries per state With a resulting compressed dfa16 size of 207874 bytes (about 48.6% reduction) and a run time for the dfa compilation of real 0m5.416s (about 40% faster) user 0m5.280s sys 0m0.040s Repeating with a larger DFA that has 17033 states after minimization. * If only comb compression (current default) is used 102992 transitions are packed into a table of 137987 entries. Achieving an efficiency of about 75% (an average of about 8.10 entries per state). With a resultant compressed dfa16 size of 790410 bytes and a run time for d compilation of real 0m28.153s user 0m27.634s sys 0m0.120s * with differential encoding 39374 transition are packed into a table of 39594 entries. Achieving an efficiency of about 99.4% (an average of about 2.32 entries per state). With a resultant compressed dfa16 size of 396838 bytes (about 50% reduction and a run time for dfa compilation of real 0m11.804s (about 58% faster) user 0m11.657s sys 0m0.084s Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2014-01-09 16:55:55 -08:00
void diff_encode(optflags const &flags);
Add Differential State Compression to the DFA Differential state compression encodes a state's transitions as the difference between the state and its default state (the state it is relative too). This reduces the number of transitions that need to be stored in the transition table, hence reducing the size of the dfa. There is a trade off in that a single input character may have to traverse more than one state. This is somewhat offset by reduced table sizes providing better locality and caching properties. With carefully encoding we can still make constant match time guarentees. This patch guarentees that a state that is differentially encoded will do at most 3m state traversal to match an input of length m (as opposed to a non-differentially compressed dfa doing exactly m state traversals). In practice the actually number of extra traversals is less than this becaus we selectively choose which states are differentially encoded. In addition to reducing the size of the dfa by reducing the number of transitions that have to be stored. Differential encoding reduces the number of transitions that need to be considered by comb compression, which can result in tighter packing, due to a reduction in sparseness, and also reduces the time spent in comb compression which currently uses an O(n^2) algorithm. Differential encoding will always result in a DFA that is smaller or equal in size to the encoded DFA, and will usually improve compilation times, with the performance improvements increasing as the DFA gets larger. Eg. Given a example DFA that created 8991 states after minimization. * If only comb compression (current default) is used 52057 transitions are packed into a table of 69591 entries. Achieving an efficiency of about 75% (an average of about 7.74 table entries per state). With a resulting compressed dfa16 size of 404238 bytes and a run time for the dfa compilation of real 0m9.037s user 0m8.893s sys 0m0.036s * If differential encoding + comb compression is used, 8292 of the 8991 states are differentially encoded, with 31557 trans removed. Resulting in 20500 transitions are packed into a table of 20675 entries. Acheiving an efficiency of about 99.2% (an average of about 2.3 table entries per state With a resulting compressed dfa16 size of 207874 bytes (about 48.6% reduction) and a run time for the dfa compilation of real 0m5.416s (about 40% faster) user 0m5.280s sys 0m0.040s Repeating with a larger DFA that has 17033 states after minimization. * If only comb compression (current default) is used 102992 transitions are packed into a table of 137987 entries. Achieving an efficiency of about 75% (an average of about 8.10 entries per state). With a resultant compressed dfa16 size of 790410 bytes and a run time for d compilation of real 0m28.153s user 0m27.634s sys 0m0.120s * with differential encoding 39374 transition are packed into a table of 39594 entries. Achieving an efficiency of about 99.4% (an average of about 2.32 entries per state). With a resultant compressed dfa16 size of 396838 bytes (about 50% reduction and a run time for dfa compilation of real 0m11.804s (about 58% faster) user 0m11.657s sys 0m0.084s Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2014-01-09 16:55:55 -08:00
void undiff_encode(void);
void dump_diff_encode(ostream &os);
2024-12-30 01:27:21 -08:00
void dump(ostream &os, Renumber_Map *renum);
void dump_dot_graph(ostream &os);
void dump_uniq_perms(const char *s);
Add Differential State Compression to the DFA Differential state compression encodes a state's transitions as the difference between the state and its default state (the state it is relative too). This reduces the number of transitions that need to be stored in the transition table, hence reducing the size of the dfa. There is a trade off in that a single input character may have to traverse more than one state. This is somewhat offset by reduced table sizes providing better locality and caching properties. With carefully encoding we can still make constant match time guarentees. This patch guarentees that a state that is differentially encoded will do at most 3m state traversal to match an input of length m (as opposed to a non-differentially compressed dfa doing exactly m state traversals). In practice the actually number of extra traversals is less than this becaus we selectively choose which states are differentially encoded. In addition to reducing the size of the dfa by reducing the number of transitions that have to be stored. Differential encoding reduces the number of transitions that need to be considered by comb compression, which can result in tighter packing, due to a reduction in sparseness, and also reduces the time spent in comb compression which currently uses an O(n^2) algorithm. Differential encoding will always result in a DFA that is smaller or equal in size to the encoded DFA, and will usually improve compilation times, with the performance improvements increasing as the DFA gets larger. Eg. Given a example DFA that created 8991 states after minimization. * If only comb compression (current default) is used 52057 transitions are packed into a table of 69591 entries. Achieving an efficiency of about 75% (an average of about 7.74 table entries per state). With a resulting compressed dfa16 size of 404238 bytes and a run time for the dfa compilation of real 0m9.037s user 0m8.893s sys 0m0.036s * If differential encoding + comb compression is used, 8292 of the 8991 states are differentially encoded, with 31557 trans removed. Resulting in 20500 transitions are packed into a table of 20675 entries. Acheiving an efficiency of about 99.2% (an average of about 2.3 table entries per state With a resulting compressed dfa16 size of 207874 bytes (about 48.6% reduction) and a run time for the dfa compilation of real 0m5.416s (about 40% faster) user 0m5.280s sys 0m0.040s Repeating with a larger DFA that has 17033 states after minimization. * If only comb compression (current default) is used 102992 transitions are packed into a table of 137987 entries. Achieving an efficiency of about 75% (an average of about 8.10 entries per state). With a resultant compressed dfa16 size of 790410 bytes and a run time for d compilation of real 0m28.153s user 0m27.634s sys 0m0.120s * with differential encoding 39374 transition are packed into a table of 39594 entries. Achieving an efficiency of about 99.4% (an average of about 2.32 entries per state). With a resultant compressed dfa16 size of 396838 bytes (about 50% reduction and a run time for dfa compilation of real 0m11.804s (about 58% faster) user 0m11.657s sys 0m0.084s Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2014-01-09 16:55:55 -08:00
map<transchar, transchar> equivalence_classes(optflags const &flags);
void apply_equivalence_classes(map<transchar, transchar> &eq);
Add Differential State Compression to the DFA Differential state compression encodes a state's transitions as the difference between the state and its default state (the state it is relative too). This reduces the number of transitions that need to be stored in the transition table, hence reducing the size of the dfa. There is a trade off in that a single input character may have to traverse more than one state. This is somewhat offset by reduced table sizes providing better locality and caching properties. With carefully encoding we can still make constant match time guarentees. This patch guarentees that a state that is differentially encoded will do at most 3m state traversal to match an input of length m (as opposed to a non-differentially compressed dfa doing exactly m state traversals). In practice the actually number of extra traversals is less than this becaus we selectively choose which states are differentially encoded. In addition to reducing the size of the dfa by reducing the number of transitions that have to be stored. Differential encoding reduces the number of transitions that need to be considered by comb compression, which can result in tighter packing, due to a reduction in sparseness, and also reduces the time spent in comb compression which currently uses an O(n^2) algorithm. Differential encoding will always result in a DFA that is smaller or equal in size to the encoded DFA, and will usually improve compilation times, with the performance improvements increasing as the DFA gets larger. Eg. Given a example DFA that created 8991 states after minimization. * If only comb compression (current default) is used 52057 transitions are packed into a table of 69591 entries. Achieving an efficiency of about 75% (an average of about 7.74 table entries per state). With a resulting compressed dfa16 size of 404238 bytes and a run time for the dfa compilation of real 0m9.037s user 0m8.893s sys 0m0.036s * If differential encoding + comb compression is used, 8292 of the 8991 states are differentially encoded, with 31557 trans removed. Resulting in 20500 transitions are packed into a table of 20675 entries. Acheiving an efficiency of about 99.2% (an average of about 2.3 table entries per state With a resulting compressed dfa16 size of 207874 bytes (about 48.6% reduction) and a run time for the dfa compilation of real 0m5.416s (about 40% faster) user 0m5.280s sys 0m0.040s Repeating with a larger DFA that has 17033 states after minimization. * If only comb compression (current default) is used 102992 transitions are packed into a table of 137987 entries. Achieving an efficiency of about 75% (an average of about 8.10 entries per state). With a resultant compressed dfa16 size of 790410 bytes and a run time for d compilation of real 0m28.153s user 0m27.634s sys 0m0.120s * with differential encoding 39374 transition are packed into a table of 39594 entries. Achieving an efficiency of about 99.4% (an average of about 2.32 entries per state). With a resultant compressed dfa16 size of 396838 bytes (about 50% reduction and a run time for dfa compilation of real 0m11.804s (about 58% faster) user 0m11.657s sys 0m0.084s Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2014-01-09 16:55:55 -08:00
void compute_perms_table_ent(State *state, size_t pos,
vector <aa_perms> &perms_table,
bool prompt);
void compute_perms_table(vector <aa_perms> &perms_table,
bool prompt);
Add Differential State Compression to the DFA Differential state compression encodes a state's transitions as the difference between the state and its default state (the state it is relative too). This reduces the number of transitions that need to be stored in the transition table, hence reducing the size of the dfa. There is a trade off in that a single input character may have to traverse more than one state. This is somewhat offset by reduced table sizes providing better locality and caching properties. With carefully encoding we can still make constant match time guarentees. This patch guarentees that a state that is differentially encoded will do at most 3m state traversal to match an input of length m (as opposed to a non-differentially compressed dfa doing exactly m state traversals). In practice the actually number of extra traversals is less than this becaus we selectively choose which states are differentially encoded. In addition to reducing the size of the dfa by reducing the number of transitions that have to be stored. Differential encoding reduces the number of transitions that need to be considered by comb compression, which can result in tighter packing, due to a reduction in sparseness, and also reduces the time spent in comb compression which currently uses an O(n^2) algorithm. Differential encoding will always result in a DFA that is smaller or equal in size to the encoded DFA, and will usually improve compilation times, with the performance improvements increasing as the DFA gets larger. Eg. Given a example DFA that created 8991 states after minimization. * If only comb compression (current default) is used 52057 transitions are packed into a table of 69591 entries. Achieving an efficiency of about 75% (an average of about 7.74 table entries per state). With a resulting compressed dfa16 size of 404238 bytes and a run time for the dfa compilation of real 0m9.037s user 0m8.893s sys 0m0.036s * If differential encoding + comb compression is used, 8292 of the 8991 states are differentially encoded, with 31557 trans removed. Resulting in 20500 transitions are packed into a table of 20675 entries. Acheiving an efficiency of about 99.2% (an average of about 2.3 table entries per state With a resulting compressed dfa16 size of 207874 bytes (about 48.6% reduction) and a run time for the dfa compilation of real 0m5.416s (about 40% faster) user 0m5.280s sys 0m0.040s Repeating with a larger DFA that has 17033 states after minimization. * If only comb compression (current default) is used 102992 transitions are packed into a table of 137987 entries. Achieving an efficiency of about 75% (an average of about 8.10 entries per state). With a resultant compressed dfa16 size of 790410 bytes and a run time for d compilation of real 0m28.153s user 0m27.634s sys 0m0.120s * with differential encoding 39374 transition are packed into a table of 39594 entries. Achieving an efficiency of about 99.4% (an average of about 2.32 entries per state). With a resultant compressed dfa16 size of 396838 bytes (about 50% reduction and a run time for dfa compilation of real 0m11.804s (about 58% faster) user 0m11.657s sys 0m0.084s Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Seth Arnold <seth.arnold@canonical.com>
2014-01-09 16:55:55 -08:00
unsigned int diffcount;
add ability to use out of band transitions Currently the NULL character is used as an out of band transition for string/path elements. This works for them as the NULL character is not valid for this data. However this does not work for binary data that can contain a NULL character. So far we have only dealt with fixed length fields of binary data making the NULL separator either unnecessary. However binary data like in the xattr match and mount data field are variable length and can contain NULL characters. To deal with this add the ability to specify out of band transitions, that can only be triggered by code not input data. The out of band transition can be used to separate variable length data fields just as the NULL transition has been used to separate variable length strings. In the compressed hfa out of band transitions are expressed as a negative offset from the states base. This leaves us room to expand the character match range in the future if desired and on average makes the range between the out of band transition and the input transitions smaller than would be had if the out of band transition had been stored after the valid input transitions. Out of band transitions in the dfa will not break old kernels that don't know about them, but they won't be able to trigger the out of band transition match. So they should not be used unless the kernel indicates that it supports them. It should be noted that this patch only adds support for a single out of band transition. If multiple out of band transitions are required. It is trivial to extend. - Add a tag indicating support in the kernel - add a oob max range field to the dfa header so the kernel knows what the max range that needs verifying is. - extend oob generation fns to generate oob based on value instead of a fixed -1. Signed-off-by: John Johansen <john.johansen@canonical.com>
2019-08-11 06:18:27 -07:00
int oob_range;
int max_range;
int ord_range;
int upper_bound;
Node *root;
State *nonmatching, *start;
Partition states;
bool filedfa;
};
void dump_equivalence_classes(ostream &os, map<transchar, transchar> &eq);
#endif /* __LIBAA_RE_HFA_H */