Queryables¶
Grapl provides powerful primitives for building graph based queries.
At the root of this query logic is the Queryable base class, though you shouldn’t ever have to work with that directly.
Queries are themselves Python classes that can be composed and constrained.
A simple query would look like this:
ProcessQuery()
This query describes a process - any process, it’s totally unconstrained.
We can execute this query in a few ways. Here are three examples,
gclient = GraphClient()
all_processes = ProcessQuery().query(gclient)
one_process = ProcessQuery().query_first(gclient)
count = ProcessQuery().get_count(gclient)
Queryable.query¶
Query the graph for all nodes that match.
graph_client
- a GraphClient, which will determine which database to query
contains_node_key
- a node_key that must exist somewhere in the query
first
- return only the first first
nodes. Defaults to 1000. When
contains_node_key
, first
is set to 1.
returns
- a list of nodes that matched your query
def query(
self,
graph_client: GraphClient,
contains_node_key: Optional[str] = None,
first: Optional[int] = 1000,
) -> List["NV"]:
pass
Queryable.query_first¶
Query the graph for the first node that matches.
graph_client
- a GraphClient, which will determine which database to query
contains_node_key
- a node_key that must exist somewhere in the query
returns
- a list of nodes that matched your query
def query_first(
self,
graph_client: GraphClient,
contains_node_key: Optional[str] = None
) -> Optional["NV"]:
pass
Queryable.get_count¶
Query the graph, counting all matches.
graph_client
- a GraphClient, which will determine which database to query
first
- count up to first
, and then stop.
returns
- the number of matches for this query. If first
is set, only count
up to first
.
def get_count(
self,
graph_client: GraphClient,
first: Optional[int] = None,
) -> int:
pass
contains_node_key¶
In some cases, such as when writing Grapl Analyzers, we want to execute a query where a node’s node_key may be anywhere in that graph.
For example,
query = (
ProcessQuery() # A
.with_bin_file(
FileQuery() # B
.with_spawned_from(
ProcessQuery() # C
)
)
)
query.query_first(mclient, contains_node_key="node-key-to-query")
In this case, if our signature matches such that any of the nodes A, B, C, have the node_key “node-key-to-query”, we have a match - otherwise, no match.
Boolean Logic¶
And¶
For a single predicate constraint (with_* method) all constraints are considered And’d.
This query matches a process name that contains both “foo” and “bar”.
ProcessQuery()
.with_process_name(contains=["foo", "bar"])
Or¶
Multiple predicate constraints are considered Or’d.
This query matches a process name that contains either “foo” or “bar”.
ProcessQuery()
.with_process_name(contains="foo")
.with_process_name(contains="bar")
Not¶
Any constraint can be wrapped in a Not to negate the constraint.
This query matches a process name that is not “foo”.
ProcessQuery()
.with_process_name(contains=Not("foo"))
All Together¶
This query matches a process with a processname that either is _not ‘foo’ but ends with ‘.exe’, or it will match a process with a process containing “bar” and “baz”.
ProcessQuery()
.with_process_name(contains=Not("foo"), ends_eith=".exe")
.with_process_name(contains=["bar", baz])
Filters and functions¶
with_* methods¶
Most Queryable classes provide a suite of methods starting with with_*
.
For example, ProcessQuery provides a with_process_name
.
ProcessQuery.with_process_name
def with_process_name(
self,
eq: Optional["StrCmp"] = None,
contains: Optional["StrCmp"] = None,
ends_with: Optional["StrCmp"] = None,
starts_with: Optional["StrCmp"] = None,
regexp: Optional["StrCmp"] = None,
distance: Optional[Tuple["StrCmp", int]] = None,
) -> ProcessQuery:
pass
The process_name
field is indexed such that we can constrain our query
through:
eq¶
Matches a node’s process_name
if it exactly matches eq
ProcessQuery().with_process_name(eq="svchost.exe")
contains¶
Matches a node’s process_name
if it contains contains
ProcessQuery().with_process_name(contains="svc")
ends_with¶
Matches a node’s process_name
if it ends with ends_with
ProcessQuery().with_process_name(ends_with=".exe")
starts_with¶
Matches a node’s process_name
if it starts with starts_with
ProcessQuery().with_process_name(starts_with="svchost")
regexp¶
Matches a node’s process_name
if it matches the regexp pattern regexp
ProcessQuery().with_process_name(regexp="svc.*exe")
distance¶
Matches a node’s process_name
if it has a string distance of less than the
provided threshold
ProcessQuery().with_process_name(distance=("svchost", 2))
Example¶
Here’s an example where we look for processes with a process_name
that is
not equal to svchost.exe
, but that has a very close string distance to it.
ProcessQuery()
.with_process_name(eq=Not("svchost.exe"), distance=("svchost", 2))