Grapl¶
Grapl is a Graph Platform for Detection and Response with a focus on helping Detection Engineers and Incident Responders stop fighting their data and start connecting it. Find out more on our Github.
For now, our documentation primarily focuses on grapl_analyzerlib. grapl_analyzerlib provides a Python interface for end-users to interact with the data in Grapl.
Note
Grapl’s documentation is still a work in progress.
Queryables¶
Grapl provides powerful primitives for building graph based queries.
At the root of this query logic is the Queryable base class, though you shouldn’t ever have to work with that directly.
Queries are themselves Python classes that can be composed and constrained.
A simple query would look like this:
ProcessQuery()
This query describes a process - any process, it’s totally unconstrained.
We can execute this query in a few ways. Here are three examples,
gclient = GraphClient()
all_processes = ProcessQuery().query(gclient)
one_process = ProcessQuery().query_first(gclient)
count = ProcessQuery().get_count(gclient)
Queryable.query¶
Query the graph for all nodes that match.
graph_client
- a GraphClient, which will determine which database to query
contains_node_key
- a node_key that must exist somewhere in the query
first
- return only the first first
nodes. Defaults to 1000. When
contains_node_key
, first
is set to 1.
returns
- a list of nodes that matched your query
def query(
self,
graph_client: GraphClient,
contains_node_key: Optional[str] = None,
first: Optional[int] = 1000,
) -> List["NV"]:
pass
Queryable.query_first¶
Query the graph for the first node that matches.
graph_client
- a GraphClient, which will determine which database to query
contains_node_key
- a node_key that must exist somewhere in the query
returns
- a list of nodes that matched your query
def query_first(
self,
graph_client: GraphClient,
contains_node_key: Optional[str] = None
) -> Optional["NV"]:
pass
Queryable.get_count¶
Query the graph, counting all matches.
graph_client
- a GraphClient, which will determine which database to query
first
- count up to first
, and then stop.
returns
- the number of matches for this query. If first
is set, only count
up to first
.
def get_count(
self,
graph_client: GraphClient,
first: Optional[int] = None,
) -> int:
pass
contains_node_key¶
In some cases, such as when writing Grapl Analyzers, we want to execute a query where a node’s node_key may be anywhere in that graph.
For example,
query = (
ProcessQuery() # A
.with_bin_file(
FileQuery() # B
.with_spawned_from(
ProcessQuery() # C
)
)
)
query.query_first(mclient, contains_node_key="node-key-to-query")
In this case, if our signature matches such that any of the nodes A, B, C, have the node_key “node-key-to-query”, we have a match - otherwise, no match.
Boolean Logic¶
And¶
For a single predicate constraint (with_* method) all constraints are considered And’d.
This query matches a process name that contains both “foo” and “bar”.
ProcessQuery()
.with_process_name(contains=["foo", "bar"])
Or¶
Multiple predicate constraints are considered Or’d.
This query matches a process name that contains either “foo” or “bar”.
ProcessQuery()
.with_process_name(contains="foo")
.with_process_name(contains="bar")
Not¶
Any constraint can be wrapped in a Not to negate the constraint.
This query matches a process name that is not “foo”.
ProcessQuery()
.with_process_name(contains=Not("foo"))
All Together¶
This query matches a process with a processname that either is _not ‘foo’ but ends with ‘.exe’, or it will match a process with a process containing “bar” and “baz”.
ProcessQuery()
.with_process_name(contains=Not("foo"), ends_eith=".exe")
.with_process_name(contains=["bar", baz])
Filters and functions¶
with_* methods¶
Most Queryable classes provide a suite of methods starting with with_*
.
For example, ProcessQuery provides a with_process_name
.
ProcessQuery.with_process_name
def with_process_name(
self,
eq: Optional["StrCmp"] = None,
contains: Optional["StrCmp"] = None,
ends_with: Optional["StrCmp"] = None,
starts_with: Optional["StrCmp"] = None,
regexp: Optional["StrCmp"] = None,
distance: Optional[Tuple["StrCmp", int]] = None,
) -> ProcessQuery:
pass
The process_name
field is indexed such that we can constrain our query
through:
eq¶
Matches a node’s process_name
if it exactly matches eq
ProcessQuery().with_process_name(eq="svchost.exe")
contains¶
Matches a node’s process_name
if it contains contains
ProcessQuery().with_process_name(contains="svc")
ends_with¶
Matches a node’s process_name
if it ends with ends_with
ProcessQuery().with_process_name(ends_with=".exe")
starts_with¶
Matches a node’s process_name
if it starts with starts_with
ProcessQuery().with_process_name(starts_with="svchost")
regexp¶
Matches a node’s process_name
if it matches the regexp pattern regexp
ProcessQuery().with_process_name(regexp="svc.*exe")
distance¶
Matches a node’s process_name
if it has a string distance of less than the
provided threshold
ProcessQuery().with_process_name(distance=("svchost", 2))
Example¶
Here’s an example where we look for processes with a process_name
that is
not equal to svchost.exe
, but that has a very close string distance to it.
ProcessQuery()
.with_process_name(eq=Not("svchost.exe"), distance=("svchost", 2))
Analyzers¶
Analyzers¶
Analyzers are the attack signatures that power Grapl’s realtime detection logic.
Though implementing analyzers is simple, we can build extremely powerful and efficient logic to catch all sorts of attacker behaviors.
(TODO: Fill this out once we’ve massaged out the new, late-2022 plugin SDK.)
Setup¶
NOTE: We do not currently support running Grapl locally; strictly on AWS.
AWS setup¶
Warnings¶
NOTE that setting up Grapl will incur AWS charges! This can amount to hundreds of dollars a month based on the configuration. This setup script is designed for testing, and may include breaking changes in future versions, increased charges in future versions, or may otherwise require manually working with CloudFormation. If you need a way to set up Grapl in a stable, forwards compatible manner, please get in contact with us directly.
Preparation¶
Local AWS credentials¶
See full instructions here.
You should have a local file ~/.aws/credentials
, with an entry resembling this
format:
[my_profile]
aws_access_key_id=...
aws_secret_access_key=...
aws_session_token=...
You will need the profile to configure your account, if you haven’t already:
aws configure --profile "my_profile"
If your profile’s name is not “default”, then note it down, as you will need to include it as a parameter in later steps.
Installing Dependencies¶
You’ll need to have the following dependencies installed:
Pulumi: https://www.pulumi.com/docs/get-started/install/
AWS CLI:
your choice of the following:
pip install awscli
https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2-docker.html
helpful alias:
alias aws='docker run --rm -it -v ~/.aws:/root/.aws -v $(pwd):/aws -e AWS_PROFILE amazon/aws-cli'
Clone Grapl Git repository¶
git clone https://github.com/grapl-security/grapl.git
cd grapl/
The remaining steps assume your working directory is the Grapl repository.
Build deployment artifacts¶
Previously we supported uploading deployment artifacts (Docker images) directly
from your dev machine, but the current state of Grapl requires that the Docker
images be downloaded from Dockerhub or Cloudsmith. If you truly wish to upload
an image to Cloudsmith, try bin/upload_image_to_cloudsmith.sh
Spin up infrastructure with Pulumi¶
(This section is actively under development, and as of Dec 2021 requires infrastructure defined in the private repository https://github.com/grapl-security/platform-infrastructure )
See pulumi/README.md for instructions to spin up infrastructure in AWS with Pulumi. Once you have successfully deployed Grapl with Pulumi, return here and follow the instructions in the following section to provision Grapl and run the tests.
graplctl
¶
We use the graplctl
utility to manage Grapl in AWS.
Installation¶
To install graplctl
run the following command in the Grapl checkout root:
make graplctl
This will build the graplctl
binary and install it in the ./bin/
directory.
You can familiarize yourself with graplctl
by running
./bin/graplctl --help
Usage notes for setup¶
If your AWS profile is not named ‘default’, you will need to explicitly provide it as a parameter:
as a command line invocation parameter
as an environmenal variable
Usage with Pulumi¶
Several commands will need references to things like S3 buckets or AWS log groups. While you can pass these values directly, you can also pull them from a Pulumi stack’s outputs automatically.
To do this, you will need to export GRAPLCTL_PULUMI_STACK
in your environment,
and then use the ./bin/graplctl-pulumi.sh
wrapper instead of invoking
graplctl
directly.
For further details, please read the documentation in that script.
Testing¶
Follow the instructions in this section to deploy analyzers, upload test data, and execute the end-to-end tests in AWS.
Deploy analyzers¶
TBD
Upload test data¶
TBD
Logging in to the Grapl UI with the test user¶
You may use the test user to log into Grapl and interact with the UI. The test
username is the deployment name followed by -grapl-test-user
. For example, if
your deployment was named test-deployment
, your username would be
test-deployment-grapl-test-user
.
To retrieve the password for your grapl deployment, navigate to “AWS Secrets Manager” and click on “Secrets”.
Click on the “Secret name” url that represents your deployment name followed by
-TestUserPassword
. The link will bring you to the “secret details” screen.
Scroll down to the section labeled “Secret Value” and click the “Retrieve Secret
Value” button. The password for your deployment will appear under “Plaintext”.
DGraph operations¶
You can manage the DGraph cluster with the docker swarm tooling by logging into
one of the swarm managers with SSM. If you forget which instances are the swarm
managers, you can find them by running graplctl swarm managers
. For your
convenience, graplctl
also provides an exec
command you can use to run a
bash command remotely on a swarm manager. For example, to list all the nodes in
the Dgraph swarm you can run something like the following:
bin/graplctl swarm exec --swarm-id my-swarm-id -- docker node ls
If you forget which swarm-id
is associated with your Dgraph cluster, you may
list all the swarm IDs in your deployment by running bin/graplctl swarm ls
.
Plugins¶
Implementing a Graph Generator¶
Graph Generators are Grapl’s parser services; they take in raw events and they produce a graph representation.
As an example, a geneartor for OSQuery process_event table would take in an event like this:
{
"action": "added",
"columns": {
"uid": "0",
"time": "1527895541",
"pid": "30219",
"path": "/usr/bin/curl",
"auid": "1000",
"cmdline": "curl google.com",
"ctime": "1503452096",
"cwd": "",
"egid": "0",
"euid": "0",
"gid": "0",
"parent": "30200"
},
"unixTime": 1527895550,
"hostIdentifier": "vagrant",
"name": "process_events",
"numerics": false
}
And produce a graph that represents the entities and relationships in the event.
For example, we might have a graph that looks like this (minimally):
// A node representing the child process
ChildProcessNode {
pid: event.columns.pid, // The child process pid
created_timestamp: event.columns.time // The child process creation time
}
// A node representing the parent
ParentProcessNode {
pid: event.columns.parent, // The parent process pid
seen_at_timestamp: event.columns.time // The time that we saw the parent process
}
// An edge, relating the two processes
ChildrenEdge {
from: ParentProcess,
to: ChildProcess,
}
The goal of this document is to guide you through how to build that function.
Getting starting¶
First off, Grapl’s graph generators are currently written in the Rust programming language. There are a number of benefits to using Rust for parsers, such as it’s high performance while retaining memory safety.
Don’t be intimidated if you don’t know Rust! You don’t have to be an expert to write a generator.
Installing Requirements¶
You can install rust by running this script:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Creating the Generator Project¶
cargo new our-graph-generator
cd ./out-graph-generator/
Modify the Cargo.toml
to include our Grapl generator library:
[dependencies]
graph-generator-lib = "*"
This library will provide the primitives we need in order to parse our data into a graph.
Implementing the EventHandler¶
Grapl’s going to handle all of the work to get data in and out of your function, all you need to do is add the entrypoint and implement an interface to do the parsing.
The interface is called the EventHandler.
Testing With Local Grapl¶
Implementing A Graph Model Plugin¶
Graph Model Plugins allow you to ‘Bring Your Own Model’ to Grapl. For example, if you wanted to implement a plugin for, say, AWS, which Grapl has no native support for, you would be adding an AWS Model to Grapl.
Models are split into a few components.
Python Schema Definitions - used for provisioning the GraphDB, among other things
Rust Schema Definitions - for graph generators to use
Analyzer Query and Views - used for detection and response
You only need to implement 1 and 2, the code for 3 will be generated for you.
Rust Schema Definitions¶
In order to generate your graphs and implement a Graph Generator you’ll want to build a schema definition in rust, the language that we currently support for graph generation. As a reminder, graph generators are the services that turn raw data, like event logs, into a graph format that Grapl can understand.
You’ll need a relatively recent installation of rust, https://rustup.rs/
You can create a new rust library to define your schemas by running something like:
cargo new grapl-aws-models
We can then add the necessary dependencies for Grapl:
cargo add grapl-graph-descriptions
cargo add derive-dynamic-node
Then, in your favorite IDE, navigate to the src/lib.rs
file, where we’ll put
our first model - the Ec2Instance.
src/lib.rs
use derive_dynamic_node::{DynamicNode as DeriveDynamicNode, GraplStaticId};
use rust_proto::graph_descriptions::*;
#[derive(Clone, DeriveDynamicNode, GraplStaticId)]
struct Ec2Instance {
#[static_id]
arn: String,
image_id: String,
image_description: String,
instance_id: String,
launch_time: u64,
instance_state: String,
instance_type: String,
availability_zone: String,
platform: String,
}
impl IEc2InstanceNode for Ec2InstanceNode {
fn get_mut_dynamic_node(&mut self) -> &mut DynamicNode {
&mut self.dynamic_node
}
}
Currently Grapl’s nodes must have only String, u64, or i64 properties.
The Ec2Instance struct is tagged with two important macros - DeriveDyanmicNode, and GraplStaticId.
The DeriveDynamicNode macro generates some code for us, in this case it will
generate an Ec2InstanceNode
structure, which is what we’ll store data in.
The GraplStaticId macro allows us to define a property, or properties, that can be used to identify the underlying entity. In AWS this is very straightforward - identity is provided by an Arn. Every node in Grapl must have an identity.
When parsing, we can add data to this node type like this:
let mut ec2_instance = Ec2InstanceNode::new(
Ec2InstanceNode::static_strategy()
);
ec2_instance.with_launch_time(launch_time);
ec2_instance.with_instance_id(&instance_id);
The Ec2InstanceNode
struct was generated by those macros, as was the method
static_strategy
, and the methods for adding data.
Python Schema Definition¶
The Python schema definitions will serve two functions:
They will help us provision Grapl’s graph databases to understand our new model
They generate more Python code, which we’ll use in our Analyzers to detect and respond to threats using our new models
Our Python Schema for the Ec2InstanceNode will be relatively straightforward to implement.
# TODO replace with instructions for next-gen grapl_analyzerlib
Make sure that the return value of the self_type
method is the same name as
the struct in your Rust model, in this case Ec2Instance
.
Using this Ec2InstanceNodeSchema we can generate the rest of the code that we need for building signatures or responding to attacks.
# TODO replace with instructions for next-gen grapl_analyzerlib
This will generate and print out the code for querying or pivoting off of Ec2Instance nodes in Grapl.
Specifically it will generate the Ec2InstanceQuery
and Ec2InstanceView
classes.
You can just copy/paste this code into a file and load it up to use. There may be minor changes required, such as imports, but otherwise it should generally ‘just work’.
Modifying the Graph Schema¶
TODO replace with instructions about graph-schema-manager¶
Deploying Analyzers With Plugins¶
TBD
Debugging Tooling¶
Grapl has several tools to aid in debugging issues.
Grapl Debugger Container
Distributed Tracing
Grapl Debugger Container¶
This is a container that can attach to running containers and includes debug tools inside. See the debugger docs for details.
Distributed Tracing¶
We currently have tracing enabled for local grapl. Our current backend is Jaeger
Usage¶
Run
make up
In your browser go to
http://localhost:16686
. You should see the Jaeger front-end.On the left side, are search options.
Select the service you’re interested in and click search. You can also use any of the additional filters on the left (such as http code, etc). If your service does not appear, it’s possible that a) it doesn’t have any traffic (ie the web ui needs a web request), b) there are no traces from within the Lookback window if
In the center a list of traces will appear.
Click on one. You will go to a page with detailed trace information, including performance data for each span.
Click on a span, and then click on tags to get more detailed information, such as http code, etc
On the top-right, there is a drop-down menu with Trace Timeline selected. Clicking on it will provide a few additional options
Tracing pulumi¶
To have pulumi send traces to Jaeger run
WITH_PULUMI_TRACING=1 make $MAKECOMMND
where $MAKECOMMAND can be up
or
test-e2e
Tracing docker buildx bake¶
docker buildx supports sending traces to a backend. Since we build prior to
running Jaeger, you will need to explicitly set Jaeger up, either via running
make up
first or by running it manually in either docker or as a standalone
nomad job.
This tracing is meant to help debug docker build issues including performance issues.
Warning!
This generates a LOT of traces, enough to potentially crash Jaeger.
This slows down the build process significantly :(
To run tracing for the docker build:
Do a one-time setup
make setup-docker-tracing
if you haven’t already.Run
WITH_TRACING=1 make build-local-infrastructure
or any other build command that uses bake such asbuild-test-e2e
. Alternatively, you can run traces for individual services viadocker buildx bake --file $FILE --builder=grapl-tracing-builder
Error Codes¶
This lists the internal error codes that are defined and currently in use at Grapl
Error Code |
Usage |
---|---|
42 |
buildx push error |
43 |
docker context too big |
44 |
devbox is local |
45 |
devbox cannot be reached |
46 |
Bad user input |
47 |
File doesn’t exist |
48 |
Missing key |
49 |
Missing venv directory |
50 |
Process exited unexpectedly |
51 |
Nomad job failed |
Local Grapl credentials¶
(This article should expire by about Oct or Nov 2021, when we’ll have more robust user management.)
Your username is:
local-grapl-grapl-test-user
You can retrieve your password with:
awslocal secretsmanager get-secret-value --secret-id local-grapl-TestUserPassword | jq -r ".SecretString"
To auth against grapl-web-ui
:
PASSWORD=$(awslocal secretsmanager get-secret-value --secret-id local-grapl-TestUserPassword | jq -r ".SecretString")
curl -i --location --request POST "http://localhost:1234/auth/login" \
--header 'content-type: application/json' \
--data @- << REQUEST_BODY
{
"username": "local-grapl-grapl-test-user",
"password": "${PASSWORD}"
}
REQUEST_BODY
Observability¶
We currently use Lightstep as our observability platform.
Local tracing¶
Go to lightstep.com
Log into lightstep using google.
On the left-hand side menu go to developer mode (the angle brackets
< >
).Copy the command and run that locally. This will spin up a docker container configured with an api key. Any data submitted will be forwarded to Lightstep.
Run
make up
. Once everything is up, check the Lightstep developer mode page. You should start seeing traces appear on the page.
Services¶
Network Diagram (Outdated)¶
Grapl Services - main pipeline¶
Pipeline Ingress¶
Input: Receives an RPC to insert event logs (e.g. sysmon logs, osquery logs, Cloudtrail logs). (Currently the plan is to allow one event/log per call, but we may revisit this in the future.)
Work: We wrap those logs in an Envelope and throw it in Kafka. No transforms are performed.
Output: Push logs to the ‘raw-logs’ topic for the next service, Generator Dispatcher.
Generator Dispatcher¶
Input: Pulls event logs from topic ‘raw-logs’
Work: This service will:
figure out which generators would respond to the incoming event-source
call into Plugin Work Queue to enqueue that work in a durable Postgres store.
Output: Push this work to the Plugin Work Queue.
Plugin Work Queue (for generators)¶
Input: Receives an RPC push_execute_generator
to store generator work
Work: PWQ is a simple facade over a postgres database that lets us store and manage work for a Generator plugin.
Output: Work is pulled from PWQ by Plugin Execution Sidecar.
Plugin Execution Sidecar (for generators)¶
Input: Pulls work from Plugin Work Queue over gRPC
Work:
Loop, querying for new work from PWQ
When there is new work, delegate it to the Generator binary that runs alongside this sidecar over gRPC
When the Generator’s work is completed - successful or failure - call
acknowledge_generator
in PWQ. This will send the generator’s output onto a Kafka queue.
Output: Put generated graphs on the ‘generated-graphs’ topic for the Node Identifier.
Plugin (generator)¶
Input: Receives an RPC containing an event log (i.e. a sysmon event)
Work: Turns these events into a standalone subgraph, independent of existing Dgraph/Scylla state.
Output: The subgraph is returned as a response to the incoming RPC, going to the Plugin Execution Sidecar.
Node Identifier¶
Input: Pulls generated graphs from topic ‘generated-graphs’
Work: Identifies nodes in the incoming subgraph against the canonical identities of nodes in DynamoDB. The incoming nodes may be new, or they may represent something already known in the master graph.
For instance, an incoming subgraph may refer to a process
{PID:1234, name: "coolthing.exe", started at 1:00AM}
; it’s possible that
Dgraph already has a node representing this exact same process. We’d like to
de-duplicate this process node.
Output: Push identified subgraphs to the ‘identified-subgraph’ topic for the next service, Graph Merger.
Graph Merger¶
Input: Pulls identified graphs from topic ‘identified-graphs’
Work: Write the new edges and nodes from Node Identifier to Dgraph.
Output: Push merged graphs to the ‘merged-graphs’ topic for the next service.
TODO: Analyzer-dispatcher and analyzer subsystem¶
Managerial RPC services¶
Services that
Organization Management¶
TODO
Plugin Registry¶
This service manages Generator and Analyzer plugins, letting one create, deploy and teardown those plugins via RPC calls.
Other services¶
Model Plugin Deployer¶
TODO
Event Source¶
Create, get and update an Event Source, which is an ID that lets us tie incoming Generator work to which Plugin we think should process it.
Other services¶
Engagement View (aka UX)¶
Provides the main customer interaction with Grapl. This is not actually a standalone service, but hosted as static assets inside Grapl Web UI.
Graphql Endpoint¶
Graphql interface into our Dgraph database.
Grapl Web UI¶
Provides authn/authz functions, and acts as a router to other services:
Graphql Endpoint (/graphqlEndpoint)
Model Plugin Deployer (current undergoing rewrite)
Also, hosts static assets like Engagement View.
Visual Studio Code Debugger¶
Python debugger¶
Setup VSCode¶
Add the following as a launch.json
debug configuration in VSCode.
You’ll want a different configuration for each service you want to debug; in this case,
we’re debugging the grapl_e2e_tests
.
Each service’s configuration should likely have a different path-mapping and a different port.
{
"version": "0.2.0",
"configurations": [
{
"name": "E2E tests debug",
"type": "python",
"request": "attach",
"connect": {
"host": "127.0.0.1",
"port": 8400
},
// Also debug library code, like grapl-tests-common
"justMyCode": false,
"pathMappings": [
{
"localRoot": "${workspaceFolder}/src/python/grapl_e2e_tests",
"remoteRoot": "/home/grapl/grapl_e2e_tests"
},
{
"localRoot": "${workspaceFolder}/src/python/grapl-tests-common/grapl_tests_common",
"remoteRoot": "../venv/lib/python3.10/site-packages/grapl_tests_common"
},
]
}
]
}
Run the tests¶
Run the tests with the following. (Again, this example is strictly about the E2E Tests.)
DEBUG_SERVICES=grapl_e2e_tests make test-e2e
You’ll see the test start up and output the following:
>> Debugpy listening for client at <something>
At this point, start the debug configuration in your VSCode. The Python code should start moving along.
If you need to debug more than on service, DEBUG_SERVICES
can be fed a comma-separated-list of service names.
Enable python debugging for another service¶
You can see all of these steps in action in this pull request.
Add the service name, and an unused port in the 84xx range, to SERVICE_TO_PORT in
vsc_debugger.py
.Forward that port, for that service, in
docker-compose.yml
Call
wait_for_vsc_debugger("name_of_service_goes_here")
in the main entrypoint for the service. (Don’t worry, it won’t trigger the debugger unless you declareDEBUG_SERVICES
.)Add a new
launch.json
debug configuration for that port.Finally - run your make test-e2e command with
DEBUG_SERVICES=name_of_service_goes_here
.
Rust debugger¶
We haven’t invested in this yet!
Queries and Views¶
Queries and Views are the main constructs to work with the graph.
Queries allow you to pull data from the graph that matches a structure.
Views represent an existing graph, which you can expand by pivoting off of its edges.
Let’s query for some processes with the name “svchost”.
# TODO replace with instructions for next-gen grapl_analyzerlib
Now we can pivot around that data. Let’s look at the parent processes of these svchosts:
# TODO replace with instructions for next-gen grapl_analyzerlib
Installation¶
Install grapl_analyzerlib by running:
pip install --user grapl_analyzerlib
License¶
The project is licensed under the Apache 2.0 license.