Grapl

Grapl is a Graph Platform for Detection and Response with a focus on helping Detection Engineers and Incident Responders stop fighting their data and start connecting it. Find out more on our Github.

For now, our documentation primarily focuses on grapl_analyzerlib. grapl_analyzerlib provides a Python interface for end-users to interact with the data in Grapl.

Note

Grapl’s documentation is still a work in progress.

Queryables

Grapl provides powerful primitives for building graph based queries.

At the root of this query logic is the Queryable base class, though you shouldn’t ever have to work with that directly.

Queries are themselves Python classes that can be composed and constrained.

A simple query would look like this:

ProcessQuery()

This query describes a process - any process, it’s totally unconstrained.

We can execute this query in a few ways. Here are three examples,

gclient = GraphClient()

all_processes = ProcessQuery().query(gclient)
one_process = ProcessQuery().query_first(gclient)
count = ProcessQuery().get_count(gclient)

Queryable.query

Query the graph for all nodes that match.

graph_client - a GraphClient, which will determine which database to query contains_node_key - a node_key that must exist somewhere in the query first - return only the first first nodes. Defaults to 1000. When contains_node_key, first is set to 1.

returns - a list of nodes that matched your query

    def query(
        self,
        graph_client: GraphClient,
        contains_node_key: Optional[str] = None,
        first: Optional[int] = 1000,
    ) -> List["NV"]:
        pass

Queryable.query_first

Query the graph for the first node that matches.

graph_client - a GraphClient, which will determine which database to query contains_node_key - a node_key that must exist somewhere in the query

returns - a list of nodes that matched your query

    def query_first(
        self,
        graph_client: GraphClient,
        contains_node_key: Optional[str] = None
    ) -> Optional["NV"]:
        pass

Queryable.get_count

Query the graph, counting all matches.

graph_client - a GraphClient, which will determine which database to query first - count up to first, and then stop.

returns - the number of matches for this query. If first is set, only count up to first.

    def get_count(
        self,
        graph_client: GraphClient,
        first: Optional[int] = None,
    ) -> int:
        pass

contains_node_key

In some cases, such as when writing Grapl Analyzers, we want to execute a query where a node’s node_key may be anywhere in that graph.

For example,

query = (
    ProcessQuery()  # A
    .with_bin_file(
        FileQuery()  # B
        .with_spawned_from(
            ProcessQuery()  # C
        )
    )
)

query.query_first(mclient, contains_node_key="node-key-to-query")

In this case, if our signature matches such that any of the nodes A, B, C, have the node_key “node-key-to-query”, we have a match - otherwise, no match.

Boolean Logic

And

For a single predicate constraint (with_* method) all constraints are considered And’d.

This query matches a process name that contains both “foo” and “bar”.

ProcessQuery()
.with_process_name(contains=["foo", "bar"])

Or

Multiple predicate constraints are considered Or’d.

This query matches a process name that contains either “foo” or “bar”.

ProcessQuery()
.with_process_name(contains="foo")
.with_process_name(contains="bar")

Not

Any constraint can be wrapped in a Not to negate the constraint.

This query matches a process name that is not “foo”.

ProcessQuery()
.with_process_name(contains=Not("foo"))

All Together

This query matches a process with a processname that either is _not ‘foo’ but ends with ‘.exe’, or it will match a process with a process containing “bar” and “baz”.

ProcessQuery()
.with_process_name(contains=Not("foo"), ends_eith=".exe")
.with_process_name(contains=["bar", baz])

Filters and functions

with_* methods

Most Queryable classes provide a suite of methods starting with with_*.

For example, ProcessQuery provides a with_process_name.

ProcessQuery.with_process_name

def with_process_name(
    self,
    eq: Optional["StrCmp"] = None,
    contains: Optional["StrCmp"] = None,
    ends_with: Optional["StrCmp"] = None,
    starts_with: Optional["StrCmp"] = None,
    regexp: Optional["StrCmp"] = None,
    distance: Optional[Tuple["StrCmp", int]] = None,
) -> ProcessQuery:
    pass

The process_name field is indexed such that we can constrain our query through:

eq

Matches a node’s process_name if it exactly matches eq

ProcessQuery().with_process_name(eq="svchost.exe")

contains

Matches a node’s process_name if it contains contains

ProcessQuery().with_process_name(contains="svc")

ends_with

Matches a node’s process_name if it ends with ends_with

ProcessQuery().with_process_name(ends_with=".exe")

starts_with

Matches a node’s process_name if it starts with starts_with

ProcessQuery().with_process_name(starts_with="svchost")

regexp

Matches a node’s process_name if it matches the regexp pattern regexp

ProcessQuery().with_process_name(regexp="svc.*exe")

distance

Matches a node’s process_name if it has a string distance of less than the provided threshold

ProcessQuery().with_process_name(distance=("svchost", 2))

Example

Here’s an example where we look for processes with a process_name that is not equal to svchost.exe, but that has a very close string distance to it.

ProcessQuery()
.with_process_name(eq=Not("svchost.exe"), distance=("svchost", 2))

Analyzers

Analyzers

Analyzers are the attack signatures that power Grapl’s realtime detection logic.

Though implementing analyzers is simple, we can build extremely powerful and efficient logic to catch all sorts of attacker behaviors.

(TODO: Fill this out once we’ve massaged out the new, late-2022 plugin SDK.)

Setup

NOTE: We do not currently support running Grapl locally; strictly on AWS.

AWS setup

Warnings

NOTE that setting up Grapl will incur AWS charges! This can amount to hundreds of dollars a month based on the configuration. This setup script is designed for testing, and may include breaking changes in future versions, increased charges in future versions, or may otherwise require manually working with CloudFormation. If you need a way to set up Grapl in a stable, forwards compatible manner, please get in contact with us directly.

Preparation

Local AWS credentials

See full instructions here.

You should have a local file ~/.aws/credentials, with an entry resembling this format:

[my_profile]
aws_access_key_id=...
aws_secret_access_key=...
aws_session_token=...

You will need the profile to configure your account, if you haven’t already:

aws configure --profile "my_profile"

If your profile’s name is not “default”, then note it down, as you will need to include it as a parameter in later steps.

Installing Dependencies

You’ll need to have the following dependencies installed:

  • Pulumi: https://www.pulumi.com/docs/get-started/install/

  • AWS CLI:

    • your choice of the following:

      • pip install awscli

      • https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2-docker.html

        • helpful alias: alias aws='docker run --rm -it -v ~/.aws:/root/.aws -v $(pwd):/aws -e AWS_PROFILE amazon/aws-cli'

Clone Grapl Git repository
git clone https://github.com/grapl-security/grapl.git
cd grapl/

The remaining steps assume your working directory is the Grapl repository.

Build deployment artifacts

Previously we supported uploading deployment artifacts (Docker images) directly from your dev machine, but the current state of Grapl requires that the Docker images be downloaded from Dockerhub or Cloudsmith. If you truly wish to upload an image to Cloudsmith, try bin/upload_image_to_cloudsmith.sh

Spin up infrastructure with Pulumi

(This section is actively under development, and as of Dec 2021 requires infrastructure defined in the private repository https://github.com/grapl-security/platform-infrastructure )

See pulumi/README.md for instructions to spin up infrastructure in AWS with Pulumi. Once you have successfully deployed Grapl with Pulumi, return here and follow the instructions in the following section to provision Grapl and run the tests.

graplctl

We use the graplctl utility to manage Grapl in AWS.

Installation

To install graplctl run the following command in the Grapl checkout root:

make graplctl

This will build the graplctl binary and install it in the ./bin/ directory. You can familiarize yourself with graplctl by running

./bin/graplctl --help
Usage notes for setup

If your AWS profile is not named ‘default’, you will need to explicitly provide it as a parameter:

  • as a command line invocation parameter

  • as an environmenal variable

Usage with Pulumi

Several commands will need references to things like S3 buckets or AWS log groups. While you can pass these values directly, you can also pull them from a Pulumi stack’s outputs automatically.

To do this, you will need to export GRAPLCTL_PULUMI_STACK in your environment, and then use the ./bin/graplctl-pulumi.sh wrapper instead of invoking graplctl directly.

For further details, please read the documentation in that script.

Testing

Follow the instructions in this section to deploy analyzers, upload test data, and execute the end-to-end tests in AWS.

Deploy analyzers

TBD

Upload test data

TBD

Logging in to the Grapl UI with the test user

You may use the test user to log into Grapl and interact with the UI. The test username is the deployment name followed by -grapl-test-user. For example, if your deployment was named test-deployment, your username would be test-deployment-grapl-test-user.

To retrieve the password for your grapl deployment, navigate to “AWS Secrets Manager” and click on “Secrets”.

Click on the “Secret name” url that represents your deployment name followed by -TestUserPassword. The link will bring you to the “secret details” screen. Scroll down to the section labeled “Secret Value” and click the “Retrieve Secret Value” button. The password for your deployment will appear under “Plaintext”.

DGraph operations

You can manage the DGraph cluster with the docker swarm tooling by logging into one of the swarm managers with SSM. If you forget which instances are the swarm managers, you can find them by running graplctl swarm managers. For your convenience, graplctl also provides an exec command you can use to run a bash command remotely on a swarm manager. For example, to list all the nodes in the Dgraph swarm you can run something like the following:

bin/graplctl swarm exec --swarm-id my-swarm-id -- docker node ls

If you forget which swarm-id is associated with your Dgraph cluster, you may list all the swarm IDs in your deployment by running bin/graplctl swarm ls.

Plugins

Implementing a Graph Generator

Graph Generators are Grapl’s parser services; they take in raw events and they produce a graph representation.

As an example, a geneartor for OSQuery process_event table would take in an event like this:

{
  "action": "added",
  "columns": {
    "uid": "0",
    "time": "1527895541",
    "pid": "30219",
    "path": "/usr/bin/curl",
    "auid": "1000",
    "cmdline": "curl google.com",
    "ctime": "1503452096",
    "cwd": "",
    "egid": "0",
    "euid": "0",
    "gid": "0",
    "parent": "30200"
  },
  "unixTime": 1527895550,
  "hostIdentifier": "vagrant",
  "name": "process_events",
  "numerics": false
}

And produce a graph that represents the entities and relationships in the event.

For example, we might have a graph that looks like this (minimally):


// A node representing the child process
ChildProcessNode {
    pid: event.columns.pid,  // The child process pid
    created_timestamp: event.columns.time  // The child process creation time
}

// A node representing the parent
ParentProcessNode {
    pid: event.columns.parent,  // The parent process pid
    seen_at_timestamp: event.columns.time  // The time that we saw the parent process
}

// An edge, relating the two processes
ChildrenEdge {
    from: ParentProcess,
    to: ChildProcess,
}

The goal of this document is to guide you through how to build that function.

Getting starting

First off, Grapl’s graph generators are currently written in the Rust programming language. There are a number of benefits to using Rust for parsers, such as it’s high performance while retaining memory safety.

Don’t be intimidated if you don’t know Rust! You don’t have to be an expert to write a generator.

Installing Requirements

You can install rust by running this script:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Creating the Generator Project
cargo new our-graph-generator
cd ./out-graph-generator/

Modify the Cargo.toml to include our Grapl generator library:

[dependencies]
graph-generator-lib = "*"

This library will provide the primitives we need in order to parse our data into a graph.

Implementing the EventHandler

Grapl’s going to handle all of the work to get data in and out of your function, all you need to do is add the entrypoint and implement an interface to do the parsing.

The interface is called the EventHandler.

Testing With Local Grapl

Implementing A Graph Model Plugin

Graph Model Plugins allow you to ‘Bring Your Own Model’ to Grapl. For example, if you wanted to implement a plugin for, say, AWS, which Grapl has no native support for, you would be adding an AWS Model to Grapl.

Models are split into a few components.

  1. Python Schema Definitions - used for provisioning the GraphDB, among other things

  2. Rust Schema Definitions - for graph generators to use

  3. Analyzer Query and Views - used for detection and response

You only need to implement 1 and 2, the code for 3 will be generated for you.

Rust Schema Definitions

In order to generate your graphs and implement a Graph Generator you’ll want to build a schema definition in rust, the language that we currently support for graph generation. As a reminder, graph generators are the services that turn raw data, like event logs, into a graph format that Grapl can understand.

You’ll need a relatively recent installation of rust, https://rustup.rs/

You can create a new rust library to define your schemas by running something like:

cargo new grapl-aws-models

We can then add the necessary dependencies for Grapl:

cargo add grapl-graph-descriptions
cargo add derive-dynamic-node

Then, in your favorite IDE, navigate to the src/lib.rs file, where we’ll put our first model - the Ec2Instance.

src/lib.rs

use derive_dynamic_node::{DynamicNode as DeriveDynamicNode, GraplStaticId};
use rust_proto::graph_descriptions::*;

#[derive(Clone, DeriveDynamicNode, GraplStaticId)]
struct Ec2Instance {
  #[static_id]
  arn: String,
  image_id: String,
  image_description: String,
  instance_id: String,
  launch_time: u64,
  instance_state: String,
  instance_type: String,
  availability_zone: String,
  platform: String,
}

impl IEc2InstanceNode for Ec2InstanceNode {
    fn get_mut_dynamic_node(&mut self) -> &mut DynamicNode {
        &mut self.dynamic_node
    }
}
  • Currently Grapl’s nodes must have only String, u64, or i64 properties.

The Ec2Instance struct is tagged with two important macros - DeriveDyanmicNode, and GraplStaticId.

The DeriveDynamicNode macro generates some code for us, in this case it will generate an Ec2InstanceNode structure, which is what we’ll store data in.

The GraplStaticId macro allows us to define a property, or properties, that can be used to identify the underlying entity. In AWS this is very straightforward - identity is provided by an Arn. Every node in Grapl must have an identity.

When parsing, we can add data to this node type like this:

let mut ec2_instance = Ec2InstanceNode::new(
  Ec2InstanceNode::static_strategy()
);

ec2_instance.with_launch_time(launch_time);
ec2_instance.with_instance_id(&instance_id);

The Ec2InstanceNode struct was generated by those macros, as was the method static_strategy, and the methods for adding data.

Python Schema Definition

The Python schema definitions will serve two functions:

  1. They will help us provision Grapl’s graph databases to understand our new model

  2. They generate more Python code, which we’ll use in our Analyzers to detect and respond to threats using our new models

Our Python Schema for the Ec2InstanceNode will be relatively straightforward to implement.

# TODO replace with instructions for next-gen grapl_analyzerlib

Make sure that the return value of the self_type method is the same name as the struct in your Rust model, in this case Ec2Instance.

Using this Ec2InstanceNodeSchema we can generate the rest of the code that we need for building signatures or responding to attacks.

# TODO replace with instructions for next-gen grapl_analyzerlib

This will generate and print out the code for querying or pivoting off of Ec2Instance nodes in Grapl.

Specifically it will generate the Ec2InstanceQuery and Ec2InstanceView classes.

You can just copy/paste this code into a file and load it up to use. There may be minor changes required, such as imports, but otherwise it should generally ‘just work’.

Modifying the Graph Schema

TODO replace with instructions about graph-schema-manager

Deploying Analyzers With Plugins

TBD

Debugging Tooling

Grapl has several tools to aid in debugging issues.

  • Grapl Debugger Container

  • Distributed Tracing

Grapl Debugger Container

This is a container that can attach to running containers and includes debug tools inside. See the debugger docs for details.

Distributed Tracing

We currently have tracing enabled for local grapl. Our current backend is Jaeger

Usage

  1. Run make up

  2. In your browser go to http://localhost:16686. You should see the Jaeger front-end.

  3. On the left side, are search options.

  4. Select the service you’re interested in and click search. You can also use any of the additional filters on the left (such as http code, etc). If your service does not appear, it’s possible that a) it doesn’t have any traffic (ie the web ui needs a web request), b) there are no traces from within the Lookback window if

  5. In the center a list of traces will appear.

  6. Click on one. You will go to a page with detailed trace information, including performance data for each span.

  7. Click on a span, and then click on tags to get more detailed information, such as http code, etc

  8. On the top-right, there is a drop-down menu with Trace Timeline selected. Clicking on it will provide a few additional options

Tracing pulumi

To have pulumi send traces to Jaeger run WITH_PULUMI_TRACING=1 make $MAKECOMMND where $MAKECOMMAND can be up or test-e2e

Tracing docker buildx bake

docker buildx supports sending traces to a backend. Since we build prior to running Jaeger, you will need to explicitly set Jaeger up, either via running make up first or by running it manually in either docker or as a standalone nomad job.

This tracing is meant to help debug docker build issues including performance issues.

Warning!

  1. This generates a LOT of traces, enough to potentially crash Jaeger.

  2. This slows down the build process significantly :(

To run tracing for the docker build:

  1. Do a one-time setup make setup-docker-tracing if you haven’t already.

  2. Run WITH_TRACING=1 make build-local-infrastructure or any other build command that uses bake such as build-test-e2e. Alternatively, you can run traces for individual services via docker buildx bake --file $FILE --builder=grapl-tracing-builder

Error Codes

This lists the internal error codes that are defined and currently in use at Grapl

Error Code

Usage

42

buildx push error

43

docker context too big

44

devbox is local

45

devbox cannot be reached

46

Bad user input

47

File doesn’t exist

48

Missing key

49

Missing venv directory

50

Process exited unexpectedly

51

Nomad job failed

Local Grapl credentials

(This article should expire by about Oct or Nov 2021, when we’ll have more robust user management.)

Your username is:

local-grapl-grapl-test-user

You can retrieve your password with:

awslocal secretsmanager get-secret-value --secret-id local-grapl-TestUserPassword | jq -r ".SecretString"

To auth against grapl-web-ui:

PASSWORD=$(awslocal secretsmanager get-secret-value --secret-id local-grapl-TestUserPassword | jq -r ".SecretString")
curl -i --location --request POST "http://localhost:1234/auth/login" \
--header 'content-type: application/json' \
--data @- << REQUEST_BODY
{
    "username": "local-grapl-grapl-test-user",
    "password": "${PASSWORD}"
}
REQUEST_BODY

Observability

We currently use Lightstep as our observability platform.

Local tracing

  1. Go to lightstep.com

  2. Log into lightstep using google.

  3. On the left-hand side menu go to developer mode (the angle brackets < >).

  4. Copy the command and run that locally. This will spin up a docker container configured with an api key. Any data submitted will be forwarded to Lightstep.

  5. Run make up. Once everything is up, check the Lightstep developer mode page. You should start seeing traces appear on the page.

Services

Network Diagram (Outdated)

Network Diagram

Grapl Services - main pipeline

Pipeline Ingress

Input: Receives an RPC to insert event logs (e.g. sysmon logs, osquery logs, Cloudtrail logs). (Currently the plan is to allow one event/log per call, but we may revisit this in the future.)

Work: We wrap those logs in an Envelope and throw it in Kafka. No transforms are performed.

Output: Push logs to the ‘raw-logs’ topic for the next service, Generator Dispatcher.

Generator Dispatcher

Input: Pulls event logs from topic ‘raw-logs’

Work: This service will:

  • figure out which generators would respond to the incoming event-source

  • call into Plugin Work Queue to enqueue that work in a durable Postgres store.

Output: Push this work to the Plugin Work Queue.

Plugin Work Queue (for generators)

Input: Receives an RPC push_execute_generator to store generator work

Work: PWQ is a simple facade over a postgres database that lets us store and manage work for a Generator plugin.

Output: Work is pulled from PWQ by Plugin Execution Sidecar.

Plugin Execution Sidecar (for generators)

Input: Pulls work from Plugin Work Queue over gRPC

Work:

  • Loop, querying for new work from PWQ

  • When there is new work, delegate it to the Generator binary that runs alongside this sidecar over gRPC

  • When the Generator’s work is completed - successful or failure - call acknowledge_generator in PWQ. This will send the generator’s output onto a Kafka queue.

Output: Put generated graphs on the ‘generated-graphs’ topic for the Node Identifier.

Plugin (generator)

Input: Receives an RPC containing an event log (i.e. a sysmon event)

Work: Turns these events into a standalone subgraph, independent of existing Dgraph/Scylla state.

Output: The subgraph is returned as a response to the incoming RPC, going to the Plugin Execution Sidecar.

Node Identifier

Input: Pulls generated graphs from topic ‘generated-graphs’

Work: Identifies nodes in the incoming subgraph against the canonical identities of nodes in DynamoDB. The incoming nodes may be new, or they may represent something already known in the master graph.

For instance, an incoming subgraph may refer to a process {PID:1234, name: "coolthing.exe", started at 1:00AM}; it’s possible that Dgraph already has a node representing this exact same process. We’d like to de-duplicate this process node.

Output: Push identified subgraphs to the ‘identified-subgraph’ topic for the next service, Graph Merger.

Graph Merger

Input: Pulls identified graphs from topic ‘identified-graphs’

Work: Write the new edges and nodes from Node Identifier to Dgraph.

Output: Push merged graphs to the ‘merged-graphs’ topic for the next service.

TODO: Analyzer-dispatcher and analyzer subsystem

Managerial RPC services

Services that

Organization Management

TODO

Plugin Registry

This service manages Generator and Analyzer plugins, letting one create, deploy and teardown those plugins via RPC calls.

Other services

Model Plugin Deployer

TODO

Event Source

Create, get and update an Event Source, which is an ID that lets us tie incoming Generator work to which Plugin we think should process it.

Other services

Engagement View (aka UX)

Provides the main customer interaction with Grapl. This is not actually a standalone service, but hosted as static assets inside Grapl Web UI.

Graphql Endpoint

Graphql interface into our Dgraph database.

Grapl Web UI

Provides authn/authz functions, and acts as a router to other services:

  • Graphql Endpoint (/graphqlEndpoint)

  • Model Plugin Deployer (current undergoing rewrite)

Also, hosts static assets like Engagement View.

Visual Studio Code Debugger

Python debugger

Setup VSCode

Add the following as a launch.json debug configuration in VSCode. You’ll want a different configuration for each service you want to debug; in this case, we’re debugging the grapl_e2e_tests. Each service’s configuration should likely have a different path-mapping and a different port.

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "E2E tests debug",
            "type": "python",
            "request": "attach",
            "connect": {
                "host": "127.0.0.1",
                "port": 8400
            },
            // Also debug library code, like grapl-tests-common
            "justMyCode": false,
            "pathMappings": [
                {
                    "localRoot": "${workspaceFolder}/src/python/grapl_e2e_tests",
                    "remoteRoot": "/home/grapl/grapl_e2e_tests"
                },
                {
                    "localRoot": "${workspaceFolder}/src/python/grapl-tests-common/grapl_tests_common",
                    "remoteRoot": "../venv/lib/python3.10/site-packages/grapl_tests_common"
                },
            ]
        }
    ]
}

Run the tests

Run the tests with the following. (Again, this example is strictly about the E2E Tests.)

DEBUG_SERVICES=grapl_e2e_tests make test-e2e

You’ll see the test start up and output the following:

>> Debugpy listening for client at <something>

At this point, start the debug configuration in your VSCode. The Python code should start moving along. If you need to debug more than on service, DEBUG_SERVICES can be fed a comma-separated-list of service names.

Enable python debugging for another service

You can see all of these steps in action in this pull request.

  • Add the service name, and an unused port in the 84xx range, to SERVICE_TO_PORT in vsc_debugger.py.

  • Forward that port, for that service, in docker-compose.yml

  • Call wait_for_vsc_debugger("name_of_service_goes_here") in the main entrypoint for the service. (Don’t worry, it won’t trigger the debugger unless you declare DEBUG_SERVICES.)

  • Add a new launch.json debug configuration for that port.

  • Finally - run your make test-e2e command with DEBUG_SERVICES=name_of_service_goes_here.

Rust debugger

We haven’t invested in this yet!

Queries and Views

Queries and Views are the main constructs to work with the graph.

Queries allow you to pull data from the graph that matches a structure.

Views represent an existing graph, which you can expand by pivoting off of its edges.

Let’s query for some processes with the name “svchost”.

# TODO replace with instructions for next-gen grapl_analyzerlib

Now we can pivot around that data. Let’s look at the parent processes of these svchosts:

# TODO replace with instructions for next-gen grapl_analyzerlib

Installation

Install grapl_analyzerlib by running:

pip install --user grapl_analyzerlib

License

The project is licensed under the Apache 2.0 license.