pyanodot.collectors.sqlv2 - SQL Collector for multiple RDBMS backends

  • This Collector enables the user to send results from a SQL query to Anodot.
  • The query is given as a jinja2 templated SQL.
  • This collector uses Python’s SQLAlchemy package and may be used together with any SQLAlchemy-supported engine ( including MySQL, Postgresql, SQLite… see the full list of supported backends ).
  • Amazon Redshift is a Database-as-a-service by Amazon Web Services which allows connections from PostgreSQL clients, and thus is supported.
  • MemSQL is an in-memory database which supports MySQL protocol.

Account-level properties

  • connection - a DB URL String for connecting to the queried database. The string must follow the format described in the SQLAlchemy documentation.

Examples:

  • MySQL on localhost, user scott, password tiger, database mydatabase
  • Postgresql on host example.com port 5432

Query-level properties

  • temporal_field - name of the column ( returned by the query ) to use as a timestamp.
  • query ( { file: "filename" }) - contains the query template to use.
  • If query is a string, then the collector treats this as the query to runThis is a Jinja2 template and may use
  • extra_params ( object ) - contains parameters to be passed as template variables into the Jinja2 template processor.

Example:

{
   "query": [
       "select date_trunc('hour',timestamp) as timestamp, action_type, user_type as user_type,",
       "       count(1) as action_count, ",
       "       sum(amount) as action_volume ",
       "from account_activity where timestamp >= '{{start_time}}'",
       "and timestamp < '{{end_time}}'",
       "group by 1,2,3"
       "order by 1"
   ]
}

Note

the query gives new column names to computed values - this is necessary to enable the collector to read the query results.

See also

dimension_defs and metric_defs below

  • dimension_defs(array) - which columns to treat as dimension.

Example ( for the query above ):

{
    "dimension_defs": [ { "dimension": "action_type" }, { "dimension": "user_type" } ],
}
  • metric_defs (array) - specifies which columns to treat as metrics.
{
    "metric\_defs": [ { "metric": "action\_count",
                        "target\_type": "counter" } ]
}

Command-line

$ bin/anodot-collect.py pyanodot.collectors.sqlv2 -h

usage: anodot-collect.py pyanodot.collectors.sqlv2 [-h] -w WORK_DIR
                                                   [-E ENDPOINT]
                                                   [-T ANODOT_API_TOKEN]
                                                   [-V VER] [-a] [-J]
                                                   [--json-keep-last JSON_KEEP_LAST]
                                                   [-d] [-p] [-D API_DELAY]
                                                   [-C API_CHUNK]
                                                   [-L LOG_FILENAME]
                                                   [-Q QUERY] [-A ACCOUNT]
                                                   [-r] [-t]
                                                   [--timestamp-shift TIMESTAMP_SHIFT]
                                                   [--zero-fill-start-time ZERO_FILL_START_TIME]
                                                   [--zero-fill-end-time ZERO_FILL_END_TIME]
                                                   [--zero-fill-step ZERO_FILL_STEP]
                                                   [-Z] [-X DEFINE]
                                                   [--db-connection DB_CONNECTION]
                                                   [--rowcount-hard-limit ROWCOUNT_HARD_LIMIT]
                                                   [collector_name]

pyanodot.collectors.sqlv2 - Collector for SQL Relational Databases

positional arguments:
  collector_name

optional arguments:
  -h, --help            show this help message and exit
  -w WORK_DIR, --work-dir WORK_DIR
                        working directory ( logs saved there )
  -E ENDPOINT, --endpoint ENDPOINT
                        Anodot API endpoint: `poc` or `production`
  -T ANODOT_API_TOKEN, --anodot-api-token ANODOT_API_TOKEN
                        Token for Anodot API
  -V VER, --ver VER     data version number to send ( unless specified here or
                        in the config, ver=1 )
  -a, --save-anodot-csv
                        write a CSV file with the Anodot format
  -J, --save-anodot-json
                        write a JSON file with the same format being sent to
                        the Anodot API
  --json-keep-last JSON_KEEP_LAST
                        Remove the previously saved JSON files, except the
                        latest JSON_KEEP_LAST files
  -d, --debug           Print verbose debug output
  -p, --production
  -D API_DELAY, --api-delay API_DELAY
                        Anodot API: delay between requests
  -C API_CHUNK, --api-chunk API_CHUNK
                        Anodot API: max. number of metrics to send per request
  -L LOG_FILENAME, --log-filename LOG_FILENAME
                        output log filename, (relative is wrt work_dir)
  -Q QUERY, --query QUERY
                        include only queries matching (fully) the regex in
                        QUERY ( may be specified multiple times )
  -A ACCOUNT, --account ACCOUNT
                        include only accounts matching (fully) the regex in
                        ACCOUNT ( may be specified multiple times)
  -r, --save-raw-csv    write a CSV file corresponding to raw data
  -t, --save-trans-csv  write a CSV file corresponding to data with
                        transformations
  --timestamp-shift TIMESTAMP_SHIFT
                        Add TIMESTAMP_SHIFT seconds to every reported
                        timestamp
  --zero-fill-start-time ZERO_FILL_START_TIME
                        Zero filling start time
  --zero-fill-end-time ZERO_FILL_END_TIME
                        Zero filling end time
  --zero-fill-step ZERO_FILL_STEP
                        Zero filling step size in seconds
  -Z, --zero-fill-missing
                        For each query read the previous JSON and fill the
                        missing samples with zeros
  -X DEFINE, --define DEFINE
                        add key=value as a definition for the query
  --db-connection DB_CONNECTION
                        DB URL for the database to be used
  --rowcount-hard-limit ROWCOUNT_HARD_LIMIT
                        raise and exception in case the returned rows are more
                        than ROWCOUNT_HARD_LIMIT

Configuration examples

The following example shows how to use a configuration file with an external query file.

The table schema is given by the following CREATE DDL:

create table account_activity(  timestamp timestamp,
    action_type text,
    user_type text,
    amount bigint );

Example anodot-collect-conf.json: (Download)

{"anodot_api_token": "PASS the -T flag in the CLI", "collectors": { "sqlv2" : {
    "accounts": { 
        "bank_db": {
        "connection": "postgresql+psycopg2://postgres@172.17.0.2/test",
        "queries": {
            "hourly_summary":{   
                "ver": "101",
                "query_type": "time_range",
                "temporal_field": "hour",
                "extra_params": {
                    "interesting_stuff": [ "list", "elements", "here" ]
                },
                "dimension_defs": [ { "dimension": "action_type" }, { "dimension": "user_type" } ],
                "metric_defs": [ {  "metric": "action_count", 
                                    "target_type": "counter" },
                                 {  "metric": "action_volume",
                                    "target_type": "counter" } ],
                "query": { "file": "query_template.sql" }
            }
        }
    }
    }
}}}

External query file query_template.sql: (Download)

select date_trunc('hour',timestamp) as hour, action_type, user_type as user_type,
count(1) as action_count,
sum(amount) as action_volume
from account_activity where timestamp >= '{{start_time}}' -- You can insert parameters from the CLI with the "-X key=value"
and timestamp < '{{end_time}}'  
group by hour, action_type, user_type
order by hour                           -- Sort the output on timestamp