pyanodot.collectors.sqlv2 - SQL Collector for multiple RDBMS backends¶
- This Collector enables the user to send results from a SQL query to Anodot.
- The query is given as a jinja2 templated SQL.
- This collector uses Python’s SQLAlchemy package and may be used together with any SQLAlchemy-supported engine ( including MySQL, Postgresql, SQLite… see the full list of supported backends ).
- Amazon Redshift is a Database-as-a-service by Amazon Web Services which allows connections from PostgreSQL clients, and thus is supported.
- MemSQL is an in-memory database which supports MySQL protocol.
Account-level properties¶
connection- a DB URL String for connecting to the queried database. The string must follow the format described in the SQLAlchemy documentation.
Examples:
- MySQL on localhost, user scott, password tiger, database mydatabase
- Postgresql on host example.com port 5432
Query-level properties¶
temporal_field- name of the column ( returned by the query ) to use as a timestamp.query ( { file: "filename" })- contains the query template to use.
- If
queryis a string, then the collector treats this as the query to runThis is a Jinja2 template and may use
extra_params ( object )- contains parameters to be passed as template variables into the Jinja2 template processor.
Example:
{
"query": [
"select date_trunc('hour',timestamp) as timestamp, action_type, user_type as user_type,",
" count(1) as action_count, ",
" sum(amount) as action_volume ",
"from account_activity where timestamp >= '{{start_time}}'",
"and timestamp < '{{end_time}}'",
"group by 1,2,3"
"order by 1"
]
}
Note
the query gives new column names to computed values - this is necessary to enable the collector to read the query results.
See also
dimension_defs and metric_defs below
dimension_defs(array)- which columns to treat as dimension.
Example ( for the query above ):
{
"dimension_defs": [ { "dimension": "action_type" }, { "dimension": "user_type" } ],
}
metric_defs (array)- specifies which columns to treat as metrics.
{
"metric\_defs": [ { "metric": "action\_count",
"target\_type": "counter" } ]
}
Command-line¶
$ bin/anodot-collect.py pyanodot.collectors.sqlv2 -h
usage: anodot-collect.py pyanodot.collectors.sqlv2 [-h] -w WORK_DIR
[-E ENDPOINT]
[-T ANODOT_API_TOKEN]
[-V VER] [-a] [-J]
[--json-keep-last JSON_KEEP_LAST]
[-d] [-p] [-D API_DELAY]
[-C API_CHUNK]
[-L LOG_FILENAME]
[-Q QUERY] [-A ACCOUNT]
[-r] [-t]
[--timestamp-shift TIMESTAMP_SHIFT]
[--zero-fill-start-time ZERO_FILL_START_TIME]
[--zero-fill-end-time ZERO_FILL_END_TIME]
[--zero-fill-step ZERO_FILL_STEP]
[-Z] [-X DEFINE]
[--db-connection DB_CONNECTION]
[--rowcount-hard-limit ROWCOUNT_HARD_LIMIT]
[collector_name]
pyanodot.collectors.sqlv2 - Collector for SQL Relational Databases
positional arguments:
collector_name
optional arguments:
-h, --help show this help message and exit
-w WORK_DIR, --work-dir WORK_DIR
working directory ( logs saved there )
-E ENDPOINT, --endpoint ENDPOINT
Anodot API endpoint: `poc` or `production`
-T ANODOT_API_TOKEN, --anodot-api-token ANODOT_API_TOKEN
Token for Anodot API
-V VER, --ver VER data version number to send ( unless specified here or
in the config, ver=1 )
-a, --save-anodot-csv
write a CSV file with the Anodot format
-J, --save-anodot-json
write a JSON file with the same format being sent to
the Anodot API
--json-keep-last JSON_KEEP_LAST
Remove the previously saved JSON files, except the
latest JSON_KEEP_LAST files
-d, --debug Print verbose debug output
-p, --production
-D API_DELAY, --api-delay API_DELAY
Anodot API: delay between requests
-C API_CHUNK, --api-chunk API_CHUNK
Anodot API: max. number of metrics to send per request
-L LOG_FILENAME, --log-filename LOG_FILENAME
output log filename, (relative is wrt work_dir)
-Q QUERY, --query QUERY
include only queries matching (fully) the regex in
QUERY ( may be specified multiple times )
-A ACCOUNT, --account ACCOUNT
include only accounts matching (fully) the regex in
ACCOUNT ( may be specified multiple times)
-r, --save-raw-csv write a CSV file corresponding to raw data
-t, --save-trans-csv write a CSV file corresponding to data with
transformations
--timestamp-shift TIMESTAMP_SHIFT
Add TIMESTAMP_SHIFT seconds to every reported
timestamp
--zero-fill-start-time ZERO_FILL_START_TIME
Zero filling start time
--zero-fill-end-time ZERO_FILL_END_TIME
Zero filling end time
--zero-fill-step ZERO_FILL_STEP
Zero filling step size in seconds
-Z, --zero-fill-missing
For each query read the previous JSON and fill the
missing samples with zeros
-X DEFINE, --define DEFINE
add key=value as a definition for the query
--db-connection DB_CONNECTION
DB URL for the database to be used
--rowcount-hard-limit ROWCOUNT_HARD_LIMIT
raise and exception in case the returned rows are more
than ROWCOUNT_HARD_LIMIT
Configuration examples¶
The following example shows how to use a configuration file with an external query file.
The table schema is given by the following CREATE DDL:
create table account_activity( timestamp timestamp,
action_type text,
user_type text,
amount bigint );
Example anodot-collect-conf.json: (Download)
{"anodot_api_token": "PASS the -T flag in the CLI", "collectors": { "sqlv2" : {
"accounts": {
"bank_db": {
"connection": "postgresql+psycopg2://postgres@172.17.0.2/test",
"queries": {
"hourly_summary":{
"ver": "101",
"query_type": "time_range",
"temporal_field": "hour",
"extra_params": {
"interesting_stuff": [ "list", "elements", "here" ]
},
"dimension_defs": [ { "dimension": "action_type" }, { "dimension": "user_type" } ],
"metric_defs": [ { "metric": "action_count",
"target_type": "counter" },
{ "metric": "action_volume",
"target_type": "counter" } ],
"query": { "file": "query_template.sql" }
}
}
}
}
}}}
External query file query_template.sql: (Download)
select date_trunc('hour',timestamp) as hour, action_type, user_type as user_type,
count(1) as action_count,
sum(amount) as action_volume
from account_activity where timestamp >= '{{start_time}}' -- You can insert parameters from the CLI with the "-X key=value"
and timestamp < '{{end_time}}'
group by hour, action_type, user_type
order by hour -- Sort the output on timestamp