pyanodot.collectors.cloudwatch - Amazon CloudWatch Collector

Overview

The pyanodot.collectors.cloudwatch collector uses two types of API calls provided by AWS CloudWatch:

  • ListMetrics - Provides a list of the available metrics ( names of time-series ).
  • GetMetricStatistics - Provides the time-series data for a given time-series and time range.

The collector provides a thin wrapper around these API calls and works follows:

Phase 1.
The collector calls the ListMetrics API and caches a list of metrics into a JSON file inside the work directory. ( this Phase is slow and so it only runs when there is no cached metric list, or when forced by the -f/--force-refresh flag ).
Phase 2.
The collector reads from the cached file the list of metrics to be collected, and for each one calls the GetMetricStatistics API call. This data is sent to Anodot and/or saved locally.

Basic Setup

  1. Create or choose an existing AWS user with an existing key-pair.

  2. Grant read-only permissions to the user as described Here .

  3. On the machine which runs the collector, log-in as the user which will run the collector. Create the directory ~/.aws and the file and create an entry for the key-pair in ~/.aws/credentials where ~ is the

    For example, the following section ( in ~/.aws/credentials ) declares a profile named anodot-collector with the key-pair details:

[anodot-collector]
aws_access_key_id = AKIAXXXXXXXXXXXXXXXX
aws_secret_access_key = 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

Please refer to Boto 3 Docs for complete explanation.

Configuration

Account-level properties

  • profile_name(string, required) - the name of the profile to use to connect as described above.
  • region_name(string, required) - the name of the AWS region for which to run this query.
Since AWS CloudWatch API queries work per-region, a separate query section is required for each region that is to be collected.

Query-level properties

The concept of a metric in AWS CloudWatch is a combination of the three properties Namespace, MetricName, Dimensions.

  • ListMetrics(ListMetrics, required) - this object specifies which metrics to fetch during step 1

    The valid properties of a ListMetrics object are Namespace, MetricName, Dimensions. These properties are passed as-is to the query for ListMetrics and have the same meaning as in the AWS ListMetrics API Call.

  • MetricNameFilter - Filters the metrics based on a regular expression. e.g.

  • FlexibleDimensionFilter - In some use-cases the full list of metrics is not needed, rather one would like to take only certain metrics at the highest level of granularity.
    For example: one may be interested in the CPUUtilization metric aggregated for the whole region, and also in a granular level for a specific instance. In order to achieve this, one may use the following definition:
ListMetrics:
  Namespace: AWS/EC2
  MetricName: CPUUtilization
FlexibleDimensionFilter:
  InstanceId:
    - i-1234567

In general, FlexibleDimensionFilter will “allow” a metric to be used in a call to GetMetricStatistics if for every dimension name specified ( in the example - “InstanceId”), the metric’s Dimensions property either

  1. does not contain a dimension with this name, or
  2. contains a dimension with this name whose value appears in the list ( in the example - ["i-1234567"] ).
  • GetMetricStatistics(GetMetricsStatistics, required) - this object may contain the properties:
    Period - the granularity in seconds of the time series ( default: 3600 ). Statistics - which statistics of the time-series during the interval to get ( default: [Average].

Sample Configuration

anodot_api_endpoint: production
anodot_api_token: XXXXXXXXXXXXXXX
collectors:
  pyanodot.collectors.cloudwatch:
    accounts:
      AWSAnodot:
        profile_name: "profile name from ~/.aws/credentials"
        queries:
          instance_example:   # this key is arbitrary
            FlexibleDimensionFilter: {}
            GetMetricStatistics: { Period: 3600 }
            ListMetrics:
              Dimensions:
              - {Name: InstanceId, Value: i-008ff9d2739f78622}
              MetricName: CPUUtilization
              Namespace: AWS/EC2
            extra_params: {test: true}
            ver: 19
        region_name: us-east-1

Command-line

$ anodot-collect.py pyanodot.collectors.cloudwatch -h

usage: anodot-collect.py pyanodot.collectors.cloudwatch [-h] [-s START_TIME] [-e END_TIME] [-f]
                                    [-w WORK_DIR] [-E ENDPOINT] [-V VER] [-a]
                                    [-J] [-d] [-p] [-D API_DELAY]
                                    [-C API_CHUNK]
                                    [--producer-concurrency PRODUCER_CONCURRENCY]
                                    [--anodot-api-concurrency ANODOT_API_CONCURRENCY]

optional arguments:
  -h, --help            show this help message and exit
  -s START_TIME, --start-time START_TIME
                        (default: yesterday 00:00)
  -e END_TIME, --end-time END_TIME
                        End time for the query. Format is "YYYY-MM-DD
                        hh:mm:ss" (default: today 00:00)
  -f, --force-refresh   force refreshing the cached metric lists
  -w WORK_DIR, --work-dir WORK_DIR
                        working directory ( logs saved there )
  -E ENDPOINT, --endpoint ENDPOINT
                        Anodot API endpoint: `poc` or `production`
  -V VER, --ver VER     data version number to send ( unless specified here or
                        in the config, ver=1 )
  -a, --save-anodot-csv
                        write a CSV file with the Anodot format
  -J, --save-anodot-json
                        write a JSON file with the same format being sent to
                        the Anodot API
  -d, --debug           Print verbose debug output
  -p, --production
  -D API_DELAY, --api-delay API_DELAY
                        Anodot API: delay between requests
  -C API_CHUNK, --api-chunk API_CHUNK
                        Anodot API: max. number of metrics to send per request
  --producer-concurrency PRODUCER_CONCURRENCY
                        Number of concurrent processes to use for producers
  --anodot-api-concurrency ANODOT_API_CONCURRENCY
                        Number of concurrent processes to use for sending to
                        anodot

Examples:

  1. To test and look into the metrics that were fetched ( using multiprocessing ):
$ anodot-collect.py pyanodot.collectors.cloudwatch -w . --producer-concurrency 20 -a -s '2017-05-01 00:00:00' -e '2017-05-02 00:00:00'
  1. To actually send metrics to Anodot using multiprocessing, with 20 processes on the AWS API side and 5 processes on the Anodot API side:
$ anodot-collect.py pyanodot.collectors.cloudwatch -w . --producer-concurrency 20 --anodot-api-concurrency 5 -p -s '2017-05-01 00:00:00' -e '2017-05-02 00:00:00'

Notes and recommendations

  • Though it varies according to the amount of managed resources, the cloudwatch collector requires a substantial number of API calls to list, collect and send metrics. It is recommended to make use of the --producer-concurrency and --anodot-concurrency parameters in order to reduce the run time of the collection process.
  • AWS CloudWatch has its own data retention policy and the highest-level granularity may not be available for all historical data. One implication is that currently for a query with "Period": 60 it is not possible to request a starting date of more than two weeks ago.