Metrics

Satella’s metrics

Metrics and instruments are a system to output real-time statistics.

Metrics are defined and meant to be used in a similar way to Python logging. It has a name (hierarchical, dot-separated), that does not have to correspond to particular modules or classes. It can be in one of 4 states:

  • DISABLED

  • RUNTIME

  • DEBUG

  • INHERIT

These are contained in a enum:

class satella.instrumentation.metrics.MetricLevel(value)

An enumeration.

By default, it runs in RUNTIME mode. This means that statistics are collected only from metrics of this instrument that are set to at least RUNTIME. If a user wants to dig deeper, it can switch the instrument to DEBUG, which will cause more data to be registered. If a metric is in state INHERIT, it will inherit the metric level from it’s parent, traversing the tree if required. The tree node separator is a dot in the metric’s name, eg. satella.metrics.my_metric.

INHERIT is the default state for all other metrics than root, for root the default is RUNTIME. Root metric cannot be set to INHERIT, as it would not make sense.

Also, if parent is RUNTIME and child is DEBUG, the metrics reported by the child won’t be included in parent metric data.

You can switch the metric anytime by assigning a correct value to it’s level property, or by specifying it’s metric level during a call to getMetric().

Note that a decision to accept/reject a handle()-provided value happens when handle() is called, based on current level. If you change the level, it may take some time for the metric to return correct values.

The call to getMetric() is specified as follows

satella.instrumentation.metrics.getMetric(metric_name: str = '', metric_type: str = 'base', metric_level: MetricLevel | None = None, **kwargs)

Obtain a metric of given name.

Parameters:
  • metric_name – a metric name. Subsequent nesting levels have to be separated with a dot

  • metric_type – metric type

  • metric_level – a metric level to set this metric to.

Raises:
  • MetricAlreadyExists – a metric having this name already exists, but with a different type

  • ValueError – metric name contains a forbidden character

You obtain metrics using getMetric() as follows:

metric = getMetric(__name__+'.StringMetric', 'string', MetricLevel.RUNTIME, **kwargs)

Please note that metric name must match the following regex:

[a-zA-Z_:][a-zA-Z0-9_:]*

internal is for those cases where the application is the consumer of the metrics, and you don’t want them exposed to outside. Take care to examine this field of MetricData if you write custom exporters!

Where the second argument is a metric type. Following metric types are available:

  • base - for just a container metric

  • int - for int values

  • float - for float values

  • empty - disregard all provided values, outputs nothing

  • counter - starts from zero, increments or decrements the counter value. Also optionally can register the amount of calls

    class satella.instrumentation.metrics.metric_types.CounterMetric(name, root_metric: Metric = None, metric_level: MetricLevel | None = None, internal: bool = False, sum_children: bool = True, count_calls: bool = False, *args, **kwargs)

    A counter that can be adjusted by a given value.

    Parameters:
    • sum_children – whether to sum up all calls to children

    • count_calls – count the amount of calls to handle()

  • cps - will count given amount of calls to handle() during last time period, as specified by user

    class satella.instrumentation.metrics.metric_types.ClicksPerTimeUnitMetric(*args, time_unit_vectors: List[float] | None = None, aggregate_children: bool = True, internal: bool = False, **kwargs)

    This tracks the amount of calls to handle() during the last time periods, as specified by time_unit_vectors (in seconds). You may specify multiple time periods as consequent entries in the list.

    By default (if you do not specify otherwise) this will track calls made during the last second.

    This was once deprecated but out of platforms which suck at calculating derivatives of their series (AWS, I’m looking at you!) this was decided to be undeprecated.

Note

Normally you should use a counter and calculate a rate() from it, but since some platforms suck at rate a decision was made to keep this.

  • linkfail - for tracking whether given link is online or offline

    class satella.instrumentation.metrics.metric_types.LinkfailMetric(name: str, root_metric: Metric = None, metric_level: ~satella.instrumentation.metrics.metric_types.base.MetricLevel | None = None, labels: dict | None = None, internal: bool = False, consecutive_failures_to_offline: int = 100, consecutive_successes_to_online: int = 10, callback_on_online: ~typing.Callable[[int, dict], None] = <function LinkfailMetric.<lambda>>, callback_on_offline: ~typing.Callable[[int, dict], None] = <function LinkfailMetric.<lambda>>, *args, **kwargs)

    Metric that measures whether given link is operable.

    Parameters:
    • consecutive_failures_to_offline – consecutive failures needed for link to become offline

    • consecutive_successes_to_online – consecutive successes needed for link to become online after a failure

    • callback_on_online – callback that accepts an address of a link that becomes online and labels

    • callback_on_offline – callback that accepts an address of a link that becomes offline and labels

  • summary - a metric that counts a rolling window of values, and provides for a way to calculate percentiles. Corresponds to Prometheus’ summary metrics.

    class satella.instrumentation.metrics.metric_types.SummaryMetric(name, root_metric: Metric = None, metric_level: MetricLevel | None = None, internal: bool = False, last_calls: int = 100, quantiles: Sequence[float] = (0.5, 0.95), aggregate_children: bool = True, count_calls: bool = True, *args, **kwargs)

    A metric that can register some values, sequentially, and then calculate quantiles from it. It calculates configurable quantiles over a sliding window of amount of measurements.

    Parameters:
    • last_calls – last calls to handle() to take into account

    • quantiles – a sequence of quantiles to return in to_metric_data

    • aggregate_children – whether to sum up children values (if present)

    • count_calls – whether to count total amount of calls and total time

  • histogram - a metric that puts given values into predefined buckets. Corresponds to Prometheus’ histogram metric

    class satella.instrumentation.metrics.metric_types.HistogramMetric(name: str, root_metric: Metric = None, metric_level: MetricLevel | None = None, internal: bool = False, buckets: Sequence[float] = (0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1.0, 2.5, 5.0, 7.5, 10.0), aggregate_children: bool = True, *args, **kwargs)

    A histogram, by Prometheus’ interpretation.

    Parameters:
    • buckets – buckets to add. First bucket will be from zero to first value, second from first value to second, last bucket will be from last value to infinity. So there are len(buckets)+1 buckets. Buckets are expected to be passed in sorted!

    • aggregate_children – whether to accept child calls to be later presented as total

  • callable - a metric whose value is a result of a given callable

    class satella.instrumentation.metrics.metric_types.CallableMetric(name, root_metric: Metric = None, metric_level: MetricLevel | None = None, labels: dict | None = None, internal: bool = False, value_getter: Callable[[], float] | None = None, *args, **kwargs)

    A metric whose value at any given point in time is the result of it’s callable.

    Parameters:

    value_getter – a callable() that returns a float - the current value of this metric. It should be easy and cheap to compute, as this callable will be called each time a snapshot of metric state is requested

  • uptime - a metric to report uptime

    class satella.instrumentation.metrics.metric_types.UptimeMetric(*args, time_getter: ~typing.Callable[[], float] = <built-in function monotonic>, **kwargs)

    A metric that gives the difference between current value of time_getter and it’s value at the initialization of this metric

    Parameters:

    time_getter – a callable/0 that returns a float, the notion of the time passing. By default it’s a safe time.monotonic

Note that metric.measure() will include time spent processing the generator’s content by the client, so you might want to avoid measuring generators. However, if this is the behaviour that you want, you get it.

Note that if you request a different type of existing metric via getMetric, an MetricAlreadyExists exception will be raised:

class satella.exceptions.MetricAlreadyExists(msg, name, requested_type, existing_type)

Metric with given name already exists, but with a different type

Third parameter is optional. If set, all child metrics created during this metric’s instantiation will receive such metric level. If the metric already exists, it’s level will be set to provided metric level, if passed.

All child metrics (going from the root metric to 0) will be initialized with the value that you just passed. In order to keep them in order, an additional parameter passed to getMetric(), metric_level, if specified, will set given level upon returning the even existing metric.

This will be set on all children created by this call. If you have any children from previous calls, they will remain unaffected.

If you specify any kwargs, they will be delivered to the last metric’s in chain constructor.

Since metrics in Satella are primarily though out to end up on a Prometheus, it is very important to understand Prometheus’ data model.

Root metric’s to_metric_data will output a flat set, called MetricDataCollection:

class satella.instrumentation.metrics.MetricDataCollection(*values: MetricData | MetricDataCollection)

A bunch of metric datas

postfix_with(postfix: str) MetricDataCollection

Postfix every child with given postfix and return self

prefix_with(prefix: str) MetricDataCollection

Prefix every child with given prefix and return self

remove_internals()

Remove entries marked as internal

set_timestamp(timestamp: float) MetricDataCollection

Assign every child this timestamp and return self

set_value(value) MetricDataCollection

Set all children to a particular value and return self. Most useful

strict_eq(other: MetricDataCollection) bool

Do values in other MetricDataCollection match also?

to_json() list | dict | str | int | float | None

Return a JSON-able representation of this object

which consists of MetricData:

class satella.instrumentation.metrics.MetricData(name: str, value: float, labels: dict | None = None, timestamp: float | None = None, internal: bool = False)
to_json(prefix: str = '') list | dict | str | int | float | None

Return a JSON-able representation of this object

On most metrics you can specify additional labels. They will serve to create an independent “sub-metric” of sorts, eg.

metric = getMetric('root', 'int')
metric.runtime(2, label='value')
metric.runtime(3, label='key')
assert metric.to_metric_data() == MetricDataCollection(MetricData('root', 2, {'label': value}),
                                                       MetricData('root', 3, {'label': 'key}))

This functionality is provided by the below class:

class satella.instrumentation.metrics.metric_types.EmbeddedSubmetrics(name, root_metric: Metric | None = None, metric_level: str | None = None, labels: dict | None = None, internal: bool = False, *args, **kwargs)

A metric that can optionally accept some labels in it’s handle, and this will be counted as a separate metric. For example:

>>> metric = getMetric('root.test.IntValue', 'int', enable_timestamp=False)
>>> metric.handle(2, label='key')
>>> metric.handle(3, label='value')

If you try to inherit from it, refer to simple.IntegerMetric to see how to do it. All please pass all the arguments received from child class into this constructor, as this constructor actually stores them! Refer to cps.ClicksPerTimeUnitMetric on how to do that.

clone(labels: dict) LeafMetric

Return a fresh instance of this metric, with it’s parent being set to this metric and having a particular set of labels, and being of level INHERIT.

get_specific_metric_data(labels: dict) MetricDataCollection

Return a MetricDataCollection for a child with given labels

Rolling your own metrics

In order to roll your own metrics, you must first subclass Metric. You can subclass one of the following classes, to the best of your liking. Please also refer to existing metric implementations on how to best subclass them.

class satella.instrumentation.metrics.metric_types.Metric(name, root_metric: Metric | None = None, metric_level: MetricLevel | int | None = None, internal: bool = False, *args, **kwargs)

Container for child metrics. A base metric class, as well as the default metric.

Switch levels by setting metric.level to a proper value

Parameters:
  • enable_timestamp – append timestamp of last update to the metric

  • internal – if True, this metric won’t be visible in exporters

get_timestamp() float | None

Return this timestamp, or None if no timestamp support is enabled

reset() None

Delete all child metrics that this metric contains.

Also, if called on root metric, sets the runlevel to RUNTIME

class satella.instrumentation.metrics.metric_types.LeafMetric(name, root_metric: Metric | None = None, metric_level: str | None = None, labels: dict | None = None, internal: bool = False, *args, **kwargs)

A metric capable of generating only leaf entries.

You cannot hook up any children to a leaf metric.

class satella.instrumentation.metrics.metric_types.base.EmbeddedSubmetrics(name, root_metric: Metric | None = None, metric_level: str | None = None, labels: dict | None = None, internal: bool = False, *args, **kwargs)

A metric that can optionally accept some labels in it’s handle, and this will be counted as a separate metric. For example:

>>> metric = getMetric('root.test.IntValue', 'int', enable_timestamp=False)
>>> metric.handle(2, label='key')
>>> metric.handle(3, label='value')

If you try to inherit from it, refer to simple.IntegerMetric to see how to do it. All please pass all the arguments received from child class into this constructor, as this constructor actually stores them! Refer to cps.ClicksPerTimeUnitMetric on how to do that.

clone(labels: dict) LeafMetric

Return a fresh instance of this metric, with it’s parent being set to this metric and having a particular set of labels, and being of level INHERIT.

get_specific_metric_data(labels: dict) MetricDataCollection

Return a MetricDataCollection for a child with given labels

Remember to define a class attribute of CLASS_NAME, which is a string defining how to call your metric. After everything is done, register it by using the following decorator on your metric class

satella.instrumentation.metrics.metric_types.register_metric(cls)

Decorator to register your custom metrics

To zip together two or more metrics, you can use the following class:

class satella.instrumentation.metrics.AggregateMetric(*metrics)

A virtual metric grabbing a few other metrics and having a single .handle() call represent a bunch of calls to other metrics. Ie, the following:

>>> m1 = getMetric('summary', 'summary')
>>> m2 = getMetric('histogram', 'histogram')
>>> m1.runtime()
>>> m2.runtime()

Is the same as:

>>> am = AggregateMetric(getMetric('summary', 'summary'), getMetric('histogram', 'histogram'))
>>> am.runtime()

Note that this class supports only reporting. It doesn’t read data, or read/write metric levels.

To automatically apply labels you can use this class:

class satella.instrumentation.metrics.LabeledMetric(metric_to_wrap, **labels)

A wrapper to another metric that will always call it’s .runtime and .handle with some predefined labels

Use like:

>>> a = getMetric('a', 'counter')
>>> b = LabeledMetric(a, key=5)

Then this:

>>> a.runtime(1, key=5)

Will be equivalent to this:

>>> b.runtime(1)

Exporting data

In order to export data to Prometheus, you can use the following function:

satella.instrumentation.metrics.exporters.metric_data_collection_to_prometheus(mdc: MetricDataCollection) str

Render the data in the form understandable by Prometheus.

Values marked as internal will be skipped.

Parameters:
  • mdc – Metric data collection to render

  • tree – MetricDataCollection returned by the root metric (or any metric for that instance).

Returns:

a string output to present to Prometheus

For example in such a way:

def export_to_prometheus():
    metric = getMetric()
    return metric_data_collection_to_prometheus(metric.to_metric_data())

Dots in metric names will be replaced with underscores.

Or, if you need a HTTP server that will export metrics for Prometheus, use this class that is a daemonic thread you can use to easily expose metrics to Prometheus:

class satella.instrumentation.metrics.exporters.PrometheusHTTPExporterThread(interface: str, port: int, extra_labels: dict | None = None, enable_metric: bool = False)

A daemon thread that listens on given interface as a HTTP server, ready to serve as a connection point for Prometheus to scrape metrics off this service.

This additionally (if user requests so) may export a metric called prometheus.exports_per_time which is a cps with time_unit_vectors=[1, 20, 60] counting the amount of exports in given time period.

Parameters:
  • interface – a interface to bind to

  • port – a port to bind to

  • extra_labels – extra labels to add to each metric data point, such as the name of the service or the hostname

  • enable_metric – whether to enable the metric

get_metric_data() MetricDataCollection

Obtain metric data.

Overload to provide custom source of metric data.

run() None

Calls self.loop() indefinitely, until terminating condition is met

terminate(force: bool = False) PrometheusHTTPExporterThread

Order this thread to terminate and return self.

You will need to .join() on this thread to ensure that it has quit.

Parameters:

force – whether to terminate this thread by injecting an exception into it

Useful data structures

Sometimes you want to have some data structures with metrics about themselves. Here go they:

class satella.instrumentation.metrics.structures.MetrifiedThreadPoolExecutor(max_workers=None, thread_name_prefix='', initializer=None, initargs=(), time_spent_waiting=None, time_spent_executing=None, waiting_tasks: CallableMetric | None = None, metric_level: MetricLevel = MetricLevel.RUNTIME)

A thread pool executor that provides execution statistics as metrics.

This class will also backport some of Python 3.8’s characteristics of the thread pool executor to earlier Pythons, thread name prefix, initializer, initargs and BrokenThreadPool behaviour.

Parameters:
  • time_spent_waiting – a metric (can be aggregate) to which times spent waiting in the queue will be deposited

  • time_spent_executing – a metric (can be aggregate) to which times spent executing will be deposited

  • waiting_tasks – a fresh CallableMetric that will be patched to yield the number of currently waiting tasks

  • metric_level – a level with which to log to these two metrics

get_queue_length() int

Return the amount of tasks currently in the queue

submit(**kwargs)

Submits a callable to be executed with the given arguments.

Schedules the callable to be executed as fn(*args, **kwargs) and returns a Future instance representing the execution of the callable.

Returns:

A Future representing the given call.

class satella.instrumentation.metrics.structures.MetrifiedCacheDict(stale_interval, expiration_interval, value_getter, value_getter_executor=None, cache_failures_interval=None, time_getter=<built-in function monotonic>, default_value_factory=None, cache_hits: ~satella.instrumentation.metrics.metric_types.counter.CounterMetric | None = None, cache_miss: ~satella.instrumentation.metrics.metric_types.counter.CounterMetric | None = None, refreshes: ~satella.instrumentation.metrics.metric_types.counter.CounterMetric | None = None, how_long_refresh_takes: ~satella.instrumentation.metrics.metric_types.measurable_mixin.MeasurableMixin | None = None)

A CacheDict with metrics!

Parameters:
  • cache_hits – a counter metric that will be updated with +1 each time there’s a cache hit

  • cache_miss – a counter metric that will be updated with +1 each time there’s a cache miss

  • refreshes – a metric that will be updated with +1 each time there’s a cache refresh

  • how_long_refresh_takes – a metric that will be ticked with time value_getter took

class satella.instrumentation.metrics.structures.MetrifiedLRUCacheDict(stale_interval: float, expiration_interval: float, value_getter, value_getter_executor=None, cache_failures_interval=None, time_getter=<built-in function monotonic>, default_value_factory=None, max_size: int = 100, cache_hits: ~satella.instrumentation.metrics.metric_types.base.Metric | None = None, cache_miss: ~satella.instrumentation.metrics.metric_types.base.Metric | None = None, refreshes: ~satella.instrumentation.metrics.metric_types.base.Metric | None = None, how_long_refresh_takes: ~satella.instrumentation.metrics.metric_types.measurable_mixin.MeasurableMixin | None = None, evictions: ~satella.instrumentation.metrics.metric_types.base.Metric | None = None, **kwargs)

A LRUCacheDict with metrics!

Parameters:
  • cache_hits – a counter metric that will be updated with +1 each time there’s a cache hit

  • cache_miss – a counter metric that will be updated with +1 each time there’s a cache miss

  • refreshes – a metric that will be updated with +1 each time there’s a cache refresh

  • how_long_refresh_takes – a metric that will be ticked with time value_getter took

class satella.instrumentation.metrics.structures.MetrifiedExclusiveWritebackCache(*args, cache_hits: CounterMetric | None = None, cache_miss: CounterMetric | None = None, entries_waiting: CallableMetric | None = None, **kwargs)