Implement Service Ping

Service Ping consists of two kinds of data:

  • Counters: Track how often a certain event happened over time, such as how many CI/CD pipelines have run. They are monotonic and always trend up.
  • Observations: Facts collected from one or more GitLab instances and can carry arbitrary data. There are no general guidelines for how to collect those, due to the individual nature of that data.

To implement a new metric in Service Ping, follow these steps:

  1. Implement the required counter
  2. Name and place the metric
  3. Test counters manually using your Rails console
  4. Generate the SQL query
  5. Optimize queries with #database-lab
  6. Add the metric definition to the Metrics Dictionary
  7. Add the metric to the Versions Application
  8. Create a merge request
  9. Verify your metric
  10. Set up and test Service Ping locally

Instrumentation classes

note
Implementing metrics directly in usage_data.rb is deprecated. When you add or change a Service Ping Metric, you must migrate metrics to instrumentation classes. For information about the progress on migrating Service Ping metrics, see this epic.

For example, we have the following instrumentation class: lib/gitlab/usage/metrics/instrumentations/count_boards_metric.rb.

You should add it to usage_data.rb as follows:

boards: add_metric('CountBoardsMetric', time_frame: 'all'),

Types of counters

There are several types of counters for metrics:

note
Only use the provided counter methods. Each counter method contains a built-in fail-safe mechanism that isolates each counter to avoid breaking the entire Service Ping process.

Batch counters

For large tables, PostgreSQL can take a long time to count rows due to MVCC (Multi-version Concurrency Control). Batch counting is a counting method where a single large query is broken into multiple smaller queries. For example, instead of a single query querying 1,000,000 records, with batch counting, you can execute 100 queries of 10,000 records each. Batch counting is useful for avoiding database timeouts as each batch query is significantly shorter than one single long running query.

For GitLab.com, there are extremely large tables with 15 second query timeouts, so we use batch counting to avoid encountering timeouts. Here are the sizes of some GitLab.com tables:

Table Row counts in millions
merge_request_diff_commits 2280
ci_build_trace_sections 1764
merge_request_diff_files 1082
events 514

Batch counting requires indexes on columns to calculate max, min, and range queries. In some cases, you must add a specialized index on the columns involved in a counter.

Ordinary batch counters

Create a new database metrics instrumentation class with count operation for a given ActiveRecord_Relation

Method:

add_metric('CountIssuesMetric', time_frame: 'all')

Examples:

Examples using usage_data.rb have been deprecated. We recommend to use instrumentation classes.

Distinct batch counters

Create a new database metrics instrumentation class with distinct_count operation for a given ActiveRecord_Relation.

Method:

add_metric('CountUsersAssociatingMilestonesToReleasesMetric', time_frame: 'all')
caution
Counting over non-unique columns can lead to performance issues. For more information, see the iterating tables in batches guide.

Examples:

Examples using usage_data.rb have been deprecated. We recommend to use instrumentation classes.

Sum batch operation

Sum the values of a given ActiveRecord_Relation on given column and handles errors. Handles the ActiveRecord::StatementInvalid error

Method:

add_metric('JiraImportsTotalImportedIssuesCountMetric')

Average batch operation

Average the values of a given ActiveRecord_Relation on given column and handles errors.

Method:

add_metric('CountIssuesWeightAverageMetric')

Examples:

Examples using usage_data.rb have been deprecated. We recommend to use instrumentation classes.

Grouping and batch operations

The count, distinct_count, sum, and average batch counters can accept an ActiveRecord::Relation object, which groups by a specified column. With a grouped relation, the methods do batch counting, handle errors, and returns a hash table of key-value pairs.

Examples:

count(Namespace.group(:type))
# returns => {nil=>179, "Group"=>54}

distinct_count(Project.group(:visibility_level), :creator_id)
# returns => {0=>1, 10=>1, 20=>11}

sum(Issue.group(:state_id), :weight))
# returns => {1=>3542, 2=>6820}

average(Issue.group(:state_id), :weight))
# returns => {1=>3.5, 2=>2.5}

Add operation

Sum the values given as parameters. Handles the StandardError. Returns -1 if any of the arguments are -1.

Method:

add(*args)

Examples:

project_imports = distinct_count(::Project.where.not(import_type: nil), :creator_id)
bulk_imports = distinct_count(::BulkImport, :user_id)

 add(project_imports, bulk_imports)

Estimated batch counters

Introduced in GitLab 13.7.

Estimated batch counter functionality handles ActiveRecord::StatementInvalid errors when used through the provided estimate_batch_distinct_count method. Errors return a value of -1.

caution
This functionality estimates a distinct count of a specific ActiveRecord_Relation in a given column, which uses the HyperLogLog algorithm. As the HyperLogLog algorithm is probabilistic, the results always include error. The highest encountered error rate is 4.9%.

When correctly used, the estimate_batch_distinct_count method enables efficient counting over columns that contain non-unique values, which can not be assured by other counters.

estimate_batch_distinct_count method

Method:

estimate_batch_distinct_count(relation, column = nil, batch_size: nil, start: nil, finish: nil)

The method includes the following arguments:

  • relation: The ActiveRecord_Relation to perform the count.
  • column: The column to perform the distinct count. The default is the primary key.
  • batch_size: From Gitlab::Database::PostgresHll::BatchDistinctCounter::DEFAULT_BATCH_SIZE. Default value: 10,000.
  • start: The custom start of the batch count, to avoid complex minimum calculations.
  • finish: The custom end of the batch count to avoid complex maximum calculations.

The method includes the following prerequisites:

  • The supplied relation must include the primary key defined as the numeric column. For example: id bigint NOT NULL.
  • The estimate_batch_distinct_count can handle a joined relation. To use its ability to count non-unique columns, the joined relation must not have a one-to-many relationship, such as has_many :boards.
  • Both start and finish arguments should always represent primary key relationship values, even if the estimated count refers to another column, for example:

      estimate_batch_distinct_count(::Note, :author_id, start: ::Note.minimum(