Develop and test Service Ping

To add a new metric and test Service Ping:

  1. Name and place the metric
  2. Test counters manually using your Rails console
  3. Generate the SQL query
  4. Optimize queries with #database-lab
  5. Add the metric definition
  6. Add the metric to the Versions Application
  7. Create a merge request
  8. Verify your metric
  9. Set up and test Service Ping locally

Name and place the metric

Add the metric in one of the top-level keys:

  • settings: for settings related metrics.
  • counts_weekly: for counters that have data for the most recent 7 days.
  • counts_monthly: for counters that have data for the most recent 28 days.
  • counts: for counters that have data for all time.

How to get a metric name suggestion

The metric YAML generator can suggest a metric name for you. To generate a metric name suggestion, first instrument the metric at the provided key_path. Then, generate the metric’s YAML definition and return to the instrumentation and update it.

  1. Add the metric instrumentation to lib/gitlab/usage_data.rb inside one of the top-level keys, using any name you choose.
  2. Run the metrics YAML generator.
  3. Use the metric name suggestion to select a suitable metric name.
  4. Update the instrumentation you created in the first step and change the metric name to the suggested name.
  5. Update the metric’s YAML definition with the correct key_path.

Test counters manually using your Rails console

# count
Gitlab::UsageData.count(User.active)
Gitlab::UsageData.count(::Clusters::Cluster.aws_installed.enabled, :cluster_id)

# count distinct
Gitlab::UsageData.distinct_count(::Project, :creator_id)
Gitlab::UsageData.distinct_count(::Note.with_suggestions.where(time_period), :author_id, start: ::User.minimum(:id), finish: ::User.maximum(:id))

Generate the SQL query

Your Rails console returns the generated SQL queries. For example:

pry(main)> Gitlab::UsageData.count(User.active)
   (2.6ms)  SELECT "features"."key" FROM "features"
   (15.3ms)  SELECT MIN("users"."id") FROM "users" WHERE ("users"."state" IN ('active')) AND ("users"."user_type" IS NULL OR "users"."user_type" IN (6, 4))
   (2.4ms)  SELECT MAX("users"."id") FROM "users" WHERE ("users"."state" IN ('active')) AND ("users"."user_type" IS NULL OR "users"."user_type" IN (6, 4))
   (1.9ms)  SELECT COUNT("users"."id") FROM "users" WHERE ("users"."state" IN ('active')) AND ("users"."user_type" IS NULL OR "users"."user_type" IN (6, 4)) AND "users"."id" BETWEEN 1 AND 100000

Optimize queries with #database-lab

#database-lab is a Slack channel that uses a production-sized environment to test your queries. Paste the SQL query into #database-lab to see how the query performs at scale.

  • GitLab.com’s production database has a 15 second timeout.
  • Any single query must stay below the 1 second execution time with cold caches.
  • Add a specialized index on columns involved to reduce the execution time.

To understand the query’s execution, we add the following information to a merge request description:

  • For counters that have a time_period test, we add information for both:
    • time_period = {} for all time periods.
    • time_period = { created_at: 28.days.ago..Time.current } for the last 28 days.
  • Execution plan and query time before and after optimization.
  • Query generated for the index and time.
  • Migration output for up and down execution.

We also use #database-lab and explain.depesz.com. For more details, see the database review guide.

Optimization recommendations and examples

Add the metric definition

See the Metrics Dictionary guide for more information.

Add the metric to the Versions Application

Check if the new metric must be added to the Versions Application. See the usage_data schema and Service Data parameters accepted. Any metrics added under the counts key are saved in the stats column.

Create a merge request

Create a merge request for the new Service Ping metric, and do the following:

  • Add the feature label to the merge request. A metric is a user-facing change and is part of expanding the Service Ping feature.
  • Add a changelog entry that complies with the changelog entries guide.
  • Ask for a Product Intelligence review. On GitLab.com, we have DangerBot set up to monitor Product Intelligence related files and recommend a Product Intelligence review.

Verify your metric

On GitLab.com, the Product Intelligence team regularly monitors Service Ping. They may alert you that your metrics need further optimization to run quicker and with greater success.

The Service Ping JSON payload for GitLab.com is shared in the #g_product_intelligence Slack channel every week.

You may also use the Service Ping QA dashboard to check how well your metric performs. The dashboard allows filtering by GitLab version, by “Self-managed” and “SaaS”, and shows you how many failures have occurred for each metric. Whenever you notice a high failure rate, you can re-optimize your metric.

Set up and test Service Ping locally

To set up Service Ping locally, you must:

  1. Set up local repositories.
  2. Test local setup.
  3. (Optional) Test Prometheus-based Service Ping.

Set up local repositories

  1. Clone and start GitLab.
  2. Clone and start Versions Application. Make sure you run docker-compose up to start a PostgreSQL and Redis instance.
  3. Point GitLab to the Versions Application endpoint instead of the default endpoint:
    1. Open service_ping/submit_service.rb in your local and modified PRODUCTION_URL.
    2. Set it to the local Versions Application URL: http://localhost:3000/usage_data.

Test local setup

  1. Using the gitlab Rails console, manually trigger Service Ping:

    ServicePing::SubmitService.new.execute
    
  2. Use the versions Rails console to check the Service Ping was successfully received, parsed, and stored in the Versions database:

    UsageData.last
    

Test Prometheus-based Service Ping

If the data submitted includes metrics queried from Prometheus you want to inspect and verify, you must:

  • Ensure that a Prometheus server is running locally.
  • Ensure the respective GitLab components are exporting metrics to the Prometheus server.

If you do not need to test data coming from Prometheus, no further action is necessary. Service Ping should degrade gracefully in the absence of a running Prometheus server.

Three kinds of components may export data to Prometheus, and are included in Service Ping:

  • node_exporter: Exports node metrics from the host machine.
  • gitlab-exporter: Exports process metrics from various GitLab components.
  • Other various GitLab services, such as Sidekiq and the Rails server, which export their own metrics.

Test with an Omnibus container

This is the recommended approach to test Prometheus-based Service Ping.

To verify your change, build a new Omnibus image from your code branch using CI/CD, download the image, and run a local container instance:

  1. From your merge request, select the qa stage, then trigger the package-and-qa job. This job triggers an Omnibus build in a downstream pipeline of the omnibus-gitlab-mirror project.
  2. In the downstream pipeline, wait for the gitlab-docker job to finish.
  3. Open the job logs and locate the full container name including the version. It takes the following form: registry.gitlab.com/gitlab-org/build/omnibus-gitlab-mirror/gitlab-ee:<VERSION>.
  4. On your local machine, make sure you are signed in to the GitLab Docker registry. You can find the instructions for this in Authenticate to the GitLab Container Registry.
  5. Once signed in, download the new image by using docker pull registry.gitlab.com/gitlab-org/build/omnibus-gitlab-mirror/gitlab-ee:<VERSION>
  6. For more information about working with and running Omnibus GitLab containers in Docker, refer to GitLab Docker images in the Omnibus documentation.

Test with GitLab development toolkits

This is the less recommended approach, because it comes with a number of difficulties when emulating a real GitLab deployment.

The GDK is not set up to run a Prometheus server or node_exporter alongside other GitLab components. If you would like to do so, Monitoring the GDK with Prometheus is a good start.

The GCK has limited support for testing Prometheus based Service Ping. By default, it comes with a fully configured Prometheus service that is set up to scrape a number of components. However, it has the following limitations:

  • It does not run a gitlab-exporter instance, so several process_* metrics from services such as Gitaly may be missing.
  • While it runs a node_exporter, docker-compose services emulate hosts, meaning that it normally reports itself as not associated with any of the other running services. That is not how node metrics are reported in a production setup, where node_exporter always runs as a process alongside other GitLab components on any given node. For Service Ping, none of the node data would therefore appear to be associated to any of the services running, because they all appear to be running on different hosts. To alleviate this problem, the node_exporter in GCK was arbitrarily “assigned” to the web service, meaning only for this service node_* metrics appears in Service Ping.