GitLab Developers Guide to Logging

GitLab Logs play a critical role for both administrators and GitLab team members to diagnose problems in the field.

Don’t use Rails.logger

Currently Rails.logger calls all get saved into production.log, which contains a mix of Rails’ logs and other calls developers have inserted in the codebase. For example:

Started GET "/gitlabhq/yaml_db/tree/master" for 168.111.56.1 at 2015-02-12 19:34:53 +0200
Processing by Projects::TreeController#show as HTML
  Parameters: {"project_id"=>"gitlabhq/yaml_db", "id"=>"master"}

  ...

  Namespaces"."created_at" DESC, "namespaces"."id" DESC LIMIT 1 [["id", 26]]
  CACHE (0.0ms) SELECT  "members".* FROM "members"  WHERE "members"."source_type" = 'Project' AND "members"."type" IN ('ProjectMember') AND "members"."source_id" = $1 AND "members"."source_type" = $2 AND "members"."user_id" = 1  ORDER BY "members"."created_at" DESC, "members"."id" DESC LIMIT 1  [["source_id", 18], ["source_type", "Project"]]
  CACHE (0.0ms) SELECT  "members".* FROM "members"  WHERE "members"."source_type" = 'Project' AND "members".
  (1.4ms) SELECT COUNT(*) FROM "merge_requests"  WHERE "merge_requests"."target_project_id" = $1 AND ("merge_requests"."state" IN ('opened','reopened')) [["target_project_id", 18]]
  Rendered layouts/nav/_project.html.haml (28.0ms)
  Rendered layouts/_collapse_button.html.haml (0.2ms)
  Rendered layouts/_flash.html.haml (0.1ms)
  Rendered layouts/_page.html.haml (32.9ms)
Completed 200 OK in 166ms (Views: 117.4ms | ActiveRecord: 27.2ms)

These logs suffer from a number of problems:

  1. They often lack timestamps or other contextual information (for example, project ID or user)
  2. They may span multiple lines, which make them hard to find via Elasticsearch.
  3. They lack a common structure, which make them hard to parse by log forwarders, such as Logstash or Fluentd. This also makes them hard to search.

Note that currently on GitLab.com, any messages in production.log aren’t indexed by Elasticsearch due to the sheer volume and noise. They do end up in Google Stackdriver, but it is still harder to search for logs there. See the GitLab.com logging documentation for more details.

Use structured (JSON) logging

Structured logging solves these problems. Consider the example from an API request:

{"time":"2018-10-29T12:49:42.123Z","severity":"INFO","duration":709.08,"db":14.59,"view":694.49,"status":200,"method":"GET","path":"/api/v4/projects","params":[{"key":"action","value":"git-upload-pack"},{"key":"changes","value":"_any"},{"key":"key_id","value":"secret"},{"key":"secret_token","value":"[FILTERED]"}],"host":"localhost","ip":"::1","ua":"Ruby","route":"/api/:version/projects","user_id":1,"username":"root","queue_duration":100.31,"gitaly_calls":30}

In a single line, we’ve included all the information that a user needs to understand what happened: the timestamp, HTTP method and path, user ID, and so on.

How to use JSON logging

Suppose you want to log the events that happen in a project importer. You want to log issues created, merge requests, and so on, as the importer progresses. Here’s what to do:

  1. Look at the list of GitLab Logs to see if your log message might belong with one of the existing log files.
  2. If there isn’t a good place, consider creating a new filename, but check with a maintainer if it makes sense to do so. A log file should make it easy for people to search pertinent logs in one place. For example, geo.log contains all logs pertaining to GitLab Geo. To create a new file:
    1. Choose a filename (for example, importer_json.log).
    2. Create a new subclass of Gitlab::JsonLogger:

      module Gitlab
        module Import
          class Logger < ::Gitlab::JsonLogger
            def self.file_name_noext
              'importer'
            end
          end
         end
      end
      
    3. In your class where you want to log, you might initialize the logger as an instance variable:

      attr_accessor :logger
      
      def initialize
        @logger = Gitlab::Import::Logger.build
      end
      

      Note that it’s useful to memoize this because creating a new logger each time you log opens a file, adding unnecessary overhead.

  3. Now insert log messages into your code. When adding logs, make sure to include all the context as key-value pairs:

    # BAD
    logger.info("Unable to create project #{project.id}")
    
    # GOOD
    logger.info(message: "Unable to create project", project_id: project.id)
    
  4. Be sure to create a common base structure of your log messages. For example, all messages might have current_user_id and project_id to make it easier to search for activities by user for a given time.

Implicit schema for JSON logging

When using something like Elasticsearch to index structured logs, there is a schema for the types of each log field (even if that schema is implicit / inferred). It’s important to be consistent with the types of your field values, otherwise this might break the ability to search/filter on these fields, or even cause whole log events to be dropped. While much of this section is phrased in an Elasticsearch-specific way, the concepts should translate to many systems you might use to index structured logs. GitLab.com uses Elasticsearch to index log data.

Unless a field type is explicitly mapped, Elasticsearch infers the type from the first instance of that field value it sees. Subsequent instances of that field value with different types either fail to be indexed, or in some cases (scalar/object conflict), the whole log line is dropped.

GitLab.com’s logging Elasticsearch sets ignore_malformed, which allows documents to be indexed even when there are simpler sorts of mapping conflict (for example, number / string), although indexing on the affected fields breaks.

Examples:

# GOOD
logger.info(message: "Import error", error_code: 1, error: "I/O failure")

# BAD
logger.info(message: "Import error", error: 1)
logger.info(message: "Import error", error: "I/O failure")

# WORST
logger.info(message: "Import error", error: "I/O failure")
logger.info(message: "Import error", error: { message: "I/O failure" })

List elements must be the same type:

# GOOD
logger.info(a_list: ["foo", "1", "true"])

# BAD
logger.info(a_list: ["foo", 1, true])

Resources:

Logging durations

Similar to timezones, choosing the right time unit to log can impose avoidable overhead. So, whenever challenged to choose between seconds, milliseconds or any other unit, lean towards seconds as float (with microseconds precision, that is, Gitlab::InstrumentationHelper::DURATION_PRECISION).

In order to make it easier to track timings in the logs, make sure the log key has _s as suffix and duration within its name (for example, view_duration_s).

Multi-destination Logging

GitLab is transitioning from unstructured/plaintext logs to structured/JSON logs. During this transition period some logs are recorded in multiple formats through multi-destination logging.

How to use multi-destination logging

Create a new logger class, inheriting from MultiDestinationLogger and add an array of loggers to a LOGGERS constant. The loggers should be classes that descend from Gitlab::Logger. For example, the user-defined loggers in the following examples could be inheriting from Gitlab::Logger and Gitlab::JsonLogger, respectively.

You must specify one of the loggers as the primary_logger. The primary_logger is used when information about this multi-destination logger is displayed in the application (for example, using the Gitlab::Logger.read_latest method).

The following example sets one of the defined LOGGERS as a primary_logger.

module Gitlab
  class FancyMultiLogger < Gitlab::MultiDestinationLogger
    LOGGERS = [UnstructuredLogger, StructuredLogger].freeze

    def self.loggers
      LOGGERS
    end

    def primary_logger
      UnstructuredLogger
    end
  end
end

You can now call the usual logging methods on this multi-logger. For example:

FancyMultiLogger.info(message: "Information")

This message is logged by each logger registered in FancyMultiLogger.loggers.

Passing a string or hash for logging

When passing a string or hash to a MultiDestinationLogger, the log lines could be formatted differently, depending on the kinds of LOGGERS set.

For example, let’s partially define the loggers from the previous example:

module Gitlab
  # Similar to AppTextLogger
  class UnstructuredLogger < Gitlab::Logger
    ...
  end

  # Similar to AppJsonLogger
  class StructuredLogger < Gitlab::JsonLogger
    ...
  end
end

Here are some examples of how messages would be handled by both the loggers.

  1. When passing a string
FancyMultiLogger.info("Information")

# UnstructuredLogger
I, [2020-01-13T18:48:49.201Z #5647]  INFO -- : Information

# StructuredLogger
{:severity=>"INFO", :time=>"2020-01-13T11:02:41.559Z", :correlation_id=>"b1701f7ecc4be4bcd4c2d123b214e65a", :message=>"Information"}
  1. When passing a hash
FancyMultiLogger.info({:message=>"This is my message", :project_id=>123})

# UnstructuredLogger
I, [2020-01-13T19:01:17.091Z #11056]  INFO -- : {"message"=>"Message", "project_id"=>"123"}

# StructuredLogger
{:severity=>"INFO", :time=>"2020-01-13T11:06:09.851Z", :correlation_id=>"d7e0886f096db9a8526a4f89da0e45f6", :message=>"This is my message", :project_id=>123}

Logging context metadata (through Rails or Grape requests)

Gitlab::ApplicationContext stores metadata in a request lifecycle, which can then be added to the web request or Sidekiq logs.

The API, Rails and Sidekiq logs contain fields starting with meta. with this context information.

Entry points can be seen at:

Adding attributes

When adding new attributes, make sure they’re exposed within the context of the entry points above and:

  • Pass them within the hash to the with_context (or push) method (make sure to pass a Proc if the metho