GitLab Flavored Markdown (GLFM) Specification Guide

GitLab supports Markdown in various places. The Markdown dialect we use is called GitLab Flavored Markdown, or GLFM.

The specification for the GLFM dialect is based on the GitHub Flavored Markdown (GFM) specification, which is in turn based on the CommonMark specification. The GLFM specification includes several extensions to the GFM specification.

See the section on acronyms for a detailed explanation of the various acronyms used in this document. This guide is a developer-facing document that describes the various terms and definitions, goals, tools, and implementations related to the GLFM specification. It is intended to support and augment the user-facing documentation for GitLab Flavored Markdown.

note
In this document, GFM refers to GitHub Flavored Markdown, not GitLab Flavored Markdown. Refer to the section on acronyms for a detailed explanation of the various acronyms used in this document.
note
This guide and the implementation and files described in it are still a work in progress. As the work progresses, rewrites and consolidation between this guide and the user-facing documentation for GitLab Flavored Markdown are likely.

Terms and definitions

Acronyms: GLFM, GHFM, GFM, CommonMark

GitHub Flavored Markdown is widely referred to by the acronym GFM, and this document follows that convention as well. GitLab Flavored Markdown is referred to as GLFM in this document, to distinguish it from GitHub Flavored Markdown.

Unfortunately, this convention is not followed consistently in the rest of the documentation or GitLab codebase. In many places, the GFM acronym is used to refer to GitLab Flavored Markdown. An open issue exists to resolve this inconsistency.

Some places in the code refer to both the GitLab and GitHub specifications simultaneous in the same areas of logic. In these situations, GitHub Flavored Markdown may be referred to with variable or constant names like ghfm_ to avoid confusion. For example, we use the ghfm acronym for the ghfm_spec_v_0.29.txt GitHub Flavored Markdown specification file which is committed to the gitlab repo and used as input to the update_specification.rb script.

The original CommonMark specification is referred to as CommonMark (no acronym).

Various Markdown specifications

The specification format we use is based on the approach used in CommonMark, where a spec.txt file serves as documentation, as well as being in a format that can serve as input to automated conformance tests. It is explained in the CommonMark specification:

This document attempts to specify Markdown syntax unambiguously. It contains many examples with side-by-side Markdown and HTML. These examples are intended to double as conformance tests.

Here are the HTML-rendered versions of the specifications:

note
The creation of the GitLab Flavored Markdown (GLFM) specification file is still pending.

However, GLFM has more complex parsing, rendering, and testing requirements than GFM or CommonMark. Therefore, it does not have a static, hardcoded, manually updated spec.txt. Instead, the GLFM spec.txt is automatically generated based on other input files. This process is explained in detail in the Implementation sections below.

Markdown examples

Everywhere in the context of the specification and this guide, the term examples is specifically used to refer to the Markdown + HTML pairs used to illustrate the canonical parsing (or rendering) behavior of various Markdown source strings in the standard CommonMark specification format.

In this context, it should not be confused with other similar or related meanings of example, such as RSpec examples.

Parsers and renderers

To understand the various ways in which a specification is used, and how it related to a given Markdown dialect, it’s important to understand the distinction between a parser and a renderer:

  • A Markdown parser accepts Markdown as input and produces a Markdown Abstract Syntax Tree (AST) as output.
  • A Markdown renderer accepts the AST produced by a parser, and produces HTML (or a PDF, or any other relevant rendering format) as output.

Types of Markdown tests driven by the GLFM specification

The two main types of automated testing are driven by the Markdown examples and data contained in the GLFM specification. We refer to them as:

  • Markdown conformance testing.
  • Markdown snapshot testing.

Many other types of tests also occur in the GitLab codebase, and some of these tests are also related to the GLFM Markdown dialect. Therefore, to avoid confusion, we use these standard terms for the two types of specification-driven testing referred to in this documentation and elsewhere.

Markdown conformance testing

Markdown conformance testing refers to the standard testing method used by all CommonMark Markdown dialects to verify that a specific implementation conforms to the CommonMark Markdown specification. It is enforced by running the standard CommonMark tool spec_tests.py against a given spec.txt specification and the implementation.

note
spec_tests.py may eventually be re-implemented in Ruby, to not have a dependency on Python.

Markdown snapshot testing

Markdown snapshot testing refers to the automated testing performed in the GitLab codebase, which is driven by “example_snapshots” fixture data derived from all of the examples in the GLFM specification. It consists of both backend RSpec tests and frontend Jest tests which use the fixture data. This fixture data is contained in YAML files. These files are generated and updated based on the Markdown examples in the specification, and the existing GLFM parser and render implementations. They may also be manually updated as necessary to test-drive incomplete implementations. Regarding the terminology used here:

  1. The Markdown snapshot tests can be considered a form of the Golden Master Testing approach, which is also referred to as Approval Testing or Characterization Testing.
    1. The term Golden Master originally comes from the recording industry, and refers to the process of mastering, or making a final mix from which all other copies are produced.
    2. For more information and background, you can read about Characterization Tests and Golden Masters.
  2. The usage of the term snapshot does not refer to the approach of Jest snapshot testing, as used elsewhere in the GitLab frontend testing suite. However, the Markdown snapshot testing does follow the same philosophy and patterns as Jest snapshot testing:
    1. Snapshot example fixture data is represented as files which are checked into source control.
    2. The files can be automatically generated and updated based on the implementation of the code under tests.
    3. The files can also be manually updated when necessary, for example, to test-drive changes to an incomplete or buggy implementation.
  3. The usage of the term fixture does not refer to standard Rails database fixture files. It instead refers to test fixtures in the more generic definition, as input data to support automated testing.
  4. These example snapshots fixture files are generated from and closely related to the rest of the GLFM specification. Therefore, the example_snapshots directory is colocated under the glfm_specification directory with the rest of the GLFM specification files. They are intentionally not located under the spec/fixtures directory with the rest of the fixture data for the GitLab Rails application. In practice, developers have found it simpler and more understandable to have everything under the glfm_specification directory rather than splitting these files into the spec/fixtures directory.

See also the section on normalization below, which is an important concept used in the Markdown snapshot testing.

Parsing and Rendering

The Markdown dialect used in the GitLab application has a dual requirement for rendering:

  1. Rendering to static read-only HTML format, to be displayed in various places throughout the application.
  2. Rendering editable content in the Content Editor, a “What You See Is What You Get” (WYSIWYG) editor. The Content Editor supports real-time instant switching between an editable Markdown source and an editable WYSIWYG document.

These requirements means that GitLab has two independent parser and renderer implementations:

  1. The backend parser / renderer supports parsing and rendering to static read-only HTML. It is implemented in Ruby. It leverages the commonmarker gem, which is a Ruby wrapper for libcmark-gfm, GitHub’s fork of the reference parser for CommonMark. libcmark-gfm is an extended version of the C reference implementation of CommonMark
  2. The frontend parser / renderer supports parsing and WYSIWYG rendering for the Content Editor. It is implemented in JavaScript. Parsing is based on the Remark Markdown parser, which produces a MDAST Abstract Syntax Tree (MDAST). Rendering is the process of turning an MDAST into a ProseMirror document. Then, ProseMirror is used to render a ProseMirror document to WYSIWYG HTML. In this document, we refer to the process of turning Markdown into an MDAST as the frontend / JavaScript parser, and the entire process of rendering Markdown to WYSIWYG HTML in ProseMirror as the Content Editor. Several requirements drive the need for an independent frontend parser / renderer implementation, including:
    1. Lack of necessary support for accurate source mapping in the HTML renderer implementation used on the backend.
    2. Latency and bandwidth concerns: eliminating the need for a round-trip to the backend every time the user switches between the Markdown source and the WYSIWYG document.
    3. Different HTML and browser rendering requirements for WYSIWYG documents. For example, displaying read-only elements such as diagrams and references in an editable form.

Multiple versions of rendered HTML

Both of these GLFM renderer implementations (static and WYSIWYG) produce HTML which differs from the canonical HTML examples from the specification. For every Markdown example in the GLFM specification, three versions of HTML can potentially be rendered from the example:

  • Static HTML.
  • WYSIWYG HTML.
  • Canonical HTML.

Static HTML

Static HTML is HTML produced by the backend (Ruby) renderer, which contains extra styling and behavioral HTML. For example, Create task buttons added for dynamically creating an issue fro