- Terms and definitions
- Parsing and Rendering
- Goals
- Implementation
- Workflows
GitLab Flavored Markdown (GLFM) Specification Guide
GitLab supports Markdown in various places. The Markdown dialect we use is called GitLab Flavored Markdown, or GLFM.
The specification for the GLFM dialect is based on the GitHub Flavored Markdown (GFM) specification, which is in turn based on the CommonMark specification. The GLFM specification includes several extensions to the GFM specification.
See the section on acronyms for a detailed explanation of the various acronyms used in this document. This guide is a developer-facing document that describes the various terms and definitions, goals, tools, and implementations related to the GLFM specification. It is intended to support and augment the user-facing documentation for GitLab Flavored Markdown.
Terms and definitions
Acronyms: GLFM, GHFM, GFM, CommonMark
GitHub Flavored Markdown is widely referred to by the acronym GFM, and this document follows that convention as well. GitLab Flavored Markdown is referred to as GLFM in this document, to distinguish it from GitHub Flavored Markdown.
Unfortunately, this convention is not followed consistently in the rest of the documentation or GitLab codebase. In many places, the GFM acronym is used to refer to GitLab Flavored Markdown. An open issue exists to resolve this inconsistency.
Some places in the code refer to both the GitLab and GitHub specifications
simultaneous in the same areas of logic. In these situations,
GitHub Flavored Markdown may be referred to with variable or constant names like
ghfm_
to avoid confusion. For example, we use the ghfm
acronym for the
ghfm_spec_v_0.29.txt
GitHub Flavored Markdown specification file
which is committed to the gitlab
repo and used as input to the
update_specification.rb
script.
The original CommonMark specification is referred to as CommonMark (no acronym).
Various Markdown specifications
The specification format we use is based on the approach used in CommonMark, where
a spec.txt
file serves as documentation, as well as being in a format that can
serve as input to automated conformance tests. It is
explained in the CommonMark specification:
This document attempts to specify Markdown syntax unambiguously. It contains many examples with side-by-side Markdown and HTML. These examples are intended to double as conformance tests.
Here are the HTML-rendered versions of the specifications:
- GitLab Flavored Markdown (GLFM) specification, which extends the:
-
GitHub Flavored Markdown (GFM) specification (rendered from the source
spec.txt
for GFM specification), which extends the: -
CommonMark specification (rendered from the source
spec.txt
for CommonMark specification)
However, GLFM has more complex parsing, rendering, and testing requirements than
GFM or CommonMark. Therefore,
it does not have a static, hardcoded, manually updated spec.txt
. Instead, the
GLFM spec.txt
is automatically generated based on other input files. This process
is explained in detail in the Implementation sections below.
Markdown examples
Everywhere in the context of the specification and this guide, the term examples is specifically used to refer to the Markdown + HTML pairs used to illustrate the canonical parsing (or rendering) behavior of various Markdown source strings in the standard CommonMark specification format.
In this context, it should not be confused with other similar or related meanings of example, such as RSpec examples.
Parsers and renderers
To understand the various ways in which a specification is used, and how it related to a given Markdown dialect, it’s important to understand the distinction between a parser and a renderer:
- A Markdown parser accepts Markdown as input and produces a Markdown Abstract Syntax Tree (AST) as output.
- A Markdown renderer accepts the AST produced by a parser, and produces HTML (or a PDF, or any other relevant rendering format) as output.
Types of Markdown tests driven by the GLFM specification
The two main types of automated testing are driven by the Markdown examples and data contained in the GLFM specification. We refer to them as:
- Markdown conformance testing.
- Markdown snapshot testing.
Many other types of tests also occur in the GitLab codebase, and some of these tests are also related to the GLFM Markdown dialect. Therefore, to avoid confusion, we use these standard terms for the two types of specification-driven testing referred to in this documentation and elsewhere.
Markdown conformance testing
Markdown conformance testing refers to the standard testing method used by
all CommonMark Markdown dialects to verify that a specific implementation conforms
to the CommonMark Markdown specification. It is enforced by running the standard
CommonMark tool spec_tests.py
against a given spec.txt
specification and the implementation.
spec_tests.py
may eventually be re-implemented in Ruby, to not have a dependency on Python.Markdown snapshot testing
Markdown snapshot testing refers to the automated testing performed in the GitLab codebase, which is driven by “example_snapshots” fixture data derived from all of the examples in the GLFM specification. It consists of both backend RSpec tests and frontend Jest tests which use the fixture data. This fixture data is contained in YAML files. These files are generated and updated based on the Markdown examples in the specification, and the existing GLFM parser and render implementations. They may also be manually updated as necessary to test-drive incomplete implementations. Regarding the terminology used here:
- The Markdown snapshot tests can be considered a form of the
Golden Master Testing approach,
which is also referred to as Approval Testing or Characterization Testing.
- The term Golden Master originally comes from the recording industry, and refers to the process of mastering, or making a final mix from which all other copies are produced.
- For more information and background, you can read about Characterization Tests and Golden Masters.
- The usage of the term snapshot does not refer to the approach of
Jest snapshot testing, as used elsewhere
in the GitLab frontend testing suite. However, the Markdown snapshot testing does
follow the same philosophy and patterns as Jest snapshot testing:
- Snapshot example fixture data is represented as files which are checked into source control.
- The files can be automatically generated and updated based on the implementation of the code under tests.
- The files can also be manually updated when necessary, for example, to test-drive changes to an incomplete or buggy implementation.
- The usage of the term fixture does not refer to standard Rails database fixture files. It instead refers to test fixtures in the more generic definition, as input data to support automated testing.
- These example snapshots fixture files are generated from and closely related to the rest of the
GLFM specification. Therefore, the
example_snapshots
directory is colocated under theglfm_specification
directory with the rest of the GLFM specification files. They are intentionally not located under thespec/fixtures
directory with the rest of the fixture data for the GitLab Rails application. In practice, developers have found it simpler and more understandable to have everything under theglfm_specification
directory rather than splitting these files into thespec/fixtures
directory.
See also the section on normalization below, which is an important concept used in the Markdown snapshot testing.
Parsing and Rendering
The Markdown dialect used in the GitLab application has a dual requirement for rendering:
- Rendering to static read-only HTML format, to be displayed in various places throughout the application.
- Rendering editable content in the Content Editor, a “What You See Is What You Get” (WYSIWYG) editor. The Content Editor supports real-time instant switching between an editable Markdown source and an editable WYSIWYG document.
These requirements means that GitLab has two independent parser and renderer implementations:
- The backend parser / renderer supports parsing and rendering to static
read-only HTML. It is implemented in Ruby.
It leverages the
commonmarker
gem, which is a Ruby wrapper forlibcmark-gfm
, GitHub’s fork of the reference parser for CommonMark.libcmark-gfm
is an extended version of the C reference implementation of CommonMark - The frontend parser / renderer supports parsing and WYSIWYG rendering for
the Content Editor. It is implemented in JavaScript. Parsing is based on the
Remark Markdown parser, which produces a
MDAST Abstract Syntax Tree (MDAST). Rendering is the process of turning
an MDAST into a ProseMirror document. Then,
ProseMirror is used to render a ProseMirror document to WYSIWYG HTML. In this
document, we refer to the process of turning Markdown into an MDAST as the
frontend / JavaScript parser, and the entire process of rendering Markdown
to WYSIWYG HTML in ProseMirror as the Content Editor. Several
requirements drive the need for an independent frontend parser / renderer
implementation, including:
- Lack of necessary support for accurate source mapping in the HTML renderer implementation used on the backend.
- Latency and bandwidth concerns: eliminating the need for a round-trip to the backend every time the user switches between the Markdown source and the WYSIWYG document.
- Different HTML and browser rendering requirements for WYSIWYG documents. For example, displaying read-only elements such as diagrams and references in an editable form.
Multiple versions of rendered HTML
Both of these GLFM renderer implementations (static and WYSIWYG) produce HTML which differs from the canonical HTML examples from the specification. For every Markdown example in the GLFM specification, three versions of HTML can potentially be rendered from the example:
- Static HTML.
- WYSIWYG HTML.
- Canonical HTML.
Static HTML
Static HTML is HTML produced by the backend (Ruby) renderer, which contains extra styling and behavioral HTML. For example, Create task buttons added for dynamically creating an issue fro