GoWin Tools
Tools
โ† JSON Converter

JSON Converter ยท 7 min read

YAML explained: why DevOps abandoned JSON for configuration files

YAML started as a data serialisation language and became the default config format for Kubernetes, GitHub Actions, and Docker Compose. Here is why, and what to watch out for.

YAML โ€” which stands for "YAML Ain't Markup Language" โ€” was first proposed by Clark Evans in 2001 as a human-friendly data serialisation format. Twenty-five years later it is the dominant syntax for infrastructure configuration, CI/CD pipelines, and container orchestration. Understanding what YAML does differently from JSON, where it excels, and where it fails is essential for anyone working in a modern DevOps environment.

A brief history of YAML

YAML 1.0 was published in 2004 by Evans, Oren Ben-Kiki, and Brian Ingerson. The stated goal was a format that humans could read and write without the syntactic overhead of XML or the strictness of JSON. Early adoption was slow, confined mostly to Ruby configuration files (Rails used YAML for database configuration from the start) and Perl data interchange.

The real inflection point came in the 2010s with the rise of configuration management tools. Ansible, released in 2012, chose YAML as its playbook language. Kubernetes, open-sourced by Google in 2014, adopted YAML for all resource manifests. GitHub Actions, launched in 2019, used YAML for workflow files. Each of these platforms brought millions of developers into contact with YAML, making it effectively unavoidable in infrastructure work.

Indentation-based syntax

YAML uses whitespace indentation to denote structure, similar to Python. There are no braces, brackets, or explicit delimiters โ€” hierarchy is expressed purely through indentation level. A two-space indent under a key means the indented content belongs to that key.

# Kubernetes Deployment (abbreviated)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  labels:
    app: my-app
spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: app
          image: my-app:latest

The same structure in JSON requires braces, quotes around every key, and commas after every value. For configuration files read and edited by humans daily, YAML's syntax is genuinely less noisy. For data exchanged programmatically between systems, JSON's strictness is an advantage.

Why DevOps adopted YAML over JSON

Three properties drove YAML's adoption in configuration contexts. First, YAML supports comments with the # character. JSON has no comment syntax โ€” a deliberate omission by Douglas Crockford to keep JSON a pure data format. In a Kubernetes manifest or Ansible playbook, inline comments explaining why a value is set to a particular number are essential for maintainability. JSON cannot accommodate them without extensions.

Second, YAML supports multi-line strings naturally, using block scalars (| for literal blocks, > for folded blocks). Embedding a shell script or a certificate inside a JSON string requires escaping every newline as \n and every quote as \" โ€” which is error-prone and unreadable. YAML handles these cases cleanly.

Third, YAML supports anchors and aliases โ€” a form of reference that lets you define a value once and reuse it throughout the file. In a Docker Compose file with multiple services sharing the same environment variables, anchors eliminate repetition without requiring a templating layer.

Common pitfalls

YAML's flexibility is also the source of its most frustrating bugs. Indentation errors are silent โ€” a misaligned block does not throw a parse error but silently changes the meaning of the document. A key that should be nested under another key becomes a sibling at the wrong level, and the error manifests only at runtime when the tool consuming the YAML behaves unexpectedly.

The most notorious pitfall is implicit type coercion โ€” sometimes called the Norway Problem. In YAML 1.1 (the version used by most real-world parsers), bare unquoted strings are automatically converted to typed values. The two-letter ISO country code for Norway is NO. In YAML 1.1, an unquoted NO is parsed as the boolean false. Similarly, yes, on, and off are parsed as booleans. Version numbers like 1.2 become floats. Octal literals prefixed with 0 are parsed as integers.

YAML 1.2 (2009) removed most of this implicit coercion, but most parsers โ€” including those used by Kubernetes and Ansible โ€” implement YAML 1.1 behaviour for historical compatibility. The safe rule: quote any string value that could be misinterpreted as a boolean, number, or null.

YAML in Kubernetes, GitHub Actions, and Docker Compose

In Kubernetes, every resource โ€” Deployments, Services, ConfigMaps, Secrets โ€” is defined as a YAML document submitted to the API server. The schema is strict (fields are validated against the resource definition), but YAML is the surface format. Large Kubernetes deployments commonly use templating tools like Helm or Kustomize to generate YAML programmatically rather than maintaining it by hand.

GitHub Actions workflows are YAML files stored in .github/workflows/. The format maps closely to the Actions event model: triggers (on), jobs, steps, and environment variables. YAML's comment support is valuable here for annotating why a step runs or what a secret variable is for.

Docker Compose files (docker-compose.yml) define multi-container applications. YAML anchors are particularly useful in Compose files for sharing common service configurations such as logging drivers, restart policies, and network assignments.

When to choose YAML over JSON

YAML is the right choice when the file will be read and edited by humans regularly, when comments are needed for clarity, when multi-line string values are common, or when the ecosystem you are working in has standardised on YAML (Kubernetes, Ansible, GitHub Actions). JSON is the right choice when the data will be consumed programmatically, when strict type safety matters, when tooling support is the priority, or when the consuming system does not support YAML. For cases where human readability and strict typing are both required, TOML is worth evaluating.

References

  1. Ben-Kiki, O., Evans, C., & Ingerson, B. (2021). YAML Ain't Markup Language (YAML) version 1.2.2. yaml.org.
  2. Burns, B., Beda, J., Hightower, K., & Evenson, L. (2022). Kubernetes: Up and Running (3rd ed.). O'Reilly Media.
  3. GitHub. (2024). Workflow syntax for GitHub Actions. docs.github.com.
  4. Merkel, D. (2014). Docker: Lightweight Linux containers for consistent development and deployment. Linux Journal, 2014(239).
  5. Poul-Henning Kamp. (2010). The Norway problem โ€” YAML implicit type coercion. Blog post archived at The Register.