Skip to main content

Your submission was sent successfully! Close

Thank you for signing up for our newsletter!
In these regular emails you will find the latest updates from Canonical and upcoming events where you can meet our team.Close

Thank you for contacting us. A member of our team will be in touch shortly. Close

An error occurred while submitting your form. Please try again or file a bug report. Close

  1. Blog
  2. Article

Canonical
on 24 August 2017

From docs to Schema


This article was originally featured on Chad Smith’s blog

Is there an echo in here? When looking through cloud-config modules it seemed there was a lot of boilerplate documentation and logic in each module to document and validate accepted configuration keys for the module.

Houston, we have a problem

Problem 1: Doc rot

Cloud-init has 51 python modules which define the configuration functions for cloud-config features. Each module has a set of supported YAML configuration options which are documented @ http://cloudinit.readthedocs.io. Documentation of new configuration options need to be updated with changes to module supported options. We’re all (mostly) human, and here’s where our friend “doc rot” enters our project. It is easy to forget to update documentation to match changed features.

Problem 2: Repetitive docs and configuration option parsing

Each cloud-config module has a boilerplate reStructured text docstring describing all configuration option for the module. Most modules also check presence of a top-level configuration key before parsing or skipping a given config. This key definition could be encoded in a simple structure which can be sourced for both documentation and initial config parse. Let’s observe a DRY approach to docs and module configuration definitions.

Problem 3: Absent config validation

Most cloud-config modules do little validation on the configuration options provided to each module. While appearing flexible, the lack of validation ultimately costs the user time and clarity due to terse KeyError or ValueError tracebacks which could be better handled if more strict validation were performed.

Solution: One schema to rule them all

Performing validation using a strict declarative schema has the following benefits:

  • a declared schema is an explicit API contract between the module and the configuration making it easier to consume due to reduced ambiguity
  • a schema definition improves automated testing coverage by describing all supported options which can be exercised
  • avoid stale docs by tightly coupling documentation to our config validation
  • strict validation versus permissive acceptance reduces cost of failures by addressing invalid configuration errors exlicitly and early instead of at deployment time.
  • performing upfront schema validation on the entire config allows for reporting multiple errors in one pass instead of individually hitting them at runtime.

Step 1: Add JSONchema definitions for each cloud-config module from which documentation can also be generated.

The ntp module, which supports optional servers and pools keys, shows and easy schema which codifies each property name, type and expected format:

schema = {
    'id': 'cc_ntp',
    'name': 'NTP',
    'title': 'enable and configure ntp',
    'description': 'Something with ntp',
    'distros': ['centos', 'ubuntu',...],
    'examples': [...],
    'properties': {
        'ntp': {
            'properties': {
                'pools': {
                    'type': 'array',
                    'items': {
                        'type': 'string',
                        'format': 'hostname'
                    },
                    'uniqueItems': True,
                },
                'servers': {
                    'type': 'array',
                    'items': {
                        'type': 'string',
                        'format': 'hostname'
                    },
                    'uniqueItems': True,
                }
            },
            'required': [],  # No required properties
            'additionalProperties': False  # Error on unregistered properties
        }
    }
}

Step 2: Add simple helper functions to generate sphinx docs from schema dict instead of module docstrings

The magic in sphinx doc generation is overriding the default module-level docstring behavior to make use of docs rendered from schema definition. This docstring generating callback needs to be added to your conf.py in the directory where you run sphinx:

def generate_docstring_from_schema(app, what, name, obj, options, lines):
    """Override module docs from schema when present."""
    if what == 'module' and hasattr(obj, "schema"):
        del lines[:]
        lines.extend(get_schema_doc(obj.schema).split('\n'))

def setup(app):
    app.connect('autodoc-process-docstring', generate_docstring_from_schema)

And the simple doc-generation from schema function is below:

SCHEMA_DOC_TMPL = """
{name}
---
**Summary:** {title}

{description}

**Internal name:** ``{id}``

**Module frequency:** {frequency}

**Supported distros:** {distros}

**Config schema**:
{property_doc}
{examples}
"""

def get_schema_doc(schema):
    """Return reStructured text rendering the provided jsonschema.

    @param schema: Dict of jsonschema to render.
    @raise KeyError: If schema lacks an expected key.
    """
    schema['property_doc'] = _get_property_doc(schema)
    schema['examples'] = _get_schema_examples(schema)
    schema['distros'] = ', '.join(schema['distros'])
    return SCHEMA_DOC_TMPL.format(**schema)

Step 3: The module handler function will iterate over schema errors with jsonschema.Validator and log collected warnings for all schema infractions.

from jsonschema import Draft4Validator, FormatChecker

validator = Draft4Validator(schema, format_checker=FormatChecker())
for error in sorted(validator.iter_errors(config), key=lambda e: e.path):
    path = '.'.join([str(p) for p in error.path])
    errors += ((path, error.message),)
if errors:
    raise SchemaValidationError(errors)

Step 4: Simple cmdline tools to validate cloud-config files against known schema to avoid costly errors during instance deployment.

Already included with cloud-init 0.7.9 is a minimal schema validation development tool:

python3 -m cloudinit.config.schema --help

Related posts


Canonical
9 October 2025

Canonical releases Ubuntu 25.10 Questing Quokka

Canonical announcements Article

The latest interim release of Ubuntu comes with compatibility enhancements at the silicon level, accessibility upgrades and a robust security posture that sets the stage for the next LTS. October 9, 2025 Today Canonical announced the release of Ubuntu 25.10, codenamed “Questing Quokka,” available to download and install from ubuntu.com/do ...


Aaron Whitehouse
8 October 2025

Ubuntu worker nodes for OKE now in Limited Availability

Ubuntu Article

Oracle Kubernetes Engine now supports Ubuntu images for worker nodes natively, with no need for custom images 8 October 2025 – Today Canonical, the publisher of Ubuntu, announced that Ubuntu worker nodes for Oracle Kubernetes Engine (OKE) are now available in Limited Availability. This means that OKE now supports Ubuntu images for worker ...


Tytus Kurek
7 October 2025

OpenStack cloud – happy 15th anniversary!

Cloud and server OpenStack

Happy birthday, OpenStack! It’s astonishing how fast time flies – fifteen years already. Yet, here we are: OpenStack cloud still stands as a de facto standard for open source cloud infrastructure implementation. It powers thousands of organisations around the world, across telco, finserv, public sector, IT, research, manufacturing and mor ...