YAML is a human-readable configuration file format that is flexible and easy to understand, but fraught with unexpected pitfalls. Here’s how to dodge its most precarious issues. Credit: Lucadp / Getty Images The YAML (“YAML Ain’t Markup Language”) configuration language sits at the heart of many modern applications including Kubernetes, Ansible, CircleCI, and Salt. After all, YAML offers many advantages, like readability, flexibility, and the ability to work with JSON files. But YAML is also a source of pitfalls and gotchas for the uninitiatied or incautious. Many aspects of YAML’s behavior allow for momentary convenience, but at the cost of unexpected zigs or zags later on down the line. Even folks with plenty of experience assembling or deploying YAML can be bitten by these issues, which often surface in the guise of seemingly innocuous behavior. Here are seven steps you can take to guard against the most troublesome gotchas in YAML. When in doubt, quote strings The single most powerful defensive practice you can adopt when writing YAML: Quote everything that is meant to be a string. One of YAML’s best-known quirks is that you can write strings without quoting: - movie: title: Blade Runner year: 1982 In this example, the keys movie, title, and year will be interpreted as strings, as will the value Blade Runner. The value 1982 will be parsed as a number. But what happens here? - movie: title: 1979 year: 2016 That’s right—the movie title will be interpreted as a number. And that’s not even the worst thing that can happen: - movie: title: No year: 2012 What are the odds this title will be interpreted as a boolean? If you want to make absolutely sure that keys and values will be interpreted as strings, and guard against any potential ambiguities (and a lot of ambiguities can creep into YAML), then quote your strings: - "movie": "title": "Blade Runner" "year": 1982 If you’re unable to quote strings for some reason, you can use a shorthand prefix to indicate the type. These make YAML a little noisier to read than quoted strings, but they are just as unambiguous as quoting: movie: !!str Blade Runner Beware of multiline strings YAML has multiple ways to represent multiline strings, depending on how those strings are formatted. For instance, unquoted strings can simply be broken across multiple lines when prefixed with a >: long string: > This is a long string that spans multiple lines. Note that using > automatically appends a n at the end of the string. If you don’t want the trailing new line, then use >- instead of >. If you use quoted strings, you need to preface each line break with a backslash: long string: "This is a long string that spans multiple lines." Note that any spaces after a line break are interpreted as YAML formatting, not as part of the string. This is why the space is inserted before the backslash in the example above. It ensures the words string and that don’t run together. Beware of booleans As hinted above, one of YAML’s other big gotchas is boolean values. There are so many ways to specify booleans in YAML that it is all too easy for an intended string to be interpreted as a boolean. One notorious example of this is the two-digit country code problem. If your country is US or UK, fine. If your country is Norway, the country code for which is NO, that is no longer a string—it’s a boolean that evaluates to false! Whenever possible, be deliberately explicit with both boolean values and shorter strings that might be misinterpreted as booleans. YAML’s shorthand prefix for booleans is !!bool. Watch out for multiple forms of octal This is an out-of-the-way gotcha, but it can be troublesome. YAML 1.1 uses a different notation for octal numbers than YAML 1.2. In YAML 1.1, octal numbers look like 0777. In YAML 1.2, that same octal becomes 0o777. It’s much less ambiguous. Kubernetes, one of the biggest users of YAML, uses YAML 1.1. If you use YAML with other applications that use version 1.2 of the spec, be extra-careful not to use the wrong octal notation. Since octal is generally used only for file permissions these days, it’s a corner case compared to other YAML gotchas. Still, YAML octal can bite you if you’re not careful. Beware of executable YAML Executable YAML? Yes. Many YAML libraries, such as PyYAML for Python, have allowed the execution of arbitrary commands when deserializing YAML. Amazingly, this isn’t a bug, but a capability YAML was designed to allow. In PyYAML’s case, the default behavior for deserialization was eventually changed to support only a safe subset of YAML that doesn’t allow this sort of thing. The original behavior can be restored manually (see the above link for details on how to do this), but you should avoid using this feature if you can, and disable it by default if it isn’t already disabled. Beware of inconsistencies when serializing and deserializing Another potential issue with YAML is that different YAML-handling libraries across different programming languages sometimes generate different results. Consider: If you have a YAML file that includes boolean values represented as true and false, and you re-serialize that to YAML using a different library that represents booleans as y and n or on and off, you could get unexpected results. Even if the code remains functionally the same, it could look totally different. Don’t use YAML The most general way to avoid problems with YAML? Don’t use it. Or at least, don’t use it directly. If you have to write YAML as part of a configuration process, it could be safer to write the code in JSON or native code (e.g., Python dictionaries), then serialize that to YAML. You’ll have more control over the types of objects, and you’ll be more comfortable using a language you already work with. Failing that, you could use a linter such as yamllint to check for common YAML problems. For instance, you can forbid truthy values like YES or off, in favor of simply true and false, or to enforce string quoting. Related content feature 14 great preprocessors for developers who love to code Sometimes it seems like the rules of programming are designed to make coding a chore. Here are 14 ways preprocessors can help make software development fun again. By Peter Wayner Nov 18, 2024 10 mins Development Tools Software Development feature Designing the APIs that accidentally power businesses Well-designed APIs, even those often-neglected internal APIs, make developers more productive and businesses more agile. By Jean Yang Nov 18, 2024 6 mins APIs Software Development news Spin 3.0 supports polyglot development using Wasm components Fermyon’s open source framework for building server-side WebAssembly apps allows developers to compose apps from components created with different languages. By Paul Krill Nov 18, 2024 2 mins Microservices Serverless Computing Development Libraries and Frameworks news Go language evolving for future hardware, AI workloads The Go team is working to adapt Go to large multicore systems, the latest hardware instructions, and the needs of developers of large-scale AI systems. By Paul Krill Nov 15, 2024 3 mins Google Go Generative AI Programming Languages Resources Videos