Haml/Yaml/Markdown

Tags

Knowing when to cut and run is a great skill to have, because there's nothing like continuing to invest in sunk cost projects. Given a recent opportunity to cut my losses on the many in-draft articles here on this site, I naturally decided to double down and actually finish a few of them (possibly all of them).

This article was started February 15, 2016. In the intervening years, I've changed employment a number of times, had game activity wax and wane, and had similar waxing and waning with specific technology. But writing and document engineering is a perennial attraction. As part of a current drive to reduce the number of "draft" articles (mostly unfinished game sessions), it's a perfect time to get this out the door.

Significantly, in the last couple of years, Generative AI technology enabled by the creation of Large Language Models is becoming nearly ubiquitous in software engineering. This article leverages ChatGPT and Cursor/Sonnet 3.5 for a number of reasons:

  1. Hooking up databases, of which a file system may also be considered, is rapidly becoming a triviality. LLMs can solve technical issues in seconds which might take me a few hours of deep reading.
  2. There is still value in static data such as intended to present in this article. Having backing database operating at run time is overkill.
  3. While a runtime database is overkill, the file system can be regarded as a database, whence the technical decisions around data formatting matter.
  4. Alternatively, a static and archived instance of a SQLite3 database may also be a viable option. This database would only be queried at build time, and would never ship to production.
  5. Haml is a very fine templating language which gets too little attention. It's leveraged in this article to provide access to the Ruby programming language which is used by the platform (MiddlemanApp), necessary for processing the data.

As the initial version of this article is wrapped up, the initial goal of leveraging static but mutable data for creating static web pages is satisfied. The game data is stored in completely separate files which are loaded and processed at build time to provide game data when the page renders. There is more which could be done, some of which is noted in the article text.

Data processing a few Vietnam wargames

This is not my complete collection of Vietnam wargames nor an exhaustive list of all the known games, and the listed descriptions which follow are incomplete. It is, however, a proof of concept for integrating YAML processing directly into a Haml template.

Design

Two aspects of design are relevant. First, how should the data be designed to be accessible during build time? Second, how should the data be processed and presented during build time? Build time is when the data and the template are processed by the engine, in this case, an appropriately configured Middleman App. The engine first renders the Haml, then renders the markdown, with the final output being HTML as displayed here.

Data design

Among many choices for storing the game data, this project is using Yaml. It's easy to read, store, and process. Given Yaml, a decision has to be made whether to store all the games in a single file, or each game in it's own file. There are pros and cons for each.

However, there is a strong argument that having each game in its own file reduces the cogitive load when the files need manual editing, or when a new game is added. With a large single file, it's easier to create syntax errors and induce annoying debugging. With multiple smaller files, adding a new game is as easy as adding its relevant file. The tradeoff is that opening many smaller files is a lot slower than opening a single file, even when the file is large.

If multiple files ever becomes an issue, it's not that difficult to concatenate all the individual files into a single file in some out-of-band process. This provides a best of both worlds at the one time expense of setting up the concatenation.

Regardless of whether the data is stored in a single file or multiple files, Yaml is not the most efficent for data entry. Dedicated data entry tools or spreadsheets are far more efficient. For this tiny application, writing a dedicated tool is out of scope, it would require a lot more web programming than warranted. However, spreadsheets typically supply CSV exporting, for which the CSV file can be loaded into a database, then Yaml emitted from the database. As an alternate, CSV could have been used for this project as well, but out of band reasons dictated Yaml. Generating Yaml from a database could use either a single file or multiple files.

Processor design

There is not much to document. For individual, game-specific file, each are read then displayed.

Batch processing reads all the files at once, then iterates over the collection of files to display the contents of each.

The intersting aspect of this is that the processing is being done from Ruby code embedded in the text of the file. The reader doesn't see the code, only what it outputs.

Individual file processing

The YAML formatted data for each game is loaded into a Ruby hash, where each game is explicitly named in the processor. Here's example of the hash: {"game"=>{"title"=>"Viet Nam", "published"=>1965, "designer"=>"Phil Orbanes", "publisher"=>"Game Science", "comment"=>"Given its initial publication in 1965, the game seems prescient to how the war eventually played out."}}.

The hash is dereferenced appropriately to display the game's title, designer, etc. This is the standard stuff, the novelty being in how it's done in the Middleman template. Also, presenting each game requires duplicating markup. Each game has it's own explicit section in the article, and that has to be explicitly defined. Here are two of them:

Viet Nam

Published: 1965

Given its initial publication in 1965, the game seems prescient to how the war eventually played out.

Indochine 1952: Opération "Bruno"

published: 2013

Each of the games above was individually added to this file, a very tedious process which nevetheless leverages having the game data in an independent and parseable format.

Batched processing

Reading individual files is not difficult, but it is time consuming. Each file has to be read by name into the processor (which is this article) with the presentation duplicated for each file. A better way is to read all the files at once and iterate over the collection. This allows having a single, standard presentation, as demonstrated by article-specific header styling.

require 'pathname'
games_dir = Pathname.new('data/indochina/games')
game_files = Dir.glob(games_dir.join('*.yaml'))
games = game_files.map { |file| YAML.load_file(file).with_indifferent_access }

Once the games are loaded, iteration over the collection emitting formatted output. Standard stuff, really. But always fun to build.

Bad Moon Rising

Designer: Paul Rohrbaugh

Publisher: High Flying Dice Games

Indochine 1952: Opération "Bruno"

Published: 2013

Viet Nam

Published: 1965

Designer: Phil Orbanes

Publisher: Game Science

Comments: Given its initial publication in 1965, the game seems prescient to how the war eventually played out.


And that's all there is to it. Adding a game is as simple as adding another game file, everything else is automatic. Future opportunity for batched processing includes both adding more games, and further customizing the presentation. In particular, hanging indentation might be a great way to help separate game-related text from the article text proper.

Summary

From Cursor/Sonnet 3.5:

This document explores the technical implementation of content generation using HAML and YAML, using Vietnam wargames as a practical example. It demonstrates the evolution from manual, individual file processing to automated batch processing, while examining the tradeoffs between different data storage approaches (single vs multiple YAML files). The article serves as both a technical reference and a practical demonstration of how to structure, process, and present data in a static site generator context, with particular attention to the benefits and challenges of each approach.

While it took over 9 years to ship this article, it was definitely worth finishing and publishing. The techniques here are between completely static text, and a fully database-driven application. As such, it retains flexibility while remaining lightweight.

At the time of publication, the article is generated from a MiddlemanApp build, something which may change in the future.


Tags