mov eax, rgb - Stephen Whittle

My current contract is for a SDK that uses asio as an asynchronous event loop, and standard practice for asio projects is for callbacks to receive a std::error_code as a status indicator. std::error_code is pretty useful, but requires a fair bit of boilerplate if you want to create your own codes.

To ease that process, I've cooked up a code-generation process that can be driven by CMake to produce a nice header for all the error categories and codes we return to consumers.

A custom std::error_code requires these components:

An enum class containing the numeric values for the error codes
A class derived from std::error_category and a static instance of that class
- This class needs a override for error_category::message() that returns a string message for a given numerical value in the category

In addition for convenience:

An implementation of make_error_code to simplify creating a std::error_code instance containing a specific numerical value and the correct error_category instance
An implementation of operator== to simplify comparisons between a std::error_code and a numeric value in the enum class

In particular, the implementation of error_category::message() is quite tedious to do if you want unique, ie helpful, strings attached to your error codes.


struct HttpErrorCategoryImpl : std::error_category
{
	inline const char* name() const noexcept override { return "HttpError"; }
	inline std::string message(int ErrorValue) const override
	{
		switch (static_cast<HttpError>(ErrorValue))
		{
			case HttpError::CannotOpenConnection:
					return "Unable to connect to server";
				break;
			case HttpError::HttpAlreadyInitialized:
					return "HTTP service already initialized";
				break;
			case HttpError::HttpNotInitialized:
					return "HTTP service not initialized";
				break;
			// <Repeat n times for each individual status code> 
			default:
				return "Unknown HttpError error";
		}
	}
};

This is just the boilerplate implementation of the error category - it's not bad for a few codes as in this example - but if you have many to do, the manual approach really, really sucks. Needless to say after doing this exactly once, I decided to use code generation instead.

So for code generation to work there's a couple of considerations:

The format of the input data
The format of the output template
Dependency tracking for both data and template files
What will actually transform the input data and template into the output? How will the output be used by the build?

Input data format

Given that the project deals with REST APIs and HTTP already, JSON was a natural choice for the format of the input data. JSON also allows for slightly more structure to the data than, say, CSV or ini file formats too.

JSON, however, remains flexible enough to allow us to specify error categories in two ways, either hardcoding the numeric value of each error in the category or leaving it unspecified:

{
  "ErrorTypes": [
    {
      "Name": "HttpError",
      "Values": {
        "HttpNotInitialized": "HTTP service not initialized",
        "HttpAlreadyInitialized": "HTTP service already initialized",
        "CannotOpenConnection": "Unable to connect to server"
      }
    },
	{
    	"Name": "ApiError",
      "Values": {
        "InvalidApiVersion": {
          "Code": 10003,
          "Description": "API version supplied is invalid."
        },
        "MissingAPIKey": {
          "Code": 11000,
          "Description": "api_key is missing from your request."
        },
	  }
	}
  ]
}

Output file template format

I wasn't particularly worried about the specific templating engine that was used by this, but I really did want something with the ability to express loops or more complex logic in them. For instance, with the above JSON, I wanted the output template to be able to handle either:

a key-value pair indicating the particular error code did not have a fixed numeric value
a JSON object containing a numeric value for the code as well as a text description

I also wanted to be able to say 'for each element in the error types, generate the following output' in a concise way.

Dependency tracking for use as part of the build system

I also wanted something that I could invoke as part of our internal build system (we use CMake) so that if I changed either the input data or the output template I'd automatically get an updated header. This was done via a few steps:

Create a custom target for each source file that required code generation, that depends on the output file
Set the custom target as a dependency of the main project
Create a custom command to generate the output file
Mark the output file as generated so that CMake knows to check the inputs

This means that when the main project is built, the custom target is pulled in as a dependency which itself depends on the output file, causing the custom command to be run to update it if any of the inputs are changed.

This does have the side-effect of creating some utility targets when you are using the Visual Studio CMake integration's Target View, but I view that as a fairly minimal downside.

Code generator binary

So given the aforementioned dependency tracking I wanted, this meant that the actual code generator needed to be something that CMake could invoke to perform the data-and-template-to-output transformation itself.

CMake has it's own code generation support but it is pretty simple in terms of the use cases it supports and mostly consists of simple text substitutions based on CMake variable values.

Bearing in mind the other requirements on my input formats (supports JSON, supports loops and logic) I eventually came across Inja. It is a C++ library that meets those needs and is both easily integrated into an executable as well as easily extendable.

Importantly, Inja uses a flexible templating format that isn't overly complicated. Loop support is super simple too. This snippet handles iterating over the different error values in the above JSON file, checking if we're passing a key-value pair or an object containing a description and numeric value:

## for Name, Value in ErrorType.Values
				case {{ ErrorType.Name }}::{{ Name }}:
	{% if isObject(Value) %}
					return "{{ Value.Description }}";
	{% else %}
					return "{{ Value }}";
	{% endif %}
				break;
## endfor

Inja uses the concept of an environment for code generation. You create an environment, set properties on it that control code generation, then ask the environment to process your input files to create the specified output file. Adding custom operators that can be invoked from templates is as simple as providing your own lambda function to do the transformation on the input. For example, here's a simple 'nospaces' function that will strip any spaces from the first provided argument:

Env.add_callback("nospaces", [](inja::Arguments& args) {
		auto StringVal = args[0]->get<std::string>();
		StringVal.erase(std::remove(StringVal.begin(), StringVal.end(), ' '), StringVal.end());
		return StringVal;
	});

This would be invoked in the template file using {{ nospaces(some.json.field) }}.

So our code generator is a single-cpp project that parses command-line arguments for input data, template, and output paths, and simply forwards that onto the Inja library after adding some custom string handling functions for CamelCasing and stripping out the spaces in identifiers.

Putting it all together

Making this process a little bit easier simply required a couple of CMake macros that can be used throughout the project. Defining a generated file and making sure it gets included in the build is done with the following macros:


macro(add_generated_header_to_target Target HeaderTemplate Data OutputPath)
	FetchContent_GetProperties(code_generator)
    if (NOT code_generator_POPULATED)
        set (FETCHCONTENT_QUIET OFF CACHE INTERNAL "" FORCE)
        FetchContent_Populate(code_generator)
    endif()


    message(STATUS "Registering ${OutputPath} as output from template ${HeaderTemplate} for target ${Target}")
	set_property(GLOBAL APPEND PROPERTY generated_headers ${OutputPath})
    set(CodeGenPath "${code_generator_BINARY_DIR}/code_generator.exe")
    add_custom_command(
        OUTPUT  ${OutputPath}
        COMMAND ${CodeGenPath} ${HeaderTemplate} ${Data} ${OutputPath}
        DEPENDS ${CodeGenPath} ${HeaderTemplate} ${Data}
    )

    get_filename_component(FileNameOnly ${OutputPath} NAME_WLE)
    add_custom_target(generate_${FileNameOnly} DEPENDS ${OutputPath})
    add_dependencies(${Target} generate_${FileNameOnly})
    add_dependencies(${Target} code_generator)
endmacro()

macro(mark_generated_headers)
get_property(GeneratedFiles GLOBAL PROPERTY generated_headers )
foreach(OutFile in LISTS GeneratedFiles)
    set_source_files_properties(${OutFile} PROPERTIES GENERATED 1)
endforeach()
endmacro()

add_generated_header_to_target can be invoked anywhere in the project, but due to some CMake limitations at the time this solution was developed the GENERATED property needed to be set at the project's top level. As a result, add_generated_header_to_target builds a list of all generated sources used through the project and mark_generated_headers is called as the last line in the top-level CMakeLists.txt file to ensure it gets done after all the generated headers are defined.

Data-driven code generation for C++ projects with CMake and inja

Input data format

Output file template format

Dependency tracking for use as part of the build system

Code generator binary

Putting it all together