Hello and welcome to the eighth instalment in the series where we build a parser for a domain specific language in Rust. I’d highly recommend going through the previous articles to make sense of what we’ll talk about today.

After a bit of back and forth with my mentor, we landed on moving the logic that imports other config files into the parser crate itself. Config files can reference other modules using import statements of the following form:

include some_other_module.swhkd

The grammar side is fairly simple to implement, we match the token “include” followed by a path to some other file.

import_file = { (!NEWLINE ~ ANY)+ }
import      = { "include" ~ import_file }

We’ll add this to the core set of variants so that we can actually match the expression.

content = _{ comment | mode | unbind | binding | import | NEWLINE }

Now we could very well blindly recurse through modules imported one after another but that comes with the subtle pitfall of an infinite recursion. Allow me to elaborate:

Assume you have a module called module_a that is the top-level or the root config file. Let’s say it imports another module, module_b. If module_b now imports module_a, our code enters an infinite recursion state, continuously evaluating these two modules forever.

Thus, the key takeaway is to implement book-keeping for the import paths so that they form a directional acyclic graph. This requires us to write some additional code for our parser.

First, let’s create a field in our parser struct that stores tha names of all the imports it has seen.

pub struct SwhkdParser {
    pub bindings: Vec<Binding>,
    pub unbinds: Vec<Definition>,
    pub imports: BTreeSet<String>,
    pub modes: Vec<Mode>,
}

Notice that the import field is a BTreeSet or a binary tree set. As you might know, adding duplicate elements to a set discards them, keeping only the unique elements behind. Although we could have used a HashSet here, a binary tree set is faster since it does not require a dedicated hashing function. Considering that the average setup would not wield even a thousand submodules, it’s sufficient to store the imports in a set.

We’ll create slightly separate implementations to differentiate between the root module and any submodules it imports. For now, let’s tackle the implementation for the submodules.

We create a method for the parser result called as_import for loading any of these aforementioned submodules.

fn as_import(input: ParserInput, seen: &mut BTreeSet<String>) -> Result<Self, ParseError> {
    // ...
}

The seen argument is how the caller tells the callee about what import paths it has already seen.

While processing import expressions, we keep adding the imports we have seen so far to a local BTreeSet.

let mut imports = BTreeSet::new();
for decl in contents.into_inner() {
    match decl.as_rule() {
        // other rules like bindings
        Rule::import => imports.extend(import_parser(decl)),
    }
}

Once all the tokens in the current config have been parsed, we can move on to adding the imports to the set of seen imports.

while let Some(import) = imports.pop_first() {
    if !seen.insert(import.clone()) {
        continue;
    }
    let child = Self::as_import(ParserInput::Path(Path::new(&import)), seen)?;
    imports.extend(child.imports);
    bindings.extend(child.bindings);
    unbinds.extend(child.unbinds);
    modes.extend(child.modes);
}

Although we recurse here, the base case when the set of seen elements already contains an import saves us from entering an infinite loop.

Once that’s done, we can return the newly parsed result.

Ok(SwhkdParser {
    bindings,
    unbinds,
    imports,
    modes,
})

Coming back to the root config, this is where we create the topmost set of seen imports that can be passed on to any Self::as_import calls.

pub fn from(input: ParserInput) -> Result<Self, ParseError> {
    let mut root_imports = BTreeSet::new();
    let mut root = Self::as_import(input, &mut root_imports)?;
    root.imports = root_imports;
    Ok(root)
}

We start off with an empty set and delegate the loading of the config to the as_import function, sending it a mutable reference to this (kind of) global source of truth, at least throughout the call stack of import related functions.

Lastly, for the sake of backwards compatibility, we assign the imports we have seen so far to the root parser result. This was the behavior present in the original parser. Note that the import fields in the submodules will all be empty since we popped them one by one in this loop:

while let Some(import) = imports.pop_first() {
    // ...
}

Okay, that’s all for now. See you soon!