How to parse CLI arguments in Zig

I am busy building a tool that will help me manage my IMAP mailboxes. Usually I would dive head-first into the logic and then figure out the CLI semantics later. This time I wanted to do it the other way around, and I wanted to parse arguments from scratch, without any libraries.

Why not use a library?

I am relatively new to Zig and this is a super easy way to add some learning while building something. Having a library at hand and docs to refer to usually hides a lot of cool details that could serve as a learning opportunity. I also just wanted to write code, not jump between library docs and my code.

Meta-programming and enumerations were a god send

Let's take this small example:

Imagine your cli accepts 2 commands add and remove, and each command has a different set of flags it can accept.

Before Enums

Starting with a naive implementation, we will first get an ArgIterator. These are super handy to parse arguments.

const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    const allocator = gpa.allocator();

    // argsWithAllocator has better platform support
    var args = try std.process.argsWithAllocator(allocator);
    defer args.deinit();
}

Now the logic is:

Skip the first arg (the program name)
Check if the next arg
- Exists
- is add or remove
- Handle unknown commands

(We will also need to parse flags, but that's coming up in a sec.)

const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    const allocator = gpa.allocator();

    // argsWithAllocator has better platform support
    var args = try std.process.argsWithAllocator(allocator);
    defer args.deinit();

    // Skip the first arg (the program name)
    _ = args.next();

    while(args.next()) |arg| {
        if (std.mem.eql(u8, arg, "add")) {
            // Handle add command
            return std.debug.print("Add command\n", .{});
        } else if (std.mem.eql(u8, arg, "remove")) {
            // Handle remove command
            return std.debug.print("Remove command\n", .{});

        // Handle unknown command
        } else {
            return std.debug.print("Unknown command: {}\n", .{arg});
        }
    }

    // Handle emtpy args (all code paths in while return)
    std.debug.print("usage goes here\n", .{});
}

That works, but it is not very elegant, here are some of the reasons why:

We need to handle exhaustive checks

Whenever our program needs a new command, the way we add it is by adding a whole new branch to the if statement. This is not a big deal, but it is not very elegant either.

We don't have a central place to add a command and let the compiler fail if we don't handle it.

We need to handle the flags for each command separately

This creates quite a lot of repetition. Let's look at how we'll handle flags for the add command:

// Rest of the code ommitted for brevity
if (std.mem.eql(u8, arg, "add")) {
  var first_none_optional: ?[]const u8 = null;

  while(args.next()) |flag| {
      if (flag[0] != '-') {
          first_none_optional = flag;
          break;
      }

      if (std.mem.eql(u8, flag, "--my-flag")) {
          // Process --my-flag
      }
  }

  return std.debug.print("Add command, {?s}\n", .{first_none_optional});
}

Here we are checking every arg after the command. As soon as we hit a non-flag, we store it since that is the positional argument that may come after the command. If the command doesn't accept positionals (e.g. add value), we can just ignore the null value.

The meh part about this is that we need to repeat this logic for every command. We also need to check if the flag is valid for the command. We can only do it within the if because we need to know what the value of the command is to determine if the flag is valid.

Introducing Enums

Enums in Zig are a powerful way to represent a set of related values. They can be used to create a more structured and type-safe way to handle commands and flags. Let's create an enum with fields that are exactly the same as the strings we expect:

const CommandType = enum {
    add,
    remove
};

With this enum, we can get a few benefits:

We get rid of string comparisons and can switch on the enum value.
We can use meta-programming to dynamically cast the user input to the enum value.
We can use that same cast to determine if a command is unknown.
We can use a tagged union to represent the command and its flags.
We get exhaustive checks for free.

Let's refactor the whole main.zig file:

const std = @import("std");

const CommandType = enum {
    add,
    remove,
    list
};

const Command = union(CommandType) {
    add: struct {
        flags: [][]const u8,
        positional: []const u8,
    },
    remove: struct {
        flags: [][]const u8,
        positional: []const u8,
    },
    list: struct {
        flags: [][]const u8,
    },
};

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    const allocator = gpa.allocator();

    // argsWithAllocator has better platform support
    var args = try std.process.argsWithAllocator(allocator);
    defer args.deinit();

    // Skip the first arg (the program name)
    _ = args.next();

    while(args.next()) |arg| {
      // Meta-programming to cast the user input to the enum value
      // Use failure as the way to check for unknown commands
      const cmd_type = std.meta.stringToEnum(CommandType, arg) orelse {
          return std.debug.print("Unknown command: {s}\n", .{arg});
      }

      // We can now use the type to generate
      // the correct tagged union command
      switch(cmd_type) {
          .add => {},
          .remove => {},
          .list => {},
      }
    }
}

Above we do not handle the cases or flags yet, but it illustrates now that whenever we add a new command we simply have to update our enum. Then the compiler will tell us we're not handling it in the Tagged Union as well as not being exhaustive in our switch. Nice!

Handling flags

Now we can use the same logic as above to parse out flags, but we only have to do it once and pass it to our tagged union structs. We'll use an ArrayList temporarily to collect flags and then pass them as owned slices.

const std = @import("std");

const CommandType = enum {
    add,
    remove
};

const Command = union(CommandType) {
    add: struct {
        flags: [][]const u8,
        positional: []const u8,
    },
    remove: struct {
        flags: [][]const u8,
        positional: []const u8,
    },
    list: struct {
        flags: [][]const u8,
    },
};

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    const allocator = gpa.allocator();

    // argsWithAllocator has better platform support
    var args = try std.process.argsWithAllocator(allocator);
    defer args.deinit();

    // Skip the first arg (the program name)
    _ = args.next();

    var flags = std.ArrayList([]const u8).init(allocator);
    defer flags.deinit();

    while(args.next()) |arg| {
      // Meta-programming to cast the user input to the enum value
      // Use failure as the way to check for unknown commands
      const cmd_type = std.meta.stringToEnum(CommandType, arg) orelse {
          return std.debug.print("Unknown command: {s}\n", .{arg});
      }

      var first_none_optional: ?[]const u8 = null;

      while(args.next()) |flag| {
          if (flag[0] != '-') {
              first_none_optional = flag;
              break;
          }
          try flags.append(flag);
      }

      // We can now use the type to generate
      // the correct tagged union command
      switch(cmd_type) {
          .add => {
              _ = Command{
                  .add = .{
                      .flags = try flags.toOwnedSlice(),
                      .positional = first_none_optional,
                  }
              };
              return;
          },
          .remove => {
              _ = Command{
                  .remove = .{
                      .flags = try flags.toOwnedSlice(),
                      .positional = first_none_optional,
                  }
              };
              return;
          },
          .list => {
              _ = Command{
                  .list = .{
                      .flags = try flags.toOwnedSlice(),
                  }
              };
              return;
          },
      }
    }
}

We have introduced a new list command to illustrate the variance of commands. It does not accept a positional. With this update, we now only have to process any given flags once and pass it to the Command to be handled and checked on an individual basis. For example via a method on the tagged union:

const Command = union(CommandType) {
    add: struct {
        flags: [][]const u8,
        positional: []const u8,
    },
    remove: struct {
        flags: [][]const u8,
        positional: []const u8,
    },
    list: struct {
        flags: [][]const u8,
    },

    // Only `add` command can be silenced in this example
    pub fn shouldSilence(self: Command) bool {
        return switch (self) {
            .add => |add_cmd| {
                for (add_cmd.flags) |flag| {
                    if (std.mem.eql(u8, flag, "--silence")) {
                        return true;
                    }
                }
                return false;
            },
            .remove => false,
            .list => false,
        };
    }
};

If you want to be real strict, you can also check each commands flags and return a error.UnknownOption flag or whatever. This way you can ensure that the command is valid and the flags are valid for the command.

Conclusion

This is a simple way to parse CLI arguments in Zig without any libraries. It uses enums, tagged unions, and meta-programming to create a structured and type-safe way to handle commands and flags. It's not the most advanced way to do it and you should probably use a library for when your args become really complicated. This was a fun discovery for me though and taught me some new things about Zig.

Interested in working with me? Reach out here