Internal DSLs in PowerShell

(Harry is on a secret mission in uncharted space this week, so instead of the daily Morning Coffee post, you get a series of autoposted essays. This post combines both some leftover learnings about Ruby from Harry’s Web 2.0 days with his recent obsession with PowerShell.)

My first introduction to the idea of internal DSLs was an article on Ruby Rake by Martin Fowler. Rake is Ruby’s make/build utility. Like most build tools like Ant and MSBuild, Rake is a dependency management system. Unlike Ant and MSBuild, Rake doesn’t use an XML based language. It uses Ruby itself, which has huge benefits when you start doing custom tasks. In Ant or MSBuild, building a custom task requires you to use a external environment (batch file, script file or custom compiled task object). In Rake, since it’s just a Ruby file, you can start writing imperative Ruby code in place.

Here’s the simple Rake sample from Fowler’s article:

task :codeGen do  
  # do the code generation
end

task :compile => :codeGen do  
  # do the compilation
end

task :dataLoad => :codeGen do  
  # load the test data
end

task :test => [:compile, :dataLoad] do  
  # run the tests
end

The task keyword takes three parameters: the task name, an array containing the task dependencies and a script block containing the code to execute to complete the task. Ruby’s flexible syntax allows you to specify task without any dependencies (:codegen), with a single dependency (:compile => :codegen), and with multiple dependencies (:test => [:compile,:dataLoad])

So what would this look like if you used Powershell instead of Ruby? How about this:

task codeGen {  
  # do the code generation
}
task compile codeGen {
  # do the compilation
}

task dataLoad codeGen {  
  # load the test data
}

task test compile,dataLoad {
  # run the tests
}

Not much different. PS uses brackets for script blocks while Ruby uses do / end, but that’s just syntax. Since it lacks Ruby’s concept of symbols (strings that start with a colon), PS has to use strings instead. Otherwise, it’s almost identical. They even both use the # symbol to represent a line comment.

There is one significant difference. For tasks with dependencies, Rake uses a hash table to package the task name and its dependencies. The => syntax in Ruby creates a hash table. Since the hash table has only a single value, you can leave of the surrounding parenthesis. The key of this single item hash table is the task name while the value is an array of task names this task depends on. Again, Ruby’s syntax is flexible, so if you have only a single dependency, you don’t need to surround it in square brackets.

In Powershell, the hash table syntax isn’t quite so flexible, you have to surround it with @( ). So using Rake’s syntax directly would result in something that looked like “task @(test = compile,dataLoad) {…}” which is fairly ugly. You don’t need to specify the square brackets on the array, but you having to add the @( is a non-starter, especially since you wouldn’t have them on a task with no dependencies.

So instead, I thought a better approach would be to use PS’s variable parameter support. Since all tasks have a name, the task function is defined simply as “function task ([string] $name)”. This basically says there’s a function called task with at least one parameter called $name. (All variables in PS start with a dollar sign.) Any parameters that are passed into the function that aren’t specified in the function signature are passed into the function in the $args variable.

This approach does mean having to write logic in the function to validate the $args parameters. Originally, I specified all the parameters, so that it looked like this: “function global:task([string] $name, [string[]] $depends, [scriptblock] $taskDef)”. That didn’t work for tasks with no dependencies, since it tried to pass the script block in as the $depends parameter.

Here’s a sample task function that implements the task function shown above. It validates the $args input and builds a custom object that represents the task. (Note, the various PS* objects are in the System.Management.Automation namespace. I omitted the namespaces to make the code readable.)

function task([string] $name) {
  if (($args.length -gt 2) -or ([string]::isnullorempty($name))) {
    throw "task syntax: task name [<dependencies>] [<scriptblock>]"
  }
  if ($args[0] -is [scriptblock]) {
    $taskDef = $args[0]
  }
  elseif ($args[1] -is [scriptblock]) {
    $depends = [object[]]$args[0]
    $taskDef = $args[1]
  }
  else {
    $depends = [object[]]$args[0]
    #if a script block isn't passed in, use an empty one
    $taskDef = {}
  }

  $task = new-object PSObject
  $nameProp = new-object PSNoteProperty Name,$name
  $task.psobject.members.add($nameProp)
  $dependsProp = new-object PSNoteProperty Dependencies,$depends
  $task.psobject.members.add($dependsProp)
  $taskMethod = new-object PSScriptMethod ExecuteTask,$taskDef
  $task.psobject.members.add($taskMethod)
  $task
}

Of course, you would need much more than this if you were going to build a real build system like Rake in PowerShell. For example, you’d need code to collect the tasks, order them in the correct dependency order, execute them, etc. Furthermore, Rake supports other types of operations, like file tasks and utilities that you’d need to build.

However, the point of this post isn’t to rebuild Rake in PS, but to show how PS rivals Ruby as a language for building internal DSLs. On that front, I think PowerShell performs beautifully.

I’m looking forward to using PowerShell’s metaprogramming capabilities often in the future.