Building a rockstar language transpiler

I'm a Rockstar Developer!—Building a Rockstar Language Transpiler

Polcode Team
12 minutes read

It's been a long-running joke in the developer community that a lot of companies want to hire "rockstar developers." But what does it even mean? Do I need to be able to play an instrument, or, worse yet, be able to sing?

Lately, someone had a really silly idea to actually create a Rockstar language, so that question finally has an answer.

On that Github repository, we have a pretty detailed specification of the language. There are also a bunch of transpilers linked at the bottom of the readme. This article is a write-up of the few lessons I’ve learned by building a Rockstar language transpiler.

In the first part, we’ll start with the basics: making the gem skeleton, filling it with some base code, and publishing it on RubyGems. The following parts will deal with more problems I’ve encountered by parsing the language—the tool I’m using, Parslet, is unfortunately a bit lacking in both examples and documentation, so I hope it might be useful. All the example code is also available on Github.

Generating a Gem Skeleton

First things first: we should have some structure for our project. The easiest way of starting a new gem is to use Bundler to generate a basic skeleton of everything a gem requires to work. All you need to do is to run a simple command:

$ bundle gem kaiser-tutorial

This may ask you a few questions about licenses and so on, and will generate a project folder with all the required files. The next step is to open your .gemspec file and fill in all the missing information there, as Bundler won’t let you publish the gem if there are still TODO’s in it. This information is later used on the RubyGems website, so best to put something useful there.

Also, the .gemspec is the place where you should put all the dependencies.

spec.add_development_dependency "pry"spec.add_dependency "parslet"

Take care that you keep your development tools like pry or rspec always as development_dependencies so that later the gem doesn’t pollute the user’s Gemfile and Gemfile.lock with unnecessary things—and what’s even more important, with necessary things but in severely outdated versions, which has happened to me a few times.

We also might have to mess a bit with the skeleton if we entered the gem’s name with a –, because it will make a bit of a mess there. You do need to keep the main structure intact – a kaiser_ruby.rb file in the lib directory, and a version.rb in the kaiser_ruby subdirectory. We could keep the structure, but this will have one less level and will be a bit more readable.

Don’t forget to update the require paths in the .gemspec, spec/spec_helper.rb and in the bin/console files.

Finally, running bundle inside the project’s folder will install all the dependencies.

Starting with Some Tests, Like a Proper Developer Should

Now that we have a skeleton, it’s time to flesh it out. We’ll use Parslet to help us write all the rules for parsing the Rockstar language files into actual Ruby code. To make sure we follow the specification, we’ll start with writing some tests and then develop the parser along so that they pass. We can be rock stars, but even rock stars have to follow the rules once in a while.

Let’s start with simple variable assignment. The specification has an example:

Tommy was a lean mean wrecking machine

Which should assign 14487 to the variable Tommy. Since Parslet parses everything depth-first, that means it will first parse Tommy to a variable name, then a lean mean wrecking machine should be converted to a number before it all gets put together. So let’s write a couple of tests for that.

RSpec.describe KaiserTutorial do
  context 'proper variable name' do
    it 'converts a word to a variable name' do
      expect(KaiserTutorial.transpile("Tommy")).to eq "tommy"
    end
  end
 
  context 'poetic number assignment' do
    it "assigns a number to a variable" do
      expect(KaiserTutorial.transpile("Tommy was a lean mean wrecking machine")).to eq "tommy = 14487"
    end
  endend

This should be pretty self-explanatory to anyone familiar with RSpec testing—basically what we have here is an input string. We feed it into a KaiserTutorial.transpile method and expect the result to be a snippet of actual valid Ruby code.

These tests will obviously fail horribly if we try to run them right now, so it’s time to make them work.

Parsing the Rockstar Code with Parslet

The Parslet gem helps us transform text into a tree of names and values. Let’s start with the RockstarParser class first.

module KaiserTutorial
  class RockstarParser < Parslet::Parser
    rule(:proper_word) { match['A-Z'] >> match['A-Za-z'].repeat }
    rule(:proper_variable_name) { (proper_word >> (space >> proper_word).repeat).repeat(1).as(:variable_name) }
 
    rule(:string_as_number) { match['^\n'].repeat.as(:string_as_number) }
 
    rule(:poetic_number_keywords) { str('is') | str('was') | str('were') }
    rule(:poetic_number_literal) do
      (
        proper_variable_name.as(:left) >>
        space >> poetic_number_keywords >> space >>
        string_as_number.as(:right)
      ).as(:assignment)
    end
 
    rule(:space) { match[' \t'].repeat(1) }
    rule(:string_input) { poetic_number_literal | proper_variable_name }
    root(:string_input)
  endend

This might seem a bit cryptic at first glance, so let’s explain it a bit.

Parslet works with something it calls atoms—here we’re using things like match['A-Z'], which just matches a regular expression, str('is'), which matches a whole string and >>, which just means we go forward through the supplied input. And there are also | (alternative) and repeat atoms used.

This makes the rules pretty readable, unlike the pure regular expression form which we would need to use otherwise.

All this is put together in the rule definitions which are then used by the Parslet::Parser class. You also can nest the rules so that the resulting file is more readable. Finally, the root definition points at the starting point.

Let’s go through the above code line by line.

First, we will handle our proper Rockstar variable name—that’s one or more capitalized words. First rule matches a single capitalized word and the second matches several capitalized words separated by spaces.

rule(:proper_word) { match['A-Z'] >> match['A-Za-z'].repeat }
 rule(:proper_variable_name) { (proper_word >> (space >> proper_word).repeat).repeat(1).as(:variable_name) }

The next rule is a poetic representation of a number. It’s called “poetic” because that’s just how the Rockstar language calls it. Here in the Parser class, it’s just everything up to the end of the line, stopping at the newline character.

rule(:string_as_number) { match['^\n'].repeat.as(:string_as_number) }

The next set of rules deals with the actual assignment.

First we declare keywords our assignment can have: in Rockstar, Tommy was a rebel, Tommy is a rebel, and Tommy were a rebel are all the same expression that assigns the number 15 to the variable Tommy. The poetic_number_literal is then declared in the next rule—as expected, it has a variable name, a keyword and then some value.

rule(:poetic_number_keywords) { str('is') | str('was') | str('were') }
 rule(:poetic_number_literal) do
  (
    proper_variable_name.as(:left) >>
    space >> poetic_number_keywords >> space >>
    string_as_number.as(:right)
  ).as(:assignment)
 end

The last three rules are just helpers to make everything easier.

We declare a space—or a bunch of spaces really, as the .repeat(1) atom makes it require at least one matched element—to simplify the rest of the rules.

rule(:space) { match[' \t'].repeat(1) }

Then we make both of our tests pass. We just list the whole assignment and just the variable name here, so that we can use them in the last line and let the Parslet parser know what it should search for in the input.

rule(:string_input) { poetic_number_literal | proper_variable_name }
 root(:string_input)

Running the parser on it’s own with the Tommy was a lean mean wrecking machine as the input will result in returning a following hash that matches what we wrote in the :poetic_number_literal rule. The numbers after the strings – @0 and @10 – are not really important to us right now, as they just point at where in the input the string was matched.

{
  assignment: {
    left: { variable_name: "Tommy"@0 },
    right: { string_as_number: "a lean mean wrecking machine"@10 }
  }
 }

That’s very cool, but also not exactly useful, yet.

Transforming a Tree into Code

To get some actually usable code from the above structure, we need to use another part of Parslet—the Transform class.

module KaiserTutorial
  class RockstarTransform < Parslet::Transform
    rule(variable_name: simple(:str)) { |context| parameterize(context[:str]) }
    rule(string_as_number: simple(:str)) { |context| str_to_num(context[:str]) }
    rule(assignment: { left: simple(:left), right: simple(:right) }) { "#{left} = #{right}" }
 
    def self.parameterize(string)
      string.to_s.downcase.gsub(/\s+/, '_')
    end
 
    def self.str_to_num(string)
      string.to_s.split(/\s+/).map { |e| e.length % 10 }.join.to_i
    endr
  endend

Since this class doesn’t parse the input and instead deals with the tree hashes from the Parser class, it’s even simpler and easier to read.

First, we take a variable name (that can be more than one word) and snake_case it. That simple(:str) is just Parslet’s way to express that this is a singular thing to be parsed and it should not expect anything else there. We have to use the |context| to be able to access other methods.

rule(variable_name: simple(:str)) { |context| parameterize(context[:str]) }
 
 def self.parameterize(string)
  string.to_s.downcase.gsub(/\s+/, '_')

end

Next we make an actual number out of our set of words by taking the lengths of the words, doing a modulo 10 operation on them and joining them together. This means that metal will have a value of 5, while rock and roll will equal 434 and so on.

rule(string_as_number: simple(:str)) { |context| str_to_num(context[:str]) }
 
 def self.str_to_num(string)
  string.to_s.split(/\s+/).map { |e| e.length % 10 }.join.to_i

end

Then we take an :assignment hash that has one variable name and one value, and put it together into a valid Ruby string.

rule(assignment: { left: simple(:left), right: simple(:right) }) { "#{left} = #{right}" }

The last thing that I’m going to show here is the KaiserTutorial module in itself, that calls all of our methods. It’s pretty short as well.

require 'parslet'require 'kaiser_tutorial/rockstar_parser'require 'kaiser_tutorial/rockstar_transform'
 
 module KaiserTutorial
  def self.parse(input)
    KaiserTutorial::RockstarParser.new.parse(input)
  rescue Parslet::ParseFailed => failure
    puts failure.parse_failure_cause.ascii_tree
  end
 
  def self.transform(tree)
    KaiserTutorial::RockstarTransform.new.apply(tree)
  end
 
  def self.transpile(input)
    transform(parse(input))
  endend

This should not require much explanation, so I’m just adding it here for completeness’ sake. And with this, all our tests should finally pass.

Running the Code

Obviously you shouldn’t just believe me that this will work, so we should test it. We can run the rspec command and see all the tests passing, but there’s also another way—running the console.

I like to update it first so that it runs Pry instead of the basic IRB: Pry is much more useful with code completion and command history. The bin/console file actually should include instructions on how to do this, but let’s show what goes into that file anyway:

#!/usr/bin/env ruby
 
 require "bundler/setup"
 require "kaiser_tutorial"
 require "pry"
 
 Pry.start

We can then try out our code easily like this:

$ bin/console
 [1] pry(main)> KaiserTutorial.transpile 'Mary is a great developer'
 => "mary = 159"
 [2] pry(main)>

As you can see, it works as expected.

Extending the Parser and Transformer

Since we have our first element of the Rockstar language working, we should continue by adding something a bit more advanced. First, let’s add a print statement, which in Rockstar looks like this—Shout Tommy. Once again, we’ll start with writing a test for this case:

context 'print statement' do
  it "prints a variable" do
    expect(KaiserTutorial.transpile("Shout Tommy")).to eq "puts tommy"
  endend

Running this test fails, but in a bit of an unexpected way:

Failures:
 
  1) KaiserTutorial print statement prints a variable
     Failure/Error: expect(KaiserTutorial.transpile("Shout Tommy")).to eq "puts tommy"
 
       expected: "puts tommy"
            got: "shout_tommy"

As you can see, the parser thought we were declaring another variable name, which isn’t really what we want to do here, but it doesn’t know that yet. We should add a separate rule for this, and we have to add it to our :string_input rule so that the parser actually uses it.

rule(:print_function) do
  (str('Shout') >> space >> proper_variable_name.as(:output)).as(:print)
 end
 
 rule(:string_input) { print_function | poetic_number_literal | proper_variable_name }

One important thing to notice that can later introduce all sorts of problems and which can be very hard to find and debug is that the ordering of the rules really matters. We have to test for 'Shout ' before we start trying to make a variable out of it, or our parser won’t ever do its thing. Right now we can run the test again and see that it does:

1) KaiserTutorial print statement prints a variable
   Failure/Error: expect(KaiserTutorial.transpile("Shout Tommy")).to eq "puts tommy"
 
     expected: "puts tommy"
          got: {:print=>{:output=>"tommy"}}

We got our result from the parser, so we can write another transformer rule to make a Ruby statement out of it. Since our parser returns a :print hash that has only :output in it, the rule is really simple as well:

rule(print: { output: simple(:output) }) { "puts #{output}" }

And this is all we need to make our tests pass.

Time for Something Less Boring

Let’s finish the first part of the tutorial on making something a bit more advanced, so it’s less boring. So far all we’ve been doing is parsing single strings, but programs usually have more than one of these. We proceed as always.

First, we write a test:

context 'transpiles multiple lines' do
  let(:input) do <<~END
      Jane is a dancer
      World was spinning
    END
  end
 
  it 'makes multiple lines properly' do
    expect(KaiserTutorial.transpile(input)).to eq <<~RESULT
      jane = 16
      world = 8
    RESULT
  endend

The squiggly heredocs (<<~END with the tilde inside—it’s how they’re officially called and they’ve been available from Ruby 2.3 onward) make it so that the following text up to the ending keyword is stripped of the whitespace, so we can indent it nicely and still treat it as if it didn’t have all these additional spaces inside.

This test will also obviously fail, but let’s look at the error output from Parslet for a moment, as it tells us what the parser tried to do and why it failed.

Expected one of [PRINT_FUNCTION, POETIC_NUMBER_LITERAL, PROPER_VARIABLE_NAME] at line 1 char 1.
 |- Failed to match sequence ('Shout' SPACE output:PROPER_VARIABLE_NAME) at line 1 char 1.
 | `- Expected "Shout", but got "Jane " at line 1 char 1.
 |- Failed to match sequence (left:PROPER_VARIABLE_NAME SPACE POETIC_NUMBER_KEYWORDS SPACE right:STRING_AS_NUMBER) at line 1 char 9.
 | `- Extra input after last repetition at line 1 char 17.
 |    `- Failed to match [^\\n] at line 1 char 17.
 `- Don't know what to do with " is a danc" at line 1 char 5.
    makes multiple lines properly (FAILED - 1)

First it tried to match a print function, but it didn’t have ‘Shout’ at the beginning, so it skipped to the next rule, which is assignment. The Line 1 char 17 mentioned in the error message is the new line character, which we explicitly disallowed in the :string_as_number rule, so this fails as well.

Next, the parser tries to make a proper variable name out of what it got, but it would have to be “Jane Is,” not “Jane is,” so that’s not going to work either. The parser exhausted its rules, so it throws an error.

What do we need to make it work? Obviously, we need to handle lines that end with a newline character. We do that by extending our parser with a few additional rules:

rule(:eol) { match["\n"] }
 rule(:line) { (string_input >> eol.maybe).as(:line) }
 rule(:lyrics) { line.repeat.as(:lyrics) }
 root(:lyrics)

We declare a newline character in rule :eol to make the next rule look better. Then we take our :string_input rule and make it a :line that might or might not end with a newline character. Finally we declare :lyrics as a set of repeated lines and put that as our root rule (obviously we have to delete our previous root rule as well, we should only ever have one of these). If we run our new test again, we can see that the parser works correctly:

1) KaiserTutorial transpiles multiple lines makes multiple lines properly
   Failure/Error:
           expect(KaiserTutorial.transpile(input)).to eq <<~RESULT
             jane = 16
             world = 8
           RESULT
 
     expected: "jane = 16\nworld = 8\n"
          got: {:lyrics=>[{:line=>"jane = 16"}, {:line=>"world = 8"}]}

We can see here that the transformer already transformed the contents of the lines into Ruby, but it doesn’t yet know what to do with the :lyrics or the :line itself. Fortunately, that’s easy to fix by extending the RockstarTransform class a little:

rule(line: simple(:line)) { line }
 rule(lyrics: sequence(:lines)) { lines.size > 1 ? lines.join("\n") + "\n" : lines.join }

The :line contents is already transformed, so there’s nothing else to do there other than simply output it; and the :lyrics rule is just joining the lines together with newlines and adding an additional newline at the end if we got more than one line—this is so that we don’t have to go back and fix all of our tests by adding an extra newline to their results.

With this we’ve successfully made our very basic implementation of the Rockstar language. But we still got things to do.

A Common Variable and Its Assignment

Another type of variable in Rockstar is a common variable. It consists of one of the words a or the (and some others too, but we’re only going to use these two now), followed by a lowercase word. Sounds simple? It’s because it is. At the same time we’re going to introduce another type of assignment to make it more interesting.

Put the love into the heart

This should be parsed into the_heart = the_love. Oh, and by the way—we just simply described what we need our code to do and we almost got another test ready. Now we need to wrap it up with some RSpec. Isn’t TDD great?

context 'common variable name' do
  it 'converts words to a variable name' do
    expect(KaiserTutorial.transpile("the world")).to eq "the_world"
  endend
 
 context 'assignment' do
  it 'assigns variables' do
    expect(KaiserTutorial.transpile("Put the love into the heart")).to eq "the_heart = the_love"
  endend

First off, let’s make some variables—we need to match one of our keywords with a space and then just a word with lowercase letters.

rule(:common_variable_name) do
  ((str('A ') | str('a ') | str('The ') | str('the ')) >> match['[[:lower:]]'].repeat).as(:variable_name)
 end

We return this to the Transformer as a :variable_name, so it uses the same rule as it does for the proper variables we introduced before.

Next, we make a rule for the assignment—we can also reuse the Transform :assignment rule here, so it expects a :left side and a :right side, which we have to mark appropriately, as it’s reversed in the Rockstar assignment.

rule(:basic_assignment_expression) do
  (str('Put ') >> variable_names.as(:right) >> str(' into ') >> variable_names.as(:left)).as(:assignment)
 end

Finally, we need to update our helper methods for the parser to include our new rules.

rule(:string_input) { print_function | basic_assignment_expression | poetic_number_literal | proper_variable_name | common_variable_name }

This final rule makes our tests pass again.

Publishing the Gem on RubyGems

We are now at a decent point of coding: we have a gem skeleton with our code in it, it does things that we expect it to do, and it even has passing tests—we can finish this post on how to actually release our code into the wild by putting it onto the RubyGems website.

Releasing a gem is a very simple process—all we need is a public git repository with our code and an account on the Rubygems website.

The next step is easy. Thanks to the Bundler gem, rake release will build the gem, push a release tag onto our repository and make a release on Github, and finally push the package to RubyGems website. It should result in our gem being registered and visible on the website, except that it will throw an error instead.

rake aborted!
 ERROR: While executing gem ... (Gem::CommandLineError)
    Too many gem names (/Volumes/Projects/kaiser-tutorial/pkg/kaiser-tutorial-0.1.0.gem, Set, to, http://mygemserver.com); please specify only one

This error means that we left a few critical lines in our .gemspec on purpose that explicitly prevent pushing to RubyGems, so in a real gem example we would remove them. But we can still easily use our gem locally—we can run rake install:local instead and then just require 'kaiser_tutorial' where we want:

$ rake install:local
 kaiser-tutorial 0.1.0 built to pkg/kaiser-tutorial-0.1.0.gem.
 kaiser-tutorial (0.1.0) installed.
 
 $ pry
 [1] pry(main)> require 'kaiser_tutorial'
 => true
 [2] pry(main)> KaiserTutorial.transpile('Rockstar is a fun programming language')
 => "rockstar = 1318"
 [3] pry(main)>

And that’s it for the first part of this article. In the next one, we’ll take a look at some of the more advanced tricks and gotchas of parsing text with Parslet so that we can write and run a pretty simple program in Rockstar. We’ll also use the Thor gem to put together a CLI for our transpiler to make the code usable on its own. And of course, there will be tests for everything.

Polcode is an international full-cycle software house with over 1,300 completed projects. Propelled by passion and ambition, we’ve coded for over 800 businesses across the globe. If you share our passion and want to become a part of our team, contact our HR department. We’ll be happy to answer all your questions and even happier to welcome you aboard 🙂 Or maybe you have an interesting project in mind? If so, drop us an email and let’s talk over the details.

On-demand webinar: Moving Forward From Legacy Systems

We’ll walk you through how to think about an upgrade, refactor, or migration project to your codebase. By the end of this webinar, you’ll have a step-by-step plan to move away from the legacy system.

moving forward from legacy systems - webinar

Latest blog posts

Ready to talk about your project?

1.

Tell us more

Fill out a quick form describing your needs. You can always add details later on and we’ll reply within a day!

2.

Strategic Planning

We go through recommended tools, technologies and frameworks that best fit the challenges you face.

3.

Workshop Kickoff

Once we arrange the formalities, you can meet your Polcode team members and we’ll begin developing your next project.