Just-in-time, or JIT compilation, is no new development at all; languages like Java, Python, and the countless languages running on the .NET framework have leveraged JIT compilation for many years now.
It can be said that JIT compilation provides the best of the compiled and interpreted worlds: fast startup and high-level features, but execution at native speeds (after all, native code is what JIT’s produce).
If you are interested in creating your own language runtime, and are also planning on working in just-in-time features, there are two major ways to go about it. Either learn the architecture of the processor you are targeting, and emit opcodes through your own code, OR use a library like LLVM or GNU LibJIT as a portable backend.
My personal favorite is LibJIT. LLVM is also quite good, but it is more aimed at AOT-compilers, and is not as fast as LibJIT when it comes to being a "just-in-time compiler," likely because of all of its optimizations, SSA, etc.
Last night’s project was an attempt to give RinGy (really just a random programming language I found on the Internet) a fast implementation, especially given that its only implementation lies behind a 404, and has for a long time.
Simple interpreters for a single-byte instruction set are child’s play. On the other hand, compiling to C and then shelling out to the system’s compiler is far too slow, and effectively requires a separate build step. JIT it was.
The actual implementation was more complex than I had imagined, and spans a few hundred lines, so I won’t paste the whole thing here.
The JIT compilation logic is all contained in a C++ class named ringy::Compiler
.
In typical RAII style, I create a jit_context_t
in the constructor, and destroy it in the destructor (duh).
The interesting bit of the constructor is the jit_type_create_signature(...)
call. To create a function in LibJIT, as one can imagine, you must pass its signature. It’s possibly to mark the function as "on-demand," allowing true, lazy, "just-in-time" compilation, but I omitted this, as RinGy has no functions:
// Copyright (c) 2018, Tobechukwu Osakwe.//// All rights reserved.//// Use of this source code is governed by an// MIT-style license that can be found in the LICENSE file.#include <jit/jit-dump.h>#include <deque>#include "Compiler.h"ringy::Compiler::Compiler() { jitContext = jit_context_create(); // The entry point takes no parameters, and returns void. auto signature = jit_type_create_signature(jit_abi_cdecl, jit_type_void, nullptr, 0, 0); function = jit_function_create(jitContext, signature);}ringy::Compiler::~Compiler() { jit_context_destroy(jitContext);}
The next task was to implement each instruction as a series of LibJIT calls. Most were simple, like incrementing the memory pointer, increasing the value at the pointer, or printing the current value.
LibJIT is, honestly, a very convenient library to work with. There is an abundance of top-level functions that one can use to implement any expression or block imaginable. In this case, I made very frequent use of jit_insn_load_relative
, in order to obtain the value at the current memory pointer.
One thing to note is the call to jit_insn_call_native
; it could not be any simpler. When building a JIT compiler, it is extremely unlikely that the language will be freestanding. That is to say, there is almost certainly some standard library functionality you will need to invoke. LibJIT makes FFI quite easy, and only requires a single call:
if (ch == '.') { // Print the character representing the value at the current memory cell. // Get the signature for putchar. auto putcharReturnType = jit_type_int; auto putcharSignature = jit_type_create_signature(jit_abi_cdecl, jit_type_int, &putcharReturnType, 1, 0); // Fetch the character. auto currentElement = jit_insn_load_relative(function, memoryPointer, 0, jit_type_sys_char); jit_insn_call_native(function, "putchar", (void *) &putchar, putcharSignature, ¤tElement, 1, JIT_CALL_NOTHROW);} else if (ch == ',') { // Print the NUMERICAL character representing the value at the current memory cell. // Get the signature for putchar. auto putcharReturnType = jit_type_int; auto putcharSignature = jit_type_create_signature(jit_abi_cdecl, jit_type_int, &putcharReturnType, 1, 0); // Fetch the character. auto currentElement = jit_insn_load_relative(function, memoryPointer, 0, jit_type_sys_char); auto zeroChar = jit_value_create_nint_constant(function, jit_type_sys_char, '0'); auto addedChars = jit_insn_add(function, currentElement, zeroChar); jit_insn_call_native(function, "putchar", (void *) &putchar, putcharSignature, &addedChars, 1, JIT_CALL_NOTHROW);} else if (ch == 'q') { // Quits the program. Just return. jit_insn_return(function, nullptr);}
The most complicated part of the entire compiler was the implementation of the _
command, which inserts a 0
at the current pointer, and then moves every subsequent value to the right by 1. Mentally mapping this took a few minutes, but ultimately, it boiled down to:
- Compute the distance left until the memory pointer reaches the end of declared memory; this is the number of values to move.
- Create a local/temporary variable. I named it
i
, for the sake of convention. - Loop until
i
is equal to the earlier-calculated distance. On each iteration, movedata[i]
todata[i + 1]
.
LibJIT’s IR (intermediate representation), like Assembly, does not contain a native looping construct.
Those familiar with Assembly programming will already know the process necessary to produce a while
loop:
- Store a number that represents the total number of iterations somewhere (in x86, this is often the
ecx
register). - Create a label that performs the desired action, and then decrements the value of the counter.
- Compare the current value of the counter to
0
. If it’s not equal, jump back to the label that performs work. - If it IS equal to 0, however, then return from the current procedure.
This can be implemented as follows in LibJIT:
if (ch == '_') { // Inserts a memory set to 0 at the location of MP, shifting all values after it to the right by 1. // Insert the character 0. auto zeroChar = jit_value_create_nint_constant(function, jit_type_sys_char, 0); jit_insn_store_relative(function, memoryPointer, 0, zeroChar); // The end of the buffer is the original pointer, PLUS the size. auto end = jit_insn_add(function, memoryBuffer, memorySize); // The distance from the current pointer is the number of times we will need to loop. auto diff = jit_insn_sub(function, end, memoryPointer); // Maintain an index pointer, i. Initialize to 0. auto i = jit_value_create(function, jit_type_void_ptr); auto zeroPtr = jit_value_create_nint_constant(function, jit_type_void_ptr, 0); jit_insn_store(function, i, zeroPtr); // While `i` < `diff`, move the value at buf[i] to buf[i + 1]. jit_label_t loopProc = jit_label_undefined; jit_insn_label(function, &loopProc); auto currentValue = jit_insn_load_elem(function, memoryBuffer, i, jit_type_sys_char); auto one = jit_value_create_nint_constant(function, jit_type_void_ptr, 1); auto iPlusOne = jit_insn_add(function, i, one); jit_insn_store_elem(function, memoryBuffer, iPlusOne, currentValue); // Continue looping, ONLY IF i < diff. auto continueLooping = jit_insn_lt(function, diff, i); jit_insn_branch_if(function, continueLooping, &loopProc); // Otherwise, increment i. jit_insn_store(function, i, iPlusOne);}
The end result was pretty satisfactory; the Hello, world!
program runs in under a second (as it should).
However, I’m certain there is a logical error in my implementation somewhere, or I misunderstood the RinGy specification, as this is the output of the 99 Bottles
program:
ottledeoffbeerronntheewall,,ottledeoffbeer...ittaround,,ottledeoffbeerronntheewall...
I’m sure I’ll get the next one right.
The complete implementation can be found on Github (follow me!): https://github.com/thosakwe/RinGy