Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metaprogramming and decorators #16

Open
LPeter1997 opened this issue Feb 8, 2022 · 0 comments
Open

Metaprogramming and decorators #16

LPeter1997 opened this issue Feb 8, 2022 · 0 comments
Labels
Language idea One single idea which may be a start of a design document
Milestone

Comments

@LPeter1997
Copy link
Member

LPeter1997 commented Feb 8, 2022

Introduction

One of the most fundamental features we should plan out is metaprogramming. Instead of incrementally introducing small features and then giving stronger tools that can implement those features, metaprogramming could be the sandbox of new features, that can later get their built-ins in the language. Notice, that this is the exact opposite of how languages usually progress.

In my opinion most languages will eventually need to support some kind of metaprogramming, or force their users to work around it. C# for example has Fody for IL weaving, and only introduced metaprogramming in the form of Source Generators in .NET 5 and tied it to Roslyn instead of making it a language feature.

My key point: Metaprogramming should be an early, first-class citizen that embraces the creation of new language features. It should not be an answer to feature creep, where designers tried to plug the endless holes in the language and then got tired of it.

Uses of metaprogramming

Generally, metaprogramming gives us the ability to give abstractions outside the classic functional boundaries. This usually manifests itself as some kind of code generation, where we can inspect syntactic/semantic parts of the program and inject/modify the source code.

Below I'd like to show a few real-world examples that could have been solved with metaprogramming instead of a language feature.

GetHashCode and Equals

Implementing GetHashCode and Equals on a C# class is tedious and repetitive. Surely the IDE can generate the implementation, but who will make sure they are up to date, when someone adds a field? In my opinion, the new feature records are a good example, where an earlier metaprogramming system could have completely eliminated the need for records.

Null-checks

In C# we often want to null-check reference-type parameters. It's a very repetitive process:

public static void Foo(T1 a1, T2 a2, ...)
{
    if (a1 is null) throw new ArgumentNullException(nameof(a1));
    if (a2 is null) throw new ArgumentNullException(nameof(a2));
    // ...
}

A decorator could simply inject the null-checks at the start of the method. It would only need to know the parameter names and that which parameters are reference types. Again, C# has decided to introduce a new operator (!!) for it. I believe this would be completely unnecessary with a proper metaprogramming solution.

Memoization

When we are doing a heavy computation and the computation itself is side-effect free, we can simply memoize the results in a dictionary like so:

private static readonly Dictionary<(T1, T2, T3), Foo> fooResults = new();
public static Foo CalculateFoo(T1 a, T2 b, T3 c)
{
    // If the value is already computed, return that
    if (fooResults.TryGetValue((a, b, c), out var result)) return result;
    // Do some heavy computation
    ...
    // Save the result for later
    fooResults.Add((a, b, c), result);
    return result;
}

Again, this is really repetitive and could be done with decorators. In fact, Python has a really beautiful solution for this. Currently with C# Source Generators we'd require partial types and an ugly API that would have to generate a proxy function with a different name passed in by the user as a string.

Others

There are a bunch of other uses for metaprogramming, like implementing Aspect Oriented Programming designs or a parser driven by a grammar written in an attribute, like rply.

Existing metaprogramming solutions

Here I'd like to summarize the different styles of metaprogramming existing out there. Knowing the existing solutions could help us settle with one for our language. The list is partially inspired by this Wikipedia article.

Macro systems

There are a various number of macro systems out there, ranging from primitive to very sophisticated. The gist of macros is that based on some pattern, the input syntax should be mapped to some output syntax - so it's very much like functions directly on syntax.

Text-based macros

Text-based macros are a very primitive concept, and can be found in old languages like C. C macros have no semantic knowledge of the language, all they do is paste text based on their definition. For example, you can define the mathematical max function with macros like so:

#define max(a, b) ((a) > (b) ? (a) : (b))

These macros are not hygienic, meaning that you can't even use them to reliably define a variable for some intermediate value.

Hygienic macros

Hygienic macros usually describe how they match their parameters on the syntax-tree level and map the output to another syntax tree description. The "hygienic" word puts the emphasis on only describing the shape, not worrying about things like name collisions. With hygienic macros you can safely introduce helper variables without the need for a complex mechanism to generate unique names.

For example, here is a macro for Rust that creates a Vec<T> (analogous to List<T> in .NET) from a sequence of values:

// macro_rules! <name> { ... } defines a new macro
macro_rules! vec {
    // This is a match arm, this means match 0 or more comma-separated expressions, binding the expression to the name 'x'
    // A macro could have multiple of these match arms, some use it to do macro-recursion for example
    ( $( $x:expr ),* ) => {
        // Open a block, as Rust blocks are expressions, and it will result in the constructed Vec<T>
        // This is part of the output!
        {
            // This is also part of the output, just creates a temporary vector
            let mut temp_vec = Vec::new();
            // This with the matching )* at the end expands the expressions bound to the name 'x'
            $(
                // For each expression $x we write a temp_vec.push(...) to the output
                temp_vec.push($x);
            )*
            // Writing the variable here means the block will evaluate to it
            temp_vec
        }
    };
}

// Usage
let my_vec = vec![1, 2, 3];

// Which expands into
let my_vec = {
    let mut temp_vec = Vec::new();
    temp_vec.push(1);
    temp_vec.push(2);
    temp_vec.push(3);
    temp_vec
};

Nim has decided to include macros in a more template-engine-y way, essentially allowing to paste in parts of the syntax tree, like string-interpolation. This is almost like Rust procedural-macros (see below), but with batteries included. An example:

# Repeats the passed in statement twice
macro twice(arg: untyped): untyped =
  result = quote do:
    `arg`
    `arg`

# Usage
twice echo "Hello world!"

# Expands into
echo "Hello world!"
echo "Hello world!"

These kinds of macros are a huge improvement over the text-based ones. Still, declarative Rust macros can get really ugly and they still don't solve many cases.

Procedural macros

Procedural macros essentially hand over the input for a function, and let that function spit out some other token stream as a substitution. Any computation can happen in between. They allow for a lot of flexibility, but they are usually cumbersome to develop. Rust supports them, but almost all procedural macros import the syn and quote crates to help them out. Procedural macros receive and hand back a token stream, so for syntax tree parsing users have to include syn, and for code templating they use quote.

There are a lot of variations in Rust, but I'll include just one derive-style macro, that helps the user implement a custom trait. The custom trait to implement:

trait TypeName {
    fn type_name(&self) -> String;
}

The derive-macro implementation:

#[proc_macro_derive(TypeName)]
pub fn derive(input: TokenStream) -> TokenStream {
    let DeriveInput { type_name, .. } = parse_macro_input!(input);
    let name_str = type_name.to_string();
    let output = quote! {
        impl TypeName for #type_name {
            fn type_name(&self) -> String { #name_str }
        }
    };
    output.into()
}

Usage:

#[derive(TypeName)]
struct Foo { ... }

I think that these are the most flexible solutions as far as macro systems go without semantic information. They can be a bit cumbersome to write, but we can aid them with proper utilities (for example shipping something like syn and quote with the feature). Rust managed to de-mistify things like how derive works under the hood with them, which meant that the compiler had less magic to do, and allowed users to add their own extensions.

Metaclasses

Metaclasses step into the territory of multi-level modeling. Metaclasses are to classes as classes to instances. Hence, the instances of metaclasses are classes. Sadly I haven't worked with them enough to justify writing an example about them, but this page shows a really neat example in Python. They seem like an interesting concept, but using something like this would imply that we'd want to heavily build on classes or some similar language feature.

Template metaprogramming

Template metaprogramming - or TMP for short - is essentially a higher level substitution mechanism in the compiler. One of the most notable languages with TMP is C++. C++ templates can be used to do a lot of fancy compile-time computations. The wiki page is full of great examples, so I won't include any here.

The Nim language also includes templates which are really close to their macros, operating on the AST. An example log template, that only prints while debugging:

# Definition
template log(msg: string) =
  if debug: stdout.writeLine(msg)

# Usage
log("Hello, world!")

The syntax is very similar to simple procedures.

Other solutions

  • The D programming language has a bunch of metaprogramming tools: templates, string mixins and template mixins
  • JAI macros are essentially functions that are guaranteed to be inlined in the AST. This means that all language features - like named arguments - automatically work on macros as well. The only extra feature they support is that they can access variables in their surrounding scope using a special notation.
  • Also JAI allows the user to write event hooks for the compiler. The handlers will receive the AST and allows to modify them.
  • LISP metaprogramming is beautiful in general, mainly because the language is homoiconic
  • C# Source Generators are very similar to what JAI does, but on source text level. In my opinion it's fairly inconvenient, as it litters user-code with partial and doesn't allow for modifying existing code, disabling the possibility for nice decorators. Working with strings is also very inconvenient, and the alternative of using Roslyn syntax-trees is hard to integrate with the semantic information we might work with in a Source Generator. To simply put it, it lacks tooling and fatally limits itself.

Decorators

The concept of decorators as metaprogramming elements arise from the Decorator Pattern being implemented with some metaprogramming feature. Very often it allows us to decorate some entity in our program in a very declarative way - usually in the form of annotations/attributes. Since decorators are - in many cases - a desirable way to do metaprogramming, I believe it's worth a section to talk about how different languages allow defining them.

Python

I believe one of the most beautiful ways to write decorators is coming from Python:

def uppercase_decorator(function):
    def wrapper():
        result = function()
        return result.upped()
    return wrapper

@uppercase_decorator
def say_hello():
    return 'Hello, World!'

print(say_hello()) # Prints HELLO, WORLD!

In Python decorators are nothing more than functions that are invoked with @ before the entity they wrap. They receive the wrapped entity as a parameter and return the transformed entity. This requires almost no new language concepts to be introduced and is fairly simple to understand.

Rust

Rust chose to shove all the metaprogramming - including decorators - into procedural macros. As mentioned before, this is really flexible, but really tedious to work with without the extra tools (syn and quote). Another problem is that there is no semantic information at the time the macros are invoked. Still, it's powerful enough to implement the most common trait derives and many-many other things.

Derive macros in Rust are essentially procedural macros that are invoked with the annotation/attribute syntax over entities and they append to the existing entity instead of modifying it. Procedural macros of course can modify their annotated entity, but then they are called attribute macros. This is a really insignificant convenience thing, derive macros simply can be written as attribute macros.

C#

Currently C# decorators are very limited in nature, because Source Generators do not allow for code modification, only addition. This means that while most derive-style features can be implemented, things like null-checks or memoization logic can't be simply injected. Since C# doesn't allow external definitions by default, this usually means that decorated elements have to be marked partial.

What we would like for the new language

Again, I'd like to emphasise that looking at how other languages have progressed, we should really get metaprogramming in early and as close to perfect as possible. We might be able to slow down feature-creep, or not have to implement certain features at all.

I think the prettiest decorator is from Python. The only extra language feature it requires is invoking with @, which would transform the entity after it. As much as I like it, I don't yet see a way to make this work without a lot of reflection and runtime overhead for a non-duck-typing language.

Rust and Nim seem to have powerful metaprogramming capabilities instead of source transformation and generation. They both rely on being able to quote and template source code, but Nim has the advantage of having a built-in for it. If we decide to go that way, we might also want to have it as a built-in feature instead of ending up with a module that just reimplements the language syntax and everyone includes anyway.

I'm not sure about if we want semantic information to be available to macros/decorators. They can certainly be handy, but the only such system I've seen so far is C# Source Generators.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Language idea One single idea which may be a start of a design document
Projects
None yet
Development

No branches or pull requests

2 participants