Version 0.3 of the Merg-E language specification : Attenuation , decomposition, and membranes

in HiveDevs8 days ago (edited)

image.png

This is the 8th post in this series on the 0.3 version of the Merg-E language spec. The previous posts are available here.

  • part 1 : coding style, files, merging, scoping, name resolution and synchronisation
  • part 2 : reverse markdown for documentation
  • part 3 : Actors and pools.
  • part 4 : Semantic locks, blockers, continuation points and hazardous blockers
  • part 5 : Semantic lexing, DAGs, prune / ent and alias.
  • part 6 : DAGs and DataFrames as only data structures, and inline lambdas for pure compute.
  • part 7 : Freezing
  • part 8 : Attenuation, decomposition, and membranes
  • part 10 : Scalars and High Fidelity JSON
  • part 11 : Operators, expressions and precedence.
  • part 12 : Robust integers and integer bitwidth generic programming
  • part 13 : The Merg-E ownership model, capture rules, and the --trustmebro compiler flag.

In this post we are going to have a first look at attenuation, decomposition and membranes. For the 0.3 version this is going to be limited and very preliminary because the ambient dag is still pretty much undefined so some of the examples here are more conceptual than reliable towards the future. In short, whatever is in lang is part of the v0.3 spec, but what is in ambient is still just rough ideas.

DAG membranes

The strongest form of attenuation is done by the runtime using the idea of a membrane. We already discussed the membrane pattern when discussing freezing.

myblocker += my_function( membrane<nonfreezable> mydag );

This gives my_function a non-freezable version of my-dag. The function can traverse the DAG, but no matter the sub graph it has or the leaf node, that also isn't freezable because the membrane extends to every sub graph and leaf node the function accesses through the membrane.

A similar membrane is available to attenuate a DAG with read-only access:

myblocker += my_function( membrane<nonwritable> mydag );

Where writable refers to mutable leaf nodes and invocable mutable functions, a nonwritable DAG can still be pruned, grafted(ent) or linked. There are three membranes that address these separately.

myblocker += my_function( membrane<nonprunable> mydag );
myblocker += my_function( membrane<nonentable> mydag );
myblocker += my_function( membrane<nonlinkable> mydag );

The nontopworkable membrane combines nonprunable, nonentable and nonlinkable in one.

myblocker += my_function( membrane<nontopworkable> mydag );

And if we want to combine all of the above so far, nonmutable is our friend:

myblocker += my_function( membrane<nonmutable> mydag );

We also define two different types of attenuation membranes. The copyonmutate membrane acts like there is no membrane at all, you can freeze, assign to scalars, prune, ent and link, but you are actually doing that on a shadow DAG. Everywhere the DAG is unchanged the original DAG will be used. It is like Copy On Write, but extended to all kinds of mutation.

mutable dag newdag = membrane<copyonmutate> olddag; 

And finally there is the versionable membrane. This language feature is defined for 0.3, and will likely get a very minimal implementation, but the complete spec and implementation is meant for a later version of the draft language spec. The idea is making the DAG into a lightweight in memory version control system for DAGs. Think git light for DAGs in memory.

mutable dag newdag = membrane<versionable> olddag; 

Argument filters

Argument filters are building blocks for function and actor attenuation. While you can use them on their own, idiomatic usage combines them in an attenuation expression.
Let's focus on the idiomatic way:

proxy<int64> limitedCountFunction = attenuate myfunction {
  count attenuation.int64.assert<range>[1 31] discard;
  };

There is a lot to unpack here. Let's start with the proxy type. It tells us the limitedCountFunction is to be a proxy function that takes an int64 as one and only argument. Now the attenuate expression. It takes an existing function, in this case myfunction as first argument, and builds an attenuation proxy function with one or more assertions/action pairs on function arguments.
In this case there is one triplet, but multiple are possible if the function has multiple function arguments. Every triplet consists of:

  • An argument name matching one of the argument names in the original callable definition, in our case count
  • An attenuation assertion, in our case attenuation.int64.assert[1 31]
  • An action defining what to do if the assertion fails, in this case discard indicating the invocation is silently discarded

We need to dive a little deeper into the attenuation.int64.assert[1 31] syntax. It too has three parts:

  • The type family of assertions tied to the argument type, in this case attenuation.int64.assert, indicates we are looking up the assertion type from in64 assertions.
  • The assertion type. The part selects the specific range assertion.
  • The assertion config values. The [1 31] value tells us that all int64 values from 1 up to and including 31 are valid.

Now when the limitedCountFunction proxy gets called with a value of 17, then myfunction will get invoked, but if it is called with a count value of 0 or of 99, the invocation will be silently discarded.

Currently only the following actions are defined:

  • discard : if assertion fails,
  • raise : raise an error if the assertion fails.

We hope to add a branch option to this in the 0.4 or 0.5 version of the language spec, but this isn't quite sure yet.

If the action is raise, then the expression should end in an error scope code block.

proxy<int64> limitedCountFunction = attenuate myfunction {
  count attenuation.int64.assert<range>[1 31] raise;
  }!!{
    ...
    };

Note that this block will "only" capture filter assertion errors.

To elaborate on why branch might or might not make it into a future language spec draft, the effect of a branch can almost be implemented by a raise as well:

proxy<int64> limitedCountFunction = attenuate myfunction {
  count attenuation.int64.assert<range>[1 31] raise;
  }!!{
    myfunction2(count);
    };

or

proxy<int64> limitedCountFunction = attenuate myfunction {
  count attenuation.int64.assert<range>[1 31] raise;
  }!!{
    blocker aw1;
    aw1 += myfunction2(count);
    await.all aw1;
    };

The fact that there are two ways I hope illustrates why a branch might or might not be needed.

Next to range, numeric scalars will have maxval and minval as one sided ranged.

String and bytes filters

Note that regexes are not yet part of the 0.3 language spec, it is expected this part of the language spec will change once regexes become part of the language.

The first string attenuation we have is the prefix. This requires a string argument to always start with a given prefix.

proxy<string> limitedFsFunction = attenuate myfunction {
  filepath attenuation.string.assert<prefix>["~/.myApp/"] discard;
  };

Instead of the prefix being actually needed, we can attenuate by actually always prefixing.

proxy<string> limitedFsFunction = attenuate myfunction {
  filepath attenuation.string.rewrite<prefixrelative>["~/.myApp/"];
  };

Note that there is no action defined because it is not an assertion.

In this example we used a file path, and string rewrites and attenuations don't know enough about file paths to make this safe. We can use a less naive alternative.

proxy<string> limitedFsFunction = attenuate myfunction {
  filepath attenuation.string.rewrite<fspathrelative>["~/.myApp"];
  };

This option prevents for example ".." from being used, protecting from break outs.

In a similar way as with prefixes we can do the same with suffixes.

proxy<string> limitedHostFunction = attenuate myfunction {
  hostname attenuation.string.assert<suffix>[".innuen.do"] discard;
  };

Again we can choose to rewrite instead of assert and work with relative values.

proxy<string> limitedHostFunction = attenuate myfunction {
  hostname attenuation.string.rewrite<suffixrelative>[".innuen.do"];
  };

And as we are naively working with DNS names without taking note of relevant RFCs we can instead opt for a less naive option:

proxy<string> limitedHostFunction = attenuate myfunction {
  hostname attenuation.string.assert<dnsnamerelative>["innuen.do"] discard;
  };

And the relative prefix option:

proxy<string> limitedHostFunction = attenuate myfunction {
  hostname attenuation.string.rewrite<dnsnamerelative>["innuen.do"];
  };

URI filters

Because Merg-E is to be a Web 3.0 DSL, URIs are a very important concept in the language, important enough to put in special features for attenuating string arguments if the string.

proxy<string> limitedUriFunction = attenuate myfunction {
  uri attenuation.string.assert<isuri> discard;
  uri.__scheme__ attenuation.string.assert<set>["https" "wss"] discard;
  uri.__host__ attenuation.string.assert<dnsnamerelative>[".innuen.do"] discard;
  uri.__userinfo__ attenuation.string.assert<set>["" "pibara"] discard;
  uri.__port__ attenuation.int16.assert<set>[443 8443] discard;
  uri.__path__ attenuation.string.assert<fspathrelative>["/api/v1/"] discard;
  uri.__query__.user attenuation.string.assert<set>["pibara" "test123"] discard;
  };

After asserting the string is a valid URI, a number of special URI parts become available for individual assertion. The above shows a few good examples of that. This way we can do surgical string based attenuation with function arguments.

Annotations for ambient attenuation

Not everything can always be done with string based attenuation. Often this is either too early, or it might require asynchronous operations that we want to keep out of string assertion logic.
But how then do we communicate further attenuation, for example for our URI example above?

proxy<string> limitedUriFunction = attenuate myfunction {
  uri attenuation.string.assert<isuri> discard;
  uri.__scheme__ attenuation.string.assert<set>["https" "wss"] discard;
  uri.__host__ attenuation.string.assert<dnsnamerelative>[".innuen.do"] discard;
  uri.__annotations__ annotation.required<hasdnssec>[True] discard;
  uri.__annotations__ annotation.required<x509authority>["WR3"] discard;
  };

We do that by adding annotations to our arguments. These annotations will be available to the ambient part of the runtime, for example after doing DNS lookups, or after connecting to a server with TLS.
In the above example we annotate the URI by saying the hostname supplied should resolve and check out with DNS-Sec, and the x509 certificate authority used to sign the server certificate should be signed by Google's WR3 CA.

Revocability and quotas.

The final least authority patterns we nees to look at are patterns for revocability and quotas. For this we have a special type, the caretaker. The caretaker in Merg-E is a type of tuple that on top of the clasical caretaker pattern for revocation can optionaly act as a quota.

caretaker revokeMyFunction = quotacaretaker myfunction;
proxy<string> revokableFunction = revocable revokeMyFunction;

or

caretaker revokeMyFunction = quotacaretaker myfunction 8;
proxy<string> revokableFunction = revocable revokeMyFunction;

Revocation is not freezing with the null function as we discussed in the post about freezing.

The caretaker itself is a callable, and if we call we will turn the revokable function into a null function like we did in the freezing post.

revokeMyFunction()

Or if we need, asynchonously:

blocker bl1;
bl1 += revokeMyFunction();
await.all bl1;

If we use the caretaker as a quota, it won't need us to revoke it.

the line

caretaker revokeMyFunction = quotacaretaker myfunction 8;

tells the caretaker pattern to revoke itself after 8 invocations of the revokable function.

Coming up

In this post we looked at the principles of attenuation and decomposition as defined for the 0.3 version of the Merg-E language specs. These language features help make Merg-E a more effective least authority language.

I'll need a few more posts to talk about parallelism models, and a few more. Possibly also about how to handle sensitive data in constants, a subject that I'm currently seeking community feedback on.

Sort:  

Bookmarked, I will read it later.