release – Reowolf

Reowolf 1.2: Release Notes

The Reowolf Team — Fri, 17 Dec 2021 13:36:44 +0000

We are happy to release this milestone of the Reowolf project: Reowolf version 1.2. This is an alpha release. The milestone improves the concurrency of the Protocol Description Language (PDL) run-time interpreter. This post summarizes the improvements, and further lays out the milestones we will be working on next. This release is sponsored by the Next Generation Internet fund.

For this release, we have migrated to Gitlab as our public facing repository. Gitlab includes an issue tracker that is open for alpha users to submit bug reports and feature requests. The release tag is v1.2.0. The software is licensed under the MIT license.

The following aspects of the language have been improved, and in the sections below we demonstrate their functionality by small examples:

Decentralized synchronization implementation. In previous versions, the PDL run-time interpreter would communicate among all components to synchronize them, and in doing so required a central leader. We have improved this aspect of the implementation, and now dynamically discover the neighboring components with which synchronization takes place. Thus, no longer a single component is considered the central leader. This improvement allows for different components in the system to run at different speeds, and allows the composition of slow components (e.g. in embedded systems) with fast components (e.g. running on dedicated high-performance hardware).
Multi-threaded run-time implementation. In previous versions, all components ran on a single processor. We have adapted the run-time interpreter of PDL so that it distributes the execution of components over multiple threads, and can influence the scheduling of components using information provided in the protocol.
We have added the capability for components so that multiple communications can take place over a single port between synchronizations. In previous versions, either no or exactly one datum could be communicated over a port between synchronizations. This change makes it possible to let the implementations of data transmission and synchronization between components race.
We changed the syntax of PDL to more directly express the branches in speculative executions. Since speculative executions in some cases leads to expensive but discarded computations, it seems sensible to make it explicit in a PDL specification where this happens.
We have performed some initial experimentation with error handling in sync blocks. Components might encounter runtime errors, which might make them unavailable for future interactions using the ports they owned. The components might also be programmed in such a way that the synchronization algorithm cannot find a satisfactory global behaviour. In all of these cases we need to be able to tell the programmer what went wrong.

Furthermore, this release has fixed some minor bugs that were present in previous releases. The final section shows the roadmap ahead, explaining what milestones will be worked on next, and our plans for the future.

Decentralized synchronization

The PDL run-time interpeter of versions 1.0 and 1.1 made use of a centralized synchronization algorithm. In this release we have replaced that by a decentralized synchronization algorithm. This section gives more detail explaining what this means in the execution of protocol programs written in PDL.

The centralized synchronization algorithm assumed authority over all of the running components: all of the components were executed until reaching the point in their sync block where a decision had to be made about the speculative branch that could be committed to memory. This made it a lot simpler to provide an initial implementation, but has a downside in that unrelated components now have their execution speed linked to one another. The region over which synchronization had to be achieved (the so-called sync region) spanned the entire constellation of components.

The distributed synchronization algorithm instead performs the discovery of the synchronous regions based on the ports that are involved in a synchronous code block. So if two components communicate with one another synchronously, then they belong to a single synchronous region. Consequently, those components will have to wait on one another until consensus on their shared behavior has been achieved. Conversely, if those two components do not share any ports to communicate with, then they can run independently of one another.

Planned for the next release is a mechanism for more control on how synchronous regions are constructed. If we consider the following composite component, then we see that there are two workers which are performing some kind of asynchronous work. We may assume that each time they’ve finished some work, they will send the result over their out<u32> port.

composite network(out<u32> result) {
    channel output_a -> input_a;
    channel output_b -> input_b;
    new worker(output_a); // create a worker
    new worker(output_b); // create another worker
    new merger(input_a, input_b, result); // merge the streams produced by the workers
}
primitive merger(in<u32> a, in<u32> b, out<u32> c) {
    while (true) {
        sync {
            if (fires(a)) {
                auto m = get(a);
                put(c, m);
            } else if (fires(b)) {
                auto m = get(b);
                put(c, m);
}   }   }   }

It is perfectly reasonable to block the execution of the workers if there is nobody to receive their results on the other end of the merger. But if there is a component rapidly expecting results from the result port, then the two workers should be able to run independently to produce results on input_a and input_b as fast as possible. This is one of the goals of Reowolf 1.3.

Multi-threaded run-time

So far, the Reowolf runtime has been implemented to run on a single thread to simplify development. With this release we’ve moved to a multi-threaded runtime. In its essence its a green-thread scheduler. Working towards an implemention of the Reowolf runtime that operates within the context of the operating system kernel, the scheduler has been kept simple.

The scheduler makes sure that when components are solely waiting on input, then they will not be scheduled until they receive messages allowing them to continue executing. Likewise, when a component has finished its execution (cleanly, or due to an error) the peers are notified such that they are no longer able to send messages to components that cannot receive them.

As a simple example, if we wish to execute the following code:

primitive producer(out<u32> output) {
    sync put(output, 1);
    sync put(output, 2);
}

primitive consumer(in<u32> input) {
    sync {
        auto value = get(input);
        assert(value == 1);
    }

    // Exit without receiving the second value
}

Then we are greeted by the following error message:

ERROR: attempted to 'put' on port that is no longer owned
 +-  at 4:10
 | 
 |      sync put(output, 2);
 |           ~~~~~~~~~~~~~~

 +-  Stack trace:
 | component producer:4

Because we are still running in user-space, the scheduler is implemented such that it will exit if it is certain that all components have finished their work and components can no longer be created through the API.

Multiple port firings

The synchronization algorithm is tasked with finding an agreeable global behavior of all of the participating components in a sync region. Finding this solution is achieved by considering (among other things) the ports that each component has used in the interaction with its peers. In the previous implementation this algorithm imparted the requirement that ports could either fire, or not fire. As a result, component could only put on a particular port once per sync block.

The new decentralized algorithm seeks a solution in a different way: still transmitting data through ports, but now allowing a programmer to use a port multiple times. So in the following simple example we may safely expect the receiver to always execute its second behaviour.

primitive sender(out<u32> output) {
    sync {
        // sender will only allow exactly two messages to be sent
        put(output, 1);
        put(output, 2);
    }
}

primitive receiver(in<u32> input) {
    sync {
        auto num = 0;

        fork      num = 1; // first behaviour: receive once
        or fork   num = 2; // second behaviour: twice
        or        num = 3; // third time's a charm?

        // num is now 1, 2 or 3
        auto index = 0;
        while (index < num) {
            auto value = get(input);
            index += 1;
        }
    }
}

Explicit forking

As hinted at in the example above: firing a port multiple times no longer meshes well with the concept of the fires function. We have also come to realize that the fires predicate was redundant. As a programmer you first have to make sure that the assertion fires(port) == true holds, and then remember to actually use that port. Conversely, asserting that a port is silent must be followed by not using that port. Any other use of the port is a runtime error.

To better reflect the usage of ports, we have replaced the firing predicate with a more explicit fork statement. As an example, consider the following snippet that uses the fires method:

u32 hash = 0;
if (fires(request) && fires(response)) {
    hash = compute_hash(get(request));
    put(response, hash);
} else if (fires(response)) {
    hash = compute_hash(default_request);
    put(response, hash);
} else {
    assert(false); // do not allow any other combination
}

This would now be written as:

fork {
    hash = compute_hash(get(request));
    put(response, hash);
    // Used both 'request' and 'response'
} or {
    hash = compute_hash(default_request);
    put(response, hash);
    // Used only 'response'
} // No more behaviour specified

Roadmap

After this release we can continue our work in the following directions:

Allowing further control over the synchronous region in which the synchronization algorithm seeks consensus about the global behaviour of a set of composed components.
Modelling existing transport layer protocols, such as TCP and UDP, as Reowolf protocols. This allows us to convincingly demonstrate the expressiveness of the protocol description language, and to compare our implementation’s efficiency with existing networking stacks. These transport layer implementations would make use of native IP components. Further ahead, we can model existing Internet protocols such as ICMP, DNS, BGP.
We are interested in the SCION internet architecture, and are investigating whether the Reowolf connector API can be used for programming internet applications that run on top of SCION, and whether we can specify components in PDL that allow applications to make use of all capabilities SCION networks offer. Towards this, we are setting up a subnet that will be connected to the SCIONlab network. Our experiences will be published in a series of blog posts.
Make first approaches to integrating Reowolf into the operating system kernel. We are exploring which operating system is most suitable for integration, so that we can offer the Reowolf connector API to user-mode processes. Further, we are investigating the compilation of PDL component specifications into loadable kernel modules, thereby increasing the performance of applications that can instantiate pre-compiled components.
Work on the specification of the Protocol Description Language (PDL), leading to a standardization track. Part of this specification work is the need to formalize, in an unambiguous manner, the semantics of protocols specified in PDL. We have submitted an article that describes the formal semantics for centralized synchronization, but we still need to investigate how to adapt the semantics to deal with decentralized synchronization. Formalized semantics increases the future potential for formal verification of protocols, and allows us to define the correctness criteria of Reowolf implementations.

We will keep you updated!

The Reowolf Team
– December 17, 2021

Reowolf 1.1: Release Notes

The Reowolf Team — Fri, 04 Jun 2021 13:50:40 +0000

We are happy to release this milestone of the Reowolf project: Reowolf version 1.1. This is an alpha release. The milestone improves the structural aspects of Protocol Description Language (PDL), which increases the declarative aspects of protocol descriptions needed for modeling Internet protocols (e.g. TCP, UDP, ICMP, DNS). This post summarizes the improvements, and further lays out the milestones we will be working on next. This release would not be here without Max Henger, who joined the Reowolf project in November 2020, whose contributions have had a major impact on the feature completeness of this release. This release is sponsored by the Next Generation Internet fund.

The Git repository associated to this release can be checked out here. The release tag is v1.1.0. The software is licensed under the MIT license.

The following aspects of the language have been improved, and in the sections below we demonstrate their functionality by small examples:

Introduced algebraic data types (“enum”, “union”, “struct”) for user-defined structuring of data. For handling elements, we introduced “if let” statements for deconstructing “enum” and “union” types, and field dereferencing of “struct” types. The “if let” statement allows extensibility of the type definition of “union” and “enum” types. We also introduced constructor literals for constructing elements in data types, including arrays.
Introduced a type system, an algorithm for type checking, and a type inference system. Type checking ensures that the execution of protocols do not misinterpret data (i.e. avoiding “type confusion”), thus rules out a class of errors.
Introduced generic types in function and datatype definitions, by the use of type variables. Ad-hoc polymorphism is a structural feature of the language only, and is erased during the execution of protocols (in a process called monomorphization). Ad-hoc polymorphism is also available for the port types, allowing rich type information to describe the kind of messages that components can exchange over time.
Improved usability. The module system and namespaces are implemented. This is important for protocols that are authored by multiple independent developers, to avoid namespace conflict. Further, error messages are improved, that increases the usability of the system and makes life easier for developers that use PDL.

The final section shows the roadmap ahead, explaining what milestones will be worked on in the future.

Algebraic Data Types

We have introduced a system for user-defined algebraic data types, in addition to the primitive types (for signed, unsigned integer handling, and arrays). User-defined types can be declared to aid the protocol programmer in structuring data. There are three kinds of user-defined data types:

enumeration types (enum)
tagged union types (union)
product types (struct)

Enumeration and tagged union types serve a similar purpose: to discriminate different cases. Tagged unions further allow for data to be stored per variant, whereas enumeration types do not store data. Enumerations are used for named constants.

For example, consider an enumeration of DNS record types (we list only here a few variants):

enum DNSRecordType { A, NS, CNAME, SOA, MX, TXT }

Then one can access each constant as DNSRecordType::A, DNSRecordType::NS, et cetera.

A classic examples of disjoint union types are to implement so-called option types. For example, in places where some variant of the enumeration above is expected or no value can be given, one may use the following data type:

union OptionalDNSRecordType { None, Some(DNSRecordType) }

To construct an element of a tagged union type, one uses the constructor for each variant. In the case of no-argument variants, this is similar in use as constants in enumerations. For variants that accept parameters, these have to be supplied to the variant constructor. For example, OptionalDNSRecordType::Some(DNSRecordType::A) is an element of OptionalDNSRecordType.

To be able to test what variant is used, we introduce the “if let” statement. This statement tests the variant of a union type and at the same time performs a pattern match to extract data from the matched variant.

auto x = OptionalDNSRecordType::Some(DNSRecordType::A);

if (let OptionalDNSRecordType::Some(y) = x) {
    // y is bound to DNSRecordType::A here
    if (let y = DNSRecordType::A) {
        assert true;
    } else {
        assert false;
    }
}

Product types can be used for modeling structured data, such as packet headers. Each field has an associated type, thus constraining what elements can be stored in the type. Product types are also known as records. For example, here is a structure modeling the UDP packet header:

struct UDPHeader {
    u16 source_port,
    u16 dest_port,
    u16 length,
    u16 checksum
}

Elements of product types can be constructed by a constructor literal, that takes an element for each of the fields that are part of the product. For example, UDPHeader{ source_port: 82, dest_port: 1854, length: 0, checksum: 0 } constructs an element of the type defined above.

Further, user-defined types may be recursive, and thus allow modeling interesting structures such as binary trees. Consequently, one can define recursive functions that traverse such structures. See for example:

union Tree {
    Leaf(u8),
    Node(Tree,u8,Tree)
}
func mirror(Tree t) -> Tree {
    if (let Tree::Leaf(u) = t) {
        return t;
    }
    if (let Tree::Node(l, n, r) = t) {
        return Tree::Node(mirror(r), n, mirror(l));
    }
    return Leaf(0); // not reachable
}

Type Checking and Inference

The type checker ensures that protocols written in PDL are type safe: it is ensured statically that no element of one type is assigned to another type. It is still possible to transform elements from one type to another, either by the means of casting or by calling a function.

The type checker in Reowolf futhermore elaborates protocols in which not sufficient typing information is supplied. This is called type inference. This reduces the need for protocol programmers to supply typing information, if such information can be deduced automatically from the surrounding context.

The type checker and inference system works in tandem with user-defined algebraic data types. Also, in pattern matching constructs such as the “if let” statement, the types of the variables occurring in patterns are automatically inferred.

Consider the following function, that computes the size of a binary tree. It declares an automatic variable (s) that contains the result of the function. Further, it automatically infers the type of the pattern variables (l, n, r) that follows from the definition of the Node variant in the Tree data type.

func size(Tree t) -> u32 {
    auto s = 0;
    if (let Tree::Node(l, n, r) = t) {
        s += 1;
        s += size(l) + size(r);
    } else {
        s = 1;
    }
    return s;
}

We shall now consider a number of negative examples. One assigns two different variables (val16 and val32), and then leaves unspecified the type of the third variable (a) by the use of the auto keyword.

func error1() -> u32 {
    u16 val16 = 123;
    u32 val32 = 456;
    auto a = val16;
    a = val32;
    return 0;
}

In this case, an error is triggered, because there exists no type to which both 16-bit unsigned integers and 32-bit unsigned integers can be assigned. The same kind of error occurs whenever one performs an operation on two different types. Reowolf has no automatic implicit casting. This type strictness is added to ensure that code is never ambiguous. However, casting operators are used to explicitly mark where casting happens in the code.

func error2() -> s32 {
    s8 b = 0b00;
    s64 l = 1234;
    auto r = b + l;
    return 0;
}
func good1() -> s32 {
    s8 b = 0b00;
    s64 l = 1234;
    auto r = cast(b) + l;
    return r;
}
func good2() -> s32 {
    s8 b = 0b00;
    s64 l = 1234;
    auto r = cast(b) + l; // type inferencer can make the jump
    return r;
}

Generic Types and Functions

Reowolf now supports generic type parameters, that can be used both in user-defined data type definitions and in function definitions. Generic type parameters are also used by the type checker and type inferencer. For example, it is possible to define the generic option type:

union Option { None, Some(T) }

The generic type can be instantiated by a concrete type, including the primitive types such as integers. It is also possible to define generic functions, for example:

func some(Option s) -> T {
    if (let Option::Some(c) = s) { return c; }
    while (true) {} // does not terminate for Option::None
}

Furthermore, generic types are also added to input and output ports: this allows protocol programmers to specify precisely what value is expected during communication. For example, the sync channel is defined as:

primitive sync(in i, out o) {
    while (true) {
        sync {
            if (fires(i) && fires(o)) {
                auto m = get(i);
                put(o, m);
}   }   }   }

The sync channel can then be instantiated by different concrete types: e.g. sync is a byte channel, and sync is a byte array channel. The additional type information is useful to avoid communicating with incompatible message types.

Usability Improvements

The module and namespace system is improved. Protocol descriptions live in their own namespace (each domain name is a separate namespace) to prevent naming conflicts of definitions among multiple protocol authors. Importing symbols from other modules is allowed, and checks for naming conflicts among symbols imported from other modules and locally defined symbols.

An important aspect of this release is to have user-friendly error messages. This helps the protocol programmer to identify the error in the protocol description, that can be quickly resolved. For example:

struct Pair{ T1 first, T2 second }
func bar(s32 arg1, s8 arg2) -> s32 {
    auto shoo = Pair{ first: arg1, seond: arg2 };
    return arg1;
}

produces the following user-friendly error message:

ERROR: This field does not exist on the struct 'Pair'
 +-  at 3:45
 | 
 |              auto shoo = Pair{ first: arg1, seond: arg2 };
 |                                                      ~~~~~

Roadmap

After this release we can continue our work in the following directions:

The semantics of Reowolf’s sync block has to be adapted to make it possible to be driven by an efficient distributed consensus algorithm. For this, we introduced so-called scoped sync statements, that allows for a run-time discovery of neighboring components.
Modelling existing transport layer protocols, such as TCP and UDP, as Reowolf protocols. This allows us to convincingly demonstrate the expressiveness of the protocol description language, and to compare our implementation’s efficiency with existing networking stacks. These transport layer implementations would make use of native IP components. Further ahead, we can model existing Internet protocols such as ICMP, DNS, HTTP, ….
Make first approaches to integrating Reowolf into the operating system kernel. We are exploring which operating system is most suitable for integration. Considering that our user-mode implementation is written in Rust, we are seeking whether our kernel implementation can also be written (mostly) in Rust.
Work on the specification of the Protocol Description Language (PDL), leading to a standardization track. Part of this specification work is the need to formalize, in an unambiguous manner, the semantics of protocols specified in PDL. Formalized semantics increases the future potential for formal verification of protocols, and allows us to define the correctness criteria of Reowolf implementations.

We will keep you updated!

The Reowolf Team
– June 4, 2021

Reowolf 1.0 Project Code and Documentation

The Reowolf Team — Fri, 30 Oct 2020 14:59:21 +0000

The Reowolf 1.0 project files are released on Zenodo. The project documentation (technical report) is available at CWI’s Institutional Repository.

The repository serves as the documentation and specification of the Reowolf project, aiming to provide connectors as a generalization of BSD-sockets for multi-party communication over the Internet. A copy of the source code repository of version v1.0.0, and an overview presentation and its slides, are included. The repository comprises the final deliverables of the Reowolf 1.0 project.

Main contributor of the release is Christopher Esterhuyse, core developer of Reowolf 1.0. On Tuesday, October 27, 2020 he gave a talk in the Amsterdam Coordination Group (ACG).

Title: Overview of the Reowolf Project
Abstract:
The Reowolf project introduces connectors as a replacement for BSD-style sockets for multi-party network programming. Connectors encourage applications to make explicit their requirements on the behavior of the session, by facilitating configuration using protocol code, expressed in a domain-specific protocol language (based on Reo). These protocols are retained, and shared over the network, such that the underlying runtime and middleware can cooperate on realizing the session as safely, and efficiently as possible. The presentation summarizes the project’s developments, and lays out promising directions for the sequel.