Reowolf https://reowolf.net Tue, 01 Nov 2022 16:04:37 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.2 Time Modulation Protocol: a ‘Killer App’ for Reowolf? https://reowolf.net/time-modulation-protocol-a-killer-app-for-reowolf/ Thu, 27 Oct 2022 16:06:27 +0000 https://reowolf.net/?p=9184 During the execution of the Reowolf project over the past years, we have discovered a deep and interesting technique, that we would like to share with the wider community. Over the course of the past months, we assembled a small team (comprising Dalia Papuc, Benjamin Lion, Hans-Dieter Hiep) and intensively worked together to create a prototype and write an article about our discovery. This blog post summarizes what we’ve done in the past and what we have accomplished.

Progress in science is not linear and is hardly predictable: our discovery was not planned for during the execution of the Reowolf project, but we believe now is a showcase of a number of important aspects of our work. First, we figured that the concept of synchronization is essential and central to our project, and we now have a very clear idea of what synchronization in communication systems means. Second, we found that it is always possible to challenge an established field of study, in our case Information Theory: we’ve studied the fundamental assumptions underlying this field, which potentially opens up new avenues of research and development. Third, and lastly, the outcome of research comes at an unexpected moment: that is what makes doing research such an exciting activity!

In the Reowolf project, we originally set out to replace BSD sockets by Reowolf connectors, for numerous reasons we outlined before. We still believe in pursuing this end, but we’ve learned that doing so effectively requires bringing more convincing arguments to the table. After implementing two prototypes (Reowolf 1.0 and Reowolf 2.0) in which we experimented with many different set-ups and approaches, and writing an article about our approaches, our conclusion was: we need a realistic supporting use-case. Thus, we started looking for a so-called “killer app” that can demonstrate why connectors are superior to sockets.

During our work on the semantics of Reowolf’s Protocol Description Language (PDL), we became aware that all our components are so-called time-sensitive. More generally speaking, for every component that is present in a communication system (whether ‘low-level’, i.e. handling the encoding and decoding of messages communicated over channels, or ‘high-level’, i.e. protocol components that implement e.g. packet-switched networks or communication sessions with transmission control), there is always a (hidden) dependency on time. But, as it became clear to us, this insight is not as obvious as it is now in hindsight.

The fundamental problem with sockets is that sockets abstract away time: the application programmer who writes socket-based applications must manage time entirely by herself. This seems the logical thing to do in asynchronous networks, such as the Internet: what does “time” even mean in such an asynchronous setting? Now that we have realized that all components are time-sensitive and need to synchronize, our motto became: synchronization is necessary before communication is possible. But then, what is easier? Programming with synchronized connectors or programming with asynchronous sockets? Maybe we can demonstrate this point with our killer app! So, without further ado: here is the discovery (the “killer app”) we made:

In this article we describe a novel technique for increasing the effective capacity or increasing the end-to-end privacy and security of communication channels using high-resolution clocks.  Under specific conditions more data may be sent on channels than is traditionally the case, by means of advanced synchronization and packet delivery scheduling algorithms.  This may alleviate the need for capital investment for extending bandwidth for Tier 2 or Tier 3 network operators, who are purchasing Internet Protocol (IP) transit to connect with remote networks.  Alternatively, this technique may be used by end-users of the network to increase the end-to-end confidentiality beyond classical encryption techniques, without the need for adapting the underlying transit networks.

So, let us jump ahead and give the protocol realizing our novel technique a name: Time Modulation Protocol, or TMP in short.  The essence  of TMP is this:  instead of sending  data (a sequence of bits) contained in packets over the Internet from one party to another, we instead ensure  that the two parties have synchronized clocks and transmit data by merely sending a signal and reading off the clock value at the moment of reception of the signal.  No longer data is present in the transmitted message (as a bit sequence), but instead is transmitted at the moment a signal is received: the value of the clock on the receiving side then is the value of the message.  This ensures that the content of the communication can no longer be sniffed by a man-in-the-middle if such an attacker does not know the parameters of the clock synchronization between the parties, but it also ensures that metered links under-report the information content of the message.

We compare two little programs, one based on Reowolf’s time-sensitive Protocol Description Language and the other on the time-insensitive BSD sockets. The purpose of this program is to implement a time-sensitive channel between two parties. One party sends merely a signal to the other party: the signal is meaningless to anyone seeing the traffic between the parties (if unaware of the synchronization between the parties). At the moment the signal arrives, the clock value at the recipient is significant and allows the receiver to interpret the signal as a message.

In a simplified setting of implementing this time-sensitive protocol in PDL, we write the transmitter and receiver components as follows:

primitive void transmitter(boolean msg, out<void> p) {
  if (msg) put(p, ()); // send signal
  sync; // let clock tick
  if (!msg) put(p, ());
  sync;
}
primitive boolean receiver(in<void> p) {
  boolean result = false;
  if (let x = get(p)) result = true; // if signal was received
  sync;
  if (!result) get(p); // signal must be received now
  sync;
  return result;
}

We now let a transmitter component and a receiver component run, one per side of a channel that connects the two components. The semantics of the PDL-program above is as follows: the transmitter wishes to send a true or a false. To send true, it immediately sends a signal over the channel (in the current clock interval). Otherwise, the transmitter lets the clock tick, and sends a signal after the clock has progressed (in the next clock interval). The receiver (that is synchronized with the transmitter) tests whether it has received a signal in the current clock interval and determines the value of the result: if no signal was received, it progresses to the next clock interval and again tests whether it has received a signal. Both the transmitter and the receiver have progressed two clock intervals at the end. Note that there are two error conditions:

  • The receiver receives two signals (on both clock intervals).
  • The receiver receives no signal (on both clock intervals).

We say that the two components are synchronized if running these two components with some particular clock speed over some channel is reliable: the outcome of the receiver is no error, and the result of the receiver is equal to the input of the transmitter. Indeed, to establish the synchronization between two parties in the first place, we could be running this simplified protocol and tweak the synchronization parameters (e.g. delay-adjusted clock precision) until the two components behave reliably!

Now, the runtime of Reowolf guarantees that an underlying synchronization algorithm exists which guarantees that this transmission is reliable. Imagine implementing such a synchronization algorithm manually using sockets! Writing a program using sockets requires significantly more effort: the programmer must manage the clock synchronization entirely herself, which is error-prone.

More details about the mathematics of synchronization and time-sensitive communication can be found in the article we wrote (see below). In that article we also present more detail about the Time Modulation Protocol, which is essentially a sophisticated version of the simple program above.

The source files of our prototype based on sockets is available in this public GitHub repository. The article we’ve written over the past months, giving more details on our discovery, is available below:

In the coming week, we prepare and submit a condensed version (adapted to the particular audience of the publication venue) of the article to the FSEN 2023 conference for multiple reasons:

  • We believe our discovery is important enough to share widely.
  • We want to establish a consensus on the basic set of (mathematical) concepts, on which we can further development of our project.
  • We wish to obtain expert feedback from different academic fields of expertise. This allows us to receive critical viewpoints that we, ourselves, may have missed. Science is all about being proven wrong: and we are very much open to criticism.

Research never stops, so we will keep you updated when we’ve made more progress!

]]>
Consensus Algorithm (version 2.x) https://reowolf.net/consensus-algorithm-version-2-x/ Mon, 23 May 2022 10:25:46 +0000 https://reowolf.net/?p=9177 Introduction

For all of the reasons described in the previous consensus algorithm, speculative execution was removed from the runtime. This greatly simplifies the consensus algorithm. In fact, there isn’t much need for a consensus algorithm anymore, apart from some edge cases. Hence the fork statement is gone from the language.

Most of this document will describe a different problem that wasn’t discussed in the document describing the previous consensus algorithm: the introduction of a select statement and the way in which components are allowed into a synchronous region.

Reducing the Previous Consensus Algorithm

Without speculative execution, there is no more execution tree: there is only one control flow path a component can take through its PDL code. That execution may not form a complete trace (i.e. reach the end of the sync block) because it encounters a blocking operation. But to reiterate: if there is a complete trace, then it can only have taken one control flow path through its code.

However, there is still are reasons for incorporating parts of the old consensus algorithm. That is: there is still a port mapping, put operations update this port mapping, and get operations (which will not fork anymore, but simply block until a message is present) do the same thing.

There are two reasons for this. Firstly there is the design choice of enforcing strict ordering on the way channels are used between components. If there are two channels between components, then we may have the following code:

comp sender_variant_a(out<u32> tx_a, out<u32> tx_b) {
    sync {
        put(tx_a, 1);
        put(tx_b, 2);
    }
}

comp sender_variant_b(out<u32> tx_a, out<u32> tx_b) {
    sync {
        put(tx_b, 2);
        put(tx_b, 1);
    }
}

comp receiver(in<u32> rx_a, in<u32> rx_b) {
    sync {
        auto va = get(rx_a);
        auto vb = get(rx_b);
    }
}

If we wouldn’t send along the port mapping. Then both sender_variant_a as sender_variant_b would be valid peers for the receiver component. This is because if put operations are still asynchronous (that is: they send the message and continue executing, not waiting for the corresponding get to complete), then both messages will arrive at the receiver component. The receiver component can retrieve these messages in any order. However, if we do send along the port mapping, then only sender_variant_a can be a valid peer of receiver. Note: this is a design choice.

We could reduce this design choice by only sending along the port mapping of the port over which is a message is sent. This would imply that messages over the same channel have to be received in the correct order, but messages over different channels can be received in any desired order. We’ve kept this strict ordering in place by sending along the full port mapping.

The second reason for still including the port mapping has to do with the fact that, by design, put operations are not acknowledged. That is to say: the put operation is asynchronous, and does not prevent the sending component from continuing its execution. For a put operation to be valid, the message must be received by the peer. Without any kind of port mapping we have no way of detecting the following mismatch:

comp sender(out<u32> tx) {
    sync {
        put(tx, 0);
        put(tx, 1);
    }
}

comp receiver(in<u32> rx) {
    sync get(rx);
}

However, there are much simpler ways of designing a consensus algorithm without speculative execution: one may annotate the port with the number of message sent, one may (as stated above) only send along the port mapping for a single port, instead of all shared channels, etc.

That being said, the current implementation still uses branch IDs (differently interpreted in this case: a simple operation counter). There is still a port mapping from port ID to branch ID, and finding the global solution still proceeds in the same way (except now there is only one complete trace per component).

Synchronous Regions

There were some aspects of the consensus algorithm that were specifically left out in the previous document. Among them is when to consider a component part of the synchronous region. In version 1.0 of the Reowolf runtime each peer of each component was considered part of the synchronous region. The idea behind this decision was that the fact that a channel remains unused in a synchronous round should be seen as part of the consensus algorithm: if a component did not get on a port, then the peer must agree that it did not put on a port.

So when a complete trace was found (that is: a component reached the end of its sync block), a component went over all its ports and sent a special message over the unused/silent channels asking the peers to join the synchronous round with their local solutions, where those local solutions are required not to have sent/received anything over those silent channels.

The problem with this approach may be illustrated with a simple example: suppose a set of requesting servers that are connected to a database server of some sorts. In the normal situation it would be perfectly normal for multiple servers to be storing database resources at the same time. These can all be handled in sequence by the database server, but the requesting servers do not necessarily have to wait for one another. Some pseudocode for this example:

union Request { 
    StoreData(u8[]),
    /* more things here in a more complicated example */ 
}

comp requester(out<Request> tx_cmd) {
    // Setup code here
    // Perhaps some code that requires retrieving database resources
    while (!should_exit) {
        sync {
            u8[] data = { 1, 2, 3 } /* something more complicated */
            sync {
                put(tx_cmd, Request::StoreData(data))
            }
        }
    }
}

comp database(in<Request>[] rx_cmds) {
    // Note the array of input ports: there are multiple requesters
    // Lot of complicated stuff
    while (!should_exit && length(rx_cmds) > 0) {
        // Here is the conundrum (and a second one, mentioned later in this document):
        auto command = get(rx_cmds[0]); // take a message from the first requester
        if (let Request::StoreData(to_store) = command) {
            store_the_data(to_store);        
        } else {
            // other commands, in a more complicated example
        }
    }
}

In this case, the code seems reasonable, but will always fail if there are >1 requesters at the database component. Because once the database reaches the end of its sync block, it will have a mapping for rx_cmds[0], but the remaining ports were all silent. So the consensus algorithm asks the peer requester components if their channels were silent. But they aren’t! Each requester can only send data in its sync block.

So one might be inclined (in fact, its the only way to solve this in the unmodified language) to write the requester as:

comp requester(out<Request> tx_cmd) {
    // Setup code here
    // Perhaps some code that requires retrieving database resources
    while (!should_exit) {
        sync {
            fork {
                // We either do nothing
            } or fork {
                // Or we communicate with the database server 
                u8[] data = { 1, 2, 3 } /* something more complicated */
                sync {
                    put(tx_cmd, Request::StoreData(data))
                }
            }
        }
        
        some_code_that_may_take_a_really_long_time();
    }
}

In some sense the problem is solved. Only one requester component will have its request going through, the other ones have silent channels, and the database component is capable of receiving that message. But there are two further problems here: Firstly there is no way to give any guarantees about fairness: the component that can communicate to the database component the fastest will probably have their traces completed first, hence will be the first ones submitted to the consensus algorithm, hence will likely be the one picked first. If this happens over the internet then the slow connections will almost never be able to submit their information. Secondly we have that none of the components can run at their own pace. Each component always has to wait until all of the other ones have joined the synchronous region. There is no way in which a single requester can do some other work on its own unfettered by the execution speeds of its peers.

And so the rule for joining a synchronous region was changed. Now it is simply: if a component performs a get and receives a message from another component, then they both become part of the same synchronous region (and perhaps this region is already larger because the components are part of larger separate synchronous regions). If a message is in the inbox of the component, but there is no get operation performed, then the message is kept around for perhaps a later synchronous round. Note that this works together with the consensus algorithm in such a way that joining a consensus region happens upon the first accepted message on a channel. Later messages within that synchronous round now must arrive due to the consensus algorithm.

Select Statement

The example above hints at a second problem (figured out a long time ago by our unix overlords): there is no way within PDL code to say that we can perform a variety of behaviours triggered by the arrival of messages. For our database component above, we see that it can have multiple requesters connected to it, but there is no way to indicate that we can have valid behaviour for any one of them. So we introduce the select statement. It is a statement with a set of arms. Each arm has a guard, where we indicate which ports need to have incoming messages, and a block of code, that is executed when all of the indicated ports actually have received a message.

The select statement may only appear within a sync block. The arm’s guard is formulated in terms of an expression or a variable declaration (that may contain get calls, but not put calls). The corresponding block of code is executed when all of the get calls in the guard have a message ready for them. In case multiple arm guards are satisfied then a random arms is chosen by the runtime for execution.

So the select statement takes the form:

select {
    auto value = get(rx_a) + get(rx_b) 
        -> do_something_with_the_value(value),
    auto value = get(rx_a)
        -> do_something_with_the_value(value),
}

This code is transformed by the compiler into something akin to:

while (still_waiting_for_a_message) {
    if (value_present(rx_a) && value_present(rx_b)) {
        auto value = get(rx_a) + get(rx_b);
        do_something_with_the_value(value);
        
        break;
    } else if (value_present(rx_a)) {
        auto value = get(rx_a);
        do_something_with_the_value(value);
        
        break;
    }
    block_until_there_is_a_new_message();
}

The combination of the select statement and the way we introduce components into the synchronous region permits components to run independently of another when their protocol admits it.

The rule where components only enter a synchronous region when the get a message that is present in the inbox still applies here. If, in the example above, the arm requiring messages on channel b executes, then only the peer component of this channel joins the synchronous region. The message that came over channel a will still be present in the inbox for later interactions or sync blocks.

The Current Consensus Algorithm

Because the current consensus algorithm is a scaled down version of the previous one, we’ll be a bit more concise: Each component has a local counter, this produces number we (still) call branch branch numbers. The counter is incremented each time a get or a put call is performed. We’ll use this counter to annotate ports in the port mapping each time a put call is performed. The port mapping is sent along with messages sent through put calls. This message will arrive in the inbox of the receiver. When the peer performs a get call on the receiving end of the channel, it will check if the received port mapping matches the local port mapping. If it doesn’t, we can immediately let the component crash. If it does, then the message is received and the component continues execution. Once a get call has completed, the sender is incorporated into the synchronous region, if it wasn’t already.

Once a component has reached the end of the sync block it will submit its local solution (i.e. the port mapping) for validation by the consensus algorithm. If all components in the synchronous region have submitted a local solution, whose port mappings are pairwise consistent with one another, then we’ve found a global solution. With this global solution the components in the synchronous region are ordered to continue the execution of their PDL code.

As a small side-note: with speculative execution the consensus algorithm amounted to finding, for each component, a complete trace whose interactions are consistent with all of its peers. Without speculative execution we have the multi-party equivalent to an Ack message in the TCP protocol. (Yes, this statement skips over a lot of details of the TCP protocol, but) in TCP both parties perform a best-effort attempt to ensure that a sent message has arrived at the receiver by the receiver Acknowledging that the message has been received. In this cases the consensus algorithm performs this function: it ensures that the sent messages have all arrived at their peers in a single multi-party interaction.

]]>
Consensus Algorithm (version 1.x) https://reowolf.net/consensus-algorithm-version-1-x/ https://reowolf.net/consensus-algorithm-version-1-x/#comments Mon, 23 May 2022 10:20:19 +0000 https://reowolf.net/?p=9174 Introduction

The previous consensus algorithm (the one within Reowolf 1.0 and 1.1) had support for speculative execution. This means that the user may (directly or indirectly) fork the execution of a component. That particular execution (which may be a forked execution already) then becomes >1 executions. At some point a component will have to choose which particular execution will be committed to memory. This is one reason for the existence of a sync block: a block of code wherein one may perform forking, and at the end a component will have to choose the execution that is committed to memory.

With speculative execution we may have multiple components that are all forking their execution and sending/receiving messages. So we do not end up with one component that has to choose its final execution, but all components choosing their final execution. Note that one component’s execution may apply restrictions on the validity of another component’s execution. As an example, suppose the following components and their executions:

  • Component A: Has two executions:
    • Execution A1: Component A has sent a message to component B.
    • Execution A2: Component A has received a message from component B.
  • Component B: Has three executions:
    • Execution B1: Component B has received a message from component A, then sends a message back to component A.
    • Execution B2: Component B has received a message from component A.
    • Execution B3: Component B has sent two messages to component A.

Without delving into too much detail, one may see that the only valid solution to this problem is the combination of A1 and B2. So the components cannot just pick any execution, but must pick an execution that is agreeable with the chosen executions of the components it has interacted with.

Component Execution Tree, and Execution Traces

Components execute PDL code, which may contain calls to fork, put, and get. A fork explicitly forks the execution of the code. A put sends a message to a particular component, and a get receives a message from a component and forks (as explained later).

As the component enters a sync block, it has only one possible execution. But as stated above there are reasons for the execution to split up. These individual executions may themselves split up later, thereby forming a so-called “execution tree”:

                             +-----+       +------+
                             | put |------>| sync |
+-------+      +------+----->+-----+       | end  |
| sync  |      | fork |                    +------+
| start |----->+------+----->+-----+
+-------+                    | get |------>+------+
                             +-----+       | sync |
                                   |       | end  |
                                   |       +------+
                                   |
                                   +------>+------+
                                   |       | sync |
                                   |       | end  |
                                   |       +------+
                                   |
                                   +--> ...

This tree was formed by executing the following following PDL code:

primitive some_component(out<u32> tx, in<u32> rx) {
  sync {
    fork {
      put(tx, 5);
    } or fork {
      get(rx, 1);
    }
}

We can see the reason for calling the execution tree a “tree”. There are several things to note about the execution tree: Firstly that some executions have been completed and form a complete trace, that is: starting from the “sync start” a complete trace may be represented by the path running to the “sync end”. Conversely, there is one trace that is incomplete: there is a trace waiting at the get for a message. We’ll call a place where the execution splits into multiple branches/executions a “branching point”.

Note that the branching points can in the general case only be discovered at runtime. Any code may have control flow points like if statements, or while loops. Consider the following code:

primitive some_component(out<u32> tx, bool which_way) {
  sync {
    if (which_way) {
      fork {
        put(tx, 1);
      } or fork {
        put(tx, 2);
      }
    } else {
      put(tx, 3);
    }
  }
}

Depending on the value of which_way we produce two different execution trees (of which we can determine all traces). The compiler cannot decide at compile time which execution tree will be generated.

Note that the get branching points have an arbitrary number of forked executions arising from them. We’ll call them “waiting points”. In the general case we cannot figure out how many forked executions arise from a get branching point. The reason being might be illustrated by the following simple example:

primitive sender(out<u32> tx, u32 num_forks) {
  sync {
    auto fork_counter = 1;
    while (fork_counter < num_forks) {
      fork {
        put(tx, fork_counter); 
      } or fork { } // empty case
    }
    put(tx, num_forks);
  }
}

primitive receiver(in<u32> rx) {
  u32[] values = {};
  sync {
    bool keep_going = true;
    while (keep_going) {
      auto new_value = get(rx);
      values @= { new_value }; // append
      fork {
        keep_going = false; 
      } or fork { }
    }
  }
}

If the sender is connected to the receiver, then the sender will send anywhere between 1 and num_forks messages (distributed over num_forks forks), depending on a user-supplied parameter (which we cannot figure out at compile-time). The isolated receiver can generate an infinite number of forked executions. We can analyze that the receiver will at most have num_forks + 1 forked executions arising from its get branching point (the num_forks branches that do receive, and one final fork that is infinitely waiting on another message), but the compiler cannot.

For this reason a get branching point needs to be kept around for the entire duration of the sync block. The runtime will always need to have a copy of the component’s memory and execution state the moment it encountered a get instruction, because it might just be that another component (in perhaps a new fork, which we cannot predict) will send it another message, such that it needs to produce a new forked execution.

A get operation is also a “blocking operation”: in the general case the component needs to know the value produced by the get operation in order to continue its execution (perhaps more specifically: the first time a read operation is performed on the variable that will store the transmitted message). Consider the simple case where the received message contains a boolean that is used in the test expression of an if statement: we’ll need to have actually received that boolean before we can decide which control flow path to take. Speculating on the contents of messages is too computationally expensive to be taken seriously. A put operation is not a blocking operation: the message is sent and the component continues executing its code.

We’ve touched upon control flow points multiple times. We’ll touch upon some aspects of control flow here, to more easily introduce the algorithm for finding consensus later. A component is fully described by its memory (i.e. all of the memory locations it has access to through its variables) and execution state (i.e. its current position in its code). So once a component encounters a control flow point, it can only take one control flow path. Which path may be computed from its memory state. The calling of certain impure functions (e.g. retrieving a cryptographically secure random number) does not change this fact. Note that receiving values from other components might change a component’s memory state, hence influence the control flow path it takes in the subsequent forked execution. Conversely, a component sending a value might influence another component’s memory state.

So before treading into more detail, here we’ve found that in the general case:

  • Speculative execution may only occur inside sync blocks.
  • Speculative execution implies that we end up with an execution tree.
  • A path through the execution tree that reaches the end of the sync block is called a complete trace, and represents a valid execution of the sync block for the component (but perhaps not compatible with a particular complete trace of a peer it interacted with).
  • The set of traces produced by a component in its sync block can practically only be discovered at runtime.
  • A get operation is necessarily a blocking operation that always incurs a branching point. A put operation is a nonblocking operation that will not fork into multiple executions.
  • The control flow path of a trace of a component may be influenced by the messages it has received.

Towards a Consensus Algorithm

The key to the consensus problem is somehow discovering the ways in which the components have influenced the memory state of their peers. If we have a complete trace for each component, for which all peer components agree on the way they have influenced that complete trace, then we’ve found a solution to the consensus problem. Hence we can subdivide the consensus problem into four parts:

  1. Keeping track of the messages that influence the memory state of components.
  2. Keeping track of the peers that influence the memory state of components.
  3. Finding a set of interactions between components on which all involved components agree, i.e. each put should have a corresponding get at the peer.
  4. Somehow having a communication protocol that finds these agreeable interactions.

We’ll not consider the last point, as this is essentially a gossip protocol, and the appropriate gossip protocol varies with the requirements of the user (e.g. robust to failure, memory efficient, runtime efficiency, message complexity). We define some terms to make the following discussion easier:

  • “component graph”: A graph where each node is a component, and each channel between components forms am edge in that graph.
  • “sync region”: The group of components that have interacted with one another and should agree on the global consensus solution.
  • “local solution”: A complete trace of a component. For the component this is a valid local solution, but might not be part of a global solution.
  • “global solution”: A set of traces, one for each of the components in the sync region, that all agree on the interactions that took place between the components in the sync region.

We’ll now work incrementally to the final consensus algorithm, making it a bit of a story in order to explain the reasoning and intuition behind the consensus algorithm.

Suppose a component can somehow predict exactly which messages it is going to receive during the execution of its code, we’ll assume that each received message has the appropriate get call associated with it. In this case we’re able to produce the set of complete traces that a component produces by symbolically executing its code: we start out with the initial memory state, might perhaps do some explicit forking, know exactly which messages we receive and how they influence the control flow, and arrive at the end of the sync block. Hence each component can figure out independently which complete trace is the solution to its consensus problem.

However, as we’ve outlined above, we cannot know exactly which messages we’re going to receive. We’ll have to discover these messages while executing a component. The next best thing is to keep track of the values of the messages that we’ve received in a complete trace. Once we have complete traces for all of the interacting components, we can check that the received value corresponds to a sent value. e.g.

primitive sender(out<u32> tx) {
  sync {
    fork {
      put(tx, 1);   
    } or fork {
      put(tx, 2);
    }
  }
}

primitive receiver(in<u32> rx) {
  u32 value = 0; 
  sync {
    value = get(rx);
  }
}

Where tx is part of the same channel as rx. In this case we’ll have two traces for each of the components, resulting in two valid global consensus solutions. In one solution the message 1 was transferred, in another the message 2 was transferred. There are two problems with this solution: firstly it doesn’t take the identity of the channel into account. And secondly it doesn’t take the effects of previous messages into account.

To illustrate the first problem, consider:

primitive sender(out<u32> tx_a, out<u32> tx_b) {
  u32 some_variable_in_memory = 0;
  sync {
    fork {
      put(tx_a, 1);
      some_variable_in_memory = 1;
    } or fork {
      put(tx_b, 1);
      some_variable_in_memory = 2;
    }
  }
}

primitive receiver(in<u32> rx_a, in<u32> rx_b) {
  u32 value = 0; 
  sync {
    value = get(rx_a);
  }
}

Here the fact that the sender has the solutions 1 and 1 does not help the receiver figure out which of those corresponds to its own solution of 1.

To illustrate the second problem, consider:

primitive sender(out<u32> tx) {
  sync {
    fork {
      put(tx, 1);
      put(tx, 2);
    } or fork {
      put(tx, 2);
    }
  }
}

primitive receiver(in<u32> rx) {
  u32 value = 0; 
  sync {
    value = get(rx);
  }
}

Now we’ll have sender contributing the solutions 1, 2 and 2. While the receiver will generate the solutions 1, 2 and 2. The reason there are three solutions for the receiver is because it cannot figure out that the message 2 from the sender depended on the first message 1 from the sender having arrived.

So we need to change the algorithm. Instead of just tracking which messages were sent, each component needs to have a mapping from port identities to sent messages (internally the runtime will generate port/channel IDs, but for the sake of this discussion we’ll use the postfixes to the port names in the PDL code to indicate to which channel they belong, e.g. the tx_a out-port is part of the same channel a as the rx_a in-port). Secondly, if we sent a message, we need to transmit in which way it depends on previously received messages by sending along the sender’s port mapping. The receiver, upon getting a message, checks the port mapping to see if there is any of its own executions that can accept that message.

We’re already calling this information the “port mapping” (because we’ll truly turn it into a mapping later), but for now the sent mapping is a list of pairs containing (port ID, sent value).

This changes the way we can interpret the execution tree: each node is not only associated with the performed operation (fork, put or get), but also associated with a particular port mapping that indicates the influence of other components that allowed it to reach that exection node. We modify the port mapping per node in the following way:

  • For a fork: we fork the execution as many times as needed, and for those forks we copy the port mapping of the ancestor node in the execution tree.
  • For a put: we transmit the current port mapping, and the transmitted value in the message. We then update the mapping from the sending port: that particular port ID now maps to the recently transmitted value.
  • For a get: we receive the transmitted port mapping and value. Note that this get might be a particular statement executed by multiple different forked executions (each with their own port mapping). And so for a get to succeed, we need the shared channels between the sender and the receiver to agree on their port mapping. If such a get succeeds, then it forks into a new execution where the receiving port now maps to the received value.

So, for a slightly more complicated example, combining the two previously examples:

primitive initiator(out<u32> tx_a, out<u32> tx_b, in<u32> rx_c) {
  u32 value = 0;
  sync {
    put(tx_a, 1); // operation i1
    fork {
      put(tx_a, 1); // operation i2
      value = get(rx_c); // operation i3
    } or fork {
      put(tx_b, 2); // operation i4
      value = get(rx_c); // operation i5
    }
  }
}

primitive responder(in<u32> rx_a, in<u32> rx_b, out<u32> tx_c) {
  sync {
    auto value_1 = get(rx_a); // operation r1
    auto value_2 = 0;
    fork {
      value_2 = get(rx_a); // operation r2
    } or fork {
      value_2 = get(rx_b); // operation r3
    }
    put(tx_c, value1 + value2); // operation r4
}

Here, once the components have brought as much forked executions to completion as possible, we’ll have the following execution trees (and mappings). The square bracketed terms denote port mapping. The parenthesized terms correspond to the operations in the code, and the curly bracketed terms are the names for the traces (so we can refer to them in this document).

For the initiator:

sync  --> put (i1) --> fork +--> put (i2) -----> get (i3) -+-----> sync end {A}
start     [(a,1)]           |    [(a,1),(a,1)]   [(a,1),(a,1),(c,2)]
                            |                              |
                            |                              |
                            |                              +-> ...
                            |
                            +--> put (i4) -----> get (i5) -+-----> sync end {B}
                                 [(a,1),(b,2)]   [(a,1),(b,2),(c,3)]
                                                           |
                                                           |
                                                           +-> ...

For the responder:

sync  --> get (r1) -+--> fork +--> get (r2) -----> put (r4) ----> sync end {C}
start     [(a,1)]   |         |    [(a,1),(a,1)]   [(a,1),(a,1),(c,2)]
                    |         |
                    +-> ...   +--> get (r3) -----> put (r4) ----> sync end {D}
                                   [(a,1),(b,2)]   [(a,1),(b,2),(c,3)]

We can see that the put operation at i2 does not end up being received at r1. The reason being that at r1 the responder expects to not have received anything on rx_a yet. The message that the initiator sends contains the annotation [(a,1),(a,1)], meaning: I have previously sent [(a,1)], and am now sending [(a,1)]. The only operation that can receive this operation is at r2, because that expects the mapping [(a,1)]!

Similarly: when the responder puts at r4, this happens in two branches. The branch that ends up being trace C expects the initiator to be in the state [(a,1),(a,1)] (hence can only be received at operation i3, resulting in trace A of the initiator). The branch that ends up being trace D expects the initiator to be in the state [(a,1),(b,2)] (hence can only be received at operation i5, resulting in trace B of the initiator).

And ofcourse, when both components are finished, they can compare the mappings in both of them and conclude that the traces A and C are compatible, since their port mappings are compatible. Similarly traces B and D are compatible. So there are two global solutions to the consensus problem.

For the sake of simplicity, we’ve only considered two components. But there may be many more components involved in a single synchronous region. In this case we’ll have to clarify when receiving a message is valid. When a message is sent to another component, the receiving component first filters the port mapping (both for the mapping stored in the execution tree and the mapping sent over the channel) such that only the channels shared between the two components are left. If those two mappings are equal, then the message can be received in that branch.

The same process needs to be applied when we seek a global solution. In rather silly pseudocode, but the simplest way to explain this process, is to have the following algorithm seeking the global solution:

all_components = [ component_1, component_2, ..., component_N ]
global_solutions = []

// Nested loop through all components
for all complete traces in component_1:
  for all complete traces in component_2:
    ...
      ...
        for all complete traces in component_N:
          let success = true;
          let all_traces = [trace_of_1, trace_of_2, ..., trace_of_N];
          
          // Looping through every pair of traces
          for index_a in 0..N:
            for index_b in index_a + 1..N:
              let component_a = all_components[index_a]
              let trace_a = all_traces[index_a]
              let component_b = all_components[index_b]
              let trace_b = all_traces[index_b]
              
              trace_a = filter_on_shared_channels(trace_a, component_a, component_b)
              trace_b = filter_on_shared_channels(trace_b, component_a, component_b) 
            
              if trace_a != trace_b:
                success = false;
                break
            
            if !success:
              break
              
          if success:
            global_solutions = append(global_solutions, all_traces)

We’ll now apply the last bit of trickery to this algorithm. Firstly keeping track of the sent message values may be prohibitively expensive. Suppose some kind of streaming data processor that receives gigabytes of data in a single synchronous round. It would be an unwise design decision to store all of that data in the port mapping. So what we’ll do instead is assinging each node in the execution tree a unique number (unique only for that execution tree, different execution trees might contain the same number). With that unique number we’ll redefine the port mapping to consist of a list of (port ID, branch number) pairs.

The reason this is a valid trick is because of the arguments made earlier regarding control flow being influenced by received messages. If we know how a component was influenced by external influences, then the control flow path it takes is deterministic, hence the content of the sent messages will be deterministic. Locally a component A may only describe the way it was influenced by its peer B, but B also records how it was influenced by its peers C and D. So transitively A will also know the indirect mutual influences between it and C and D.

Lastly, we can turn the list of (port ID, branch number) pairs into a true mapping {port ID -> branch number}, we do not actually need to keep the entire history around. The reason behind this is the fact that the get operation is blocking and requires the sent port mapping to be compatible with its execution node’s port mapping. So when a get operation succeeds, it agrees that the executions of both sender and receiver are compatible up until that point. Continued execution only has to check that the subsequent interactions are compatible up until that point. Consider the following cases:

  • A get operation never receives a message: in that case we keep blocking indefinitely, we’ll never get a complete trace, hence are prevented from finding a local solution.
  • A get operation receives a message containing an incompatible port mapping: in that case we have roughly the same case as above. The message is not accepted and we keep blocking indefinitely. The interpretation is different: the sender and the receiver did not agree on the control flow paths they took.
  • A get operation receives a message containing a compatible port mapping: in this case both components agree on the order of operations that took place for the message to be transferred.
  • A put operation is never received: again, we’re fine. The putting component might submit a local solution for a complete trace. But the mapping for that never-received message will also never appear in the supposed receiver’s port mapping. Hence the two components will never agree on the control flow paths they took.

The Consensus Algorithm

With the story above, we may describe the complete consensus finding algorithm as following.

Each component will locally construct an execution tree. Branches appear whenever it encounters a fork (producing a set of new branches), a put operation (this will create a single new branch) or a get operation (which will create new branches if the sent message’s port mapping is compatible with the port mapping stored at that get‘s branch). Whenever a new branch is created it is assigned a locally unique identifier.

The port mapping at each branch consists of a mapping {port ID -> branch number}. This port mapping starts out empty at the start of a sync block. When a component puts a value through a channel, it will:

  • Generate the put branch and assign it a unique ID.
  • Send the message that is annotated with the ancestor branch’s port mapping. Together with the (sending port's ID, newly assigned branch ID) pair.
  • Update the port mapping of the put branch: it will copy the ancestor branch’s port mapping, and update it with the (send port's ID, newly assigned branch ID) pair. In case a previous entry is present for that specific port ID, then it is overwritten.

When a component encounters a fork, it will simply copy the ancestor branch’s port mapping for each new branch. Each new branch will receive a new unique ID. When a component performs a get, it will block until it receives a message annotated with a port mapping. If that happens, then it will:

  • Compare the port mapping in the message with the branch’s port mapping. If one of the shared channels do not agree on the port mapping, then the message is not valid for that get operation. Note that the message must be kept around, because in the future there may be a get operation that is valid for that port mapping.
  • If the port mapping for the shared channels do agree, then we:
    • Generate a new fork originating from the get branch and assign it a unique ID.
    • Update the port mapping of the get branch: copy the ancestor branch’s port mapping, and update it with the (sending port ID, peer's branch number) pair.

Reasons for not implementing this Consensus Algorithm

There are a lot of practical issues with this consensus algorithm:

  1. The fact that a get operation never knows when it will receive a new message requires us to keep a complete copy of the component’s memory and execution state at that point. Hence for each get operation we’re incurring a rather large memory overhead.
  2. The fact that we never know if a received message can be discarded because it cannot be received by any of the get operations in the component’s code. There may be another message coming in that causes a fork with a get operation that can receive this message. Hence we need to keep around all of the messages received in a synchronous round.
  3. The incredible computational complexity of finding a global solution to the consensus algorithm. We need to check for each component all of its completed traces. For all of those N components, each supplying T traces (to simplify the math), we need to check each pair of traces. So the maximum computational complexity is (N*T)*(N*(T-1))/2. In reality this is a bit less, because we can very likely quickly eliminate certain traces.
  4. Considering the previous points: the simplicity (especially when running over the internet) for a nefarious actor to incur computational overhead for the receiver. All it has to do is to keep sending messages to the receiver with an acceptable port mapping, but to never offer the consensus algorithm a valid local solution. Each accepted message will spawn a new fork at the receiver.
]]>
https://reowolf.net/consensus-algorithm-version-1-x/feed/ 1
Runtime Design (version 2.x) https://reowolf.net/runtime-design-version-2-x/ Mon, 23 May 2022 10:08:52 +0000 https://reowolf.net/?p=9171 General Architecture

Roughly speaking the project consists of the following parts:

  1. The compiler itself. This transforms the PDL source code into an executable format.
  2. The interpreter. This takes the executable format and executes it. It is a very unoptimized AST walker based on stack frames. Generally speaking the bottommost frame in the stack contains the code and memory associated with a component. Once the interpreter hits a point where it needs to interact with the runtime (generally in order to communicate with another component) it will halt and emit a signal to the runtime.
  3. The runtime. This is the code that keeps track of the components, decides when they can run, where their messages should end up, and bring various control algorithms (running behind the scene) to completion.

We’ll not go into points 1 and 2 in this document. One may simply assume that at the language level there is support for all of the things that are implemented in the runtime: sync blocks, channels, ports, component creation, sending and receiving messages, etc.

Once such builtin features are encountered in the interpreter (e.g. the creation of a channel), a signal will be emitted to the runtime. A rough outline of the architecture, and handling these signals, is discussed in this document.

Runtime Architecture

The runtime’s code essentially consists of:

  • The machinery that keeps track of the components. That is to say: there is some memory reserved for each of the components. And using some kind of component ID we can look up this memory in a registry. If the runtime finds that there are no more user-controlled components running (i.e. the ones that a programmer uses to create more components) and there are no more regular components running, then the runtime shuts down.
  • A bit of shared memory for all of the OS threads that will be managed by the runtime. This mainly consists of a work queue. This work queue contains the identities of the components that are scheduled for execution.
  • A set of scheduler threads. They attempt to retrieve work from the work queue. This work is in the form of component IDs, which they use to retrieve the associated component and run its PDL code. They do this by invoking the interpreter on the component’s current execution and memory state. Once a runtime signal (as mentioned above) is emitted by the interpreter, the scheduler thread will deal with it appropriately.
  • An auxilliary polling thread. Not of great importance, so a short description suffices: although most components react to one another, some components (e.g. a TCP socket) might have nothing to do until the OS instructs it to do something. This polling thread ensures that once there is something to do, the component is put back onto the work queue.

As long as the runtime doesn’t shut down there will be T threads executing N components. A component can either be running (by being executed by a scheduler thread), scheduled for execution (its ID is in the work queue), or sleeping. All of these states are exclusive. Maintaining the exclusivity of these states is of great importance! We never want to end up in a place where two threads are both modifying the code/memory state of the same component!

A component will start its lifecycle as being put into the work queue (not exactly true, but this will be dealt with later) by the component that created it. At some point a scheduler will pick up the newly created component’s ID from the work queue and start executing its code. Once running (and we’ll exclude fatal errors for now) the component may at some point reach the end of its code and terminate, or it may encounter a place in its code where it blocks and needs to wait for external input (e.g. it performed a get, but a message has not arrived yet).

Once the execution of a component is blocked, it will attempt to go to sleep. Meaning: it will be in a state where it is not running, but also not scheduled for execution. A component may then be woken up by a different component, or the polling thread, by sending the sleeping a component a message. To prevent components from scheduling a sleeping component multiple times, the memory of a component contains an atomic sleeping boolean.

It is instructive at this point to roughly explain how components are stored in memory. The components memory is roughly divided into two regions. There is the publicly accessible memory of a component and a private memory region. The public part is accessible by all scheduler threads. So all of the memory in the public memory region is somehow behind a lock, or some kind of locking mechanism (we will include the concept of atomics into “some kind of locking mechanism” for now). Hence the aforementioned sleeping boolean lives in this region. Conversely, the private memory region of a component is only accessed by the scheduler thread that is running the component. So here we store things like the memory state and execution state of the component.

Returning to the idea of a component wishing to enter the “sleeping” state. The procedure in pseudo-code is written as:

func go_to_sleep(this_component) {
   // We are currently executing, so our sleeping flag MUST be `false`
   assert(atomic::load(this_component.sleeping) == false);
   atomic::store(this_component.sleeping, true); // we force the flag to true
   // Note that while we were trying to go to sleep, someone has sent us a new
   // message, but it did not see yet that we stored `false` in the sleeping
   // flag, so we need to check ourselves.
   if messages_in_inbox() {
      // We try to set the flag back to false, but another scheduler thread may
      // have already done this
      let swap_success = atomic::cas(this_component.sleeping, true, false);
      if swap_success {
         put_in_work_queue(this_component.id);
      }
   }
}

Similarly, each time we try to send a component a message, we must do the following:

func send_message_to(target_component, message_data) {
   put_message_in_inbox_locked(target_component.inbox, message_data);
   let swap_success = atomic::cas(target_component.sleeping, true, false);
   if swap_success {
      put_in_work_queue(target_component.id);
   }
}

Note that, because we cannot predict how the OS threads are scheduled, we can also not predict the way in which our own schedulers (which are running on OS threads) will schedule the execution of the components. Hence as a mental model, one may assume that each component is running in its own thread. The system above ensures that if a component has something to do (because it has received a message), it will eventually end up being executed by a scheduler. With the code for the “sleeping” state we’ll ensure that a component can only be executed by one scheduler at a time.

General Messaging

With this rough architecture, components can send each other messages. One will find three kinds of messages in the runtime (okay, four, but the last one is just to make the OS polling thread work):

  1. Data messages: these are the messages that are sent by put calls and received with get calls. As will be explained later in this document, more information is attached to data messages than the values given as argument to the put call. These messages will arrive in the target component’s inbox. When the target component performs a call to get they’re pulled out of the inbox and transferred to the component’s memory state, such that the PDL code can read the transferred values.
  2. Control messages: to facilitate certain operations permitted within PDL, the scheduler thread may decide to send messages to other components. These messages are called control messages. We’ll encounter them later in this document when describing component creation, transmitting ports, and closing channels. Different from data messages, which may linger in the component’s inbox for a while, control messages are handled by the receiving component immediately. This is important for various control algorithms.
  3. Sync messages: to facilitate the consensus algorithm, there will be messages initiated by the scheduler thread as well. That is to say: when the component sends data messages there will be information attached to the data message that facilitates a successful sync round. But apart from that when the components are all done executing their code they must somehow reach consensus. This is done through these sync messages.

Note that already the concept of a channel, each having its own little slot (or limited buffer), becomes a superfluous design decision for the runtime. This is because the scheduler threads themselves also need to be able to send messages to other components. Hence we’ll need something more generic than a per-channel buffer, namely a generic message inbox.

As we’ll later also see, the concept of a directional channel may be a useful tool within PDL code (and I’m not arguing against such a concept), but as we’ll see throughout this document the control messages can (conceptually) flow both ways over the channel.

Runtime Design Drivers

Most design choices in the runtime were based on the fact that the Reowolf language should facilitate easier programming over the internet. So although the entire system currently only considers components running on a single machine, those components were conceptually regarded as living on different machines. In other words: components were conceptually considered not to have shared memory they can both access.

This has some implications for channels. Certainly a component that sends a value must have it stored somewhere temporarily, and the component receiving it needs to keep it around as well. But the channel is not an entity that you’ll find in memory. Rather there is one component owning one port, and when a message is put through it, it will arrive at another component owning the peer port; there is no memory sahred between components that will store a message flowing through a channel.

A multi-machine runtime also requires the runtime to embrace the asynchronous nature of components. puts are non-blocking and can be performed one after the other before the peer has performed a corresponding get. The language does not contain the concept of a “lock” such that two components can agree on who owns a shared bit of memory. Rather each component is executed in its own thread of execution, and for multiple components to coordinate their actions they must use the messaging facilities. In order to make this coordination-through-messages somewhat simple to reason about one of the design drivers of the runtime was to ensure that each message sent in a specific order from one component to another will arrive in that same order at the target component.

And so we have a multi-machine runtime where components running in their own thread can only coordinate through messages. As a result an ever-important consideration in designing internal (control) algorithms is something called “message crossing”. Two components may decide to initiate a protocol at the same time, hence send each other the exact same protocol-initiating message (e.g. we have components A and B, and a protocol that requires an initiator to send a Request message, and then wait for a response in terms of a Response message, then we may have A and B both sending each other Request at the same time).

Yet another result is that we decided to design the runtime without any globally unique component and/or port IDs. Certainly: on a single machine a component IDs allows one to retrieve a component’s memory. But when sending a message to a component living on another machine, it may well be that we’re sending it to a through a port that has the same port ID as ours, and targeting a component that has the same ID as ours.

Control Algorithms

We’ll now discuss several of the control algorithms. These control algorithms may be initiated by the scheduler threads when certain runtime signals are emitted by the interpreter. The control algorithms are brought to completion by sending messages. We’ll talk about these messages as if they’re sent from component to another component (this is for the sake of clarity: in reality they’re sent by one scheduler thread to the memory location reserved for the target component’s inbox). Because messages may be relayed one or more times before they arrive at the intended receiver (we’ll introduce this concept soon), most messages include their intended target port in some kind of message header. This is true for all data messages, and most control messages. Only when a component is certain about the identity of the receiving component can it send messages without a target port in a header.

Changing Port Peers due to Component Creation

Components, when they’re not in sync mode, may decide to create new components. Ports may be used as the arguments to these newly created components. The rule we place on such a kind of port transfer is that the component that is creating the new component fully relinquishes ownership of the transferred port, and after the new component is created, the new component owns that port. As an annotated example:

comp creator(in<u32> one_port) {
   channel another_port -> and_a_final_one;
   sync {
      auto value = get(one_port); // legal, just receiving an integer
      put(another_port, 1337); // legal, sending a value over an owned
   }
   // perform component creation
   new some_other_component(one_port, and_a_final_one); // transferring two ports
   
   sync get(one_port); // illegal! Port was transferred
   sync put(another_port, 1338); // still legal, we still own this port
   sync get(and_a_final_one); // also illegal, port was transferred.
}

We have several runtime properties to uphold when we’re transferring ports:

  • No globally unique port IDs, so the new component is allowed to choose new port IDs for the ports it is adopting ownership of.
  • The peers of the transferred ports may be unaware that a new component is created. In fact those peers may have already transferred messages to the instantiating component! As a design decision (the one that we find makes sense) any incoming, but unread, messages for a port are transferred along to the new component.
  • Similarly to the above: a peer of a transferred port needs to be aware at some point that its peer port has changed ownership.
  • Together with the requirement that peers need to be aware of the transferred ports, we also need to maintain ordering in the sent messages that intend to arrive at that transferred port at some point in time.

Here we see the asynchronous nature of the runtime rear its head. Because the transferring of ports does not just happen to the receiving end of a port (in which case we transfer already received messages, hence messages only arrive at their correct destination eventually). It may happen to the transmitting end of a port as well. What this means for the receiver is that it is never sure which component is its peer until it has recevied a data message that is annotated with the origin of the message. At that moment in time the peer of the port is known, but only until the end of the synchronous round. Because after the synchronous round it is perfectly possible for the port to be passed around again.

For all of the requirements above, the internal control algorithm to transfer a port to a new component is as following:

  1. The component that is creating the new component (we’ll call the creator the instantiator component, and the created one the new component) temporarily has access to the private memory of the new component. Reason being is that a component is always created on the same machine as the instantiator component. And so the first step it takes is to create new port IDs (that make sense for the newly created component, instead of for the instantiator component) and map the old port IDs to the new ones.
  2. The component transfers all of the metadata associated with the port, and transfers all of the messages that are targeted at those transferred ports to the new component.
  3. For each transferred port the instantiator sends a PortPeerChanged_Block control message to the peers. This message instructs the peer that the port should be temporarily blocked. Any component that tries to send a message through that port enters a blocked state that can only be lifted if the corresponding PortPeerChanged_Unblock control message is sent. At the same time the instantiator sets up a special bit of code that will relay all incoming messages from that peer to the new component. We’ve mentioned earlier that all messages will have a target port. So when messages arrive at the instantiator component that need to be relayed, the instantiator component will modify the target port to the new component’s chosen port ID.
  4. Once a peer has received a PortPeerChanged_Block, it will, as stated above, stop sending messages over that channel. Not only data messages, but control messages as well. This also means that if the other component cannot start transferring ports itself. In any case, it will respond with an Acknowledgement back to the instantiator component.
  5. The instantiator component waits until it has received an Ack for all of the PortPeerChanged_Block message it has sent. This is done such that we’re sure that we’ve received all of the messages that are actually intended for the new component (because while the new component is being created the peer may still be sending messages intended for the new component, but sent to the instantiator component). As a result, the new component will have all of the data messages in the inbox in the order in which they were sent, therefore maintaining the runtime property of message ordering.
  6. When all of the Acks are received, the instantiator component will remove the bit of code that relays all of the messages and will schedule the new component for execution. At this point the instantiator component will no longer access the private memory of the new component. Since the instantiator component is aware of the new component’s ID and the new port IDs for all of the transferred ports, it will send PortPeerChanged_Unblock messages to all of the peer components. This message will also contain the new component’s ID and its port ID.
  7. The peers, upon receiving the PortPeerChanged_Unblock message, will update the metadata of their ports such that they point to the new component’s ports. They will also unblock the port such that messages can be sent again.

With this control algorithm, all peers are now aware of the new port’s position. We’ve also maintained message ordering for the message sent to the new component. Although it was mentioned in point (4), we’ll mention it here to be extra clear: creating a new component will be blocked until all of the transferred ports are unblocked. If we don’t do this a data/control message may end up at the wrong component.

Likewise we see the asynchronous nature of ports: the peers are eventually consistent. This is why we stressed earlier that almost all messages have their targeted port in their message header. This is needed such that a component like the instantiator discussed above knows when to relay messages. In this process the relaying component will also update the target port ID in the header to the new port ID.

Shutting Down Components

A component will require a bit of memory to run. So when we’re done executing a component (either because it has crashes, or because its program has terminated) we would like to release this memory again. Earlier we mentioned that components send messages by accessing an inbox in the public memory region of a component. This memory will, ofcourse, be freed as well. So we need to make sure that when a component shuts down, all of its peers will somehow be notified that they can never send messages to that terminated component again.

In order to do so we have another control protocol. We’ll extend this protocol when we discuss encountering crashing components, but we’ll introduce a simpler variant here. The protocol is relatively simple. For each of the ports that are not yet closed and are owned by the terminating component we will:

  1. Make sure that the port is not blocked. If the port is blocked then the component blocks until the associated port is becomes unblocked. If the port is already closed then we do not execute the other steps in this control algorithm.
  2. The port will send a ClosePort message to the peer of the port that is closing. Note that this ClosePort message will have a target port. If it happens to be that the terminating component will receive a PortPeerChanged_Block message for that port in the near future, we’re certain that the ClosePort message will at least arrive at the correct peer (since the target port will be used to relay that message to the correct receiver).
  3. The peer of the port, upon receiving a ClosePort message, will mark the port as being closed in its metadata. From that point onwards, any attempt to put or get on that port will result in the peer component crashing. In response to the ClosePort message, the peer component will send an Ack. There is one exception, and that is when the peer component itself already initiated a ClosePort control algorithm for that port. In that case the incoming ClosePort message is treated like an Ack.
  4. The terminating component will wait until all of the Acks have arrived (or crossing ClosePort messages, as stated in point (3)). Once they do, they will instruct the runtime to remove the component from memory.

To reiterate: we have to be careful and annotate the ClosePort message with the target port. The terminating component will delay sending a ClosePort message if the port is blocked, but it may be that we have the ClosePort message crossing with a PortPeerChanged_Block message. Which implies that our ClosePort message will be relayed by the peer component.

Transferring Ports through Data Messages

The PDL code allows for ports to be transferred through ports. As a simple example, consider the following code:

struct Pair {
   in<bool> command,
   out<u32> response,
}

comp some_component(
   in<u32> to_transmit,
   out<in<u32>> tx_one_port,
   out<Pair> tx_two_ports
) {
   // Transmitting a port directly
   sync put(tx_one_port, to_transmit);
   
   // Transmitting multiple ports at a time using a data structure
   channel command_tx -> command_rx;
   channel response_tx -> response_rx;
   
   sync {
      let message = Pair{ 
         command: command_rx,
         response: response_tx,
      };
      put(tx_two_ports, message);
   }
   
   // Sending a command and receiving a response
   sync {
      put(command_tx, true);
      auto response = get(response_rx);
   }
}

To facilitate this, we’ll follow roughly the same procedure as when we’re transferring ports to a newly created component. But we have one complication: we do not have direct access to the private memory of the component we’re sending the ports to (we’ll call this component the “adopting component”, and the sending component the “relinquishing component”). And so we’ll have to follow a control protocol that is slightly different.

Note that it is perfectly okay to send closed ports. The adopting component will receive this component together with the information that the port is closed. In this way, if the adopting component attempts a put or get on that received component, it will crash.

We’ll enforce a second rule upon transferring ports. Namely that ports transferred in a synchronous round may not have been used in get or put operations. I’m certain that it is possible to come up with a set of rules that will make this possible. But the protocol for transferring components over channels is a lot easier if we disallow this. For this reason we’ll introduce a field in the metadata for each port that registers when the port was last used. If the relinquishing component attempts to transfer a port that has been used within the same sync round, then it will crash.

Like before we want to ensure that all messages intended for the transferred port arrive in the correct order at the adopting component.

And so the control protocol for transmitting ports proceeds as following:

  1. The relinquishing component will first make sure that none of the ports are blocked. If the ports are blocked then it will sleep until the ports become unblocked. As stated above the relinquishing component will also make sure that the ports were not previously used within the synchronous round.
  2. The relinquishing component will send PortPeerChanged_Block message to all of the peers of the ports that will be transferred. However, in this case it will not relay any messages to the new component, they will still pile up in the relinquishing component’s inbox.
  3. The peers, upon receiving the PortPeerChanged_Block message, will proceed as they would in the case where ports were transferred to a new component: they’ll block the port and send an Ack.
  4. The relinquishing component will wait until all of the expected Ack message are received. Once they are received the component will wait until the port the message will travel through becomes unblocked (that is: the port that is used to transfer the ports to the adopting component).
  5. The relinquishing component will send the data message containing the transferred ports to the adopting component. It will annotate this message with a list containing (tranferred port ID, peer component ID, peer port ID) triples. Note that since those peer ports are blocked, they will not be transferred in the meantime. This is essential for the next step.
  6. The adopting component will receive the annotated data message containing the transferred ports. For each transferred port it will decide upon a new port ID.
  7. The adopting component will, for each adopted port, send out a PortPeerChanged_Unblock message to the blocked peer ports. This message will be annotated with the (adopting component ID, new port ID) pairs. Such that the peers all know where the peers can be found.

Dealing with Crashing Components

The cases in which peers crash in response

A component may at any point during its execution be triggered to crash. This may be because of something simple like an out-of-bounds array access. But as described above using closed ports may lead to such an event as well. In such a case we not only need to go through the ClosePort control protocol, to make sure that we can remove the crashing component’s memory from the runtime, but we’ll also have to make sure that all of the peers are aware that their peer has crashed. Here we’ll make a design decision: if a peer component crashes during a synchronous round and there were interactions with that component, then that interacting component should crash as well. The exact reasons will be introduced later, but it comes down to the fact that we need to do something about the fact that the synchronous round will never be able to complete.

We’ll talk ourselves through the case of a component crashing before coming up with the control algorithm to deal with components crashing.

We’ll first consider that a component may crash inside or outside of a synchronous block. From the point of view of the peer component, we’ll have four cases to consider:

  1. The peer component is not in a synchronous block.
  2. The crashing component died before the peer component entered the synchronous block.
  3. The crashing component died during the same synchronous block as the peer component.
  4. The crashing component died after reaching consensus on the synchronous block that the peer component is currently still in.

Before discussing these cases, it is important to remember that the entire runtime has components running in their own thread of execution. We may have that the crashing component is unaware of its peers (due to the fact that peer ports might change ownership at any point in time). We’ll discuss the consensus algorithm in more detail later within the documentation. For now it is important to note that the components will discover the synchronous region they are part of while the PDL code is executing. So if a component crashes within a synchronous region before the end of the sync block is reached, it may be possible that it will not discover the full synchronous region it would be part of.

Because the crashing component is potentially unaware of the component IDs it will end up notifying that it has failed, we can not design the crash-handling algorithm in such a way such that the crashing component notifies the peers of when they have to crash. We’ll do the opposite: the crashing component simply crashes and somehow attempts to notify the peers. Those peers themselves decide whether they have to crash in response to such a notification.

For this reason, it does not make a lot of sense to deal with component failure through the consensus algorithm. Dealing with the failure through the consensus algorithm only makes sense if we can find the synchronous region that we would have discovered if we were able to fully execute the sync block of each participating component. As explained above: we can’t, and so we’ll opt to deal with failure on a peer-by-peer basis.

We’ll go back to the four cases we’ve discusses above. We’ll change our point of view: we’re now considering a component (the “handling component”) that has to deal with the failure of a peer (the “crashing component”). We’ll introduce a small part of our solution a-priori: like a component shutting down, a failing component will simply end its life by broadcasting ClosePort message over all of its owned ports that are not closed (and, like the other control algorithms. the failing component will wait for the port that is shutting down to become unblocked before it will send the ClosePort message).

In the first case, we’re dealing with a failing component while the handling component is not in a synchronous block. This means that if there was a previous synchronous block, that it has succeeded. We might still have data messages in our inbox that were sent by the failing component. But in this case it is rather easy to deal with this: we mark the ports as closed, and if we end up using them in the next synchronous block, then we will crash ourselves.

In the second case we have that the peer component died before we ourselves have entered the synchronous block. This case is somewhat equivalent to the case we described above. The crashing component cannot have sent the handling component any messages. So we mark the port as closed, potentially failing in the future if they end up being used. However, the handling component itself might’ve performed put operations already. So now that the handling component receives a ClosePort message, it realizes that those earlier put operations can never be acknowledged. For this reason a component stores when it last used a port in the metadata associated with a port. When, in this second case, a ClosePort message comes in while the port has been used already, the handling component should crash as well.

Next up is the third case, where both the crashing component and the handling component were both in the same synchronous round. Like before we mark the port as closed and future use will cause a crash. Like the second case, if the handling component has already used a port (which in this case may also be having received a message from the crashing component), then it should crash as well.

The fourth case is where the failing component crashes after the handling component finished its sync round. This is an edge cases dealing with the following situation: both the handling as the crashing component have submitted their local solution to the consensus algorithm (assumed to be running somewhere in a thread of execution different from the two components). The crashing component receives a global solution, finishes the sync round, and then crashes, therefore sending the ClosePort message to the handling component. The handling component, due to the asynchronous nature of the runtime, receives the ClosePort message before the global solution has a chance to reach the handling component. In this case, however, the handling component should be able to finish the synchronous round, and it shouldn’t crash.

Distinguishing the crashing cases

So far we’ve pretended like we could already determine the relation between the crashing component’s synchronous round and the handling component’s synchronous round. But in order to do this we need to add a bit of extra information to the ClosePort message.

The simplest case is to determine if the two components are both in the same synchronous round (case three, as described above). The crashing component annotates the ClosePort message with whether it was in a synchronous round or not. Then if both components are in a synchronous round (as checking by the handling component), and the about-to-be-closed port at the handling component was used in that round, or will be used in that round, then the handling component should crash.

Equally simple: the handling component can figure out itself if it is in a synchronous round (case one, as described above). If not: then the port is marked closed and future use causes crashes.

The last two cases require a bit more work: how do we distinguish the edge case where the handling component’s round will complete in the future, from the case where it should crash. To distinguish the edge case we need the handling component to know if the last interaction the crashing component handled was the one in the handling component’s current synchronous round.

For this reason we keep track of the synchronous round number. That is to say: there is a counter that increments each time a synchronous round completes for a component. We have a field in the metadata for a port that registers this round number. If a component performs a put operation, then it stores its own round number in that port’s metadata, and sends this round number along with the message. If a component performs a get operation, then it stores the received round number in the port’s metadata.

When a component closes a port, it will also send along the last registered round number in the ClosePort message. If the handling component receives a ClosePort message, and the last registered round number in the port’s metadata matches the round number in the ClosePort message, and the crashing component was not in a synchronous round, then the crashing component crashed after the handling component’s sync round. Hence: the handling component can complete its sync round.

To conclude: if we receive a ClosePort message, then we always mark the port as closed. If the handling and the crashing component were in a synchronous round, and the closed port was used in that synchronous round, then the handling component crashes as well. If the handling component is in a synchronous round but the crashing component is not in a synchronous round, the port of the handling component is used in the synchronous round and the port’s last registered round number does not match the round number in the ClosePort message, then the handling component crashes as well.

Sync Algorithm

A description of the synchronous algorithm is present in different documents. We will mention here that central to the consensus algorithm is that two components agree on the interactions that took place over a specific channel. In order for this to happen we’ll send along a lot of metadata when trying to reach consensus, but here we’re just concerned with attempting to match up the two ends of a channel.

A port is identified by a (component ID, port ID) pair, and channel is a pair of those identifying pairs. So to match up the two ends of a channel we would have to find a consistent pair of ports that agree on who their peers are. However, we’re dealing with the problem of eventual consistency: putting ports never know who their peer is, because the sent message might be relayed. However, getting ports will know who their peer is for the duration of a single synchronous round once they’ve received a single message.

This is the trick we will apply in the consensus algorithm. If a channel did not see any messages passing through it, then the components that own those ports will not have to reach consensus because they will not be part of the same synchronous region. However if a message did go through the channel then the components join the same synchronous region, and they’ll have to form some sort of consensus on what interaction took place on that channel.

And so the putting component will only submit its own (component ID, port ID, metadata_for_sync_round) triplet. The getting port will submit information containing (self component ID, self port ID, peer component ID, peer port ID, metadata_for_sync_round). The consensus algorithm can now figure out which two ports belong to the same channel.

Component Nomenclature

Earlier versions of the Reowolf runtime featured the distinction between primitive and composite components. This was put into the language from a design perspective. Primitive components could do nitty-gritty protocol execution: perform put/get operations, and entering into sync blocks. Conversely, composite components were tasked with setting up a network of interconnected components: creating channels and handing off the appropriate ports to the instantiated components.

Once the runtime was capable of sending ports over channels, it became apparent that this distinction no longer made sense. Because if only primitive components can send/receive ports, and cannot create new components, then the programmer is limited to using those received ports directly in the primitive’s code! And so the split between primitive and composite components was removed: only the concept of a “component” is left.

]]>
Reowolf 2.0: Release Notes https://reowolf.net/reowolf-2-0-release-notes/ Fri, 20 May 2022 19:55:58 +0000 https://reowolf.net/?p=9155 We are happy to release version 2 of the Reowolf project. This version introduces many new features: a select statement, run-time error handling, dynamic port hand-off, native TCP components and detailed project documentation. This post summarizes the most important features, and further lays out the vision we have for the future of Reowolf. This release is sponsored by the Next Generation Internet fund.

This release can be found on our Gitlab repository page. Gitlab includes an issue tracker that is open for users to submit bug reports and feature requests. The release tag is v2.0.1. The software is licensed under the MIT license.

The following aspects of the Protocol Description Language (PDL) and its supporting run-time and compiler have been improved, and in the sections below we demonstrate their functionality by small examples:

  1. Select statements
  2. Run-time error handling
  3. Transmitting ports through ports (dynamic port hand-off)
  4. Native components
  5. Project documentation

Furthermore, this release has fixed a number of bugs that were present in previous releases. The final section shows the vision for the future of Reowolf.

Select statements

We have reworked the component synchronization mechanism, and the underlying consensus algorithm supporting components.

Imagine we instantiate a data_producer a number of times (say a b c), and link them up with a data_receiver. The data receiver takes a datum from one of the producers, one by one.

In the old synchronization mechanism, all data producers had to indicate they were ready to synchronize, even when only one producer actually gives data for the receiver to process. So the following example causes the inadvertent synchronization of all participating components, which causes all producing components to wait on each other:

comp data_producer(out<u64> tx, u64 min_val, u64 max_val) {
    while (true) {
        sync {
            auto value = lots_of_work(min_val, max_val);
            put(tx, value);
        }
    }
}

comp data_receiver_v1(in<u64> rx_a, in<u64> rx_b, in<u64> rx_c, u32 num_rounds) {
    u32 counter = 0;
    auto rxs = { rx_a, rx_b, rx_c };
    while (counter < num_rounds) {
        auto num_peers = length(rxs);
        auto peer_index = 0;
        while (peer_index < num_peers) {
            sync {
                auto result = get(rxs[peer_index]);
                peer_index += 1;
            }
        }
        counter += 1;
    }
}

The reason was that a synchronous interaction checked all ports for a valid interaction. So for the round robin receiver we have that it communicates with one peer per round, but it still requires the other peers to agree that they didn’t send anything at all! Note that this already implies that all running components need to synchronize. We could fix this by writing:

comp data_receiver_v2(in<u64> rx_a, in<u64> rx_b, in<u64> rx_c, u32 num_rounds) {
    u32 counter = 0;
    auto rxs = { rx_a, rx_b, rx_c };
    while (counter < num_rounds) {
        auto num_peers = length(rxs);
        auto peer_index = 0;
        sync {
            while (peer_index < num_peers) {
                auto result = get(rxs[peer_index]);
                peer_index += 1;
            }
        }
        counter += 1;
    }
}

But this is not the intended behavior. We want the producer components to be able to run independently of one another. This requires a change in the semantics of the language! We no longer have that each peer is automatically dragged into the synchronous round. Instead, after the first message of the peer is received through a get call, will we merge each other’s synchronous rounds.

With such a change to the runtime, we now have that the first version (written above) produces the intended behavior: the consumer accepts one value and synchronizes with its sender. Then it goes to the next round and synchronizes with the next sender.

But what we would really like to do is to synchronize with any of the peers that happens to have its work ready for consumption. And so the select statement is introduced into the language. This statement can be used to describe a set of possible behaviors we could execute. Each behavior will have an associated set of ports. When those associated set of ports have a message ready to be read, then the corresponding behavior will execute. So to complete the example above, we have:

comp data_receiver_v3(in<u64> rx_a, in<u64> rx_b, in<u64> rx_c, u32 num_rounds) {
    u32 counter = 0;
    auto rxs = { rx_a, rx_b, rx_c };

    u32 received_from_a = 0;
    u32 received_from_b_or_c = 0;
    u32 received_from_a_or_c = 0;
    u64 sum_received_from_c = 0;

    while (counter < num_rounds*3) {
        sync {
            select {
                auto value = get(rx_a) -> {
                    received_from_a += 1;
                    received_from_a_or_c += 1;
                }
                auto value = get(rx_b) -> {
                    received_from_b_or_c += 1;
                }
                auto value = get(rx_c) -> {
                    received_from_a_or_c += 1;
                    received_from_b_or_c += 1;
                    sum_received_from_c += value;
                }
            }
        }
        counter += 1;
    }
}

Run-time error handling

We have an initial implementation for error handling and reporting. Roughly speaking: if a component has failed then it cannot complete any current or future synchronous rounds anymore. Hence, apart from some edge cases, any (attempted) received message by a peer should cause a failure at that peer as well. We may have a look at the various places where a component can crash, and how its neighboring peer handles receiving messages: sometimes the crash of the first component propagates, and sometimes it is blocked.

enum ErrorLocation {
    BeforeSync,
    DuringSyncBeforeFirstInteraction,
    DuringSyncBeforeSecondInteraction,
    DuringSyncAfterInteractions,
    AfterSync,
}

func crash() -> u8 {
    return {}[0]; // access index 0 of an empty array
}

comp sender_and_crasher(out<u32> value, ErrorLocation loc) {
    if (loc == ErrorLocation::BeforeSync) { crash(); }
    sync {
        if (loc == ErrorLocation::DuringSyncBeforeFirstInteraction) { crash(); }
        put(value, 0);
        if (loc == ErrorLocation::DuringSyncBeforeSecondInteraction) { crash(); }
        put(value, 1);
        if (loc == ErrorLocation::DuringSyncAfterInteractions) { crash(); }
    }
    if (loc == ErrorLocation::AfterSync) { crash(); }
}

comp receiver(in<u32> value) {
    sync {
        auto a = get(value);
        auto b = get(value);
    }
}

comp main() {
    channel tx -> rx;

    new sender_and_crasher(tx, ErrorLocation::AfterSync);
    new receiver(rx);
}

Note that when we run the example with the error location before sync, or during sync, that the receiver always crashes. However the location where it will crash is somewhat random! Due to the asynchronous nature of the runtime, a sender of messages will always just put the value onto the port and continue execution. So even though the sender component might already be done with its sync round, the receiver officially still has to receive its first message. In any case, a neat error message is displayed in the console (or in some other place where such diagnostics are reported).

Note that, especially, given the asynchronous nature of the runtime, the receiver should figure out when the peer component has crashed, but it can still finish the current synchronous round. This might happen if the peer component crashes just after the synchronous round. However, there may be a case where the peer receives the information that the peer crashed before it receives the information that the synchronous round has succeeded.

Transmitting ports through ports

Since this release transmitting ports is possible. This means that we can send ports through ports. In fact, we can send ports that may send ports that may send ports, etc. But don’t be fooled by the apparent complexity. The inner type T of a port like in<T> simply states that that is the message type. Should the type T contain one or more ports, then we kick off a bit of code that takes care of the transfer of the port. Should the port inside of T itself, after being received, send a port, then we simply kick off that same procedure again.

In the simplest case, we have someone transmitting the receiving end of a channel to another component, which then uses that receiving end to receive a value. The example below shows this:

comp port_sender(out<in<u32>> tx, in<u32> to_transmit) {
    sync put(tx, to_transmit);
}

comp port_receiver_and_value_getter(in<in<u32>> rx, u32 expected_value) {
    u32 got_value = 0;
    sync {
        auto port = get(rx);
        got_value = get(port);
    }
    if (expected_value == got_value) {
        print("got the expected value :)");
    } else {
        print("got a different value :(");
    }
}

comp value_sender(out<u32> tx, u32 to_send) {
    sync put(tx, to_send);
}

comp main() {
    u32 value = 1337_2392;

    channel port_tx -> port_rx;
    channel value_tx -> value_rx;
    new port_sender(port_tx, value_rx);
    new port_receiver_and_value_getter(port_rx, value);
    new value_sender(value_tx, value);
}

Of course we may do something a little more complicated than this. Suppose that we don’t just send one port, but send a series of ports. i.e. we use an Option union type, to turn an array of ports that we’re going to transmit into a series of messages containing ports, each sent to a specific component.

union Option<T> {
    Some(T),
    None,
}

comp port_sender(out<Option<in<u32>>>[] txs, in<u32>[] to_transmit) {
    auto num_peers = length(txs);
    auto num_ports = length(to_transmit);

    auto num_per_peer = num_ports / num_peers;
    auto num_remaining = num_ports - (num_per_peer * num_peers);

    auto peer_index = 0;
    auto port_index = 0;
    while (peer_index < num_peers) {
        auto peer_port = txs[peer_index];
        auto counter = 0;

        // Distribute part of the ports to one of the peers.
        sync {
            // Sending the main batch of ports for the peer
            while (counter < num_per_peer) {
                put(peer_port, Option::Some(to_transmit[port_index]));
                port_index += 1;
                counter += 1;
            }

            // Sending the remainder of ports, one per peer until they're gone
            if (num_remaining > 0) {
                put(peer_port, Option::Some(to_transmit[port_index]));
                port_index += 1;
                num_remaining -= 1;
            }

            // Finish the custom protocol by sending nothing, which indicates to
            // the peer that it has received all the ports we have to hand out.
            put(peer_port, Option::None);
        }

        peer_index += 1;
    }
}

And here we have the component which will receive on that port. We can design the synchronous regions any we want. In this case when we receive ports we just synchronize port_sender, but the moment we receive messages we synchronize with everyone.

comp port_receiver(in<Option<in<u32>>> port_rxs, out<u32> sum_tx) {
    // Receive all ports
    auto value_rxs = {};

    sync {
        while (true) {
            auto maybe_port = get(port_rxs);
            if (let Option::Some(certainly_a_port) = maybe_port) {
                value_rxs @= { certainly_a_port };
            } else {
                break;
            }
        }
    }

    // Receive all values
    auto received_sum = 0;

    sync {
        auto port_index = 0;
        auto num_ports = length(value_rxs);
        while (port_index < num_ports) {
            auto value = get(value_rxs[port_index]);
            received_sum += value;
            port_index += 1;
        }
    }

    // And send the sum
    sync put(sum_tx, received_sum);
}

Now we need something to send the values, we’ll make something incredibly simple. Namely:

comp value_sender(out<u32> tx, u32 value_to_send) {
    sync put(tx, value_to_send);
}

comp sum_collector(in<u32>[] partial_sum_rx, out<u32> total_sum_tx) {
    auto sum = 0;
    auto index = 0;
    while (index < length(partial_sum_rx)) {
        sync sum += get(partial_sum_rx[index]);
        index += 1;
    }
    sync put(total_sum_tx, sum);
}

And we need the component to set this entire system of components up. So we write the following entry point.

comp main() {
    auto num_value_ports = 32;
    auto num_receivers = 3;

    // Construct the senders of values
    auto value_port_index = 1;
    auto value_rx_ports = {};
    while (value_port_index <= num_value_ports) {
        channel value_tx -> value_rx;
        new value_sender(value_tx, value_port_index);
        value_rx_ports @= { value_rx };
        value_port_index += 1;
    }

    // Construct the components that will receive groups of value-receiving
    // ports
    auto receiver_index = 0;
    auto sum_combine_rx_ports = {};
    auto port_tx_ports = {};

    while (receiver_index < num_receivers) {
        channel sum_tx -> sum_rx;
        channel port_tx -> port_rx;
        new port_receiver(port_rx, sum_tx);

        sum_combine_rx_ports @= { sum_rx };
        port_tx_ports @= { port_tx };
        receiver_index += 1;
    }

    // Construct the component that redistributes the total number of input
    // ports.
    new port_sender(port_tx_ports, value_rx_ports);

    // Construct the component that computes the sum of all sent values
    channel total_value_tx -> total_value_rx;
    new sum_collector(sum_combine_rx_ports, total_value_tx);

    auto expected = num_value_ports * (num_value_ports + 1) / 2;
    auto received = 0;

    sync received = get(total_value_rx);

    if (expected == received) {
        print("got the expected value!");
    } else {
        print("got something entirely different");
    }
}

Native TCP components

Also new in this release are native components. Native components are provided by the underlying implementation of Reowolf and allow protocols to be built on top of other protocols. We demonstrate this by introducing native components for the Transmission Control Protocol (TCP). Hence, Reowolf can now be used to express protocols that assume an underlying implementation of TCP.

We’ll start by important the standard library that defines the builtin components that support a TCP listener and a TCP client. We’ll define a little utility function (listen_port) that is used through this example that is called to retrieve the port we’re going to listen on.

import std.internet::*;

func listen_port() -> u16 {
    return 2392;
}

Next we define our server. The server accepts (for the case of this example) a number of connections until it will stop listening. At that point it will wait until it receives a signal that allows it to shut down.

comp server(u32 num_connections, in<()> shutdown) {
    // Here we set up the channels for commands, going to the listener
    // component, and the channel that sends new connections back to us.
    channel listen_cmd_tx -> listen_cmd_rx;
    channel listen_conn_tx -> listen_conn_rx;

    // And we create the tcp_listener, imported from the standard library, here.
    new tcp_listener({}, listen_port(), listen_cmd_rx, listen_conn_tx);

    // Here we set up a variable that will hold our received connections
    channel client_cmd_tx -> unused_client_cmd_rx;
    channel unused_client_data_tx -> client_data_rx;
    auto new_connection = TcpConnection{
        tx: client_cmd_tx,
        rx: client_data_rx,
    };

    auto connection_counter = 0;
    while (connection_counter < num_connections) {
        // We wait until we receive a new connection
        sync {
            // The way the standard library is currently written, we need to
            // send the `tcp_listener` component the command that it should
            // listen to for the next connection. This is only one way in which
            // the standard library could be written. We could also write it
            // such a way such that a separate component buffers new incoming
            // connections, such that we only have to `get` from that separate
            // component.
            //
            // Note that when we get such a new connection, (see the
            // TcpConnection struct in the standard library), the peers of the
            // two ports are already hooked up to a `tcp_client` component, also
            // defined in the standard library.
            put(listen_cmd_tx, ListenerCmd::Accept);
            new_connection = get(listen_conn_rx);
        }

        // In any case, now that the code is here, the synchronous round that
        // governed receiving the new connection has completed. And so we send
        // that connection off to a handler component. In this case we have the
        // `echo_machine` component, defined in this file as well.
        new echo_machine(new_connection);
        connection_counter += 1;
    }

    // When all of the desired connections have been handled, we first await a
    // shutdown signal from another component.
    sync auto v = get(shutdown);

    // And once we have received that signal, we'll instruct the listener
    // component to shut down.
    sync put(listen_cmd_tx, ListenerCmd::Shutdown);
}

The following piece of code represents the component that is spawned by the server component to handle new connections. All it does is wait for a single incoming TCP packet, where it expects a single byte of data, and then echo that back to the peer.

comp echo_machine(TcpConnection conn) {
    auto data_to_echo = {};

    // Here is where we receive a message from a peer ...
    sync {
        put(conn.tx, ClientCmd::Receive);
        data_to_echo = get(conn.rx);
        put(conn.tx, ClientCmd::Finish);
    }

    // ... and send it right back to our peer.
    sync put(conn.tx, ClientCmd::Send(data_to_echo));

    // And we ask the `tcp_client` to shut down neatly.
    sync put(conn.tx, ClientCmd::Shutdown);
}

// Here is the component that we will instantiate to connect to the `server`
// component above (more specifically, to the `tcp_listener` component
// instantiated by the `server`). This is the component that will ask the
// `echo_machine` component to echo a byte of data.

comp echo_requester(u8 byte_to_send, out<()> done) {
    // We instantiate the `tcp_client` from the standard library. This will
    // perform the "connect" call to the `tcp_listener`.
    channel cmd_tx -> cmd_rx;
    channel data_tx -> data_rx;
    new tcp_client({127, 0, 0, 1}, listen_port(), cmd_rx, data_tx);

    // And once we are connected, we send the single byte to the other side.
    sync put(cmd_tx, ClientCmd::Send({ byte_to_send }));

    // This sent byte will arrive at the `echo_machine`, which will send it
    // right back to us. So here is where we wait for that byte to arrive.
    auto received_byte = byte_to_send + 1;
    sync {
        put(cmd_tx, ClientCmd::Receive);
        received_byte = get(data_rx)[0];
        put(cmd_tx, ClientCmd::Finish);
    }

    // We make sure that we got back what we sent
    if (byte_to_send != received_byte) {
        crash();
    }

    // And we shut down the TCP connection
    sync put(cmd_tx, ClientCmd::Shutdown);

    // And finally we send a signal to another component (the `main` component)
    // to let it know we have finished our little protocol.
    sync put(done, ());
}

And here the entry point for our program:

comp main() {
    // Some settings for the example
    auto num_connections = 12;

    // We create a new channel that allows us to shut down our server component.
    // That channel being created, we can instantiate the server component.
    channel shutdown_listener_tx -> shutdown_listener_rx;
    new server(num_connections, shutdown_listener_rx);

    // Here we create all the requesters that will ask their peer to echo back
    // a particular byte.
    auto connection_index = 0;
    auto all_done = {};
    while (connection_index < num_connections) {
        channel done_tx -> done_rx;
        new echo_requester(cast(connection_index), done_tx);
        connection_index += 1;
        all_done @= {done_rx};
    }

    // Here our program starts to shut down. First we'll wait until all of our
    // requesting components have gotten back the byte they're expecting.
    auto counter = 0;
    while (counter < length(all_done)) {
        sync auto v = get(all_done[counter]);
        counter += 1;
    }

    // And we shut down our server.
    sync put(shutdown_listener_tx, ());
}

Project documentation

Detailed documentation has been provided, providing users and developers background information about the current implementation of Reowolf 2.

  • The Runtime Design documents the general architecture, the Reowolf run-time, design decisions, control algorithms, dealing with crashing components, and synchronization.
  • A new version of the consensus algorithm. Also the previous implementation of the consensus algorithm (which is replaced to implement the select statement) is documented.

Developers interested in contributing to the Reowolf project are invited to read the Known Issues document. These offer various known limitations in the implementation, ranging from small issues to large issues. Pull requests are welcome!

What is ahead?

After this release we continue our work in the following directions:

  • Further ahead is improvements to the native component libraries, and starting to model existing Internet protocols such as BGP that builds on top of the TCP protocol.
  • We are interested in the SCION internet architecture, and are investigating whether the Reowolf connector API can be used for programming internet applications that run on top of SCION, and whether we can specify components in PDL that allow applications to make use of all capabilities SCION networks offer. Towards this, we are setting up a subnet that will be connected to the SCIONlab network. Our experiences will be published in a series of blog posts.
  • We decided to wait a bit longer before we start work on integrating Reowolf into the operating system kernel. Although we are exploring which operating system is most suitable for integration, we have not yet reached a stable protocol description language.
  • Work on the specification of the Protocol Description Language (PDL), leading to a standardization track. Part of this specification work is the need to formalize, in an unambiguous manner, the semantics of protocols specified in PDL.
  • Work on formal verification of protocol specified in PDL. We have received funding from NLnet / NGI ASSURE for continuing the project in this direction. More about this topic soon!

We will keep you updated!

The Reowolf Team
– May 20, 2022

]]>
Reowolf: Executable, Compositional, Synchronous Protocol Specifications https://reowolf.net/reowolf-executable-compositional-synchronous-protocol-specifications/ Tue, 08 Feb 2022 10:00:00 +0000 https://reowolf.net/?p=9165 We submitted an article to the PLDI 2022 conference, but unfortunately the article was rejected. The version we submitted can be accessed in full below.

We found the reviews very helpful for improving our article (and project!) over time. For full disclosure, here are a couple of the reviews we received.

Review 1

Overall merit: Reject

Reviewer expertise: Some familiarity

Paper summary: The paper presents a formal protocol specification language in the spirit of synchronous languages like Esterel. The protocol designer describes sets of possible inputs and outputs produced by each component at every logical clock cycle. The behavior of the system is determined by the intersection of traces of system components. The main advantage of such formalisms is compositional semantics, which comes at a cost: similar to other synchronous languages, one can express behaviors in Reowolf that cannot be easily implemented, e.g., a pair of components whose output in each cycle depends on the other component’s output; hence the Reowolf runtime must perform a form of constraint solving to determine the set of inputs and outputs at each step.

Comments for authors:

This looks like an interesting formalism that may in the future have practical applications in networking and distributed systems. However, more work is needed to realize this potential, most importantly: (1) support for real distributed systems with complex failure modes, (2) evaluation using a range of real applications, (3) presenting the language and the system in a way that would make its benefits accessible to networking researchers and practitioners.

As it stands, it is not clear that Reowolf solves real problems in networking, or at least the paper does not make it clear that it does. In particular, it does not explain what types of distributed applications can be implemented using Reowolf. What does the API look like? How does one define a protocol and interact with it in practice? Moreover, Reowolf’s implementation is on top of a consensus protocol with a centralized leader. It does not seem to handle reconfigurations, member failures (transient or permanent), partitioning, i.e., all the things that make networks and distributed systems difficult to program. Overall I have the impression that this is a potentially useful language for programming local component-based systems or possibly even electronic circuits, but not necessarily asynchronous networks, at least not as a general replacement for socket-based programming.

The paper does not present any real applications or performance results, which makes it even harder to evaluate the practicality of the proposed solution.

A comment on related work: the oracle problem in Reowolf reminds me of the constructive causality semantics in Esterel. Can you explain how your approach compares to Esterel and other synchronous languages? As far as I understand their solution is to restrict the set of programs to ones that can be efficiently implemented in a sequential language (without constraint solving or look-ahead).

Review 2

Overall merit: weak reject

Reviewer expertise: Some familiarity

Paper summary: This paper proposes Protocol Description Language (PDL), a language to describe sessions between different parties in a network. The design goal of PDL as stated is to provide a high-level and compositional alternative to BSD sockets. The language is given a pair of idealistic semantics that relies on oracles, and a realistic/implementable semantics that operationally constructs these oracles for a subset of PDL programs. The final contribution of the paper is to implement the realistic semantics on top of a distributed runtime instead of a shared-memory one.

Comments for authors:

While I like the idea of a high-level language to describe protocols, my main concerns with this paper is in terms of presentation. I had a hard time following the motivation and understanding what the contributions of the paper are.

  • First, in the introduction Reowolf is presented as an alternative to BSD sockets. Reading that I expected the rest of the paper to present some sort of higher-level API to sockets, or a way to translate high-level Reowolf protocols to low-level socket implementations. As far as I can tell, the paper really presents a formalization and its more of a vision on how networked protocols should be programmed rather than an actual alternative. Maybe I misunderstood, but it is important to clarify whether someone can use Reowolf as an alternative to sockets in their real-world application.
  • The paper mentions that Reowolf is heavily inspired from Reo, but there is very little context about Reo in the paper, other than a small paragraph that mentions that Reowolf is sequential as the only difference. I think it would help the readers if the delta with Reo is more thoroughly addressed. Do the two languages share the same syntax? Are there programs that you can write in one but not the other? Is Reowolf somehow enabling the implementation of the realistic semantics/distributed semantics?
  • For language design papers, I really appreciate seeing some examples that demonstrate 1. why this language is better than existing solutions if any, 2. how that language simplifies (some aspects of) the problem, e.g., is it much easier to express the same protocol in this language rather than with sockets? Is it easier to verify programs? Or maybe, is compilation to efficient implementations simplified? The paper mentions that protocols written in PDL are more suitable for verification and proofs of security properties are tractable, yet no proof technique or an example proof is shown as evidence to this claim.
]]>
Reowolf 1.2: Release Notes https://reowolf.net/reowolf-1-2-release-notes/ Fri, 17 Dec 2021 13:36:44 +0000 https://reowolf.net/?p=9140 We are happy to release this milestone of the Reowolf project: Reowolf version 1.2. This is an alpha release. The milestone improves the concurrency of the Protocol Description Language (PDL) run-time interpreter. This post summarizes the improvements, and further lays out the milestones we will be working on next. This release is sponsored by the Next Generation Internet fund.

For this release, we have migrated to Gitlab as our public facing repository. Gitlab includes an issue tracker that is open for alpha users to submit bug reports and feature requests. The release tag is v1.2.0. The software is licensed under the MIT license.

The following aspects of the language have been improved, and in the sections below we demonstrate their functionality by small examples:

  1. Decentralized synchronization implementation. In previous versions, the PDL run-time interpreter would communicate among all components to synchronize them, and in doing so required a central leader. We have improved this aspect of the implementation, and now dynamically discover the neighboring components with which synchronization takes place. Thus, no longer a single component is considered the central leader. This improvement allows for different components in the system to run at different speeds, and allows the composition of slow components (e.g. in embedded systems) with fast components (e.g. running on dedicated high-performance hardware).
  2. Multi-threaded run-time implementation. In previous versions, all components ran on a single processor. We have adapted the run-time interpreter of PDL so that it distributes the execution of components over multiple threads, and can influence the scheduling of components using information provided in the protocol.
  3. We have added the capability for components so that multiple communications can take place over a single port between synchronizations. In previous versions, either no or exactly one datum could be communicated over a port between synchronizations. This change makes it possible to let the implementations of data transmission and synchronization between components race.
  4. We changed the syntax of PDL to more directly express the branches in speculative executions. Since speculative executions in some cases leads to expensive but discarded computations, it seems sensible to make it explicit in a PDL specification where this happens.
  5. We have performed some initial experimentation with error handling in sync blocks. Components might encounter runtime errors, which might make them unavailable for future interactions using the ports they owned. The components might also be programmed in such a way that the synchronization algorithm cannot find a satisfactory global behaviour. In all of these cases we need to be able to tell the programmer what went wrong.

Furthermore, this release has fixed some minor bugs that were present in previous releases. The final section shows the roadmap ahead, explaining what milestones will be worked on next, and our plans for the future.

Decentralized synchronization

The PDL run-time interpeter of versions 1.0 and 1.1 made use of a centralized synchronization algorithm. In this release we have replaced that by a decentralized synchronization algorithm. This section gives more detail explaining what this means in the execution of protocol programs written in PDL.

The centralized synchronization algorithm assumed authority over all of the running components: all of the components were executed until reaching the point in their sync block where a decision had to be made about the speculative branch that could be committed to memory. This made it a lot simpler to provide an initial implementation, but has a downside in that unrelated components now have their execution speed linked to one another. The region over which synchronization had to be achieved (the so-called sync region) spanned the entire constellation of components.

The distributed synchronization algorithm instead performs the discovery of the synchronous regions based on the ports that are involved in a synchronous code block. So if two components communicate with one another synchronously, then they belong to a single synchronous region. Consequently, those components will have to wait on one another until consensus on their shared behavior has been achieved. Conversely, if those two components do not share any ports to communicate with, then they can run independently of one another.

Planned for the next release is a mechanism for more control on how synchronous regions are constructed. If we consider the following composite component, then we see that there are two workers which are performing some kind of asynchronous work. We may assume that each time they’ve finished some work, they will send the result over their out<u32> port.

composite network(out<u32> result) {
    channel output_a -> input_a;
    channel output_b -> input_b;
    new worker(output_a); // create a worker
    new worker(output_b); // create another worker
    new merger(input_a, input_b, result); // merge the streams produced by the workers
}
primitive merger(in<u32> a, in<u32> b, out<u32> c) {
    while (true) {
        sync {
            if (fires(a)) {
                auto m = get(a);
                put(c, m);
            } else if (fires(b)) {
                auto m = get(b);
                put(c, m);
}   }   }   }

It is perfectly reasonable to block the execution of the workers if there is nobody to receive their results on the other end of the merger. But if there is a component rapidly expecting results from the result port, then the two workers should be able to run independently to produce results on input_a and input_b as fast as possible. This is one of the goals of Reowolf 1.3.

Multi-threaded run-time

So far, the Reowolf runtime has been implemented to run on a single thread to simplify development. With this release we’ve moved to a multi-threaded runtime. In its essence its a green-thread scheduler. Working towards an implemention of the Reowolf runtime that operates within the context of the operating system kernel, the scheduler has been kept simple.

The scheduler makes sure that when components are solely waiting on input, then they will not be scheduled until they receive messages allowing them to continue executing. Likewise, when a component has finished its execution (cleanly, or due to an error) the peers are notified such that they are no longer able to send messages to components that cannot receive them.

As a simple example, if we wish to execute the following code:

primitive producer(out<u32> output) {
    sync put(output, 1);
    sync put(output, 2);
}

primitive consumer(in<u32> input) {
    sync {
        auto value = get(input);
        assert(value == 1);
    }

    // Exit without receiving the second value
}

Then we are greeted by the following error message:

ERROR: attempted to 'put' on port that is no longer owned
 +-  at 4:10
 | 
 |      sync put(output, 2);
 |           ~~~~~~~~~~~~~~

 +-  Stack trace:
 | component producer:4

Because we are still running in user-space, the scheduler is implemented such that it will exit if it is certain that all components have finished their work and components can no longer be created through the API.

Multiple port firings

The synchronization algorithm is tasked with finding an agreeable global behavior of all of the participating components in a sync region. Finding this solution is achieved by considering (among other things) the ports that each component has used in the interaction with its peers. In the previous implementation this algorithm imparted the requirement that ports could either fire, or not fire. As a result, component could only put on a particular port once per sync block.

The new decentralized algorithm seeks a solution in a different way: still transmitting data through ports, but now allowing a programmer to use a port multiple times. So in the following simple example we may safely expect the receiver to always execute its second behaviour.

primitive sender(out<u32> output) {
    sync {
        // sender will only allow exactly two messages to be sent
        put(output, 1);
        put(output, 2);
    }
}

primitive receiver(in<u32> input) {
    sync {
        auto num = 0;

        fork      num = 1; // first behaviour: receive once
        or fork   num = 2; // second behaviour: twice
        or        num = 3; // third time's a charm?

        // num is now 1, 2 or 3
        auto index = 0;
        while (index < num) {
            auto value = get(input);
            index += 1;
        }
    }
}

Explicit forking

As hinted at in the example above: firing a port multiple times no longer meshes well with the concept of the fires function. We have also come to realize that the fires predicate was redundant. As a programmer you first have to make sure that the assertion fires(port) == true holds, and then remember to actually use that port. Conversely, asserting that a port is silent must be followed by not using that port. Any other use of the port is a runtime error.

To better reflect the usage of ports, we have replaced the firing predicate with a more explicit fork statement. As an example, consider the following snippet that uses the fires method:

u32 hash = 0;
if (fires(request) && fires(response)) {
    hash = compute_hash(get(request));
    put(response, hash);
} else if (fires(response)) {
    hash = compute_hash(default_request);
    put(response, hash);
} else {
    assert(false); // do not allow any other combination
}

This would now be written as:

fork {
    hash = compute_hash(get(request));
    put(response, hash);
    // Used both 'request' and 'response'
} or {
    hash = compute_hash(default_request);
    put(response, hash);
    // Used only 'response'
} // No more behaviour specified

Roadmap

After this release we can continue our work in the following directions:

  • Allowing further control over the synchronous region in which the synchronization algorithm seeks consensus about the global behaviour of a set of composed components.
  • Modelling existing transport layer protocols, such as TCP and UDP, as Reowolf protocols. This allows us to convincingly demonstrate the expressiveness of the protocol description language, and to compare our implementation’s efficiency with existing networking stacks. These transport layer implementations would make use of native IP components. Further ahead, we can model existing Internet protocols such as ICMP, DNS, BGP.
  • We are interested in the SCION internet architecture, and are investigating whether the Reowolf connector API can be used for programming internet applications that run on top of SCION, and whether we can specify components in PDL that allow applications to make use of all capabilities SCION networks offer. Towards this, we are setting up a subnet that will be connected to the SCIONlab network. Our experiences will be published in a series of blog posts.
  • Make first approaches to integrating Reowolf into the operating system kernel. We are exploring which operating system is most suitable for integration, so that we can offer the Reowolf connector API to user-mode processes. Further, we are investigating the compilation of PDL component specifications into loadable kernel modules, thereby increasing the performance of applications that can instantiate pre-compiled components.
  • Work on the specification of the Protocol Description Language (PDL), leading to a standardization track. Part of this specification work is the need to formalize, in an unambiguous manner, the semantics of protocols specified in PDL. We have submitted an article that describes the formal semantics for centralized synchronization, but we still need to investigate how to adapt the semantics to deal with decentralized synchronization. Formalized semantics increases the future potential for formal verification of protocols, and allows us to define the correctness criteria of Reowolf implementations.

We will keep you updated!

The Reowolf Team
– December 17, 2021

]]>
Dealing with Recursive Types in a Value-Based Language https://reowolf.net/dealing-with-recursive-types-in-a-value-based-language/ Mon, 12 Jul 2021 16:00:00 +0000 https://reowolf.net/?p=9162 In this blog post, Max Henger describes one of the challenges encountered during the design of Reowolf’s Protocol Description Language (PDL). PDL is a value-based language, meaning that programmers are never exposed to implementation details such as pointers or references. The challenge is how to add recursive data types: types that in their type declaration refer to values of the type itself that is being declared.

The Problem

In many languages constructing structs with members that, when expanded, end up having a field that refers to the struct itself by value is illegal. This makes sense because, firstly, one could never actually construct such a struct, if one would want to construct a literal then one would be typing an infinite amount of time. Secondly, because even if the struct could be constructed, one would have to take up an infinite amount of memory. As a practical example: a binary tree can conceptually grow to an unbounded size, and such a tree cannot be constructed in a fixed finite amount of space.

And so, in low-level programming languages one would replace these fields with pointers (or something equivalent) in order to deal with both points. The first point is no longer true, because the pointer can be null (in the example: this would terminate a node in the tree, making it a leaf of the tree). And secondly the size is no longer infinite and is calculable because a pointer will always have a fixed size.

In Reowolf we have value-based semantics. To have reasonable cache coherency, quick struct/union member access, and prevent memory (de)allocation overhead, we want to put as much values onto a stack-like construct. For this reason we would like to precalculate the alignment and offset of each type and its members.

Point one above still stands: if one wishes to construct a type that never terminates, then the compiler should throw an error. By termination we mean that the type declaration eventually reaches a point where a value can be constructed that inhabits the type. An example of such a non-terminating type would be:

// Simple truly infinite type
struct Infinite { 
    Infinite member,
}

// Infinite type of cycle length 2
struct Left { Right other }
struct Right { Left other }

// Infinite type of cycle length 3
struct A { B b }
struct B { C c }
struct C { A a }

// Etcetera

// But we can also do this using unions
union A { B(B), C(C) }
union B { A(A), C(C) }
union C { A(A), B(B) }

If one wishes to express a type whose value causes it to terminate somewhere, the only option in this language is to use unions to indicate the optionality of a certain member. For example:

// One option, allows setting branches individually
union Option<T>{ Some(T), None }
struct U32TreeExample {
    u32 value,
    Option<U32TreeExample> left,
    Option<U32TreeExample> right,
}

// Another option, must be a well-formed binary tree
union U32TreeLink<N> {
    Leaf,
    Node(U32TreeNode, U32TreeNode)
}
struct U32TreeNode {
    u32 value,
    U32TreeLink<U32TreeNode> link,
}

// Another kind of thing
union List<T> { 
    End,
    Entry(T, List<T>)
}

These are all valid types, and we should be able to express them, but naively calculating their size causes one to run into issues.

Towards an Algorithm

So far one can see that structs and unions are the elements that might constitute an infinite type. All of the other types (user-defined enums and all of the builtins like u32, string, s64, etc.) do not have any members and have a fixed size. Whenever there is a union in the type, we might have a type that is not truly infinite, but may terminate somewhere.

Imagine a graph containing all user-defined struct and union types. The edges in this graph are directional, and indicate that one type is embedded in the other. It is important to construct this graph on a per-member basis. struct members may have one edge running towards it, coming from the member’s type. unions can have multiple edges running towards it, one for each embedded type in the union member. Note that types may refer to themselves.

Having constructed this graph, one can visualize the potentially infinite types by loops in the graph. A struct is potentially infinite if one of its members is part of a loop. A union is potentially infinite if all of its members are part of a loop. A potentially infinite type is a truly infinite type if the relevant loops all contain potentially infinite types. Or, conversely: a potentially infinite type is not actually infinite if the relevant loops contain a union that is not potentially infinite.

More practically: a potentially infinite struct (implying that at least one of its members is part of a type loop) is not infinite if each of its “looped members” is part of a loop which contains a non-infinite union. Likewise, a potentially infinite union (implying that all of its members are part of a loop) is not actually infinite if, for one of its members that contains a set of loops (because a union can embed multiple types per variant), each of those loops contain a non-infinite union.

And finally, in human terms: a type is not actually infinite if, given enough time, a programmer can express a literal of that type. So one may type (using the examples above):

auto thing = List::Entry(3, List::Entry(2, List::Entry(1, List::End)));
auto flong = U32TreeExample{
    value: 0,
    left: Option::Some(U32TreeExample{
        value: 1,
        left: Option::None
    },
    right: Option::None
}

But one simply cannot express:

auto cannot_construct = A::B(B::C(C::A(A::C(C::B( /* onwards, to infinity */ )))));

Now we arrive at several conditions to end up at a concrete algorithm. Firstly, we have that types whose literals are expressible should be valid in the language. Secondly, for the performance reasons mentioned above we would like to specify the memory layout for each type. So that includes the size and alignment of the type itself, and the offset (with implicit alignment and size) of each of its members. Thirdly, because we’re going to have to put pointers somewhere in the aforementioned type loops (necessarily, because we have variably sized values, we need to perform some kind of allocation somewhere, hence we need to have pointers somewhere), we need these pointers to be placed consistently throughout the code. This last point is a design decision: we could decide to have some types including pointers to members, and some types excluding pointers to members. This would then require special conversion functions implemented when two values of the same type, but with a different memory layout, are converted into one another. This is on top of the fact that each of the types containing a pointer will require a constructor and a destructor. Finally, albeit this point looks ahead a bit at the implementation, we desire that subsequent compilations of the same declarations, but potentially in a different order, result in the same layout of each of the types.

We may observe that only unions may break up potentially infinite types. So for each loop that contains a non-infinite union we need at least one union that is implemented using some kind of pointer. There are multiple ways we can fix the byte size of the union, thereby breaking the type loop:

  1. Make each instance of the union a pointer to the data in that union: both the tag and the potentially embedded values. This has the upside of being relatively simple to implement. Furthermore the union itself will just have a fixed size. The downside is that for a tag lookup we have to follow the pointer, generally causing cache misses in the case that we just want to check the tag.
  2. Keep the tag in a union value, but dynamically allocate all of the contents. This way the size of a union value becomes a tag and a pointer. On 32-bit systems this is most likely fine, on 64-bit systems (which is the norm, if we’re not talking about embedded computers) alignment will generally cause the union size to bloat to 16-bytes. This kind of implementation has two sub-cases to consider.
    • Allocate as much memory as the maximum union variant. This way changing the union variant will not force a reallocation. Instead we can use up the allocated memory.
    • Allocate as much memory as the variant requires. This way we will not waste any memory, at the cost of having to reallocate when the variant is changed.
  3. Keep the tag in the union value, with the same up- and downsides as the previous point. Then dynamically allocate the members of the variants (that is, if we have union Foo { Bar(CyclicType, u64, u64), Qux }, then we will just make CyclicType a pointer). The obvious downside is that changing variants might cause (de)allocations. And if we have multiple members that cause a loop then we need to allocate each of them. The upside is that when we’re only accessing the non-pointer members that they’re likely already in the cache.

There are many more tiny variants here that are not explicitly stated above. I’ll document my reasons for choosing the variant I think works best:

  • The whole idea of breaking the type cycles is not to allocate an infinite amount of memory, so always allocating the union is naturally out of the question. Furthermore type loops will generally only occur in particular data structures that are potentially infinite: trees, linked lists, hash maps, etc.
  • The contract when using a union is that the byte size of a union value is the size of the largest member plus the size of the tag (plus padding). So allocating the union just once, with enough size to fit all possible allocated variants seems like a good idea to me. I suspect that in most cases these unions will express something along the lines of: there is nothing, or there is something.

Finally, as a note: all of this type loop checking has to be performed per monomorphization. A polymorphic type specification itself does not need to be checked for type loops. Checking type loops is deferred until all type parameters are known.

Lastly, we have to consider the point of making sure that multiple compilations, where the AST is constructed in a different order, hence where the types are “discovered” in a different order, will result in the same types. As a rather simple solution, instead of just marking one union in the type loop as pointer-like, all of the unions in the type loops will be implemented using some kind of pointer scheme.

As an initial conclusion:

  • All non-infinite unions in type loops are implemented using dynamic allocation to break the type loops and to prevent potentially infinite types from being truly infinite.
  • The tag of the union will be a non-allocated value associated with the union. Considering that most of the time these kinds of unions will represent an option-like type (there is one more entry in the linked list, there is one more node in the tree, etc.) I think it is nice to not have cache misses when checking whether a value is present, or whether it is not present.
  • Considering that we cannot allocate all of the time (because then we will still end up with an infinite amount of allocated memory), but I do not currently want to implement very complicated logic in the compiler for handling memory allocations, I will initially implement the following scheme: all union variants that do not contain any types part of a type loop will be stored on the stack, contributing to the byte size of the union. All variants that do contain such types will be fully allocated on the heap in one allocation. The size of the allocated region will be the size of the largest variant that requires dynamic allocation.

Informal Specification of the Algorithm

I might be missing some stuff here, that stuff will probably pop up while I’m implementing this:

  • Perform the basic pre-compilation for symbol discovery and definition discovery. This constructs the AST such that we can explore the AST to determine what fields are present on each struct and what variants are used in each union.
  • For each type added to the type table (including monomorphs where we have all of the polymorphic variables fully specified), we will traverse all of the members and their types. For each type we will check if it was already checked not to be cyclic. If it was already completely checked, then all types up until the most recently checked type cannot be cyclic (another member might still make them cyclic). If the type was not completely checked, but previously encountered in our search for loops, then all of types from the first encounter of that type are considered part of a type loop.
  • Note that this algorithm is recursive in the sense that for structs we need to check each member for type loops. For unions we need to check each embedded type in each variant. So when we encounter a type loop for a specific set of members, we walk back and mark each of the unions along the way as requiring allocation. If there were no such unions then we immediately return an error. We keep all of the unions that require allocation around in a list.
  • While we’re doing the above, if we encounter a union who has a member that does not exist in a type loop then we mark it as non-infinite.
  • After all of the above is done, we need to check all of the unions we have encountered. Necessarily if we started with a particular type, and recursively checked all of the embedded types, then we will discover all type loops. If none of the unions is non-infinite, then we throw an error (which might be kind of hard, because what kind of error do we show in the case of the union A; union B; union C; example?). With the algorithm above, we now know that if there are type loops, that each of them contains at least one union. With the condition that there is at least one non-infinite union we now know that values of the type are expressible. The constructed list of unions only contains unions that were part of a type loop. With that knowledge, to guarantee consistent compilation, we will decide to make each union in each type loop, whether it is infinite or non-infinite, a pointer-like union.
  • A pointer-like union’s value will consist of a tag, plus reserved size (with the proper alignment) for the largest non-pointer-like union variant if it is larger than the size of a pointer on the particular machine. If it is smaller, than we just need a tag, and a pointer that is properly aligned. So the union value will have a fixed size that is not dependent on the types that were part of the type cycle. I’ll call the variants that were not part of type cycles “value variants”, and parts that were part of type cycles “pointer variants”.

Then, during runtime, as an initial implementation (we could be smarter if we know the old value and know the new value, but that would take a lot of static analysis, we’re not yet at that point in the compiler), we will check the tag value whether the old and new values are pointers. If the old and new variants are both pointer variants, or when they’re both value variants, then we do not (re)allocate. If we go from a value variant to a pointer variant then we allocate. If we go from a pointer variant to a value variant then we deallocate.

Some Examples of the Algorithm

I wrote these before I started writing all of the above, I’ll keep it here for some concrete examples of the algorithm.

Truly Infinite Example with Structs

// Most simple case
struct SelfReferential{ SelfReferential field }

// Slightly more complicated case
struct DualA { DualB b }
struct DualB { DualA a }

In some sense we would be able to compute the sizes of these data structures, we simply make their fields pointers. Let’s assume 64 bit systems for all of these examples. For just the first case we would then have that SelfReferential has a size of 8 bytes. However, we always need to allocate its member field. So this would always take up an infinite amount of memory. We disallow this case because we would never be able to construct a literal of these kinds of data type declarations.

Truly Infinite Example with Unions

struct S { UOne one, UTwo two }
union UOne { Struct(S), One(UOne), Two(UTwo) }
union UTwo { One(UOne), Two(UTwo) }

Again, here we have a case where we can compute the sizes. We make the unions have pointer-like contents. So the unions will be 16 bytes (8 bytes for the tag and padding for pointer alignment, then an 8 byte pointer). Hence the struct will be 32 bytes. But once more we can never construct a literal and instantiating an element would always take up an infinite amount of memory.

Slightly Ambiguous Example

With the following example we get one union which is non-infinite.

struct SEnd { u32 some_value }
struct SMiddle {
   SEnd this_ends,
   UOne union_one,
   UTwo union_two,
}
union UOne { One(UOne), Two(UTwo), Struct(SMiddle) }
union UTwo { One(UOne), Two(UTwo), End(u32) }

If we draw the type graph, we get something like:

+--------+        +---------+        +-------+        +-------+
| struct |        | struct  | <----- | union | <----- | union |
| SEnd   | -----> | SMiddle | -----> | UOne  | -----> | UTwo  |
+--------+        +---------+        +-------+        +-------+
                                      |     ^          |     ^
                                      |     |          |     |
                                      \-----/          \-----/

For this case we can actually always construct a literal of SMiddle. Viewing the literal as a tree (each node a type instance) then we can terminate the tree branches using UTwo::End. However, if we would just make UTwo pointer-like, we cannot compute the size of UOne (it contains a reference to itself, so would grow infinitely in size). This hints at that we should make UOne pointer-like as well.

Slightly Ambiguous Example, Version 2

Not really ambiguous, but just to show that not all unions which are part of a connected type graph need to be turned into pointer-like unions.

union Value {
    Unsigned(u64),
    Signed(s64),
    Undefined
}
struct Node {
    Value value,
    Children children
}
union Children {
    One(Node),
    Two(Node, Node),
    None
}

We get a cycle between Node and Children, so we would want to implement Children as being pointer-like. But we don’t have to make Value pointer-like. In terms of the described algorithm: Value is not part of a type loop, so is not considered a candidate for being pointer-like.

Slightly Ambiguous Example, Version 3

Contains part of the type graph that is not part of a type loop. So the associated union should not be pointer-like.

// Parts that are outside of the type loop
struct SOutside {
    UOutside next,
}
union UOutside {
    Next(SInside)
}

// Parts that are inside the type loop
struct SInside {
    UInside next,
}
union UInside {
    Next(SInside),
    NoNext,
}

Here UInside should become pointer-like, but UOutside should remain a normal value.

Which Union Breaks the Cycle?

struct S { UOne one }
union UOne { Two(UTwo), Nope }
union UTwo { Struct(S), Nope }

Here we see that we have a type loop to which both unions contribute. We can either lay out UOne as pointer-like, or UTwo. Both would allow us to calculate the size of the types and make values of the type expressible. However we need consistent compilation. For this reason and this reason only (because it is much more efficient to only lay out one of the unions as a pointer) we have to make both unions pointer-like. Perhaps in the future some other consistent metric can be applied.

Does it even matter?

// Obviously doesn't use `T`
enum Something<T> { A, B }

// So this should be fine
struct Foo {
    Something<Foo> some_member,
}

Here we have a struct Foo that has a reliance on Something<Foo>, but this shouldn’t case a type loop. Because Foo doesn’t actually appear in Something. So whatever algorithm we come up with should resolve member types not by iterating over each individual type, but by attempting to instantiate a monomorph of that type, then to check the members of that monomorph.

]]>
Reowolf 1.1: Release Notes https://reowolf.net/reowolf-1-1-release-notes/ Fri, 04 Jun 2021 13:50:40 +0000 https://reowolf.net/?p=9110 We are happy to release this milestone of the Reowolf project: Reowolf version 1.1. This is an alpha release. The milestone improves the structural aspects of Protocol Description Language (PDL), which increases the declarative aspects of protocol descriptions needed for modeling Internet protocols (e.g. TCP, UDP, ICMP, DNS). This post summarizes the improvements, and further lays out the milestones we will be working on next. This release would not be here without Max Henger, who joined the Reowolf project in November 2020, whose contributions have had a major impact on the feature completeness of this release. This release is sponsored by the Next Generation Internet fund.

The Git repository associated to this release can be checked out here. The release tag is v1.1.0. The software is licensed under the MIT license.

The following aspects of the language have been improved, and in the sections below we demonstrate their functionality by small examples:

  1. Introduced algebraic data types (“enum”, “union”, “struct”) for user-defined structuring of data. For handling elements, we introduced “if let” statements for deconstructing “enum” and “union” types, and field dereferencing of “struct” types. The “if let” statement allows extensibility of the type definition of “union” and “enum” types. We also introduced constructor literals for constructing elements in data types, including arrays.
  2. Introduced a type system, an algorithm for type checking, and a type inference system. Type checking ensures that the execution of protocols do not misinterpret data (i.e. avoiding “type confusion”), thus rules out a class of errors.
  3. Introduced generic types in function and datatype definitions, by the use of type variables. Ad-hoc polymorphism is a structural feature of the language only, and is erased during the execution of protocols (in a process called monomorphization). Ad-hoc polymorphism is also available for the port types, allowing rich type information to describe the kind of messages that components can exchange over time.
  4. Improved usability. The module system and namespaces are implemented. This is important for protocols that are authored by multiple independent developers, to avoid namespace conflict. Further, error messages are improved, that increases the usability of the system and makes life easier for developers that use PDL.

The final section shows the roadmap ahead, explaining what milestones will be worked on in the future.

Algebraic Data Types

We have introduced a system for user-defined algebraic data types, in addition to the primitive types (for signed, unsigned integer handling, and arrays). User-defined types can be declared to aid the protocol programmer in structuring data. There are three kinds of user-defined data types:

  • enumeration types (enum)
  • tagged union types (union)
  • product types (struct)

Enumeration and tagged union types serve a similar purpose: to discriminate different cases. Tagged unions further allow for data to be stored per variant, whereas enumeration types do not store data. Enumerations are used for named constants.

For example, consider an enumeration of DNS record types (we list only here a few variants):

enum DNSRecordType { A, NS, CNAME, SOA, MX, TXT }

Then one can access each constant as DNSRecordType::A, DNSRecordType::NS, et cetera.

A classic examples of disjoint union types are to implement so-called option types. For example, in places where some variant of the enumeration above is expected or no value can be given, one may use the following data type:

union OptionalDNSRecordType { None, Some(DNSRecordType) }

To construct an element of a tagged union type, one uses the constructor for each variant. In the case of no-argument variants, this is similar in use as constants in enumerations. For variants that accept parameters, these have to be supplied to the variant constructor. For example, OptionalDNSRecordType::Some(DNSRecordType::A) is an element of OptionalDNSRecordType.

To be able to test what variant is used, we introduce the “if let” statement. This statement tests the variant of a union type and at the same time performs a pattern match to extract data from the matched variant.

auto x = OptionalDNSRecordType::Some(DNSRecordType::A);

if (let OptionalDNSRecordType::Some(y) = x) {
    // y is bound to DNSRecordType::A here
    if (let y = DNSRecordType::A) {
        assert true;
    } else {
        assert false;
    }
}

Product types can be used for modeling structured data, such as packet headers. Each field has an associated type, thus constraining what elements can be stored in the type. Product types are also known as records. For example, here is a structure modeling the UDP packet header:

struct UDPHeader {
    u16 source_port,
    u16 dest_port,
    u16 length,
    u16 checksum
}

Elements of product types can be constructed by a constructor literal, that takes an element for each of the fields that are part of the product. For example, UDPHeader{ source_port: 82, dest_port: 1854, length: 0, checksum: 0 } constructs an element of the type defined above.

Further, user-defined types may be recursive, and thus allow modeling interesting structures such as binary trees. Consequently, one can define recursive functions that traverse such structures. See for example:

union Tree {
    Leaf(u8),
    Node(Tree,u8,Tree)
}
func mirror(Tree t) -> Tree {
    if (let Tree::Leaf(u) = t) {
        return t;
    }
    if (let Tree::Node(l, n, r) = t) {
        return Tree::Node(mirror(r), n, mirror(l));
    }
    return Leaf(0); // not reachable
}

Type Checking and Inference

The type checker ensures that protocols written in PDL are type safe: it is ensured statically that no element of one type is assigned to another type. It is still possible to transform elements from one type to another, either by the means of casting or by calling a function.

The type checker in Reowolf futhermore elaborates protocols in which not sufficient typing information is supplied. This is called type inference. This reduces the need for protocol programmers to supply typing information, if such information can be deduced automatically from the surrounding context.

The type checker and inference system works in tandem with user-defined algebraic data types. Also, in pattern matching constructs such as the “if let” statement, the types of the variables occurring in patterns are automatically inferred.

Consider the following function, that computes the size of a binary tree. It declares an automatic variable (s) that contains the result of the function. Further, it automatically infers the type of the pattern variables (l, n, r) that follows from the definition of the Node variant in the Tree data type.

func size(Tree t) -> u32 {
    auto s = 0;
    if (let Tree::Node(l, n, r) = t) {
        s += 1;
        s += size(l) + size(r);
    } else {
        s = 1;
    }
    return s;
}

We shall now consider a number of negative examples. One assigns two different variables (val16 and val32), and then leaves unspecified the type of the third variable (a) by the use of the auto keyword.

func error1() -> u32 {
    u16 val16 = 123;
    u32 val32 = 456;
    auto a = val16;
    a = val32;
    return 0;
}

In this case, an error is triggered, because there exists no type to which both 16-bit unsigned integers and 32-bit unsigned integers can be assigned. The same kind of error occurs whenever one performs an operation on two different types. Reowolf has no automatic implicit casting. This type strictness is added to ensure that code is never ambiguous. However, casting operators are used to explicitly mark where casting happens in the code.

func error2() -> s32 {
    s8 b = 0b00;
    s64 l = 1234;
    auto r = b + l;
    return 0;
}
func good1() -> s32 {
    s8 b = 0b00;
    s64 l = 1234;
    auto r = cast<s64>(b) + l;
    return r;
}
func good2() -> s32 {
    s8 b = 0b00;
    s64 l = 1234;
    auto r = cast(b) + l; // type inferencer can make the jump
    return r;
}

Generic Types and Functions

Reowolf now supports generic type parameters, that can be used both in user-defined data type definitions and in function definitions. Generic type parameters are also used by the type checker and type inferencer. For example, it is possible to define the generic option type:

union Option<T> { None, Some(T) }

The generic type can be instantiated by a concrete type, including the primitive types such as integers. It is also possible to define generic functions, for example:

func some<T>(Option<T> s) -> T {
    if (let Option::Some(c) = s) { return c; }
    while (true) {} // does not terminate for Option::None
}

Furthermore, generic types are also added to input and output ports: this allows protocol programmers to specify precisely what value is expected during communication. For example, the sync channel is defined as:

primitive sync<T>(in<T> i, out<T> o) {
    while (true) {
        sync {
            if (fires(i) && fires(o)) {
                auto m = get(i);
                put(o, m);
}   }   }   }

The sync channel can then be instantiated by different concrete types: e.g. sync<u8> is a byte channel, and sync<u8[]> is a byte array channel. The additional type information is useful to avoid communicating with incompatible message types.

Usability Improvements

The module and namespace system is improved. Protocol descriptions live in their own namespace (each domain name is a separate namespace) to prevent naming conflicts of definitions among multiple protocol authors. Importing symbols from other modules is allowed, and checks for naming conflicts among symbols imported from other modules and locally defined symbols.

An important aspect of this release is to have user-friendly error messages. This helps the protocol programmer to identify the error in the protocol description, that can be quickly resolved. For example:

struct Pair<T1, T2>{ T1 first, T2 second }
func bar(s32 arg1, s8 arg2) -> s32 {
    auto shoo = Pair<s32, s8>{ first: arg1, seond: arg2 };
    return arg1;
}

produces the following user-friendly error message:

ERROR: This field does not exist on the struct 'Pair'
 +-  at 3:45
 | 
 |              auto shoo = Pair<s32, s8>{ first: arg1, seond: arg2 };
 |                                                      ~~~~~

Roadmap

After this release we can continue our work in the following directions:

  • The semantics of Reowolf’s sync block has to be adapted to make it possible to be driven by an efficient distributed consensus algorithm. For this, we introduced so-called scoped sync statements, that allows for a run-time discovery of neighboring components.
  • Modelling existing transport layer protocols, such as TCP and UDP, as Reowolf protocols. This allows us to convincingly demonstrate the expressiveness of the protocol description language, and to compare our implementation’s efficiency with existing networking stacks. These transport layer implementations would make use of native IP components. Further ahead, we can model existing Internet protocols such as ICMP, DNS, HTTP, ….
  • Make first approaches to integrating Reowolf into the operating system kernel. We are exploring which operating system is most suitable for integration. Considering that our user-mode implementation is written in Rust, we are seeking whether our kernel implementation can also be written (mostly) in Rust.
  • Work on the specification of the Protocol Description Language (PDL), leading to a standardization track. Part of this specification work is the need to formalize, in an unambiguous manner, the semantics of protocols specified in PDL. Formalized semantics increases the future potential for formal verification of protocols, and allows us to define the correctness criteria of Reowolf implementations.

We will keep you updated!

The Reowolf Team
– June 4, 2021

]]>
Reowolf 1.0 Project Code and Documentation https://reowolf.net/reowolf-1-0-project-code-and-documentation/ Fri, 30 Oct 2020 14:59:21 +0000 https://reowolf.net/?p=9104 The Reowolf 1.0 project files are released on Zenodo. The project documentation (technical report) is available at CWI’s Institutional Repository.

The repository serves as the documentation and specification of the Reowolf project, aiming to provide connectors as a generalization of BSD-sockets for multi-party communication over the Internet. A copy of the source code repository of version v1.0.0, and an overview presentation and its slides, are included. The repository comprises the final deliverables of the Reowolf 1.0 project.

Main contributor of the release is Christopher Esterhuyse, core developer of Reowolf 1.0. On Tuesday, October 27, 2020 he gave a talk in the Amsterdam Coordination Group (ACG).

Title: Overview of the Reowolf Project
Abstract:
The Reowolf project introduces connectors as a replacement for BSD-style sockets for multi-party network programming. Connectors encourage applications to make explicit their requirements on the behavior of the session, by facilitating configuration using protocol code, expressed in a domain-specific protocol language (based on Reo). These protocols are retained, and shared over the network, such that the underlying runtime and middleware can cooperate on realizing the session as safely, and efficiently as possible. The presentation summarizes the project’s developments, and lays out promising directions for the sequel.

]]>