plunder

Unnamed repository; edit this file 'description' to name the repository.
Log | Files | Refs | README | LICENSE

COG.md (7476B)


      1 Machine
      2 -------
      3 
      4 In the plunder VM, a "Machine" is a set of persistent processes, which
      5 are referred to as Cogs.
      6 
      7 Cogs can survive restarts. Their inputs are written to an event log, and
      8 their state is occasionally snapshotted to disk. On restart, we recover
      9 the most recent state by loading the most recent snapshot and passing in
     10 each input.
     11 
     12 Cogs interact with the world by making system calls. The set of active
     13 system calls made by Cogs is a part of its state. When it is reloaded,
     14 all of the calls will be resumed as well.
     15 
     16 There is no hidden state in a machine. A machine can shutdown and resume
     17 without a visible effect. The formal state of a full Machine is the
     18 plunder value `(Tab Nat Fan)`, where each cog has a process id mapping to
     19 its formal state.
     20 
     21 Cogs
     22 ----
     23 
     24 A cog is a plunder function partially applied to a row of syscalls.
     25 
     26     ($cog [[%eval 0 add 2 3] [%rand 0 %byte 8]])
     27 
     28 An event is given to a cog by calling it as a function. Each event is a
     29 set of syscall responses `(Map Nat Any)`:
     30 
     31     | ($cog [[%eval 10 add 2 3] [%rand 0 %byte 8]])
     32     | %% =0 [5]
     33     | %% =1 x#0011223344556677
     34 
     35 There are four types of syscalls: `%eval` requests, `%cog` requests,
     36 `%what` requests to detect the attached hardware and everything else is
     37 treated as a CALL to hardware. The interpreter ignores any value that it
     38 does not understand.
     39 
     40 A CALL to hardware looks like `[%rand 0 %byte 8]`. Breaking it down:
     41 
     42 -   `%rand` is the name of the hardware targeted.
     43 
     44 -   The `0` is the "durability" flag that tells the ships that the
     45     effect should not be executed until the input that lead to the
     46     effect has been committed to the log.
     47 
     48 -   `[%byte 8]` is the argument-list that is passed to the `%rand`
     49     device.
     50 
     51 If the current plunder VM does not have a piece of hardware given the
     52 passed in name, no attempt to handle the request will be made.
     53 
     54 That's why we need `%what` to synchronize what the cog thinks are the
     55 callable pieces of hardware. A cog doesn't detect when the plunder
     56 interpreter has been restarted or replaced with a different one and must
     57 know about what current capabilities are provided by CALL. At first, a
     58 cog issues a `[%what %[]]` request, receives a `%[%rand %http]` response
     59 and then holds open a `[%what %[%rand %http]]` request which will only
     60 change if the interpreter does.
     61 
     62 `%eval` asks for plunder code to be evaluated asynchronously.  The result
     63 is that we can take advantage of parallelism, and that the main loop is
     64 not slowed down when the Cog needs to perform an expensive computation.
     65 
     66 -   The `10` in `[%eval 10 add 2 3]` is an upper-bound on the number of
     67     seconds that an evaluation is allowed to run for. An evaluation that
     68     takes longer than that is canceled.
     69 
     70 -   The `[add 2 3]` indicates that EVAL should evaluate the expression
     71     (add 2 3).
     72 
     73 -   The reason that `%eval` is special, is because the event log does
     74     not actually contain the result of an EVAL call, instead the event
     75     log simply records that the event succeeded, and the result is
     76     re-calculated on replay.
     77 
     78 -   This is important because it means that extremely large values can
     79     be returned by EVAL without bogging down the log.
     80 
     81 Finally, there are the `%cog` requests. A user is likely to have multiple
     82 processes that they wish to run, and having those processes communicate
     83 over hardware CALLs would mean that each IPC message must be written into
     84 the event log. So we have a few special calls for process management and
     85 IPC between cogs. Like `%eval`, most `%cog` requests have special event
     86 log representations so that you're storing a record that something
     87 happened that could be recalculated on log replay.
     88 
     89 (If cog A sends a message to cog B, all you need to do is record that B
     90 processed the message from cog A at a given request index, instead of
     91 serializing and storing the full noun sent in the event log.)
     92 
     93 The `%cog` requests are:
     94 
     95 -   `[%cog %spin fan] -> IO Pid`: Starts a cog and returns its cog id.
     96 
     97 -   `[%cog %ask pid chan fan] -> IO (Maybe Fan)`: Sends the fan value to
     98     pid on a channel, returning afterwards. `%ask`/`%tell` operate on
     99     Word64 channels which allows a cog to offer more than one port or
    100     "service". `%ask` makes a request of a different cog which has an
    101     open `%tell` request.
    102 
    103     Returns `0` on any error (remote cog doesn't exist, remote cog's
    104     `%tell` function crashed), or the Just value (`0-result`) on success.
    105 
    106 -   `[%cog %tell chan fun] -> IO a`: Given a function with a type `>
    107     CogId > Any > [Any a]`, waits on the channel `chan` for a
    108     corresponding `%ask`. The runtime will atomically match one `%ask`
    109     with one `%tell`, and will run the tell function with the ask value.
    110     The output must be a row, and the row's index-zero value will be sent
    111     back to the `%ask`, and the row's index-one value will be sent back
    112     to the `%tell`.
    113 
    114     Execution and response is atomic; you'll never have one without the
    115     other in the written event log. This operation is used to allow two
    116     different threads to act in concert.
    117 
    118     Execution and response are atomic; each response map that contains an
    119     %ask or %tell will *only* contain an %ask or %tell. Unlike all other
    120     responses, the runtime will not put as many responses as possible in
    121     the event which delivers an %ask or a %tell response.
    122 
    123     Any crash while evaluating `fun` with the arguments will count as
    124     crashing the telling cog.
    125 
    126 -   `[%cog %stop pid] -> IO (Maybe CogState)`: If the cog does not exist,
    127     immediately returns None. Otherwise, stops and removes the cog from
    128     the set of cogs and returns the `CogState` value.
    129 
    130 -   `[%cog %reap pid] -> IO (Maybe CogState)`: If the cog does not exist,
    131     immediately returns None. Otherwise, waits for a cog to enter an
    132     error state, removes the cog from the set of cogs and returns the
    133     `CogState`.
    134 
    135     (In the case where there's a %reap and a %stop open, the calling
    136     `%stop` takes precedent and receives the cog value, and the `%reap`
    137     receives None.)
    138 
    139 -   `[%cog %wait pid] -> IO ()`: If the cog is not running (non-existent,
    140     finished, crashed or timed out), immediately return 0. Otherwise,
    141     wait for the cog to no longer be in the running state and return 0.
    142 
    143     Separate from %reap and %stop, cogs need a way to detect that cogs do
    144     not exist even when they aren't responsible for stopping or cleaning
    145     up after a crash.
    146 
    147 -   `[%cog %who] -> IO Pid`: Tells the cog who it is. Any other way of
    148     implementing this would end up with changes to the type of the cog
    149     function taking an extra `Pid ->`.
    150 
    151 The on disk snapshot of a whole Machine is just the noun value of `(Tab
    152 Pid CogState)` serialized, where Pid is a natural number and `CogState`
    153 is a row matching one of the following patterns:
    154 
    155 -   `[0 fan]`: represents a spinning cog which has requests and can
    156     process responses.
    157 
    158 -   `[1 fan]`: represents a finished cog, a cog which shut down cleanly
    159     by having no requests, so it will never receive a response in the
    160     future.
    161 
    162 -   `[2 (op : nat) (arg : fan) (final : fan)]`: represents a crashed cog,
    163     with the `op` and `arg` being the values that caused the crash and
    164     `final` being the final value of the cog before the crashing event.
    165 
    166 -   `[3 (duration : nat) (final : fan)]`: represents a cog which had a
    167     request timeout.
    168 
    169 These patterns are also what are returned in the `[%cog %stop]` and
    170 `[%cog %reap]` requests.
    171 
    172 <!---
    173 Local Variables:
    174 fill-column: 73
    175 End:
    176 -->