COG.md (7476B)
1 Machine 2 ------- 3 4 In the plunder VM, a "Machine" is a set of persistent processes, which 5 are referred to as Cogs. 6 7 Cogs can survive restarts. Their inputs are written to an event log, and 8 their state is occasionally snapshotted to disk. On restart, we recover 9 the most recent state by loading the most recent snapshot and passing in 10 each input. 11 12 Cogs interact with the world by making system calls. The set of active 13 system calls made by Cogs is a part of its state. When it is reloaded, 14 all of the calls will be resumed as well. 15 16 There is no hidden state in a machine. A machine can shutdown and resume 17 without a visible effect. The formal state of a full Machine is the 18 plunder value `(Tab Nat Fan)`, where each cog has a process id mapping to 19 its formal state. 20 21 Cogs 22 ---- 23 24 A cog is a plunder function partially applied to a row of syscalls. 25 26 ($cog [[%eval 0 add 2 3] [%rand 0 %byte 8]]) 27 28 An event is given to a cog by calling it as a function. Each event is a 29 set of syscall responses `(Map Nat Any)`: 30 31 | ($cog [[%eval 10 add 2 3] [%rand 0 %byte 8]]) 32 | %% =0 [5] 33 | %% =1 x#0011223344556677 34 35 There are four types of syscalls: `%eval` requests, `%cog` requests, 36 `%what` requests to detect the attached hardware and everything else is 37 treated as a CALL to hardware. The interpreter ignores any value that it 38 does not understand. 39 40 A CALL to hardware looks like `[%rand 0 %byte 8]`. Breaking it down: 41 42 - `%rand` is the name of the hardware targeted. 43 44 - The `0` is the "durability" flag that tells the ships that the 45 effect should not be executed until the input that lead to the 46 effect has been committed to the log. 47 48 - `[%byte 8]` is the argument-list that is passed to the `%rand` 49 device. 50 51 If the current plunder VM does not have a piece of hardware given the 52 passed in name, no attempt to handle the request will be made. 53 54 That's why we need `%what` to synchronize what the cog thinks are the 55 callable pieces of hardware. A cog doesn't detect when the plunder 56 interpreter has been restarted or replaced with a different one and must 57 know about what current capabilities are provided by CALL. At first, a 58 cog issues a `[%what %[]]` request, receives a `%[%rand %http]` response 59 and then holds open a `[%what %[%rand %http]]` request which will only 60 change if the interpreter does. 61 62 `%eval` asks for plunder code to be evaluated asynchronously. The result 63 is that we can take advantage of parallelism, and that the main loop is 64 not slowed down when the Cog needs to perform an expensive computation. 65 66 - The `10` in `[%eval 10 add 2 3]` is an upper-bound on the number of 67 seconds that an evaluation is allowed to run for. An evaluation that 68 takes longer than that is canceled. 69 70 - The `[add 2 3]` indicates that EVAL should evaluate the expression 71 (add 2 3). 72 73 - The reason that `%eval` is special, is because the event log does 74 not actually contain the result of an EVAL call, instead the event 75 log simply records that the event succeeded, and the result is 76 re-calculated on replay. 77 78 - This is important because it means that extremely large values can 79 be returned by EVAL without bogging down the log. 80 81 Finally, there are the `%cog` requests. A user is likely to have multiple 82 processes that they wish to run, and having those processes communicate 83 over hardware CALLs would mean that each IPC message must be written into 84 the event log. So we have a few special calls for process management and 85 IPC between cogs. Like `%eval`, most `%cog` requests have special event 86 log representations so that you're storing a record that something 87 happened that could be recalculated on log replay. 88 89 (If cog A sends a message to cog B, all you need to do is record that B 90 processed the message from cog A at a given request index, instead of 91 serializing and storing the full noun sent in the event log.) 92 93 The `%cog` requests are: 94 95 - `[%cog %spin fan] -> IO Pid`: Starts a cog and returns its cog id. 96 97 - `[%cog %ask pid chan fan] -> IO (Maybe Fan)`: Sends the fan value to 98 pid on a channel, returning afterwards. `%ask`/`%tell` operate on 99 Word64 channels which allows a cog to offer more than one port or 100 "service". `%ask` makes a request of a different cog which has an 101 open `%tell` request. 102 103 Returns `0` on any error (remote cog doesn't exist, remote cog's 104 `%tell` function crashed), or the Just value (`0-result`) on success. 105 106 - `[%cog %tell chan fun] -> IO a`: Given a function with a type `> 107 CogId > Any > [Any a]`, waits on the channel `chan` for a 108 corresponding `%ask`. The runtime will atomically match one `%ask` 109 with one `%tell`, and will run the tell function with the ask value. 110 The output must be a row, and the row's index-zero value will be sent 111 back to the `%ask`, and the row's index-one value will be sent back 112 to the `%tell`. 113 114 Execution and response is atomic; you'll never have one without the 115 other in the written event log. This operation is used to allow two 116 different threads to act in concert. 117 118 Execution and response are atomic; each response map that contains an 119 %ask or %tell will *only* contain an %ask or %tell. Unlike all other 120 responses, the runtime will not put as many responses as possible in 121 the event which delivers an %ask or a %tell response. 122 123 Any crash while evaluating `fun` with the arguments will count as 124 crashing the telling cog. 125 126 - `[%cog %stop pid] -> IO (Maybe CogState)`: If the cog does not exist, 127 immediately returns None. Otherwise, stops and removes the cog from 128 the set of cogs and returns the `CogState` value. 129 130 - `[%cog %reap pid] -> IO (Maybe CogState)`: If the cog does not exist, 131 immediately returns None. Otherwise, waits for a cog to enter an 132 error state, removes the cog from the set of cogs and returns the 133 `CogState`. 134 135 (In the case where there's a %reap and a %stop open, the calling 136 `%stop` takes precedent and receives the cog value, and the `%reap` 137 receives None.) 138 139 - `[%cog %wait pid] -> IO ()`: If the cog is not running (non-existent, 140 finished, crashed or timed out), immediately return 0. Otherwise, 141 wait for the cog to no longer be in the running state and return 0. 142 143 Separate from %reap and %stop, cogs need a way to detect that cogs do 144 not exist even when they aren't responsible for stopping or cleaning 145 up after a crash. 146 147 - `[%cog %who] -> IO Pid`: Tells the cog who it is. Any other way of 148 implementing this would end up with changes to the type of the cog 149 function taking an extra `Pid ->`. 150 151 The on disk snapshot of a whole Machine is just the noun value of `(Tab 152 Pid CogState)` serialized, where Pid is a natural number and `CogState` 153 is a row matching one of the following patterns: 154 155 - `[0 fan]`: represents a spinning cog which has requests and can 156 process responses. 157 158 - `[1 fan]`: represents a finished cog, a cog which shut down cleanly 159 by having no requests, so it will never receive a response in the 160 future. 161 162 - `[2 (op : nat) (arg : fan) (final : fan)]`: represents a crashed cog, 163 with the `op` and `arg` being the values that caused the crash and 164 `final` being the final value of the cog before the crashing event. 165 166 - `[3 (duration : nat) (final : fan)]`: represents a cog which had a 167 request timeout. 168 169 These patterns are also what are returned in the `[%cog %stop]` and 170 `[%cog %reap]` requests. 171 172 <!--- 173 Local Variables: 174 fill-column: 73 175 End: 176 -->