<\body> |> In this document we explain the purpose and provide a draft specification for the NARP protocol, a general-purpose networking protocol destined to be used in many layers of a new operating system and networking system. We begin by remarking that a basic operation in all computer operation processes consists in naming objects and providing acces to these named objects. Here are a few examples of naming in real use cases: <\itemize> Naming of files on a local or distant file system Naming of devices in the virtual filesystem on Unix machines Naming of networked machines (with IP adresses and DNS records) Naming of internet ressources over protocols such as HTTP, IMAP, IRC, specfic web services, ... We propose here a novel architecture with the purpose of unifying all the naming happening at all levels of the system, with two base concepts : and . <\itemize> are ressources that may implement different semantics : bidirectionnal communication (such as sockets) ; unidirectionnal communication (FIFO-like) ; file semantics ; etc. are a way of naming objects, querying the interfaces they implement, and multiplexing communications with them We suggest that a NARP service may be provided on any bidirectionnal channel of communication supporting the (reliable) sending and recieving of messages. In addition, NARP objects may implement such a send/recieve interface ; therefore a NARP service can be channeled into an object. Such a construction of using a NARP object to access a NARP service is a fundamental operation that we call , or just . The NARP protocol is a client/server protocol meant to include a variety of different operations that may or may not be implemented by a specific NARP server. A NARP service is basically any object that implements the following operations: <\itemize> : get information on a ressource identified by name : know the names of ressources presented by the service (possibly in a specific sub-path) : get an object interface for accessing a ressource, identified by name A NARP object is basically any object that implements the following operations : <\itemize> : send a message (an arbitrary byte string) to the object : recieve a message from the object (this may be done asynchronously with handler functions) : delete object connection Given any interface with send/recieve capabilities considered as an assymetric (client/server) configuration, the following client messages consitute the basics of the NARP protocol for providing a NARP service on the interface: <\itemize> : initialize a connection, check version information, ... and appropriate response messages : use credentials (user/password or access token) to gain acces to some ressources provided by the server (the protocol is thus statefull) , and appropriate response messages : get information about the available ressources and appropriate response messages : give an identifier (a descriptor) to a ressource in order to communicate with it and appropriate response messages : send a message to an attached ressource, identified by its descriptor : close a descriptor and detach from a ressource , , , : requests the creation or modification of a ressource in the namespace The server may also at any moment send a message, including: <\itemize> a response to a query : a notification of a message sent from the object to the client : the connection to the object has been terminated by the object server If an object is a NARP server, the messages sent to it and recieved from it are messages of the NARP protocol. Otherwise, they are arbitrary. Some NARP servers may support reverse object serving: the client creates an object on the server and handles all the requests arriving to this object (therefore the initial NARP server only serves as a relay between the new server and its clients<\footnote> Research is to be done on shortcutting mechanisms in specific situations where too many levels of recursion cause a performance issue. ). A client wishing to act as a reverse object server may use the following commands: <\itemize> : listen for attach requests on a servable (empty) object created in the server namespace (if authorized) and : accept (or reject) an attach request to the object : close connection between object and client (this is the same detach message as in standard communications) : stop serving for the object. Attached clients continue to be attached. The server may in turn send the following messages concerning the server object: <\itemize> : a client is willing to attach to the object. A descriptor is already associated to the connection to be established, but the server may reject it. Once a client is attached to the object, a classical send/recieved interface is provided. Typically, the protocol exchanged over the object is NARP protocol, therefore enabling the reverse server to provide its own namespace and other functionnality. Sockets are the basis of the NARP protocol : attaching to an objects opens a socket connection to the process serving the object, and when the connection is accepted, basic send/recieve functionnality is provided. See also the reverse object protocol described in section . Small files may implement the following interface: <\itemize> : erase the whole file and put the transmitted content : retrieve the whole file content Big files may implement the following interface: <\itemize> : write a portion of the file at a given offset : read a portion of the file at a given offset Virtual terminals can be seen as objects implementing a simple send/recieve semantic, where the data transmitted is unstructured (or structured given a specific terminal data structure). More specific interfaces can be defined for advanced terminals and GUIs. Specific applications may define custom messages. Examples include: <\itemize> e-mail instant messaging collaborative editing of text-based documents and many other applications yet to be invented. The message size in the NARP protocol is limited to 64kb, and recommended not to exceed 4kb+header (4kb is the size of a memory page on many machines). Therefore a possibility would be for the NARP protocol to include a way to transmit big messages by fragmenting them into small messages. Optionnal error correction may be included. This can be useful for example when using or on large files, or s and s of big file portions. The recieving of a large fragmented message may have a specific implementation allowing the reciever to work with the partial data as soon as it starts arriving and not having to wait for the whole message to be transmitted and buffered. Research is yet to be done on this specific subject. For each attached client the server may keep track of associated permissions, and accept or reject requests according to those permissions. The client may use an authentication command to gain supplementary privileges on the server's ressources. The client may request a token to delegate it's privileges on a given object to another client. Advanced right management functionnalities are to be discussed. The NARP protocol relies on the fact that when transmitting a message, the other end will recieve it. It is nevertheless recommended that NARP implementations support the repeating of messages if an expected acknowlegment has not arrived after a given delay. This server implements a namespace where any client may create an empty object and serve connections to it. Additionnaly, the server may implement the possibility to create virtual files, virtual directories, FIFO queues, etc. This server may be connected to other virtual NARP servers in order to provide a global namespace accessible to all. Each virtual NARP server acts as an endpoint into the network and may have functionnality for routing the communications to objects to the clients that serve them. This server simply implements access to a filesystem : listed ressources are the same as the files present in a served directory, each of these implements the filing protocol (served directly by the file server), and the creation of files/directories may also be implemented. Clients may create objects on the server ; each of these objects correspond to a GUI window. Two interfaces may be implemented : text IO (terminal) and graphical interaction. Advanced terminal interaction features may be implemented at the protocol level, such as auto-completion of commands or of text being edited... Suggestion for a third kind of window : the data sent by the client corresponds to a description of the scene in a given markup language and the server does the rendering. The client can also subscribed to events such as clicking on an item or entering text. This possibility is to be explored. Several features to be implemented: <\itemize> user login and private user mailboxes bridge to standard SMTP/POP3/IMAP services private threads of conversation with access rights (the users don't each have a copy of the thread) synchronization between many servers public discussion forums <\itemize> user login and status notification online and offline private messaging public chat rooms, chat room logging independently of user being online or offline bridging and synchronization between many servers TODO... A protocol message is given in the following form: |||>||||>>>> The following element types apply: <\itemize> , , : 16-bit, 32-bit or 64-bit little-endian integers : a string, prefixed by a 16-bit length header > : an array of 's (where is another element type), prefixed by a 16-bit length header * (for the last element) : consider all the rest of the message as a byte string The basic format of a message is : ||>|||>>>> We will abbreviate by ``header'' the first 32 bits (4 bytes) of the message. The list of message types is given in section . Messages for communication with an attached ressource will have the following format : |||>||||>>>> Many client messages awating a response will have a message ID included ; this message ID is an arbitrary number generated by the client and used by the server when giving its response. The header then looks like this: |||>||||>>>> Client messages have an up arrow (>) next to their name, while server messages have a down arrow |)>>. The core NARP protocol is meant for small size and rapidity (so that many layers can be encapsulated with minimal overhead), therefore no acknowlegment is to be sent for recursive send/recieve messages. Other messages usually imply some kind of action or getting of information, therefore an acknowlegment or an error is usually sent as a response. \>> <\indent> ||>|||>>>> \ \ When a NARP connection is established, the client is always the first to send a message. The object may then respond either with a message indicating that the requested interfaces can be provided, or with an message. The two common error causes are and . For interface numbers : see table in section . >> <\indent> Generic error response message for any operation. |||>||||>>>> Common error IDs are specified in section . >> <\indent> |>||>>>> Generic acknowlegment message for commands that require it. An acknowlege implies the command has been sucessfully executed (otherwise an error message is sent). >> <\indent> ||>|||>>>> The request ID is an ID decided by the client so that it can identify the answer. >>Response to the message. <\indent> ||>|||>>>> Common interface numbers are to be found in section >. If a query on an object gives a certain list of interfaces, then when connecting to the object at least all these interfaces must be included in the server's message as supported interfaces. Note that some interface numbers correspond to actions that can be done on the object from the connection where the object exists (e.g. : symbolic link, directory), and others correspond to actions that can be performed after attaching to the object (e.g. file, terminal, ...) >> <\indent> ||||>|||||>>>> >>Response to the message. <\indent> One message is passed for each entry in the requested range: |||>||||>>>> After the directory has finished being enumerated, a supplementary entry is given with entry number the last valid entry number plus one and an empty entry name. This supplementary entry is only given if its (ficious) entry number is included in the range requested by the client. Possible extension : combine List and Stat so that when the answer to List is given, information is also given on the object's implemented interfaces. >> <\indent> ||>|||>>>> >>Response to the command. <\indent> ||>|||>>>> (the handle, ie the ressource descriptor, is attributed by the server) >> <\indent> ||>|||>>>> This message does not expect a response. >> <\indent> Spontaneous server message indicating some data is sent by an attached ressource. This message does not expect a response. ||>|||>>>> >> <\indent> |>||>>>> This message does not expect a response. >> <\indent> Spontaneous server message indicating the object has been detached. |>||>>>> >>> <\indent> |||>||||>>>> A create request is accompanied with a list of needed interfaces that direct the server into creating the corresponding type of object (e.g. an empty object to be served, a directory, a file, ...) >>Response to the command. <\indent> ||>|||>>>> Signals that the object has been created, and has corresponding interfaces associated to it. >> <\indent> ||>|||>>>> This message expects a standard response message. >> <\indent> |||>||||>>>> This message expects a standard response message. Semantics of the link object: <\itemize> attaching or serving on this objects corresponds to resolving the linked path and attaching/serving on the linked object stating the link will stat the linked object and add as an implemented interface the ``this is a symlink'' information directory listings follow links deleting the link will not delete the original file but only the link >> <\indent> ||>|||>>>> >>Response to the message. <\indent> ||>|||>>>> This will only return the first level of linking, ie the link data directly associated to the link object. >> <\indent> |||>||||>>>> This message expects a standard response message. >> <\indent> |||>||||>>>> This message is a request for the client to be a reverse server to an object. The response message to this message is an message. The handle attributed to the served object is known as the and is used in the and messages. To stop serving an object, the client simply sends a command on the server handle. The semantics is that all connections that have been openned through the reverse-served object are preserved when the object stops being served, and an individual message must be sent to all of them if we want to close them. The serves to answer messages on the object while we are serving it. >> <\indent> ||>|||>>>> This message is sent by the server when another client wishes to attach to an object reverse-served by this client. The server handle is the one given as a response to the message. The client handle is a handle associated to the connection. The reverse server may reject the connection by issuing a command on the client handle, or may accept it using the message given below. >> <\indent> |>||>>>> Once a connection has been accepted, the reverse server may at any moment close it by sending a command on the corresponding client handle. >> <\indent> |||>||||>>>> Consider the handle as a NARP protocol service, and associate a handle in the outer layer to the handle of the inner layer with handle . Example : in connection A we have a connection open on handle 5 which contains NARP data that we will call B, and in connection B we have another connection open on handle 7. Issuing a Unbox(id, 5, 7) request on A will lead to the server creating a handle (say 12) where sending corresponds to sending a message to handle 7 on connection B, and such that all messages recieved on handle 5 (ie on connection B) are filtered and messages whose destination is handle 7 on connection B are removed from the stream and issued on handle 12 of connection A instead. The answer to such a request is an response giving a handle to the unboxed connection. Systematically unboxing open connections may lead in some cases to the network infrastructure being able to do simplifications in the interconnections. In other cases it may result to useless overhead on the server side : in such a case the server may refuse an unbox request. >> <\indent> |||>||||>>>> Ask the server to redirect all messages recieved on handle A to handle B and all mesages recieved on handle B to handle A. The messages recieved on either handle are not sent to the client anymore. The answer messages are standard / messages. >> <\indent> |||>||||>>>> Undoes a plugging. To be defined. Is it really usefull? What role exactly does it have? Can it implement repetition in the case where the message hasn't been acknowledge? ... >> <\indent> |||>||||>>>> Used to gain access using credentials (user/password, token, ...). Response messages are standard on success or on failure. Autentification methods include : <\itemize> 1 : user + password 2 : token >> <\indent> ||>|||>>>> Requests the server to create an authentication token for accessing a given object with the privileges of the connected client. Once the token has been returned, it may be transmitted to another client so that that client will use it to gain same access to the object. >>Response to the message. <\indent> ||>|||>>>> TODO : request account creation, manage user groups and ACLs, ... <\itemize> file protocol \; system protocols (see section on OS design using NARP) UI protocols (terminal, GUI) communication protocols (mail, IM) The tables presented in this section give the number associated to the message types. These tables are the reference on the subject ; any information found somewhere else is wrong if it is not the same as found here. This is for protocol version 1. <\indent> |||> id>|> id>|||> id>|> id>>|>|||| / >||>|>|||| / >||>|>|||| / >||>|||||>||>| / >||||>||>| / >||||>||>| / >|||| / >||>|>||||||>|>||||>||>|>||||>||>|||||>||>>>> <\indent> |> id>|> id>>|>||>| / >||>>>> <\indent> |>||>||>||>||>||>||>||>||>||>||>>>> > <\indent> ||>|||, , >>|||, >>|||, >>|||>|||>|||>||| command supported>>||| and commands supported>>|||>|||>|||>|||>>>> \ This interface specifies that the object is currently an empty object waiting for someone to issue a command on it, providing it with an implementation of some interfaces. This interfaces indicates that once attached to the object, the messages sent/recieved to it are not supposed to be NARP format but any arbitrary format. If this interface is not specified, then it is expected that the messages transmitted will follow the general NARP protocol (message format, standard hello/ack/error messages). This interface indicates that once attached to the object, one can have access to a new NARP namespace where at least the following operations are supported : , , , , . Additionnal messages may or may not be supported. An asynchronous implementation can be easily programmed in functionnal languages such as OCaml or Haskell, using closures as continuations for . TODO When designing the NARP protocol, we had in mind that it would be possible to use it in a new operating system design at many levels : access to devices, process management, memory management, filesystems, IPC, GUI, ... Kernel helpers could be developped so that a part of the NARP multiplexing and demultiplexing takes place in kernel land, before messages are passed to userspace. For instance, this would allow the simplification of useless mux-demux chains taking place on the same machine. The mux-demux helper can be implemented via the protocol message, handled at the level of the root stream of NARP communication with the kernel. Another possible helper would be to map a virtual memory region to a NARP ressource implementing a standard filing protocol, much like memory mapped files in standard OSes (only this would work with arbitrary ressources). In this section we will develop on a concrete proposal for a NARP-based operating system. The basic primitive of the system being message-passing, the system looks a lot like a micro-kernel. Only the message format has a complex semantic and the communication layer is not really ``simple''. Furthermore, the system has device drivers, file system and networking running as kernel-mode processes, making the kernel more monolithic (but still having a micro-kernel spirit). It should be easy to make any user mode process run as a kernel mode process instead, for the sake of performance (eg : graphical server & compositor). The kernel land is divided in three major parts, with strict dependency order: <\itemize> Level 0 : System ressource managment : physical memory, virtual memory, hardware interaction (IRQ, v86), debug output Level 1 : Scheduler, IPC & NARP core server : builds on top of level 0, adds support for processes and communication between them restricted to NARP protocol data. Level 2 : System processes : hardware, file systems, network, ... (may access level 0 and level 1 features) User processes are restricted to syscalls that call level 1 primitives. Here are a few basic principles for the design of these three levels : <\itemize> Level 2 processes may not communicate directly nor share memory : they must go through level 0 and level 1 primitives to achieve such a goal. Each level 2 process has a separate heap, which is completely freed when the process dies. Level 2 processes do not use separate virtual memory spaces : since the kernel memory space is mapped in all page directories, a level 2 process may run with any page directory. Benefits : critical system parts are restricted to level 0 and level 1. Level 2 components may leak or crash with less consequences. All synchronization & locking is handle by level 1, except for level 0 that must implement its own locking devices (since it cannot rely on level 1). Benefits : no complex synchronization in most of the code (which is either level 2 or userland), only simple message passing and waiting for stuff to happen No concept of ``threads'' : system processes are actually kernel threads, but we call them processes since they use separate parts of memory. Userlands processes cannot spawn multiple threads of execution either : they must fork and communicate through NARP if they want to do so (eg: launching an expensive communication in the background). (since fork is a complicated system call, and features such as copy-on-write depend on processes using different paging directories, the fork system call is accessible only to userland processes : level 2 processes may not fork, but only create new processes) Level 1 also has a memory heap ; it is used with . Level 2 proceses use standard , which are modified to act on the heap of the current process. Each process (system or user) has a , ie a queue of incoming NARP messages waiting to be transferred. The mailbox has a maximum (buffer size), and a call may fail with a error. This is the only possible failure for a call. System processes (level 2) spend most of their time in ; they may be waked up by either recieving a NARP messsage or by a hardware event. Therefore the function that composes the main loop may return either : or . If the reason is , the process is free not to read the message immediately. On the other hand, user processes can wait for only one thing : recieving a NARP message. Each user process has a in its memory space, and the function just copies the first message of the mailbox into this zone (overwriting whatever was there before) and returns control to the process (returning the length of the message). Handling of IRQs : some hardware stuff requires action as soon as the interrupt is fired, therefore a specifi IRQ handler may be used. Such a handler must do as little as possible, and when it is done signal level 1 that an IRQ has happenned (it may add specific data to the ``IRQ happenned'' message). Level 1 adds a message to the queue of the recipient process (if there is one) and returns immediately : the IRQ handler must leave as soon as possible. An IRQ is handled on whatever stack is currently used, and the IF flag is constantly off while the IRQ handler is running. The timer IRQ is the only one that behaves differently, since it has to trigger a task switch. <\enumerate> Develop level 0 completely and with cleanest possible design Develop level 1 with only basic funcionnality Develop some basic applications in level 2 : display, keyboard, mini kernel shell, mini file system, ... Improve level 1 with more complex stuff ; try to quickly attain a complete level 1 Work on the rest of the stuff <\initial> <\collection> <\references> <\collection> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > <\auxiliary> <\collection> <\associate|toc> |math-font-series||1Introduction> |.>>>>|> |math-font-series||2High-level overview> |.>>>>|> |2.1The basic operations on services and objects |.>>>>|> > |2.2The basics of the NARP protocol |.>>>>|> > |2.3Recursion |.>>>>|> > |2.4Reverse object |.>>>>|> > |2.5Specific object types and associated messages |.>>>>|> > |2.5.1Objects are sockets |.>>>>|> > |2.5.2File objects |.>>>>|> > |2.5.3User IO (terminals...) |.>>>>|> > |2.5.4Specific applications |.>>>>|> > |2.6Big messages |.>>>>|> > |2.7Permissions |.>>>>|> > |2.8Reliability concerns |.>>>>|> > |2.9Example NARP servers |.>>>>|> > |2.9.1Virtual NARP server (i.e. NARP router) |.>>>>|> > |2.9.2NARP file server |.>>>>|> > |2.9.3NARP terminal/GUI server |.>>>>|> > |2.9.4NARP e-mail and newsgroup server |.>>>>|> > |2.9.5NARP chat server |.>>>>|> > |2.9.6NARP applicative server |.>>>>|> > |math-font-series||3Specifics of the NARP protocol> |.>>>>|> |3.1Protocol description format |.>>>>|> > |3.2Basic message format |.>>>>|> > |3.3Message list for core NARP protocol |.>>>>|> > |Hello|\\> |.>>>>|> > |Error|\> |.>>>>|> > |Ack|\> |.>>>>|> > |Stat|\> |.>>>>|> > |StatR|\> |.>>>>|> > |List|\> |.>>>>|> > |ListR|\> |.>>>>|> > |Attach|\> |.>>>>|> > |Attached|\> |.>>>>|> > |Send|\> |.>>>>|> > |Recieve|\> |.>>>>|> > |Detach|\> |.>>>>|> > |Detached|\> |.>>>>|> > ||math-font-series||Create|\>> |.>>>>|> > |Created|\> |.>>>>|> > |Delete|\>|language||> |.>>>>|> > |Link|\> |.>>>>|> > |ReadLink|\> |.>>>>|> > |ReadLinkR|\> |.>>>>|> > |Rename|\> |.>>>>|> > |Serve|\> |.>>>>|> > |Incoming|\> |.>>>>|> > |Accept|\> |.>>>>|> > |Unbox|\> |.>>>>|> > |Plug|\> |.>>>>|> > |Unplug|\> |.>>>>|> > |3.4Big message protocol |.>>>>|> > |3.5Authentification and rights managment commands |.>>>>|> > |Authenticate|\> |.>>>>|> > |NewToken|\> |.>>>>|> > |NewTokenR|\> |.>>>>|> > |3.6File protocol |.>>>>|> > |3.7UI protocols |.>>>>|> > |3.7.1Terminal protocol |.>>>>|> > |3.7.2Graphical user interface protocol |.>>>>|> > |3.8Communication protocols |.>>>>|> > |3.8.1Email and newsgroups protocol |.>>>>|> > |3.8.2Instant messaging protocol |.>>>>|> > |3.9Other protocols |.>>>>|> > |3.10Table of IDs |.>>>>|> > |3.10.1Message types |.>>>>|> > |Base protocol |.>>>>|> > |Authentication & privileges |.>>>>|> > |3.10.2Error messages |.>>>>|> > |3.10.3Object interfaces |.>>>>|> > |Servable |.>>>>|> > |non-NARP inside |.>>>>|> > |NARP service |.>>>>|> > |math-font-series||4Architecture of a NARP implementation in OCaml or Haskell> |.>>>>|> |math-font-series||5Using NARP to design an Operating System> |.>>>>|>