summaryrefslogblamecommitdiff
path: root/doc/rapport_final.lytex
blob: b730ac05beb64eeae1df8391ffb31061602963e0 (plain) (tree)
1
2
3
4
5
6
7
8
9








                                      

                     



                                
                                     




                
                                                                




                                                                 
                             
                                                                                     
                                                                      



















                                                                                                          
                            
 

                                                                                             


                                                                                   
                                                    
 
                                                                                   


                                                                                   
                                  
 
                                                                                                   





                                                                                        
                                                                                         


                                                                                          
                                                                                      






                                                                                           
                                                                                     






























                                                                                  
                                      











                                                                   
                                             






                             
                                                                                                    

















































                                                                                                                     
                                                                
 
                              




















































                                                                                                                                         

























































































































                                                                                                     

                    
                                                                                   


                                                                                           
                                                                                        

































































                                                                                                    
                                                                                    










































































                                                                                                             
                        






























                                                                                              






                                                                                                
                                                                             



                                                                                              

                                  



                                                                                             
                                                                                             




                                                                                                  
              
\documentclass[11pt, a4paper]{article}

\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage[margin=1.0in]{geometry}
\usepackage[british]{babel}
\usepackage{indentfirst}
\usepackage{array,booktabs,longtable}
\usepackage{multirow}
\usepackage{listings}

\usepackage{comment}

\newcommand{\prog}[1]{{\tt#1}}
\newcommand{\underscore}{$\_\,$}
\newcommand{\vivace}{\textsc{Vivace}}

\begin{document}



\title{Conception and realization of the \vivace{}  architecture
             \\ \normalsize{\textsc{Projet de Système digital}}}
\author{A. Auvolat \and É. Enguehard \and J. Laurent}
\maketitle


The \vivace{} architecture is
a minimalistic 16 bits RISC microprocessor architecture, largely inspired by the MIPS
microprocessor. The principal characteristics of the architecture are:

\begin{itemize}
\item \textit{8 general-purpose registers}, which can hold 16 bits integers: \prog{Z, A, B, C, D, E, F, G}
\item \textit{16 bit memory addressing}, enabling the CPU to use up to $64kb$ of memory.
\end{itemize}

In order to implement and run the architecture, the following programs have been written:

\begin{itemize}
\item \textit{Netlist  simulator} and \textit{netlist optimizer}
\item An \textit{OCaml library} for generating netlists from Caml code
\item The code for the \textit{CPU implementation} (written in Caml)
\item A \textit{monitor} which is used to interact dynamically with the netlist simulator
\item An \textit{assembler} which can be used to produce the ROM files run by the CPU.
\end{itemize}

The reader keen on learning valuable and useful information should skip the next
section and jump to section \ref{sec:useful}. He who does not strive for usefulness
may proceed.

\section{The name \vivace{}}

\vivace{} is a recursive acronym standing for “\emph{Virtually Infallible \vivace Automated
Computing Environment}”. We do not think anyone would dispute that \vivace{} is an
automated computing environment, and shall thus discuss the rest of the allegations
that are made in this name.

\subsection{Why is \vivace{} virtually infallible ?}

\vivace{} bears a great many similarities to the pope, in that, among other things,
it is virtually infallible the sole and only reason that we, its creators, are able
to claim as loudly as needs be that it is so.

\subsection{Why “\vivace{}” ?}

“\vivace{}”, as anyone knows, is an italian word meaning, surprisingly enough, “vivacious”.
Outside of its homeland, the word is mostly known for its place of honour in the musical
glossary. Its inscription at the beginning of a musical piece means the composer
of said piece intends it to be played in a lively tone, yet the interpret should not
strive for fastness and virtuosity, as he should if the inscription were “Allegro”
or “Presto”.

Its name, thus, suits perfectly \vivace{}; it is certainly as fast as a software-emulated
microprocessor can be, yet, as it is the sad fate of any software-emulated microprocessor,
it is extremely slow.

One may ask: what does \vivace{} has to do with music? In fact, the \vivace{} CPU has 
eight registers, seven of which are referred to with the letters \prog{A} through \prog{G}.
In English musical notation, those letters happen to be the names of the seven notes
of the usual scale.\footnote{In German musical notation, the letters \prog{A}
through \prog{G} are the names of the same seven notes, but in F major instead of C
major. Thus the letter \prog{B} represents an English B-flat, while an English B
is represented by the letter H.}

This relation between \vivace{} and music can be used to translate \vivace{} programs
into musical pieces, which, while extremely interesting from an intellectual point
of view, is remarkably useless. Consider, for instance, the following program,
that uses only elementary instructions (see
section \ref{sec:assembly} for a description of the VIVACE assembly language):

\begin{verbatim}
.text
    # reads a nonnegative rational number and outputs its integer part and
    # whether it is an integer or not
    liuz A 0x41 # input address
    lbr B 0(A) # numerator
    lra E 2
    lbr C 0(A)
    jer E C Z # denominator must be non-zero
    divu D B C
    se G E Z
    mulu F D C
    lil A 0x02 # output address
    sb F 0(A)
    sb G 0(A)
\end{verbatim}

Mapping cycles to notes' length and write access to registers to the pitch, we get
the following piece, for the input \prog{31 10}:

\lilypond[fragment,notime,quote]{
\set Timing.defaultBarType = ""
a'4 b'4. e'4 c'4. s \appoggiatura e'8 d'\breve g'4 f'2 a'4 s2}

As should be expected, it sounds awful.

\section{How to run the \vivace{} cpu}
\label{sec:useful}
\subsection{Preparation}

All the tools described in the introduction must first be compiled:

\begin{verbatim}
    $ cd csim; make; cd .. 
    $ cd sched; make; cd .. 
    $ cd monitor; make; cd ..
    $ cd asm; make; cd ..
\end{verbatim}

To run the \vivace{} CPU, type the following:

\begin{verbatim}
    $ cd cpu; make
\end{verbatim}

\subsection{Monitor commands}

You are now running the \vivace{} CPU. The monitor accepts a few commands to control the simulation.
First, you must configure the monitor to communicate with the CPU. Type:

\begin{verbatim}
    t 0
    s 1 19 18
    d7 20 21 22 23 24 25 26 27
\end{verbatim}

The first command sets up the tick input (a tick is sent once every second on this input by the monitor). The
second command sets up the serial input/output. The third command sets up the 7-segment display (8 digits displayed).
Now, use the following commands to control the simulation:

\begin{itemize}
    \item \prog{a} run the simulation at full speed
    \item \prog{m} run the simulation step by step (enter an empty command to run a step)
    \item \prog{f <freq>} run the simulation at fixed frequency (frequency is dynamically ajusted so
            this is not very accurate)
    \item \prog{q} exit simulation
\end{itemize}

The CPU recieves commands on the serial input. To send a command to the CPU, use the following syntax:

\begin{verbatim}
    :<cpu_command>
\end{verbatim}

For instance:

\begin{verbatim}
    :Y2014
\end{verbatim}

These commands are essentially used to set one of the six variables \prog{YMDhms} ; the syntax is similar to the
example command given above. An empty CPU command tells the CPU to just tell us what time and what day it is.


\section{Program details}
\subsection{Generating netlists from Caml code}

We have developped a library that enables us to easily generate netlists from Caml code. The Caml code we write
has the same abstraction level that MiniJazz has, but it is more comfortable to write circuits like this than
with MiniJazz code.

The library functions are defined in \prog{cpu/netlist\_gen.mli}. Basically, we have created functions that build
the graph of logical operations. The abstract type \prog{t} is actually a closure to a function that adds the
required equation to a netlist program being built, therefore the generation of a netlist consists in two steps:
the generation of a closure graph that describes the graph of logical operations, and the execution of these
closures on a program which, at the beginning, has only the circuit inputs. The equations are progressively
added to the program when the closures are called.

The \vivace{} CPU has been entirely realized using this library.

\subsection{The \vivace{} CPU}

\subsubsection{Control structure}

The CPU is able to execute instructions that need several cycles to run. The two first cycles of an instruction's
execution are used to load that instruction (16 bits have to be read, ie two bytes). Most instructions finish
their execution on the second cycle, but some executions need more cycles to run:

\begin{itemize}
    \item Load and store instructions need one or two extra cycles
    \item The multiplication operation needs as many cycles as the position
        of the most-significant non-null bit in the second operand.
    \item The division always runs on 16 cycles.
\end{itemize}

The execution of instructions on several cycles is implemented using a ``control bit'' that cycles through
several steps: load instruction, various steps of instruction execution. A few of these step control bits
appear in the simulator, as CPU outputs:

\begin{itemize}
    \item \prog{read\_ilow}, \prog{read\_ihi} CPU is reading low byte/high byte of the instruction
    \item \prog{ex\_instr} CPU begins execution of the instruction
    \item \prog{ex\_finish} CPU finishes execution of the instruction (modified registers may only appear
        in the monitor at the next step)
\end{itemize}


\subsubsection{ROM, RAM and MMIO}

The CPU has uniform acces to a 64kb address space, which contains the ROM (\prog{0x0000-0x3FFF}), MMIO (\prog{0x4000-0x7FFF})
and the RAM (\prog{0x8000-0xFFFF}).
The \prog{cpu\_ram} (\prog{cpu.ml}) subcircuit is basically a bunch of multiplexers that redirect reads and writes to the correct places.

The serial input/output is implemented using one input and two outputs :

\begin{itemize}
    \item Input \prog{ser\_in} (8 bits) : when this input is non-null, the character entered is buffered by
        the CPU. This buffer can be read by reading MMIO byte at address \prog{0x4100}. The buffer is reset to zero
        on read.
    \item Output \prog{ser\_in\_busy} (1 bit) : signals when the input buffer is nonzero (ie a character is
        pending, waiting for the CPU to read and handle it).
    \item Output \prog{ser\_out} (8 bits) : when non-null, the CPU is sending a character to the serial output.
        This output can be written by writing MMIO byte at address \prog{0x4102}.
\end{itemize}

The clock is also handled by MMIO : the CPU recieves a tick every second on input \prog{tick}. When a tick is
recieved, the tick buffer is incremented by one. This tick buffer can be read by reading MMIO word at address
\prog{0x4000}. When the word is read, the buffer is reset to zero.

The 7-segment display is also handled by MMIO : the 8 digits can be modified by writing a byte to MMIO addresses
\prog{0x4200} to \prog{4207}.

\subsubsection{The ALU}

The \emph{Arithmetic and Logic Unit} implements the following features :




\begin{itemize}

    \item A \textbf{parallel addition circuit} consisting in a linear chain of
    full--adders. The same circuit is used for doing substractions. There is no
    distinction between \textit{signed} and \textit{unsigned} addition.

    \item A \textbf{serial unsigned multiplication} circuit that needs as many cycles as the position
            of the most-significant non-null bit in the second operand.

    \item A \textbf{serial unsigned division circuit} that computes one digit of the
    quotient and one digit of the remainder per cycle. It relies on the long
    division algorithm every child has learnt in school. The algorithm and the
    source code are given in example in Figure~\ref{divalg} and
    Figure~\ref{divcode}.

\end{itemize}


We first tried to keep all the instructions working on one cycle. This is
feasible concerning the multiplication, by using $16$ addition circuits working
in parallel. However, the generated overhead in the netlist is of the same order
of magnitude as the total size of the CPU without this feature. Moreover, it
appeared it was neither natural nor efficient to write a parallelized version of
the division circuit. Therefore, we changed the ALU in order to allow a few
instructions executing on several cycles, by adding two bus
\prog{start\underscore{}signal} and \prog{work\underscore{}remains}. See
Figure~\ref{divcode} for an example.


Arithmetic overflows are not handled, and the behaviour of the ALU is
unspecified in the case of a division by $0$. Signed multiplication and division
are not yet implemented, but it would be easy to use some multiplexers and
negation circuits to reuse the unsigned version of these circuits, computing the
sign of the result independently. 


In regard to the simulator's implementation, there's no need to optimize the
depth of the different circuits but the total number of gates. However, some
functions that are used to generate circuits working on \emph{Redudant Binary
Representation} were left in the source code.


\begin{figure}[p]

\textsc{Long-div} : computes the remainder \prog{R} and the quotient \prog{Q} of the
	division of \prog{N} by \prog{D}:
	\medskip

\begin {lstlisting}[basicstyle=\ttfamily, frame=lines] 
Q := 0                    
R := 0                              
for i = n-1 to 0 do
  R := R << 1
  R(0) := N(i)
  if R >= D then
    R = R - D               
    Q(i) := 1
  end
end  

\end{lstlisting}

\caption{The binary long division algorithm}
\label{divalg}

\end{figure}

\begin{figure}[p]
\lstset{language=ML}
\begin{footnotesize}

\begin {lstlisting}[basicstyle=\ttfamily, frame=lines] 
let rec ndivu n a b start_signal = 

  let next_busy, set_next_busy = loop 1 in
  let busy = start_signal ^| (reg 1 next_busy) in

  let dd, set_dd = loop n in
  let q,  set_q  = loop n in
  let r,  set_r  = loop n in
  let c,  set_c  = loop n in  
  
  let c = set_c (
    mux start_signal 
      (shiftl1 n (reg n c)) 
      ((const "0") ++ (rep (n-1) (const "1"))) ) in

  let q  = mux start_signal (reg n q) (zeroes n) in
  let r  = mux start_signal (reg n r) (zeroes n) in
  let dd = set_dd (mux start_signal (shiftl1 n (reg n dd)) a) in
  let r = (dd ** (n-1)) ++ (r % (0, n-2)) in

  let rq = mux (ule_n n b r)
    (r               ++ ((const "0") ++ (q % (0, n-2))))
    ((nsubber n r b) ++ ((const "1") ++ (q % (0, n-2)))) in

  let r = set_r (rq % (0, n-1)) in
  let q = set_q (rq % (n, 2*n-1)) in

  let work_remains = nonnull n c in

  let finished = 
    set_next_busy (busy ^& work_remains) ^. 
      (not work_remains) ^& busy in

  dd ^. c ^.
    q, r, finished
\end{lstlisting}
\end{footnotesize}
\caption{The unsigned division circuit}
\label{divcode}
\end{figure}




\subsection{The \vivace{} assembly}
\label{sec:assembly}

The \vivace{} assembly language is mostly inspired from the MIPS assembly language.
An assembly program is made of a \prog{.text} and an optional \prog{.data} segment, both of
which may contain labels that behave exactly like in MIPS, except that they may not
begin with a capital letter. End-of-line characters are
used as delimiters. Comments begin with \prog{\#} and end with an end-of-line character.

\subsubsection{The \prog{.data} segment}

Due to limitations of the simulator that we did not consider were worth the trouble to remove,
the \prog{.data} segment only contains uninitialized data. Its main use is to declare labels for use
in the \prog{.text} segment. Instead of formally describing the straightforward syntax of the
\prog{.data} segment, we will provide a simple example:

\begin{verbatim}
.data
label1:
    word 3 # three 16-bit words
label2: # label2 is label1 + 6
    byte 1 # one byte
\end{verbatim}

\subsubsection{The \prog{.text} segment}

The \prog{.text} segment contains both instructions and read-only data. The programmer
has to ensure that the program counter never reaches the data. Read-only data is declared
in the following manner:

\begin{verbatim}
    # here be instructions
    # they'd better end with a jump
data:
    word 0b101010 42 '*' # three words with the same value
    word -1 # 0xFFFF
    byte -1 255 # 0xFF 0xFF
string1:
    ascii "hello world!\n" # a trailing null character is automatically added
string2:
    ascii "well hello to you good sir!"
strings_addr:
    word string1 string2 # this will make things easier
\end{verbatim}

Instructions have between 0 and 3 operands, which are to be separated by spaces.
Depending on each instruction, operands may be integers, labels or registers.
Registers are referenced either by their name (\prog{Z}, \prog{A} etc.) or by
their number (\prog{\$0}, \prog{\$1} and so on). Here is a list of possible
instruction formats:

\begin{verbatim}
# format R3:
    add A B C
    add A B -5 # signed, 5 bits
# format R2:
    move A B
# format I:
    incri A 2 # signed byte
# format J:
    j -5 # signed, 11 bits
    j label
# format R:
    jr RA # RA is G
# format MEM:
    lw A 2(D) # signed, 5 bits
    lw A addr_label
# format H:
    hlt
\end{verbatim}

The first register is usually the destination register. Like in MIPS, the \prog{sw}
and \prog{sb} instructions are exceptions to this rule.

Here is a list of all instructions supported by the \vivace{} assembly language. The
column specifies wether the instruction is actually supported by the processor or
it is translated to elementary instructions. Elementary instructions may still
be translated to several instructions, for example if the operands are too big,
or if integer operands are used in a command that only accepts registers. In that
case, the \prog{E} register is used to store the operands.\footnote{While the assembler will process
all listed instructions, some of them may not have been implemented in the CPU itself.}

\begin{center}
\begin{longtable}{lccl}
\toprule
Name & Elementary? & Format & Comments \\ \midrule
\prog{add} & Yes & R3 & \\ \cmidrule{1-4}
\prog{sub} & Yes & R3 & \\ \cmidrule{1-4}
\prog{mul} & Yes & R3 & \\ \cmidrule{1-4}
\prog{div} & Yes & R3 & \\ \cmidrule{1-4}
\prog{addu} & Yes & R3 & Unsigned version of \prog{add} \\ \cmidrule{1-4}
\prog{subu} & Yes & R3 & \\ \cmidrule{1-4}
\prog{mulu} & Yes & R3 & \\ \cmidrule{1-4}
\prog{divu} & Yes & R3 & \\ \cmidrule{1-4}
\prog{and} & Yes & R3 & \\ \cmidrule{1-4}
\prog{or} & Yes & R3 & \\ \cmidrule{1-4}
\prog{xor} & Yes & R3 & \\ \cmidrule{1-4}
\prog{nor} & Yes & R3 & \\ \cmidrule{1-4}
\prog{lsl} & Yes & R3 & \\ \cmidrule{1-4}
\prog{lsr} & Yes & R3 & \\ \cmidrule{1-4}
\prog{asr} & Yes & R3 & \\ \cmidrule{1-4}
\prog{se} & Yes & R3 & “Set if Equal” \\ \cmidrule{1-4}
\prog{sne} & Yes & R3 & \\ \cmidrule{1-4}
\prog{slt} & Yes & R3 & “Set if Lower Than” \\ \cmidrule{1-4}
\prog{sle} & Yes & R3 & “Set if Lower or Equal” \\ \cmidrule{1-4}
\prog{sltu} & Yes & R3 & Unsigned version of \prog{slt} \\ \cmidrule{1-4}
\prog{sleu} & Yes & R3 & “Set if Lower Than” \\ \cmidrule{1-4}
\prog{incri} & Yes & I & Allows for larger integers than \prog{add} \\ \cmidrule{1-4}
\prog{shi} & Yes & I & Shifts first operand by second operand; mostly useless\\ \cmidrule{1-4}
\prog{j} & Yes & J & If operand is an integer, performs a relative jump \\ \cmidrule{1-4}
\prog{jal} & Yes & J & “Jump and link;” return address is stored in \prog{G}\\ \cmidrule{1-4}
\prog{jr} & Yes & R & \\ \cmidrule{1-4}
\prog{jalr} & Yes & R & \\ \cmidrule{1-4}
\prog{jer} & Yes & R3 & “Jump if (R2 and R3 are) Equal to (address stored in) Register” \\ \cmidrule{1-4}
\prog{jner} & Yes & R3 & \\ \cmidrule{1-4}
\prog{jltru} & Yes & R3 & \\ \cmidrule{1-4}
\prog{jleru} & Yes & R3 & \\ \cmidrule{1-4}
\prog{lra} & Yes & J & Instead of performing jump, loads target address in register
\footnote{Labels may be loaded with command \prog{li}.} \\ \cmidrule{1-4}
\prog{lw} & Yes & MEM & \\ \cmidrule{1-4}
\prog{sw} & Yes & MEM & \\ \cmidrule{1-4}
\prog{lb} & Yes & MEM & \\ \cmidrule{1-4}
\prog{sb} & Yes & MEM & \\ \cmidrule{1-4}
\prog{lwr} & Yes & R3 & Looks up at address R2 + R3\\ \cmidrule{1-4}
\prog{swr} & Yes & R3 & \\ \cmidrule{1-4}
\prog{sbr} & Yes & R3 & \\ \cmidrule{1-4}
\prog{lbr} & Yes & R3 & \\ \cmidrule{1-4}
\prog{hlt} & Yes & H & Infinite loop \\ \cmidrule{1-4}
\prog{li} & No & I & Translates to appropriate \prog{li*} commands \\ \cmidrule{1-4}
\prog{lilz} & Yes & I & Loads lower byte, zeroes upper byte \\ \cmidrule{1-4}
\prog{liuz} & Yes & I & Loads upper byte, zeroes lower byte \\ \cmidrule{1-4}
\prog{lil} & Yes & I & Loads lower byte \\ \cmidrule{1-4}
\prog{liu} & Yes & I & Loads upper byte \\ \cmidrule{1-4}
\prog{push} & No & R & Stores contents of register on top of stack  \\ \cmidrule{1-4}
\prog{pop} & No & R & Loads contents of register from top of stack  \\ \cmidrule{1-4}
\prog{move} & No & R2 & \\ \cmidrule{1-4}
\prog{not} & No & R2 & \\ \cmidrule{1-4}
\prog{move} & No & R2 & \\ \cmidrule{1-4}
\prog{jz} & No & R2 & “Jump if Zero“ \\ \cmidrule{1-4}
\prog{jnz} & No & R2 & \\ \bottomrule
\end{longtable}
\end{center}

In addition to this, labels \prog{\underscore clock}, \prog{\underscore output}
and \prog{\underscore input} are
mapped respectively to the clock counter, the serial output and the serial input.

Here is an example function that outputs a string on the serial output:

\begin{verbatim}
# PROCEDURE: ser_out_str
# ROLE: write null-terminated string to serial output
# ARGUMENTS: address of string in register A
ser_out_str:
    li C _output
_ser_out_str_loop:
    lb B 0(A)
    jz B _ser_out_str_ret
    sb B 0(C)
    incri A 1
    j _ser_out_str_loop
_ser_out_str_ret:
    jr RA
\end{verbatim}

\subsubsection{The assembler}

The assembler itself is written in OCaml in a quite straightforward manner, using
the \prog{ocamllex} and \prog{menhir} tools, which allows for great flexibility.

\subsection{The simulator and the monitor}

The simulator is written in C for performance reasons.

The monitor is a C program, using the curses library for output to the console.

The simulator and the monitor communicate via Unix named pipes (FIFO's), which are created in
the files \prog{/tmp/sim2mon} and \prog{/tmp/mon2sim}. The synchronization of the two programs
has been somewhat problematic, due to incorrect use of \prog{scanf} making the programs hang.

\subsection{The operating system}

The operating system (source file : \prog{os.asm}) is a simple collection of assembly functions,
that implement the following features :

\begin{itemize}
    \item Serial input/output
    \item Writing of string and integers to serial output
    \item Date calculation
    \item Parsing of elementary command-line calls to set the clock variables
    \item Unit tests for addition, substraction, unsigned multiplication and usigned division.
\end{itemize}


\section{Results and benchmarking}

We have managed to run the processor sucessfully, and to make it display the date and time
as expected.

With only basic ALU operations implemented (basically, unsigned division and comparison), the
netlist generated for the CPU is 5572 lines long without simplifications, and 2560 lines long
after simplifications (see report on the simulator for details about these simplification passes).

The CPU runs at a reasonnable frequency of more than $10khz$, which enables it to interact
with the user almost instantly.

\end{document}