Project
The goal of the project is to write a simple static analyzer by abstract interpretation in OCaml to discover automatically numeric invariants about the variables of programs written in a simple programming language.
The project is due on the day of the exam, on May the 28th. The project should be sent by email to the teachers.
Result
For the project, you should deliver an archive that contains the following items:
- the source code of the static analyzer, well-commented and complete with a Makefile;
- a report of 4 pages or more that describes the language features supported, the choices in semantics and implementation (if any), the usage of the analyzer; it must also discuss the result of a few analysis tests (we provide some tests that you can use).
Required features
Input language
The input language can be based on the language used in the lab session, or you can design your own language with similar features. To get started, you can download the codeV1x.tgz package (see the first practical session for an explanation of the syntax of the language).
More precisely, the language must feature at least:
- blocks (nested sequences of instructions);
- declarations of variables of integer type, with proper scoping with respect to blocks (to simplify, you can assume that all variable names are distinct);
- assignments of integer expressions, with the standard arithmetic operators: +, -, * (and optionally division / and modulo %);
- if-then-else instructions, using boolean expressions containing integer comparisons (==, !=, <, >, <=, >=) and boolean operators (&&, ||, !) (support for boolean variables is not required);
- while loops;
- a non-deterministic operation, such as rand(l,h) in integer expressions;
- an assert instruction, that checks the validity of a boolean expression;
- a print instruction, that takes as argument a list of variables and prints their value.
(You are free to reject programs using other language constructions, such as functions, non-integer variable declarations, gotos, etc.)
Analysis
The analyzer should contain an iterator that analyzes the program forwards, by induction on the abstract syntax tree. Loops can be unrolled a fixed number of times (e.g., 3 times), but you should ultimately use iterations with widening in order to always terminate for domains with infinite increasing chains. To increase the precision, the first few (e.g., 3) applications of the widening may be replaced with a join.
The analysis should be parametric in the choice of a numeric abstract domain. At least the three following domains must be provided:
- the constant domain, that you should implement yourself;
- the interval domain, that you should also implement yourself;
- the polyhedra domain, which calls the Apron library to perform the abstract operations (you are not asked to implement the polyhedra operators yourself).
Depending on the domain, at least the following kinds of expressions should be handled precisely:
- in assignments: all arithmetic expressions for the constant and interval domains; affine expressions for polyhedra;
- in the conditions of if-then-else and while instructions: for polyhedra, boolean formulas whose atoms are affine constraints; for constants and intervals, boolean formulas whose atoms are simple comparisons between two variables or between a variable and a constant.
All the expressions allowed in the program syntax must be handled (no "not implemented" exception); however, any expression not belonging to the above categories may be handled in a coarse way (e.g., considering assignments as non-deterministic and ignoring conditions).
Output
The analysis should at least output a textual file containing:
- a list of the assertions that cannot be proved to hold, and their location;
- for each print instruction, the invariant found for the specified variables at the location of the instruction.
For debugging, it can be useful to implement a trace mode that outputs the abstract element computed at each step of the analysis.
Tests
To help you test your analyzer, we provide a set of small test programs, written in the small language used in the practical sessions. We also provide the output of their analysis using our reference analyzer, so that you can easily check the correctness and the precision of your analyzer.
Extensions
In addition to the mandatory features described above, you may implement one or several extensions, such as for instance:
- functions and function calls (without recursion);
- checks for arithmetic overflows and divisions by zero errors in expressions;
- other numeric abstract domains (such as non-relational integer congruences, or linear equalities);
- several types of numeric variables (such as rationals or floating-point numbers, which are supported directly by Apron);
- boolean variables (you can handle them as integer variables that can only take two values: 0 for false, and 1 for true);
- arrays (each array can be abstracted by a single variable, and assignments result in weak updates);
- non-local control-flow instructions, such as break and continue in loops, forward goto, or even arbitrary goto;
- a backward analysis (find necessary conditions on the start of the program so that all assertions are satisfied);
- a nicer presentation for the output of the analysis, such as a graphical visualisation of numeric invariants (particularly useful for polyhedra in 2D or 3D);
- any extension of your choice.
Resources
Files
- codeV1x.tgz: language parser;
- tests.tgz: some programs to test your analyzer.
Documentation
- OCaml programming language;
- Menhir parser generator;
- ZArith arbitrary precision number library;
- Apron abstract domain library;
- Mapext maps with some additional useful functions (the source code is included in the codeV1x.tgz bundle).
Author: Antoine Miné