0 | 0 |
Robin's Approach to Static Analysis
|
1 | 1 |
===================================
|
2 | 2 |
|
3 | |
This document is a draft.
|
|
3 |
This document is a draft. It contains a bunch of not-necessarily
|
|
4 |
well-integrated notes on how Robin approaches the problem of static
|
|
5 |
analysis.
|
4 | 6 |
|
5 | 7 |
Static Analysis
|
6 | 8 |
---------------
|
|
96 | 98 |
I don't expect this will be easy, of course, but I do hope it is somehow
|
97 | 99 |
possible.
|
98 | 100 |
|
|
101 |
Some Implications of this Approach
|
|
102 |
----------------------------------
|
|
103 |
|
|
104 |
The "static" in "static analysis" means that the code is analyzed without
|
|
105 |
running it; it does not strictly mean that it happens *before the code is
|
|
106 |
run*. Since, in Robin, you (the programmer) write static analyzers in
|
|
107 |
Robin, then it's your responsibility to make sure they run when you want
|
|
108 |
them to run (and yes, typically this will be before the code itself is
|
|
109 |
run; otherwise you might as well just not have any static analysis, and
|
|
110 |
wait for incorrect code to crash when it runs.)
|
|
111 |
|
|
112 |
Fortuitously, there is a convenient time to run static analyses on a
|
|
113 |
module -- when that module is imported. The code to construct the
|
|
114 |
environment exported by the module needs to run at that point anyway, and
|
|
115 |
the static analysis might as well run then too. This also means that
|
|
116 |
statically analyzing a set of a modules is as simple as constructing a
|
|
117 |
test program which simply imports all of those modules; if any of them
|
|
118 |
fail static analysis, an exception will be thrown at import time:
|
|
119 |
|
|
120 |
(robin (0 . 1) (foo (0 . 1) bar (0 . 1) baz (0 . 1))
|
|
121 |
#t)
|
|
122 |
|
|
123 |
Another implication of running static analyzers "in" the code they are
|
|
124 |
statically analyzing -- and this relates to having the option to punt --
|
|
125 |
is that they, like the rest of your program, may not be perfect. In the
|
|
126 |
end, they are constructed from the same goo as the rest of your program,
|
|
127 |
and to have confidence that those goo formations do what you expect them
|
|
128 |
to, you should have tests for them. If a static analyzer is imperfect,
|
|
129 |
this just means you can't rely on it as heavily as you could if it was
|
|
130 |
perfect; it doesn't mean you can't still use it. When you think about it,
|
|
131 |
the static analyzer in some Java compiler you're using might contain a bug,
|
|
132 |
too, but such a bug might be harder for you to fix, what with that compiler
|
|
133 |
possibly being written in a different language. (The flip side of this is
|
|
134 |
that Robin's static analyzers ought, in some cases at least, to be able to
|
|
135 |
analyze themselves easily -- leading to better confidence that the analyzers
|
|
136 |
themselves are correct.)
|
|
137 |
|
|
138 |
Concrete Applications
|
|
139 |
---------------------
|
|
140 |
|
|
141 |
Let's provide an overview of some of the possible concrete applications of
|
|
142 |
static analysis in Robin.
|
|
143 |
|
99 | 144 |
`pure`
|
100 | 145 |
------
|
101 | 146 |
|
102 | |
Probably the most sensible place to start with all this is a macro which
|
103 | |
defines a function if and only if it can prove that the function has no
|
104 | |
side-effects. (Otherwise, it presumably raises an exception.)
|
|
147 |
A macro is referentially transparent if, for every set of possible particular
|
|
148 |
actual arguments, it always evaluates to the same particular result value.
|
105 | 149 |
|
106 | |
This goes along with the convention that the names of such functions end
|
107 | |
with `!`. However, we'd like to ensure this at the level of values rather
|
108 | |
than names; in Robin, values have types, but names do not.
|
|
150 |
"Referentially transparent" is a bit of a mouthful, so in Robin, we (for
|
|
151 |
better or worse) call macros with this property *pure*.
|
109 | 152 |
|
110 | |
As described in the Style document, the only "side-effects" in Robin are
|
111 | |
spawning a process, sending a message to a process, and receiving a message
|
112 | |
from a process, and these can be distinguished at a lower level than simply
|
113 | |
"might this have side-effects or not". But for now, for simplicitly, I'm
|
114 | |
going to glom them all together under this banner.
|
|
153 |
There is a convention that macros that are not pure are bound to names that
|
|
154 |
end with an exclamation point (`!`). However, this is a convention, and so
|
|
155 |
is not enforced; also, purity analysis deals with the values themselves, not
|
|
156 |
the names they may or may not be bound to.
|
115 | 157 |
|
116 | |
One problem I've faced is finding a succinct adjective to describe functions
|
117 | |
which can not possibly cause side-effects. "Referentially transparent" is
|
118 | |
one of the more official adjectives, but it's far too long. I've
|
119 | |
provisionally settled on *pure*, even though it is far from the most
|
120 | |
descriptive word.
|
121 | |
|
122 | |
So we can imagine our static analyzer, with its provisional name, like so.
|
123 | |
We can define a macro called, say, `pure-fun`. It accepts the same kinds
|
124 | |
of arguments as `fun`:
|
125 | |
|
126 | |
(bind perimeter (pure-fun (w h) (* 2 (+ w h)))
|
127 | |
...)
|
128 | |
|
129 | |
`pure-fun` however, examines its second argument in detail before
|
130 | |
evaluating to a function value which implements this function. It looks up
|
131 | |
`*` in its environment, sees that there is metadata on the value referred
|
132 | |
to `*` that indicates that it is pure, and continues. It descends into the
|
133 | |
term, and sees that `2`, being a literal value, is pure; it sees that `+`
|
134 | |
is also pure; and it sees that `w` and `h` are arguments to the function.
|
135 | |
(If these aren't pure, that's not a problem with this function per se.)
|
136 | |
Having thus proven the expression to be pure, it evaluates the function
|
137 | |
value in the exact same way that `fun` would, then adds metadata to that
|
138 | |
value that marks it as `pure`.
|
139 | |
|
140 | |
Then `bind` binds the identifier `perimeter` to this value, which has
|
141 | |
been marked as `pure`; so when we look up `perimeter` in this environment,
|
142 | |
we know it refers to a pure function. We can use this information in
|
143 | |
subsequent checks, like:
|
144 | |
|
145 | |
(bind perimeter (pure-fun (w h) (* 2 (+ w h)))
|
146 | |
(bind psquare (pure-fun (w) (perimeter w w))
|
147 | |
...))
|
148 | |
|
149 | |
This is all well and good for functions, but for other macros, we may
|
150 | |
need to do more work. Specifically, a macro like `fun` itself, which
|
151 | |
defines a custom syntax, might need to describe what their syntax is
|
152 | |
like, in their metadata, so that the purity analyzer can recognize them
|
153 | |
and process them correctly.
|
|
158 |
For more information on `pure`, see [the documentation for the `pure`
|
|
159 |
module](module/Pure.falderal).
|
154 | 160 |
|
155 | 161 |
`constant`
|
156 | 162 |
----------
|
157 | 163 |
|
158 | |
On top of this we can easily build another level of static analysis,
|
|
164 |
On top of `pure` we can easily build another level of static analysis,
|
159 | 165 |
`constant`. An expression is constant if it is a literal, or if it
|
160 | 166 |
an application of a pure, constant function to constant arguments.
|
161 | 167 |
|
|
163 | 169 |
this even more complex: the `eval`ing of a `constant` value inside a
|
164 | 170 |
function may make that function `pure`, whereas if it is not a `constant`
|
165 | 171 |
value, there might be no way to prove this.
|
|
172 |
|
|
173 |
`total`
|
|
174 |
-------
|
|
175 |
|
|
176 |
A macro is total if it always terminates. In the general case, this is
|
|
177 |
undecidable. However, it is not too difficult to analyze a macro and
|
|
178 |
determine that it is primitive recursive, and all primitive recursive
|
|
179 |
functions are total.
|