git @ Cat's Eye Technologies Falderal / freestyle-input TODO.markdown

Tree @freestyle-input (Download .tar.gz)

TODO.markdown @freestyle-inputview markup · raw · history · blame


(collected from the Falderal issue tracker on Bitbucket and the py-falderal issue tracker on github)

Falderal Literate Test Format

Policy for expecting both errors and output, success and failure

Policy should be this, by example:

| test
= foo

Means: I expect this to succeed and to produce foo on stdout, and I don't care what's on stderr (or — stderr should be empty?)

| test
= foo
? bar

Means: I expect this to succeed, to produce foo on stdout, and to produce bar on stderr.

| test
? foo

Means: I expect this to fail, and to produce foo on stderr. And to not care about stdout (or expect it to be empty.)

| test
? foo
= bar

Means: I expect this to fail, to produce foo on stderr, and to produce bar on stdout.

In other words, an error expectation may follow an output expectation and vice versa. Error expectations always match stderr, output expectations always match stdout. Which one's first should dictate whether we expect the command to succeed or fail.

What's after here hasn't been re-edited yet.

When you have a program that produces output on both stdout and stderr, whether it fails or not, you might want to expect text on both stdout and stderr.

Currently it expects the text on stdout if it is a = expectation, and on stderr if it is a ? expectation.

You can't work around this so well by tacking 2>&1 onto the end of the command, because then stderr will always be empty.

We could, by default, tack 2>&1 on the end ourselves and look only at stdout. This might be the simplest approach.

We might want to add options that avoid doing that, but if so, what should they be? Should each test be able to configure this? Should a single test be able to have both = and ? expectations, each for each stream?

This is complicated by the presence of %(output-file); currently, if that is given, stdout is ignored in preference to it (but stderr is still checked, if the command failed. There should probably be a corresponding %(error-file) variable.)

I think the current behaviour could work, with the following policy:

If the command succeeds, your = expectation will be matched against stdout only. If you wish to match against both stdout and stderr in these cases, add 2>&1 to your shell command.

If the command fails, your ? expectation will be matched against stderr only. If you wish to match against both stdout and stderr in these cases, add 1>&2 to your shell command.

Either way, it's still worth investigating whether it's worthwhile to have both = and ? expectations on a single test. (I can't convince myself that stdout and stderr will always be combined deterministically, and having both kinds of expectations would allow non-deterministic combinations of the two to be matched.)

Allow expectations to be transformed during comparison

It would be nice to allow expectations to be transformed before they are compared to the actual output. The main use case for this that I can think of is to allow the expected output to be "pretty printed" (that is, nicely formatted) in the Falderal file, while the functionality being tested just produces a dump. The nicely formatted expected output should be "crunched" into the same ugly format as the dump.

This doesn't work as well the other way; although one could compose the functionality being tested with an actual pretty-printer, that would prescribe a certain indentation scheme etc. that the expected output would have to match exactly. It would be rather better if the writer of the tests could format their expected output as they find most aesthetically pleasing in their literate tests, and have that be transformed instead.

This might be somewhat tricky, however; if the transformation applied is too powerful, it can distort or eliminate the meaning of the test, and erode confidence.

Allow use of patterns in expected output

Likely by way of regexps. This would be particularly valuable in exception-expecting tests, where we don't care about details such as the line number of the Haskell file at which the exception occurred.

Allow equivalency tests to be defined and run.

To test functions of type (Eq a) => String -> a, you should be able to give give multiple input strings in a set; if the function does not map them all to the same value, that's a test failure.

Syntax for an equivalency test might look like this:

| 2+2
| 3+1
| 7-3


Test report accumulation

Multiple runs of falderal ought to be able to accumulate their results to a temporary file. No report should be generated if this is selected. At the end, the report can be generated from the file.

rm -f FILE
falderal --record-to FILE tests1.markdown
falderal --record-to FILE tests2.markdown
falderal --report-from FILE

Support 'weak' testing

In which we only care about whether the command succeeded or failed. In practice, this could be useful for testing the parser (just test if these forms parse.) Or, if not this, then think of something that would make just testing parsers more useful.

Split InterveningMarkdown blocks to make nice test descriptions

For example, if we have

...test #1...
Some text.
More text
...test #2...

The description for test #2 should consist of "More text"; possibly also the heading, but not "Some text". This can take place in a pre-processing phase which simply splits every InterveningMarkdown block into multiple blocks, at its headers. It should understand both underlined and atx-style headers.

option to colourize test result output

Using one of the approaches listed here: ought to provide an option (not default, of course, and not if stdout is not a tty) to colorize the output with, of course, pass=green, fail=red.

But, you'd often want to pipe the output to less, which by default makes control characters visible, defeating colourization. But there is a flag to less, less -R, which retains colourization. So use that.

Flag invalid sequences of lines as errors


Currently, in convertLinesToBlocks, some invalid sequences of lines are ignored. They should be flagged as errors in the test suite file.

(This was written against Test.Falderal but similar considerations could be made for py-falderal.)