This file is obtained from RDFA testing document using tidyhtml.

Text-based specification and ncval

Targeted tests

Exhaustive decoder test

Exhaustive validator test

How do we apply text-based specification to sequences of bytes?

Presubmit script

Text-based specification and ncval

https://code.google.com/p/nativeclient/issues/detail?id=3453

Basically it’s a handful of python functions which accept disassembly listing and say whether it’s correct from the sandboxing point of view.

This executable specification is intended to be more or less readable and automatically up-to-date.

Note that there is no goal to reflect all validator quirks in the spec. For example, for purely technical reason old validator (and consequently new one) rejects 16-bit atomics. We plan to allow them eventually. Specification allows them from the beginning.

This specification is not a test per se, but it is used in two sets of tests: targeted tests and exhaustive validator tests.

Targeted tests

https://code.google.com/p/nativeclient/issues/detail?id=3037

https://code.google.com/p/nativeclient/issues/detail?id=3452

These tests serve several purposes:

  1. as regression tests and convenient way to report bugs and document behaviors
  2. as positive tests for RDFA validator (other tests only check that it rejects something that is unsafe, not that it accepts what it’s supposed to accept)
  3. as negative tests for text-based specification (in other places we only check that it allows what RDFA actually accepts; here we can check that it’s not too forgiving)
  4. as cheap, although incomplete, test suite that can be run on bots consistently

There are about 200 manually written test files (incorporating ~300 test cases) for 32- and 64-bit validators. They originate from tests Karl used for old validator. Since then they were supplemented with tests for most (hopefully, all) defects and subtle behaviours we encountered during the work on RDFA validator. They also include tests Mark used for his prototype DFA-based validator. They are somewhat poorly structured, so sometimes it may not be directly obvious where to find test case for a specific problem, but there is a lot of stuff there.

Test format. Test files (validator_ragel/testdata/32/*.test, validator_ragel/testdata/64/*.test) consist of one or more test cases separated by ‘------------’. Each test case consists of sections. Example:

@hex:

  # This is the correct nop case.

  # nopw   0x0(%eax,%eax,1)

  66 0f 1f 44 00 00

@dis:

     0:        66 0f 1f 44 00 00            nopw   0x0(%eax,%eax,1)

@rdfa_output:

  return code: 0

@spec:

  SAFE

----------------------------------------------------------------------

@hex:

  # This is an example where we have incremented the nop by 1.

  66 0f 1f 44 00 01

@dis:

     0:        66 0f 1f 44 00 01            nopw   0x1(%eax,%eax,1)

@rdfa_output:

  0: [0] unrecognized instruction

  return code: 1

@spec:

  0: unrecognized instruction 'nopw 0x1(%eax,%eax,1)'

‘@hex’ section contains input data as sequence of bytes. By convention, in complex cases bytes corresponding to instruction are usually preceded with comment specifying assembly form of the instruction. But this correspondence is not enforced so in principle comments can lie. To make it easier to spot errors in comments, there is a ‘@dis’ section containing output of nacl-objdump on given input.

‘@spec’ section contains expected output of text-based ncval (which receives its input from @dis section).

‘@rdfa_output’ section contains expected output of RDFA-based ncval (actually, it’s processed in a certain way, see https://code.google.com/p/nativeclient/issues/detail?id=3037 under ‘tricky part’ paragraph. In the ideal world this simulation of error recovery won’t be needed, as any test would contain no more than one violation of sandboxing rules, but for historical reasons we have tests with several errors in a single chunk of code and we want to check them all. New tests should exhibit only one failure per test case)

For each of these three sections there is a dedicated scons target:

  1. Content of @dis is checked with ./scons run_dis_section_test_32 (or _64).
  2. Content of @spec is checked with ./scons run_spec_val_test_32 (or _64).
  3. Content of @rdfa_output is checked with ./scons run_rdfa_targeted_tests_32 (or _64).
    Additionally these two targets check that RDFA validator determines valid jump targets correctly by appending jump instruction leading to each location in input code. How @hex section is split into lines determines which jump targets are allowed (it is expected that bytes corresponding to a single instruction occupy one line in hex files; for superinstructions, line continuation mark ‘\\’ is used to disallow jump to the end of the marked instruction).

All these test targets are run on bots.

If ‘regenerate_golden=1’ option is passed to scons with any of these targets, the content of the section is replaced with the actual output of the corresponding tool. It is helpful when tests are edited. Of course, each such change have to be manually reviewed.

TODO: when @rdfa_output and @spec disagree and how to check for it.

Note: there is legacy stuff in validator_x86/testdata.

How to run all these tests:

  ./scons small_tests

(or, more specifically,

  ./scons run_dis_section_test_32 run_rdfa_targeted_tests_32 run_spec_val_test_32

and same for 64)

Exhaustive decoder test

https://code.google.com/p/nativeclient/issues/detail?id=3154

The primary purpose of this test is to find errors in our instruction definition files.

We enumerate all instruction sequences accepted by decoder automaton and compare output of our decoder with specific version of objdump.

We do not enumerate all possible values of immediates (there are too many of them). Transitions corresponding to immediate bytes (as well as direct jump/call targets and relative offsets - collectively ‘anybytes’) are marked in the automaton, so we recognize them in our traversal and only generate one representative instance of the immediate.

Our decoder behaves differently than objdump when it comes to fwait instruction (https://code.google.com/p/nativeclient/issues/detail?id=3251), for example ‘FWAIT; FNINIT’ sequence is decoded as single ‘FINIT’ instruction by objdump. We decide not to reproduce this behavior and instead took precautions to ensure that FWAIT instruction is always followed with NOP in the stream we generate.

Also, RDFA decoder does not sign-extend negative immediates (https://code.google.com/p/nativeclient/issues/detail?id=3164), but it does not show up in this test because we use positive numbers for ‘anybytes’.

How to run:

  ./scons dfacheckdecoder

Since this test requires ragel, it can only be run on linux. Also, it takes a while (about an hour on z620), so we do not even attempt to run in on bots.

There are ~250 millions 32-bit instructions and ~4 billions 64-bit instructions accepted by decoder.

Exhaustive validator test

https://code.google.com/p/nativeclient/issues/detail?id=3167

This test is designed to catch following problems:

  1. unexpected interactions between DFA actions due to nondeterminism
  2. mistakes in sandboxing logic in DFA actions
  3. wrong nacl-specific annotations in .def files (for example, when one forgets to mark forbidden instruction as nacl-forbidden)
  4. typos and other errors in superinstructions and special instructions that are manually specified in .rl files (as opposed to .def files)

(requires ragel, platform=x86-64 (because of python/validator integration) and old ncval built)

Basically we are solving some kind of ‘inverse kleeny star problem’: given DFA, we attempt to find such set of words, that any word is accepted by this DFA iff it is a concatenation of words from this set. We do not know how to solve this problem in general efficiently, so we are using some algorithm which makes certain assumptions about DFA structure (and verifies this assumption along the way). Corresponding code lives in validator_ragel/verify_validators_dfa.py.

For technical reasons we subdivide all such words into two categories: regular instructions and superinstructions.

Superinstructions enumerated by validator_ragel/verify_validators_dfa.py are checked in validator_ragel/verify_superinstructions.py. For each byte sequence we call objdump to make sure that it does not end mid-instruction. Then we parse disassembly listing to determine whether it is indeed valid superinstruction (since part of validation logic resides in DFA action, some of the byte sequences accepted by automaton are invalid from sandboxing point of view). And then we invoke validator itself through python interface and check that it accepts or rejects given byte sequence according to sandboxing rules. There are less than dozen types of superinstructions and they are relatively easy to parse (it is done in function ValidateSuperinstruction32/64 in validator_ragel/spec.py), so we don’t bother to compare against the old validator for simplicity.

Regular instructions are enumerated and checked by validator_ragel/verify_regular_instructions.py. There are about 4M 32-bit instructions and 70M 64-bit instructions accepted by DFA, so it is quite a costly test (about an hour on z620). For each instruction, we call objdump to ensure that it’s indeed a single instruction. If text-based specification rejects this instruction, we make sure RDFA validator rejects it as well. There is no point for checks in other direction, because we ultimately enumerate only byte sequences RDFA accepts (enumerating all sequences would be impossible).

Actually, this scheme works for 32-bit validator. 64-bit one is additionally complicated by the fact that information flows between instructions (in the form of current restricted register). So we have to ensure that specification and RDFA validator agree on instruction pre- and post- conditions.

Also, just as in exhaustive decoder test, we actually do not try all possible values for ‘anybytes’ (and direct jump targets fall into this category). Anyway, checking jump targets logic is not the goal of this test (we rely on manually written targeted tests for jumps instead).

There is similar test in validator_ragel/verify_regular_instruction_old.py. Instead of comparing against text-based specification, it compares against old validator (and additionally objdump is used to check that instruction length is determined correctly). Hopefully we will be able to get rid of it soon.

How to run:

  ./scons dfacheckvalidator platform=x86-64

This test requires ragel and takes a lot of time, so it can only be run on linux. It uses python interface to validator (implemented as DSO), so supplied value of ‘platform’ parameter should match python bitness. Additionally, since it uses both 32-bit and 64-bit ncvals, following commands should be run before manually:

  ./scons ncval platform=x86-32

  ./scons ncval platform=x86-64

(this requirement can’t be represented as scons dependencies because these targets span across different platform configurations)

How do we apply text-based specification to sequences of bytes?

Of course we could use objdump to get disassembly, but that raises the question how reliable is objdump in presence of invalid instructions (which is not its intended use case)?

Objdump is well-tested for the instructions which make sense and which CPU accepts (each time someone adds the instruction to gas it's added to objdump with the appropriate tests and everything) but it's not all that good for incorrect instructions (especially ones which are similar to other, existing, instructions). E.g.:

   0:        66 0f 78 c0 02 01            extrq  $0x1,$0x2,%xmm0

   6:        c5 f8 28 d1                  vmovaps %xmm1,%xmm2

   a:        c5 f2 2a d0                  vcvtsi2ss %eax,%xmm1,%xmm2

   e:        66 0f 78 00                  extrq  $0x0,$0x78,(bad)

  12:        c5 fa 28 d1                  vmovaps %xmm1,%xmm2

  16:        c5 f0 2a                     (bad)  

  19:        d0                                   .byte 0xd0

First three instructions are correct and instructions after that point are minor modifications of the existing instructions (extrq with register, not memory, vmovaps with vex.pp changed from 00 to 10, and vcvtsi2ss with vex.pp changed from 10 to 00). Objdump can declare instruction "(bad)", it can detect that it's "(bad)" in the middle of instruction or can just confuse it for different, real instruction!

Suppose for some reason DFA accepts completely meaningless sequence of bytes, but objdump incorrectly decodes it as innocent instruction, which text-based specification allows. This situation is undesirable. That’s why we use our own RDFA decoder (which is tailored to mimic objdump output) instead of objdump itself.

RDFA decoder is designed to never accept invalid instructions (it always produces the same ‘unrecognized instruction’ message where objdump might try some guesswork). In order for decoding problem to go unnoticed by exhaustive decoder test, RDFA decoder should accept invalid sequence, objdump should accept the same invalid sequence, and they both should produce identically incorrect output. So using RDFA decoder in exhaustive validator test makes it extremely unlikely that text-based specification would have to deal with incorrectly decoded instruction.

Presubmit script

The script resides in validator_ragel/PRESUBMIT.py and performs two checks:

  1. when files that can affect generated validator files are updated, it reminds developer to run ./scons dfagen to regenerate these files
  2. when files that determine validator behavior (including generated files) are changed, it reminds developer to run ./scons dfacheckvalidator (because it can be only run locally)