September 05, 2007, 12:46
RTCAfter posting the above, I found an article from a California DWI Defense Lawyer's webpage in which the importance of this very issue (knowing the source code) is discussed. While the breath machine is a different model, the principal behind why knowing the source code is important is the same.
Wednesday, August 29, 2007
Successful DUI Breath Test Machine Attacks - Source Code update
SOURCE CODE OF THE DRAEGER ALCOTEST 7110 MKIII-C
After two years of attempting to get the computer based source code for the Alcotest 7110 MKIII-C, premier DUI defense attorneys in State v. Chun were successful in obtaining the code, and had it analyzed by Base One Technologies, Inc.
By making itself a party to the litigation after the oral arguments in April, Draeger subjected itself to the Supreme Court's directive that Draeger ultimately provide the source code to the defendants' software analysis house, Base One.
Despite Draeger's protestations that the code was proprietary, Base One found that the code consists mostly of general algorithms arranged in a manner to implement the breath testing sequence. "That is, the code is not really unique or proprietary. "
In a report released August 28, 2007, Base One determined:
As a matter of public safety, the Alcotest should be suspended from use until the software has been reviewed against an acceptable set of software development standards, and recoded and tested if necessary. An incorrect breath test could lead to accidents and possible loss of life, because the device might not detect a person who is under the influence, and that person would be allowed to drive. The possibility also exists that a person not under the influence could be wrongly accused and/or convicted.
Draeger reviewed the code, as well, through its software house, SysTest Labs, which agreed with Base One, that the patchwork code that makes up the 7110 is not written well, nor is it written to any defined coding standard. SysTest said, "The Alcotest NJ3.11 source code appears to have evolved over numerous transitions and versioning, which is responsible for cyclomatic complexity."
The best thing SysTest said about the machine was, "The translation from German to English of the comments within the major components shows the logical intent of the programmers to produce reliable and valid test results. SysTest was unable to find any evidence of any intention to mis-direct or re-direct the test results or report anything other than valid results."
SysTest only looked for "mal-ware", not for functioning of the code.
Base One, however, did an extensive evaluation, finding 19,400 potential errors in the code.
Among its findings are:
1. The Alcotest Software Would Not Pass U.S. Industry Standards for Software Development and Testing: The program presented shows ample evidence of incomplete design, incomplete verification of design, and incomplete "white box" and "black box" testing. Therefore the software has to be considered unreliable and untested, and in several cases it does not meet stated requirements. The planning and documentation of the design is haphazard. Sections of the original code and modified code show evidence of using an experimental approach to coding, or use what is best described as the "trial and error" method. Several sections are marked as "temporary, for now". Other sections were added to existing modules or inserted in a code stream, leading to a patchwork design and coding style.
The software development life-cycle concept is governed by one of the nationally and internationally recognized development standards to prevent defects from entering the software during the design process, and to find and eliminate more defects as the software is coded, tested, and released to the field. This concept of software development using standards requires extensive and meticulous supporting data, and notations in source files, and a configuration management system. None of this methodology is evident in the Alcotest code. Further, the decision method of how to allocate the architecture and assignment of tasks does not match any of the software standards. This further substantiates that software development standards were not used to verify or test the software, including the ISO 9000 family of standards.
It is clear that, as submitted, the Alcotest software would not pass development standards and testing for the U.S. Government or Military. It would fail software standards for the Federal Aviation Administration (FAA) and Federal Drug Administration (FDA), as well as commercial standards used in devices for public safety. This means the Alcotest would not be considered for military applications such as analyzing breath alcohol for fighter pilots. If the FAA imposed mandatory alcohol testing for all commercial pilots, the Alcotest would be rejected based upon the FAA safety and software standards.
2. Readings are Not Averaged Correctly: When the software takes a series of readings, it first averages the first two readings. Then, it averages the third reading with the average just computed. Then the fourth reading is averaged with the new average, and so on. There is no comment or note detailing a reason for this calculation, which would cause the first reading to have more weight than successive readings. Nonetheless, the comments say that the values should be averaged, and they are not.
3. Results Limited to Small, Discrete Values: The A/D converters measuring the IR readings and the fuel cell readings can produce values between 0 and 4095. However, the software divides the final average(s) by 256, meaning the final result can only have 16 values to represent the five-volt range (or less), or, represent the range of alcohol readings possible. This is a loss of precision in the data; of a possible twelve bits of information, only four bits are used. Further, because of an attribute in the IR calculations, the result value is further divided in half. This means that only 8 values are possible for the IR detection, and this is compared against the 16 values of the fuel cell.
4. Catastrophic Error Detection Is Disabled: An interrupt that detects that the microprocessor is trying to execute an illegal instruction is disabled, meaning that the Alcotest software could appear to run correctly while executing wild branches or invalid code for a period of time. Other interrupts ignored are the Computer Operating Property (a watchdog timer), and the Software Interrupt.
5. Implemented Design Lacks Positive Feedback: The software controls electrical lines, which switch devices on and off, such as an air pump, infrared source, etc. The design does not provide a monitoring sensory line (loop back) for the software to detect that the device state actually changed. This means that the software assumes the change in state is always correct, but it cannot verify the action.
6. Diagnostics Adjust/Substitute Data Readings: The diagnostic routines for the Analog to Digital (A/D) Converters will substitute arbitrary, favorable readings for the measured device if the measurement is out of range, either too high or too low. The values will be forced to a high or low limit, respectively. This error condition is suppressed unless it occurs frequently enough.
7. Flow Measurements Adjusted/Substitute d: The software takes an airflow measurement at power-up, and presumes this value is the "zero line" or baseline measurement for subsequent calculations. No quality check or reasonableness test is done on this measurement. Subsequent calculations are compared against this baseline measurement, and the difference is the change in airflow. If the airflow is slower than the baseline, this would result in a negative flow measurement, so the software simply adjusts the negative reading to a positive value.
If the measurement of a later baseline is taken, and the measurement is declared in error by the software, the software simply uses the last "good" baseline, and continues to read flow values from a declared erroneous measurement device.
8. Range Limits Are Substituted for Incorrect Average Measurements: In a manner similar to the diagnostics, voltage values are read and averaged into a value. If the resulting average is a value out of range, the averaged value is changed to the low or high limit value. If the value is out of range after averaging, this should indicate a serious problem, such as a failed A/D converter.
9. Code Does Not Detect Data Variations
10. Error Detection Logic: The software design detects measurement errors, but ignores these errors unless they occur a consecutive total number of times. For example, in the airflow measuring logic, if a flow measurement is above the prescribed maximum value, it is called an error, but this error must occur 32 consecutive times for the error to be handled and displayed. This means that the error could occur 31 times, then appear within range once, then appear 31 times, etc., and never be reported. The software uses different criteria values (e.g. 10 instead of 32) for the measurements of the various Alcotest components, but the error detection logic is the same as described.
11. Timing Problems: The design of the code is to run in timed units of 8.192 milliseconds, by means of an interrupt signal to a handler, which then signals the main program control that it can continue to the next segment. The interrupt goes off every 8.192 ms, not 8.192 ms from my latest request for a time delay. The more often the code calls a single 8.192 ms interrupt, the more inaccurate the software timing can be, because the requests from the mainline software instructions are out of phase with the continuously operating timer interrupt routine.
12. Defects In Three Out Of Five Lines Of Code: A universal tool in the open-source community, called Lint, was used to analyze the source code written in C. This program uncovers a range of problems from minor to serious problems that can halt or cripple the program operation. This Lint program has been used for many years. It uncovered that there are 3 error lines for every 5 lines of source code in C.
While Draeger's counsel claims that the "The Alcotest [7110] is the single best microprocessor- driven evidential breath tester on the market", Draeger has already replaced the antiquated 7110 with a newer Windows� based version, the 9510. The computer code in the 7110 is written on an Atari�-styled chip, utilizing fifteen to twenty year old technology in 1970s coding style.
There is no doubt that the Supreme Court should declare this machine to be unreliable. If this happens, based on an agreement entered into over 4 years ago between the State and Draeger, the taxpayers of New Jersey can recover the almost $7 million spent on these machines.
The premier DUI criminal defense lawyer returns to court on September 17th to hash this out, unless the Special Master decides the issues without a hearing.
For those California jurisdictions using the Draeger, California DUI defense lawyers prepare.
January 13, 2008, 21:50
ericccRecent developments in DUI litigation unexpectedly bleed into the realm of
computer security.
INTRODUCTION
Computer security enthusiasts are naturally interested in software quality.
They know that proper software engineering and development is necessary for
the justified extension of trust to computing and communication systems.
The search for trust appears to have lately received an unexpected ally:
according to a small but growing number of DUI defendants, breath alcohol
testing devices cannot be trusted unless defense experts are permitted to
analyze the source code for the software that controls them.
Is there now an alliance between DUI defendants and computer security
professionals? To the extent that they are both interested in trust of
computing services, the answer is, "yes."
The search for trust is really a search for dependability. Dependability is
an umbrella concept in computer science that includes five core components:
integrity, availability, safety, maintainability and reliability.1 Those
who pursue computer security recognize the first two components as
essential. Those who use evidence that is i) scientific or technical, and
ii) the output of a computer should recognize the last as critical.2 Thus,
DUI defense and computer security are indeed joined by their respective
pursuits of computer dependability and trust.
However, this alliance is certainly not to the exclusion of police, crime
labs, and prosecutors. To the extent evidence is the output of a computer,
such as a breath test device, law enforcement pursues computer dependability
with zeal equal to (probably exceeding) that of the defense.
Law enforcement pursues the reliability of breath test evidence using a
range of elaborate methods. Central to those methods is black box testing.
In this context, black box testing involves the input of certified known
solutions of ethanol into a breath testing instrument. The idea is that, if
the instrument measures the known inputs correctly both before and after the
defendant's tests, then by implication the instrument must be working
properly and accurately at the time of the defendant's tests. At trial,
prosecutors depend, in part, on this "before/after" testing to persuade
judges and juries that evidence from a given breath testing instrument is
reliable and trustworthy.
Some DUI defendants are recently claiming that this black box testing is
insufficient to establish the reliability of breath test evidence. One
notable example is the case of State v. Chun, a consolidated case involving
20 defendants who collectively demanded that the State of New Jersey
(hereinafter, "State") disclose the source code for its breath testing
instrument, the Draeger brand Alcotest 7110 MKIII-C.3 The Chun defendants
alleged that the reliability of the State's breath test evidence could only
be established by a post-hoc source code review or audit. In particular,
they claimed that "an actual source code review is necessary as there could
be hidden techniques [in the software] that would allow for altering data
and/or blatant coding errors that skew the accuracy of the instrument's
results."4 If permitted, a post-hoc source code review would be quite a
commitment, since the firmware for the Alcotest breath tester contained more
than 45,000 lines of C/C++ code.
After protracted litigation, the Chun defendants convinced a court to grant
review of the Draeger Alcotest source code firmware, version NJ3.11 (the
actual version at issue in New Jersey). So that the defense was not left
with the first, last, and only word on the "quality" of the NJ3.11 firmware,
Draeger also contracted an expert to conduct a source code review. Finally,
to resolve anticipated differences and to facilitate understanding, the
court appointed its own expert to report on the work of the parties'
experts.
THE CHUN SOURCE CODE REVIEWS
The defense hired Base One Technologies to conduct a static source code
review. Base One used the following tools to conduct its review: Lint, MS
Visual C++ Development Environment and Compiler, Borland C++, IAR Embedded C
Compiler, Understand C code analyzer, Source Format X, Beyond Compare, and
others. Since at least some of the comments for the NJ3.11 source code were
in German, Base One used AltaVista Babelfish web translation service to
translate the comments into English.5
In its final report, Base One made a number of criticisms of the NJ3.11
firmware.6 Perhaps the most incendiary charge, and the one most quoted on
DUI defense attorney blogs, was that, in some cases, if a diagnostic routine
fails, then the Alcotest "will substitute arbitrary canned data values"
thereby affecting the breath measurements. The apparent implication of this
allegation is that the Alcotest (at least for version NJ3.11) fabricates
breath test evidence.
Base One made other notable findings. It said there was "proof of
incomplete testing" of the code. This is an odd observation to make since
it is well established that complete testing of non-trivial software is
"impossible."7 Base One also wrote that "catastrophic error detection" was
improperly disabled; that the firmware would not pass "U.S. industry
standards" for software and testing; that the programming "does not
insulate/protect modules or data"; and that "incorrectly coded or modified
functions can inadvertently modify a data value not part of that routine's
sphere of influence."
Prior to submission to the Chun court, Base One's report was assessed by the
court's source code expert, the CMX Group.8 CMX was mostly critical of Base
One's report. In particular, CMX wrote that more than a few of Base One's
claims were "unsupported," or contained "misleading observations," or were
"pure speculation," or had no supporting evidence, or were flatly
contradictory. CMX also impugned Base One's knowledge of software standards
as being "inaccurate." Further, CMX said that Base One used inappropriate
"innuendo" as well as unsubstantiated phrases such as "clearly" and "ample
evidence," and also used non-specific phrases such as "industry standards"
without sufficient elaboration. Finally, CMX found as empirically
unsupported Base One's claim that the NJ3.11 firmware substitutes arbitrary
data values for authentic ones.
CMX also wrote that the Base One reviewer may be "unaware" of some system
testing tools necessary to perform an adequate review, or may not have had
much experience in the relevant technologies. CMX noted that Base One's
unspecific, misdirected, or false statements demonstrated "why companies do
not want to expose their internal code.[since] [i]t looks as if they are
covering up error while, in reality, this is the way that all code has to be
written for controlling and coordinating hardware." In sum, CMX concluded
Base One "[did] not succeed" in dislodging the presumption of reliability of
the Alcotest 7110 MKIII-C breath testing device, firmware version NJ3.11.9
For its part, Alcotest manufacturer Draeger hired SysTest Labs, a nationally
known software testing company, to review of the NJ3.11 firmware. SysTest
conducted a line-by-line, static code review, but did not stop there: it
also performed code tracing, reverse engineering, code navigation and code
metrics. SysTest used Understand C, Fortify SCA, and in-house software
assessment tools. Instead of using Babelfish, SysTest employed a
professional, human translation service to interpret the German source code
comments. SysTest documented 602 hours of labor on its source code review.
SysTest also found problems with the NJ3.11 code. It noted that critical
test data was stored in global variables, a practice that is undesirable
"because any function in the application can [theoretically] change the
data." SysTest noted at least 56 uncalled functions, at least as many
documented uncalled objects, one documented unused type, numerous functions
with higher than recommended "cyclomatic complexity,"10 non-descriptive
variable names such as "dummy" and "temp," and a buffer overflow. However,
in spite of the problems found, SysTest concluded that none affected the
reliability of the NJ3.11 firmware breath tests.
As opposed to assessment of Base One, the Chun court's expert (CMX) wrote
favorably of SysTest's review. CMX found almost all of SysTest's claims
were "substantiated," and that its analysis was "impressive" in that it were
not only able to run both "code stylistic" tests, through the use of
automation tools, (as Base One did) but also a series of logical tests of
the application by submitting combinations and permutations of data that
would expose the potential buffer overflow condition. CMX also noted that,
"[i]n contrast to the Base One Technologies review, the SysTest Labs report
is replete with empirical listings and line counts of examples of the
conditions, and criticisms they found."
CONCLUSION
The facts in Chun presented an enormous opportunity to advance the cause of
dependable computing. Were the defense able to raise legitimate reliability
issues regarding the NJ3.11 firmware, it is likely that the issue of
dependable computing would have received increased attention, understanding
and respect from the public at large.
Unfortunately, however, the defense flubbed this important opportunity.
Interested readers who take the time to read the Chun litigation material
will likely conclude that the defense accomplished very little with its
source code review. Base One's review was contradictory, undocumented,
non-empirical, misleading, and speculative. And although the SysTest report
was mostly supportive, some will undoubtedly question whether 602 hours of
post-hoc analysis, by a manufacturer-contracted expert, is sufficient to
guarantee the reliability of NJ3.11 code. Consequently, computer security
enthusiasts and genuine dependable computing advocates shall continue to
wait for the untutored establishment to understand and to appreciate the
importance of proper software quality assurance.
1 Avizienis, et al. "Basic Concepts and Taxonomy of Dependable and Secure
Computing," IEEE Transactions on Dependable and Secure Computing, Vol. 1,
No. 1, at 13, January March 2004.
2 Kumho Tire Co. v. Carmichael, 526 U.S. 137 (1999).
3 Supreme Court of New Jersey, Docket No. 58-879, available at
http://www.risk-averse.com/index_files/chun.pdf.
4 Norman Dee, CMX Group "Comments on the Source Code Reviews," available at
http://www.risk-averse.com/index_files/sm.pdf.
5 John J. Wisniewski, Base One Technologies, "Report on Behalf of the
Defendants," available at
http://www.risk-averse.com/index_files/bo.pdf.
6 Id.
7 Kem Caner, "The Impossibility of Complete Testing," SOFTWARE QA, v.4, #4,
p. 28 (1997), available at
http://www.kaner.com/pdfs/imposs.pdf.
8 Supra note 3.
9 Supra note 3.
10 In its report, SysTest defined "cyclomatic complexity" as a "standard
measure of source code complexity indicative of both understandability and
maintainability." See SysTest, "Assessment Report for Draeger Safety
Diagnostics, Inc.," available at
http://www.risk-averse.com/index_files/st.pdf.
Eric Van Buskirk, JD, MA, CISSP