IBM Year 2000 - y2kmas.gif
y
2
k

Countdown to 2000

  
IBM Year 2000 - y2kspace.gif Start scenario

Program source

Input seed file

Year 2000 Analysis

Formatted report

Analysis format

Seed format

Cross compilation

Overview

        

Home PagePrevious Page

Overview of Year 2000 Analysis

The Year 2000 Analysis Tool operates by making inferences about possible year-related impacts in each byte of each piece of data manipulated by a program. The analysis is based on seed information specified by the user, as well as the structure of the program itself.

Components

The Year 2000 Analysis Tool has two main components:
  • A front end component that takes a program's SYSADATA representation, and turns that representation into a set of simpler statements and expressions.
  • An inference engine that performs date inference on the collection of simpler statements and expressions.
This discussion will focus on the inference engine and its results.

Conservativity of Analysis: False Negatives and False Positives

The analysis engine must be conservative in making its inferences-- it makes worst-case assumptions about possible inputs, under the principle that a false positive (incorrectly reporting year impacts in non-year-related data) is preferable to a false negative (incorrectly missing year impacts in year-affected data).

Flow-Insensitivity

The Year 2000 analysis is conservative in an additional important way: it is oblivious to a program's control flow, and views a program as a collection of statements that could be executed in any order. This assumption sometimes results in false positives, but it also allows the inference algorithm to be reasonably fast and space-efficient without sacrificing accuracy in the most important cases.

Analysis of Program Statements

The Year 2000 Analysis Tool does complicated inferencing based on the data flow of individual statements in the user program. The two varieties of statement in which inferences are made are assignments and expressions. There are also two varieties of inference that can be made: kind assertion and kind propagation.
  • In an assignment, the basic inference is that the source kind and the target kind are the same. We say that the kind of the source is propagated to the target, and vice-versa.
  • In an expression, there are more complex inference rules. In a conditional expression such as IF A=B ..., as in an assignment, the kinds of A and B are inferred to be the same via propagation. In some expressions, depending on the operators or functions (for example, A+B or MAX(A,B)), the kinds are also inferred to be the same via propagation. However, there are also many expressions such as A**B or SUBSTR(A,B) where the kinds of A and B are not inferred to be the same. Orthogonally, there may be a kind assertion about one of the arguments to a function or operator. For example, in the case of SUBSTR(A,B) the inference engine can assert that the B is, at least once in the program, <USED-AS-NON-YEAR>, since it is being used as a character position in the string A.
As an added complication, "sameness" of kinds may refer to byte-by-byte sameness, or the kinds of all bytes may be "scrambled", depending on the operators and type conversions in the expression or assignment.

"Year" vs. "Year-Related"

The primary goal of the inference engine is to discover uses of year and year-duration digits. (A year-duration describes the difference between dates when years are relevant. For example, we say that the difference between 1996 and 2002 is 6, and describe the 6 as a year-duration.) The Year 2000 Analysis inference engine does not seek to make a distinction between years, year-durations and values that are "less year-related" than year-durations. Examples of such "less year-related" data include month- and day-durations, some months and days resulting from year calculations, and some financial values resulting from computations on year-related durations. The lack of a distinction between "year" and "year-related" is often useful. For instance, if a variable containing a year is expanded from two to four digits, other "less year-related" values will have to be expanded as well. On the other hand, some data that are only "distantly" year related (and which do not require alteration) will also be inferred to be year-related.

Garbage In/Garbage Out

The accuracy of the analysis is affected to a great degree by the accuracy with which year impacts in seed assertions have been specified. If a piece of data has been inaccurately identified in the seed file as having a year impact, the results of the inference will be inaccurate as a consequence.

Taking out the Garbage: Finding and Fixing Incorrect Inferences

The <USED-AS-YEAR-AND-NON-YEAR> result is an indication that there is some unusual situation which should be understood by the user. There are many reasons why a variable may be marked <USED-AS-YEAR-AND-NON-YEAR>. For example, the variable may truly have been used as both a year and non year (eg. I=1 and I=1995) at different points in the program. Or it may be that incorrect seed matching or inferences were made (eg. variable named "MyEar" matched pattern *YEAR*). As these examples show, each <USED-AS-YEAR-OR-NON-YEAR> is likely to be some unusual or special case.
  • It is strongly recommended that the user review all cases where a variable is marked <USED-AS-YEAR-AND-NON-YEAR>.
Modifying the input seed file and re-running the Year 2000 Analysis can then be used to correct the reporting problem. The user may perhaps use the <ALWAYS-YEAR> or <ALWAYS-NON-YEAR> controls, or via modifying the <PATTERN> match, excluding certain matches with the <ATTRIBUTE> or <EXCLUDE> tags. False positives may also result from "over-propagation" of year-relatedness. A primary culprit in this problem is the use of "multi-kind" or "multi-typed" variables. For example, some programs may contain serviceability-oriented code that, when an error occurs, dumps all important variables one by one through an output buffer (say an 80 character line buffer). The Year 2000 Analysis Tool inference engine will infer that all variables which "go through" the buffer have the same kind. In these cases, the simplest and "most correct" solution is to modify the input seed file, marking the buffer variable <PROPAGATE>NO.

Storage-Based Inference

Inference information is conceptually associated with each byte of memory to which the user can refer in his program. No distinction is made when the same storage is referred to by different names (user source symbols), provided the names are of the same data type. This can happen in PL/I, for instance, when one variable is DEFINED on top of another, or in COBOL when a variable REDEFINES another. The inference results reported for bytes of data common to both variables will be the same. Please note however that data type information can influence the Year 2000 Analysis inference engine, in the same way that referring to the same storage by different names with different data types can affect the code generated by a compiler. The smallest unit of memory about which inference is performed is the byte; information about PL/I bit strings of length less than one byte is combined and associated with the nearest containing byte.

Basic Kinds

Date inference works by propagating and manipulating four simple values: UNKNOWN, <USED-AS-YEAR>, <USED-AS-NON-YEAR>, and <USED-AS-YEAR-AND-NON-YEAR>. These basic kind values describe possible year-related impacts in every byte of memory that the program manipulates. For any given byte of memory, the basic kinds can be interpreted as follows:
  • UNKNOWN means that the byte is not known to contain a year-related value.
  • <USED-AS-YEAR> means that the byte has been inferred to contain a year-related value at least one time during the program's execution.
  • <USED-AS-NON-YEAR> means that the byte has been inferred to contain a non-year-related value at least one time during the program's execution.
  • <USED-AS-YEAR-AND-NON-YEAR> means that the byte has been inferred to contain both year and non-year related values during the program's execution.

Constant Kinds

When specifying seed information, the user can assert that a piece of data has a constant kind. If the user makes such an assertion, it is a promise to the inference engine that a piece of memory has exactly the kind specified, and that the inference process should not attempt to change the kind based on information inferred (conservatively) from other parts of the program. Use of (correct!) constant kind seed assertions can help the inference engine produce more accurate results. There are two constant kinds that the user can specify: <ALWAYS-YEAR> and <ALWAYS-NON-YEAR>.
  • <ALWAYS-YEAR> means that the user has asserted in the seed file that the byte always contains a year-related value.
  • <ALWAYS-NON-YEAR> means that the byte has been asserted never to contain a year-related value.
Contrary evidence will be ignored by the inference engine. Only the user can cause a constant kind to be associated with some piece of memory; the inference engine never infers a constant kind for a piece of memory which the user has not asserted to have a constant kind.

Special Constant Kind: <PROPAGATE>NO

In addition to the user-specified constant kinds above, there is a special non-propagating constant kind, <PROPAGATE>NO, associated with memory locations specified that way in the seed file. As with the "<ALWAYS...>" constant kinds, the inference engine never changes the kind associated with bytes possessing the <PROPAGATE>NO kind. In addition, only the user can assert a <PROPAGATE>NO kind.

Direct <USED-AS-YEAR> and <USED-AS-NON-YEAR> Inferences

There are some program constructs for which year or non-year inferences can be made directly, without other seed information. For instance, the PL/I "date()" built-in function yields some year-related and some non-year-related (month and day) results, and the "sin()" function has both a non-year-related argument and result. The user can enable or disable such "direct" inferences through the use of <seed><builtin-year> in the seed file. The default is not to make these direct inferences.

Direct <USED-AS-YEAR> and <USED-AS-NON-YEAR> Inferences: PL/I literals

Another form of direct inference is done with PL/I literals. For example, if there is a statement str="Hello world." then the Year 2000 Analysis tool can infer that bytes 1 through 12 of the string str are, at least once, used as non-year. This inference is currently done only in the PL/I, not the COBOL case. Again, the <seed><builtin-year> must be specified in the seed file.



Comments to: shieldhe@us.ibm.com


IBM HomeOrderContact IBMPrivacyLegal

Home PagePrevious PageTop of Page