Table of Contents
Verification Testing of Rule Bases
Local Verification Criteria
One major problem in building and maintaining rule-based systems is verification, which evaluates the rule base for criteria such as missing rules. It can be thought of as "Are we building the system right?" Suitable verification techniques are difficult to design and implement because current systems heuristically analyze the complete rule base. The complexity of any interesting rule base makes exhaustive testing appear impossible.
However, if we carefully examine established verification criteria, they fall into one of two tiers. The first tier of verification criteria is focused on the global relationships between the rule clusters (sets of conditions which satisfy common conclusions) such as reachability and dead-end IFs. The second tier of verification criteria is concerned with the local properties of a specific rule cluster such as missing rules and inconsistent rules. The significance of this approach is that global criteria are not computationally expensive to test and we can perform exhaustive testing for local criteria on each rule cluster because of the smaller size. If properly verified, the rule base represents a closed world: local verification criteria are met for the rule base as a whole if they are met for each rule cluster individually. This approach allows exhaustive verification testing for all but the most exotic domains.
Keywords: Expert system, rule base, verification, normalization, simplification, decomposition, anomaly.
Some authors use the terms verification and validation as synonyms. We will use the term verification to refer to the objective properties that are desirable in a system and the term validation to refer to the desirable subjective properties. Examples of verification criteria are missing, conflicting, or redundant rules.
A recent editorial in AI Expert hints at the problems in verification today. In an informal survey, expert system builders stated that the systems they have delivered cover from 60% to 95% of the search space. In other words, up to 40% of the conclusions reached by some systems are unreliable because the system does not have the necessary knowledge
to reach a valid conclusion. The users of these systems (not surprisingly) were dissatisfied with the accuracy of the systems . For another example, XCON has an accuracy rating of 95% in a domain that is 100% known . Most experts consider exhaustive verification testing impractical , but the heuristic approaches being practiced do not appear to be sufficiently rigorous.
Verification is one of the driving forces behind TRIMM, The Relational Intelligence Management Methodology . TRIMM is a full life-cycle development methodology for rule-based systems that is focused on verifiability, low maintenance demands, long-term accuracy, ease of use, and efficiency. It takes the position that verification should be addressed by design, not correction. TRIMM constructs systems that meet most verification criteria transparently and has been implemented in the expert system CASE tool EZ-Xpert.
TRIMM transparently eliminates many verification problems during construction and refinement. It divides verification criteria into local tests, such as missing rules, and global tests, such as reachability. It uses an Attribute Dictionary and decision tables to transparently meet most local verification criteria; the Rules Dictionary and Attribute Dictionary to ensure global criteria; and a three-stage refinement technique (the R3 Algorithms) to minimize opportunities for maintenance anomalies and eliminate irrelevant conditions. This paper discusses the derivation of local and global verification criteria. Readers desiring more detail on maintenance anomalies and the R3 Algorithms are directed to , and those desiring more detail on the TRIMM Methodology are directed to . This paper will first present an overview of verification approaches. Next, we will derive a taxonomy of global and local verification criteria. Finally, we conclude with ideas for further research in this domain.
Verification Testing of Rule Bases
The first verification program was TEIRESIAS . It was designed to work in conjunction with a completed MYCIN rule base, and would generate a rule model showing which conditions were used to reach which conclusions. When new rules were added to MYCIN, TEIRESIAS would compare the new rule against the rule model and suggest the addition of any missingconditions.
ONCOCIN  attempted to perform verification as knowledge was input to the system by partitioning the rule base into what this paper will call rule clusters. Each rule cluster is an entire set of rules that use the same conditions to reach the same conclusions. The rule base is the sum of all of the rule clusters. ONCOCIN creates a table for each rule cluster and tests the rule cluster for conflicts, redundancy, subsumption, and missing rules. The users of ONCOCIN found the rule checker to be very useful in debugging the rule base as it evolved. The only significant problem with this approach was (and is) combinatorial explosion, in which an impossibly large number of combinations exist.
Another pioneering effort came from TIMM , an induction-based shell, which tests for inconsistency, both in the normal sense and TIMM's implementation-specific sense; a rule with two or more conclusions is considered inconsistent by TIMM. Another shell with limited verification testing is KES. KES includes a tool named INSPECTOR, which tests for recursive and unattached attributes . Both of these systems were used on the complete rule base.
In more recent times, two major efforts have come from Nyugen's CHECK system  and Stachowitz's EVA system . Both systems heuristically test the complete rule base for a wide variety of criteria, and present the most ambitious sets of verification criteria.
Stachowitz proposes the following verification criteria.
1. Consistency, both direct (a) and indirect (b).
2. Numeric Completeness.
3. Inconsistency under generalization.
4. Inconsistency under incompatibility.
5. Inconsistency under synonymy.
6. Conflict under generalization.
7. Conflict under incompatibility.
8. Conflict under synonymy.
16. Indirect Cycle.
17. Legal range.
18. Legal values.
19. Data types.
20. Legal values for individual arguments.
21. Minimum and Maximum occurrences of rules.
22. Illegal values (incompatible).
23. Legal value combinations.
24. Subrelation argument and data type consistency.
25. Omission testing.
26. Rule proposer.
27. Behavior Verifier.
28. Control Checker.
Nyugen verifies the following criteria with CHECK:
A. Subsumed rules.
C. Conflicting Rules.
D. Unnecessary IF statements.
E. Circular rules.
G. Unreferenced attribute values.
H. Illegal attribute values.
I. Unreachable conclusions.
J. Dead-end IFs and goals.
Both EVA and CHECK approach the rule base as a whole, making exhaustive testing impossible. They both take a heuristic approach to verification. A heuristic verification of the rule base results in the statement "I believe that the rule base is correct," where exhaustive verification results in the statement "I know that the rule base is correct." By partitioning the established verification criteria into local and global tests, we may achieve exhaustive verification testing and more accurate expert systems.
Local Verification Criteria
Any rule base may be organized into a set of rule clusters. A rule cluster is a set of rules using the same conditions to reach the same conclusion(s). Local verification criteria are evaluated for each rule cluster. If every rule cluster meets the criteria and the Closed World Assumption (CWA) is valid, then the entire rule base meets the criteria. As the rule cluster is far simpler than the rule base, we can exhaustively verify all but the most exotic domains. There are four types of local verification criteria: completeness, consistency, domain constraints, and conciseness.
1. Completeness(Nyugen F, Stachowitz 25, 26) is a fundamental verification criteria and an essential element of the CWA. Missing rules cause the expert system to be unreliable in consultations and difficult to maintain. To be complete, some rule in the cluster must account for every combination of condition values. At some point, the rule cluster will have to be compared to a Cartesian product of the condition values to evaluate completeness. This may be computationally intractable in the most exotic domains due to combinatorial explosion; however, a 648-rule cluster is computationally tractable in a PC environment.
2. Consistency.Each rule cluster must be internally consistent (Nyugen C, Stachowitz 1a, 2, 3, 4, 5, 6, 7, 8). No rules may reach different conclusions when given the same input. Special considerations include generalization, incompatibility, and synonomy.
3. Domain Constraints. All values must be consistent with the domain (Nyugen H, Stachowitz 17, 18, 19, 20, 21, 22, and 23). These values must be consistent both locally and globally with respect to legal values, range, and data types. In addition to these general verification criteria, the domain must also meet implementation-specific criteria such as host language reserved words and syntax constraints.
4. Conciseness. The knowledge contained in the expert system should be as concise as possible. Subsumed rules (Nyugen A, Stachowitz 11), redundancy (Nyugen B, Stachowitz 10), and structural flaws are all special cases of unnecessary IF statements (Nyugen D). Each unnecessary IF statement represents an opportunity for an explicit maintenance anomaly. Structural flaws in the rule cluster treatable by normalization or decomposition also create opportunities for explicit anomalies during maintenance . Local cycles (Stachowitz 12) are another type of structural flaw. Knowledge refinement techniques that derive a more concise rule base include normalization, decomposition, text simplification, and range simplification.
To meet local verification criteria, TRIMM constructs each rule cluster in a Decision Table Relation (DTR). A DTR is a two-dimensional table that is populated with a Cartesian product of the domain of the conditions and a mapping of conclusions to each set of conditions. This results in a complete, consistent rule cluster that satisfies domain constraints, but it also contains rules that are too specific. The most concise representation of the knowledge in the rule cluster may be derived algorithmically by ID3  or the R3 Algorithms implemented in EZ-Xpert.
Many of the tests for local verification criteria are computationally intractable for the entire rule base, but practical for an individual rule cluster. Testing for consistency, for example, involves exhaustively comparing each rule in the rule cluster to every other rule in the rule cluster. In a DOS environment, EZ-Xpert's logical simplification algorithm is suitable for domains where each conclusion can be satisfied with 512 or fewer rules; rule base size is limited by disk space. In practice, this includes virtually all rule clusters. In a 1989 survey of Fortune 500 companies, less than half of the companies reported having systems larger than 60 rules and the largest rule base reported was 4000 rules ; the domains in which combinatorial explosion makes decision tables unsuitable are not pathological, but they are uncommon in business expert systems. Other algorithms such as ID3 are more suitable for exotic domains.
For domains (or portions of domains) that are computationally tractable, DTRs offer many benefits. Properly executed, they eliminate most local verification problems. As two-dimensional arrays, they are well-suited to algorithmic manipulation and refinement. They may be stored in a relational database. Finally, they can be syntactically transformed into ready-to-run expert system code for a variety of platforms.
The side effects of refinement are also very positive. Computational efficiency, storage requirements, maintenance requirements, and ease of understanding are all enhanced. Two standard tests for evaluating induction techniques are the Lens and Chess cases. The Lens rule cluster initially contains 24 rules with 4 conditions in each rule for a total of 96 clauses. ID3 refines this cluster to 9 rules with 31 clauses, where the R3 Simplification Algorithm yields 9 rules with 25 clauses. The Chess end-game test initially contains 648 rules with 7 conditions in each rule for a total of 4536 clauses. ID3 refines this cluster to 86 rules with 335 clauses, where the R3 Simplification Algorithm refines it to 20 rules with 60 clauses.
In addition to the tests performed on individual rule clusters, there are several global verification criteria. These criteria focus on the relationships between rule clusters and may be tested through the Attribute Dictionary and the Rules Dictionary. The Rules Dictionary contains the conditions and conclusions used in the rules, information about the state of the DTR (if it is complete and refined), and the location of the DTR, while the Attribute Dictionary contains the definitions and sources for each condition and conclusion.
There are only two types of Global Verification Criteria: instantiation and domain constraints.
Instantiation criteria are concerned with ensuring that every condition and conclusion will obtain a value during every run of the system. Problems with instantiation include a set of values missing from the rule set (Nyugen G), cycles (Nyugen E, Stachowitz 16) interfering with the inference engine, conclusions that do not match a condition or goal (Nyugen I, Stachowitz 15), conditions or conclusions that have no source such as a question of the user or another set of rules (Nyugen J, Stachowitz 9), and duplicate and subsumed rule clusters (Stachowitz 13 and 14).
Domain constraints must also be enforced at a global level. The local verification domain constraints must be enforced between rule clusters as well as within the rule cluster (Nyugen H, Stachowitz 1b and 24).
TRIMM meets global verification criteria through the Knowledge Dictionary. The DTR is formed using the values in the Attribute Dictionary, enforcing domain constraints both locally and globally. The Attribute Dictionary also records how each condition is instantiated. If it is instantiated by the user, a question is elicited; if it is instantiated by another source such as a database, the source is recorded; if it is instantiated by other rules, another rule cluster is elicited. This technique ensures that all conditions and conclusions will be instantiated and that no irrelevant rule clusters are created.
The structure of the rule clusters should also be considered. Each condition in the rule cluster should be independent of the others; if not, normalization should be performed. Rule clusters containing multiple conclusions should be evaluated to determine if decomposition is possible . Each rule cluster should contain the minimal set of conditions necessary to determine a minimal set of conclusions (preferably one).
The major limitation to this approach is that the Cartesian product may be inappropriate for (portions of) extraordinary domains because of combinatorial explosion; in other words, an impossibly large DTR may be generated. EZ-Xpert allows the inclusion of user-generated code for these domains. In the vast majority of expert systems, the benefits of the Cartesian product combined with refinement far outweigh its cost.
As described in this paper, TRIMM derives monotonic, backward-chaining, propositional systems. The extension to forward-chaining is straight forward. Nonmonotonicity problems are currently addressed only by changing the specifications.
More research is necessary to determine the optimum algorithms for refinement. The R3 algorithms favor completeness over computational efficiency [3, 10]. We argue that the cost expended during development is recovered many times by the computational efficiency of the delivered implementation, but more efficient algorithms will speed the development process.
The process described here also lends itself to deductive database systems. In this manner, executable specification can be realized. Again, more research is needed to make these systems practical for everyday use.
It is hoped that this paper will generate further interest into the refinement and verification of acquired knowledge.
This research is funded in part by a grant from the First Interstate Bank.
1. Ansari, A. and Modarress, Batoil. "Commercial Uses of Expert Systems in the U.S.A" Journal of Systems Management. December 1990.
2. Barker, Virginia E. and O'Connor, Dennis E. "Expert Systems for Configuration at Digital: XCON and Beyond." Communications of the ACM. Volume 32, Number 3, March 1989.
3. Cendrowska, Jadzia. "PRISM: An Algorithm for Inducing Modular Rules." Knowledge Acquisition for Knowledge-Based Systems, Gaines, B. and Boose, J., editors. Academic Press, 1988.
4. Davis, R. "Application of Meta-level Knowledge to the Construction, Maintenance, and Use of Large Knowledge Bases." Ph.D. Dissertation, Dept. of Computer Science, Stanford University, 1976.
5. Eliot, Lance. "Verification in Expert Systems."AI Expert. July 1992.
6. Hicks, Richard C. "The TRIMM Methodology for Maintainable Rule-Based Systems." Proceedings of the Fifth Golden West Conference on Intelligent Systems. Reno, Nevada, June 1992.
7. Hicks, Richard C. "Deriving Appropriate Rule Modules for Rule-Based Expert Systems." Journal of Computer Information Systems. Fall 1994.
8. KES General Description Manual. Software Architecture and Engineering, Inc. Arlington, VA, 1983.
9. Nyugen, T.A., Perkins, W.A., Laffey, T.J. and Pedora, D. "Checking an Expert System for Consistency and Completeness." Proceedings of the Ninth International Joint Conference on Artificial Intelligence.1985.
10. Quinlan, J. R. "Simplifying Decision Trees." Knowledge Acquisition for Knowledge-Based Systems. Gaines, B. and Boose, J., editors. Academic Press, 1988.
11. Stachowitz, Rolf. "Validation of Knowledge-Based Systems." Tutorial at the Hawaii International Conference on System Sciences., Kona, Hawaii, January 1990. 12. Suwa, M., Scott, A. C., and Shoftliffe, E. H. "An Approach to Verifying Completeness and Consistency in a Rule-Based Expert System." AI Magazine. Volume 3, No. 4, 1982.
13. TIMM Users Manual. General Research Corp., Santa Barbara, CA.