Thomas
R. Gruber
Sunil
Vemuri
James
Rice
Knowledge Systems Laboratory
Stanford
University
gruber@ksl.stanford.edu
The work illustrates several of the advantages of delivering product information in virtual documents. Since the documentation is generated on demand from engineering models, the information presented always reflects the current design model of the artifact. Because the documentation is delivered using standard WWW protocols, it can be truly integrated into other WWW-based documentation such as email-based design discussions, version-managed design documents, interactive tutorials, and information retrieval systems. Moreover, delivering product information in the form of virtual documents changes the way that documentation is "authored". Engineers can work in the medium of their practice --- annotated engineering models --- while a virtual document generator handles the rhetorical task of composing information to meet the needs of individual readers.
In this paper we demonstrate the application (with examples that run), describe some techniques used in deploying it on the Web, and discuss general properties of virtual documents exemplified by the system.
1.2 Virtual documents on the Web
1.3 The use of virtual documents for design documentation
2. The DME Virtual Document Generator
2.2 How does a DME virtual document look and feel?
2. 3 How does DME work as a virtual document?
3. Three Virtual Document Techniques
3.1 A One-button Query Interface
3.2 Preserving persistent query identity
3.3 Compilation versus on-demand generation
4. Discussion: Advantages and Limitations of Virtual Documents
4.1 Integrating documentation with professional practice
4.2 Presenting information in a collaborative context of use
Perhaps the simplest case of a virtual document is a WWW document that is constructed in response to a search query. For example, Example 1 is a virtual document about the use of the term "virtual document" in sources indexed by the CUI W3 catalog. The page that is returned looks like a document (e.g., a newsletter column that lists announcements). The subject of the document is stable and the document has a persistent name (http://cuiwww.unige.ch/w3catalog?virtual+document), yet the contents are generated on demand and may change over time.
Another example is the Xerox map viewer, which dynamically generates maps from an underlying database. It looks and feels like an atlas, except the pages are custom made in response to queries. In this case, the queries may be specified using form fields and interactive graphic images.
Consider the problem of design documentation. Documentation about a designed artifact covers a broad range of topics: functional specification, conceptual design, detailed design, components and materials, manufacturing processes, maintenance and operations procedures, diagnosis and repair procedures, and product revision. Typically, though, the static documentation on a design captures only a small fraction of the potentially relevant information. Engineers ask a wide range of questions about a design, from simple queries about artifact structure to complex questions about the intended or possible dynamic behavior [ 5]. The information about the artifact changes over time as the design progresses from concept to manufacturing, new product versions are designed, and new components become available. Maintaining accurate design documentation is difficult and expensive, and much of the information that an engineer might need is only available by finding and asking people who might know the answer.
Fortunately, in many engineering domains there is a succinct representation of knowledge that can be the basis for generating answers: engineering models. Engineering models include specifications of physical and logical structure, expected behavior, desired functionality, constraints that must be satisfied, and objective criteria that should be optimized.
Engineers use Computer Aided Design (CAD) and other modeling tools to create models that represent their designs. They use simulation and analysis tools to ask questions about possible structures and behaviors. However, engineers typically use a different set of tools --- document preparation tools --- to write specifications, capture decision rationale, and present the results of their formal analyses to colleagues in design discussions. The task of creating designs is therefore disconnected from the task of the creating documentation about them.
Tools that "understand" engineering models can help to integrate documentation with design practice by generating answers from the models, rather than recording and replaying static documentation [ 5]. For example, Cristina Garcia has created an "Active Design Documentation" system to support the process of parametric design [ 2]. In parametric design problems, design decisions can be characterized in terms of the values of design parameters. Designers work with the system to specify parameter values, and the system assimilates the decisions into its internal model of the design. The program can then create justifications of the decisions by analyzing the relevant constraints, objective criteria, and preferences. "Readers" can then query the design support system to get information about the design that is up to date and accurately reflects the current model.
The technology to generate answers from an underlying model can be the basis for a virtual document system for design. To build such a virtual document one also needs techniques for (1) presenting answers in a readable, document-like format, (2) eliciting questions from readers, and (3) delivering the virtual document in the same context as other design documentation.
This paper reports on a system with these capabilities, called DME ( Device Modeling Environment [ 1, 4, 8, 9, 10]. DME generates explanations about how things work from annotated component and constraint models. It can answer questions about the structure and dynamic behavior of an engineered artifact. It presents the answers as natural language explanations that are filtered and organized for human readability. It delivers the explanations as pages of a hypermedia document, so that each page is the answer to a question, and each link is another question that leads to another answer. It elicits questions from readers by offering a set of relevant follow-up questions in the context of each explanation. Instead of typing in free-form strings, users ask questions by traversing hyperlinks emanating from the current page. It delivers the virtual document in the context of other design documentation, by assigning a persistent, context-independent identifier (a unique URL) to each explanation. This allows one to embed references to pages of the virtual document in other WWW documents. Figure 1 illustrates how DME's question-answering interface is delivered as a virtual document.
Before going into the details of DME, it is worth considering the contexts in which virtual documents of this sort may be used. Figure 2 shows the role of virtual documents in three kinds of design documentation. One application is a requirements document that specifies the intended behavior of a system. Instead of describing behavior in words and drawings, the designer could point to a demonstration of the intended behavior that is delivered as a virtual document of a simulation. Similarly, specifications of operating procedures make reference to the behavior of a system under certain configurations and operating conditions. References to accurate and operational specifications of behavior are important both for the design of operating procedures and for their delivery in user manuals and training modules. A third use is in collaborative design discussions. An engineer might argue for a position in a note or email message by referring to the results of an analysis that are documented by a virtual document. The reader of the message can not only see a nice presentation of results, but can explore the model upon which the analysis depends. Example 2 is a hypothetical a email message that contains embedded references to a DME-generated virtual document.
The DME system was originally implemented as a proprietary design support system with a sophisticated graphical user interface. The advent of the WWW offered an opportunity to deliver it to a larger user population on a variety of platforms with minimal computational resources. Delivering the explanation capability in the form of a virtual document raised some interesting challenges in interface design and the use of client server protocols. We believe it was the first question-answering system deployed on the WWW that generates explanations in natural language (the first DME virtual document appeared in late 1993 and has been running continuously since then). This section will explain what it does and show how it works. If this paper is being read on a web browser, the reader will be able to try out the examples of dynamically generated documentation.
DME presents explanations to the user by generating text and graphical images in response to queries. A query is a request for a given kind of explanation (e.g., explain the causal influences on a quantity) instantiated on specific arguments (e.g., the pressure quantity at a given valve at a given point in time). The answer to a query is constructed to present the relevant information at an appropriate level of detail. Causality, salience, level of abstraction, and level of detail are determined using domain-independent techniques. For example, the answer to a causal influence query is a description of exactly those variables that had a causal effect on the quantity in question, as determined by algebraic analysis of the constraint equations. More information on DME and the explanation generation algorithms are available in other reports [ 3, 4].
The DME user interface organizes the dialog with the user as a question-answering dialog, as shown in Figure 1. Each explanation is the answer to a question. Included with each explanation is a set of follow-up questions. The reader can ask for more information by choosing one of these questions. As a result, the path of questions and explanations is generated on demand on the basis of the reader's choices. If the level of detail is too coarse in an explanation, the reader can click on one of the in objects question to get further information. If the explanation answered the wrong question, the reader can select alternate, related questions listed on the same page. If the reader is lost in hyperspace, general navigation options are always available.
Note to the reader: when exploring these pages, notice that each page is rich with hyperlinks to follow-up questions. Remember to come back to the paper! It may be helpful to view the examples in a separate window.
Example 3: A "Title Page" for the virtual document
In this presentation, DME introduces the RCS scenario and provides an ISMAP image of the component topology. A scenario is defined by a particular engineering model under specified initial conditions and simulated over some period of time. In this scenario, exogenous actions occur (operators doing things) as well as events predicted by simulation.
Example 4: Summary of a complete scenario
In this explanation, DME reports on salient events that occurred during the simulation. Although there are over 160 quantities and 150 components simulated over 14 states, DME summarizes what happened in a few pages of natural language. This serves as a kind of "table of contents" for a simulation; each reference to an object, event, quantity, or state is linked to relevant follow-up questions.
Example 5: Summarizing Salient Changes in Simulation State
At State 10, DME identifies an important milestone in the simulation. In the explanation of what happened, the system reports only the most important events of the state and filters extraneous detail (i.e., although many quantities changed values, only this pressure variable had interesting consequences). The techniques for identifying important events and filtering extraneous detail are independent of the domain and model.
Example 6: Explaining Logical Preconditions
In this example, DME explains why a qualitative event occurred (i.e., why the pressure regulator is in its pass-through mode). This style of explanation is obtained by analyzing the logical preconditions of the model fragment that represents the "pass through mode" of the component and then filtering any variables that can be proven irrelevant.
Example 7: Explaining Causal Influences
In this example, DME explains that the change in pressure that led to the qualitative change in the pressure regulator is determined by the pressure at the helium tank, because a the isolation valve that lies between them is open (see the RCS system schematic). Considering that there are over 160 variables involved, most of which are linked by constraint equations to the pressure at the regulator, this is a remarkable reduction of complexity for the user who is trying to understand what is happening.
DME virtual documents are delivered on the WWW as servers using the HTTP protocol. Figure 3 illustrates the document generation process. Questions are elicited as ordinary hypertext links (i.e., HTTP GET requests for URLs), and answers are produced in HTML. Each URL encodes a query with its arguments. Given a query, a domain model, and a trace of a simulation on that model, DME composes an explanation to answer the query. It then renders the explanation in HTML and sends it to the WWW client. Rendering in HTML requires linearization of a recursive explanation plan, using a method akin to the code generation phase of a compiler. Section 3 will describe some of the virtual document techniques in employed in this work.
As stated above, each URL encodes a query. The space of queries is parameterized by component, variable, simulation state, and explanation type. The routines that generate natural language take these parameters as inputs. For example, the query answered in Example 7 is "explain causal influences" and its arguments are the component called "primary regulator A" and the simulation state called "State 10".
Using a technique called compositional text generation [ 4], phrases corresponding to model entities are composed and the resulting sentence structure reflects their model structure. Since each phrase corresponds to a different entity, each phrase is given its own URL. For instance, in Example 7, the phrase "The pressure at the input-terminal of primary-regulator A" contains two anchors, one referring to the pressure (a quantity) and the other to the terminal (a component structure).
For each entity mentioned in an explanation, there is actually a set of relevant queries about that entity in that context. For example, when a component is mentioned, possible queries include what is this component,? what are its subcomponents?, what are its quantities?, how is it modeled?, and what are the constraint equations associated with it? Instead of offering a pop-up menu of questions under each object, DME makes a default assumption about which query is most appropriate in the given context and offers an answer to it. On the answer page, the alternative queries are offered. This technique, called the One Button Query Interface, is discussed in more detail in Section 3.1.
Delivering DME capability as a virtual document using WWW standards freed us from platform, resource, and access constraints. Although DME is a research prototype using proprietary technology, anyone on the Internet can use the service. DME virtual documents have been used in collaborative design along with WWW-based teleconference and group discussion tools (e.g., the MADEFAST project). Based on this experience, the Stanford Knowledge Systems Lab has expanded the range of services offered to include ontology development and remote access to modeling tools.
In its original Graphical User Interface (GUI), DME presented a menu of possible questions per object in its presentation context. In the HTML interface, these menus are "flattened". For each object type and context, there is a default question. This is the target of the anchor assigned to that object, so when a reader clicks on the object the default question about that object in that context is answered. (See Figure 1.) The question is rephrased as the title of the HTML page, so as to be clear about which question is being answered. On that page, the remaining, less common questions are listed explicitly under the heading "Other Questions". The phrase corresponding to the currently answered question is rendered in italics without its anchor, which is akin to graying out menu items in contemporary desktop GUIs. There are some questions that transcend any particular object or state. These serve as navigation aids and are listed at the bottom of each page as "Other Options".
To achieve persistent query identity requires that the namespaces for the parameters of the query (e.g., object and simulation state names) must be persistent over invocations of the server and independent of the user. This precludes two techniques commonly used in CGI programs: generating random object URLs on the fly and encoding dialog state in a form using the POST method of HTTP. These techniques do not preserve the mapping from URLs to information across sessions or reboots of the server.
Note that persistent query identity does not require that the results of the query --- the contents of that page of a virtual document --- be identical on every access. One of the purposes of a virtual document is generate new content to reflect changes in the underlying sources of information. The persistency requirement is semantic rather than syntactic: that all responses retrieved under that query identifier are always meaningful answers to the "same" query.
The extreme of compilation is to precompute all possible explanations and save the resulting HTML files (i.e., cache the transitive closure over the question-answer relation). As long as the URLs are relative (i.e., don't mention the server or an absolute directory path), then this technique can be used to produce a standalone web for a CD ROM or high performance server (we also found it useful for regression testing). For the RCS example demonstrated in this paper, expanding all of the questions and answers produces about 80,000 virtual document pages weighing about 150Mb (much larger than the program that generates the virtual document). However, simply walking the hypertext web does not cover the interactive graphics (i.e., using ISMAP) , in which the server maps pixel locations to pages of the virtual document. To precompile a virtual document with interactive graphics would require a standard for declaratively assigning geometric regions to URLs, so that HTML viewers could determine the URL associated with a region on a graphic without depending a particular HTTP server.
Virtual documentation may change the practice of design. If virtual documentation were to be generated as a side effect of normal practice, then it could be of use to practitioners while they are doing their work [ 5]. If design documentation were generated by design support tools, for instance, then more of the designer's attention could be devoted to design and less to writing perishable documentation. Instead of working in the medium of text, designers would work in the medium of design models (see Figure 4). Documentation derived from design models would also be more authentic, reflecting the current hypotheses and commitments of designers rather than an idealized post hoc rationalization. The standards and installed base of the World Wide Web now makes it practical to deliver virtual documentation, and integrate it with access to remote engineering services.
Virtual documents on the WWW allow the possibility of documentation that is delivered in the shared context of collaborative work. The context of a distributed team collaborating over a computer network includes shared databases and threaded discussions. Virtual documentation can be used to integrate shared information with what people say about it. For example, annotation servers (e.g., ComMentor [ 11]) and joint authoring systems (e.g., Hypernews the KSL Ontology Editor) generate virtual documents by combining the input of several contributors. Similarly, MUD/WWW gateways (e.g., ChibaMoo) use virtual documents to describe objects in shared virtual environments. Interactive improvisation (e.g., interactive fiction, poetry generator) are other examples where virtual document technology blurs the distinction between documentation as a repository of knowledge and documentation as a medium for collaboration.
Acknowledgments
The work described in this document was conducted at the Stanford Knowledge Systems Laboratory, under funding from ARPA and NASA under NASA grants NAG 2-581 (ARPA Order 8607) and NCC 2-537. The DME system described is a product of the How Things Work project, under the direction of Richard Fikes. The code and algorithms for explanation generation were developed by Tom Gruber and Patrice Gautier. The code and algorithms for putting it on the web was developed by Sunil Vemuri, Tom Gruber, James Rice, Patrice Gautier and Yves Peligry. The simulation and model formulation approaches in DME are largely due to Yumi Iwasaki. Ken Forbus, Dan Russell, and Philippe Piernot have also been very influential in this work.
[1] R. Fikes, T. Gruber, Y. Iwasaki, A. Levy, & P. Nayak. How Things Work Project Overview. Stanford University, Knowledge Systems Laboratory, Technical Report KSL 91-70, November 1991.
[2] A. C. B. Garcia & H. C. Howard. Acquiring design knowledge through design decision justification. Artificial Intelligence for Engineering Design, Analysis, and Manufacturing (AI EDAM), 6(1):59-71, 1992.
[3] P. O. Gautier & T. R. Gruber. Generating Explanations of Device Behavior Using Compositional Modeling and Causal Ordering. Proceedings of the Eleventh National Conference on Artificial Intelligence, Washington, D.C., AAAI Press/The MIT Press, 1993.
[4] T. R. Gruber & P. O. Gautier. Machine-generated explanations of engineering models: A compositional modeling approach. Proceedings of the 13th International Joint Conference on Artificial Intelligence, Chambery, France, pages 1502-1508. San Mateo, CA: Morgan Kaufmann, 1993.
[5] T. R. Gruber & D. M. Russell. Generative design rationale: Beyond the record and replay paradigm. In Thomas Moran & John H. Carroll, Eds., Design Rationale: Concepts, Techniques, and Use, Lawrence Erlbaum Associates, 1995.
[6] T. R. Gruber, A. B. Tenenbaum, & J. M. Tenenbaum. NIKE: A National Infrastructure for Knowledge Exchange. Enterprise Integration Technologies, Technical Report 1994.
[7] Y. Iwasaki, M. Vescovi, R. Fikes, & b. Chandrasekaran. A Causal Functional Representation Language with Behavior-Based Semantics. 9(1):5-31, 1994.
[8] Y. Iwasaki & A. Y. Levy. Automated model selection for simulation. Proceedings of the Twelfth National Conference on Artificial Intelligence, 1994.
[9] Y. Iwasaki & C.-M. Low. Device Modeling Environment: An integrated model-formulation and simulation environment for continuous and discrete phenomena. Proceedings of the Conference on Intelligent Systems Engineering, 1992.
[10] Y. Iwasaki & C. M. Low. Model Generation and Simulation of Device Behavior with Continuous and Discrete Changes. Intelligent Systems Engineering, 1(2)1993. Also available as Technical Report, KSL 91-69, Knowledge Systems Laboratory, Stanford University, 1991.
[11] M. Röscheisen, C. Mogensen, & T. Winograd. A platform third-party value-added information providers: architecture, protocols, and usage examples. Stanford Integrated Digital Library Project, Computer Science Department, Stanford University, Technical Report 1994.
[12] N. Stephenson. The Diamond Age; Or, Young Lady's Illustrated Primer. Bantam Books, New York, 1995.