Reference: Karp, P. D. Hypothesis Formation as Design. 1989.
Abstract: In the 1960s and 1970s biologists discovered a new mechanism of gene regulation in bacteria, called attenuation. Dr. Charles Yanofsky and his colleagues discovered attenuation in a gene regulation system called the tryptophan (trp) operon (Yanofsky, 1981). This chapter describes a computational investigation of scientific reasoning that is based on an analysis of the biological research on the trp operon. Karp and Friedland performed a detailed historical study of the discovery of attenuation in which they reconstructed the different intermediate states of knowledge that the biologists possessed as their understanding of the trp operon evolved. (Karp, 1989). Karp and Friedland analyzed the differences between these states of knowledge to elucidate examples of how and why the biologists modified their theories (such as by postulationg the existence of a new chemical reaction). This chapter describes two computer programs. The GENSIM program provides a framework for representing theories of molecular biology, and has been used to represent a theory of bacterial gene regulation. GENSIM can use these theories to predict the outcomes of biological experiments. The HYPGENE program formulates hypotheses that improve the predictive power of GENSIM theories, given experimental data. Both programs have been tested on examples from the history of attenuation. I argue that it is productive to treat the task of hypothesis formation as a design problem, because AI methods developed for design and planning are well suited to the task of hypothesis formation. In order to treat hypothesis formation as a design problem, I view a hypothesis as an artifact to be synthesized. Its synthesis is performed subject to design constraints, such as the constraint that predictions generated in the context of the hypothesis should match experimental observations. HYPGENE is a designer of hypotheses. It uses design operators to modify a theory to satisfy desgin constraints. I derived these operators from our study of the history of attenuation, and by considering the space of allowable sytactic changes to the GENSIM representation language. This approach to hypothesis formation is theory driven because it assumes that one of HYPGENE's inputs is a good - but not perfect - theory for predicting experimental outcomes.