Our ambitious goal is the development of a novel framework for scientific experimentation - AdaLab - that will adapt the scientific method for semi-automated and automated knowledge discovery. If successful it will have significant technological and societal impact through the faster and cheaper generation of scientific knowledge. AdaLab will require coordinated advances in a number of ICT methodologies: knowledge representation, ontology engineering, semantic technologies, ML, and automated scientific discovery.

Biomedical Applications

We will evaluate our AdaLab framework on a biomedically important application area: modelling the yeast diauxic shift. When yeast (Saccharomyces cerevisiae) is grown on glucose with oxygen it first produces ethanol, and when the glucose is exhausted it reorganises (shifts) its metabolism to grow using the ethanol it previously produced (Dickinson, 1999). This reorganisation of metabolism is known as the diauxic shift.

In biology it is used as a 'model' for general cellular reorganisation, and has significant medical applications in the understanding of ageing and cancer. In cancer the Warburg effect (a metabolic shift to lactic acid from glucose - lactic acid is the equivalent in animals of ethanol) is typically observed in solid tumours, and is believed to partly explain tumour growth (Gatenby and Gawlinski, 2003). In ageing a metabolic shift away from lactic acid is related to the cell’s stress response, and this is associated with increased cell life-span (Kenyon, 2010).

We have established a robust laboratory automation protocol for a high-throughput assay for growth curves. This enables Eve to observe qualitative and quantitative differences in the batch growth in yeast strains. In particular we have good statistical methods that recognise biphasic growth, i.e. diauxic shifts in culture (Fig. 5). These have enabled us to discover novel mutants with aberrant diauxic shifts.

Eve can grow specific yeast strains under specific environmental conditions and make a quantitative measurement of any diauxic shift. The yeast strains will typically have one or more specific genes deleted. The main environmental variations will consist of a defined set of compounds (e.g. glucose, ammonia, etc.) and concentrations (e.g. 1 molar). It is also possible to change the temperature and oxygen content. In addition it is possible to add approved drugs (e.g. aspirin) that are known to interact with specific enzymes. Such high-throughput protocols will be used as the main experimental test of the hypotheses generated by ML and collaborative human scientists. The objective measure of success will be the accuracy of the developed models. These accuracies will be objectively verified by experimental observation of model predictions.

Developing the OUK ontology

The AdaLab framework requires the OUK knowledge model in order to efficiently record and reason with uncertain knowledge. OUK will enable the recording of verified and unverified knowledge in a consistent manner through the utilisation of probability theory. OUK will import relevant classes from existing biomedical ontologies (i.e. Gene Ontology, Protein Ontology, etc.) and employ the HELO framework to represent biomedical knowledge about the diauxic shift.

We will develop OUK to support the key modules of the AdaLab framework:

Bioinformatic knowledge

We will formalise bioinformatic relevant to the yeast diauxic shift. This knowledge will be taken from published models, bioinformatic databases, and from the literature. The subject area is a well-studied one, and there is a substantial amount of background knowledge for formalisation - this is important as it enables the AdaLab framework to be fully utilised. A major problem with almost all bioinformatic databases is that they include no explicit concept of uncertainty. All facts have the same probability (1.0) no matter whether they are well established (e.g. stated in hundreds of papers), or not (e.g. contradicted in multiple papers). This presents a general problem for the principled utilisation of probabilistic inference in biology. We will use SRL as a unifying formalism to express scientific knowledge and probabilities. To assign a priori probabilities to facts related to our bioinformatic application area we will use facts from bioinformatic databases, e.g. SGD, nano-publications (nanopub.org), generic techniques such as use of maximum entropy, and where appropriate expert opinion will be sought. It is certain that human biologists implicitly use such probabilities when they reason, and making them explicit in a domain will be of great benefit in itself. The probabilities will be used for further automated inference and experimentation, and the addition of more and more experimental evidence, and probabilistic inference will constrain these probabilities to reasonable and consistent numbers.

Robot Scientists and automated scientific discovery

We will extend the capabilities of Robot Scientists, and to enable their better integration with human scientists – the AdaLab framework. At a high level the AdaLab framework will keep track of the current state of the research using the OUK logical framework, suggest what research goals are most interesting, formulate hypotheses towards the most important goals, design experiments to test these hypotheses, and perform suitable statistical analysis. The AdaLab framework will unite the super-human abilities of computers/robots (high inference speed, perfect recall, accuracy of movement) with the creativity and intellectual flexibility of human scientists. To achieve this we will: