Research project
In this project we develop stochastic models and corresponding tools to handle databases with uncertain data
A significant and vast growing part of data stored in modern databases are uncertain due to essential randomness (e.g., in bioinformatics, medicine, telecommunication, economics) or due to incorrect, missing and noisy data (e.g., official databases). At the same time we need to use these data as well. This leads to qualitatively new interdisciplinary problems both for statistics and for computer science. Database systems are developed for correct data models and new tools are needed to retrieve and evaluate information in these uncertain data sets. This project aims at developing new statistical models and techniques for database computations with uncertain data.
A significant and vast growing part of data stored in modern databases are uncertain due to essential randomness (e.g., in bioinformatics, medicine, telecommunication, economics) or due to incorrect, missing and noisy data (e.g., official databases). At the same time we need to use these data as well. This leads to qualitatively new interdisciplinary problems both for statistics and for computer science. Database systems are developed for correct data models and new tools are needed to retrieve and evaluate information in these uncertain data sets. This project aims at developing new statistical models and techniques for database computations with uncertain data. These results will give methods for database researchers and designers to elaborate integrated database systems to work with such data. They can also be useful for related problems in bioinformatics, data mining and environmental and health sciences. The second aim is to study related problems of quantization, compression, and approximation of random signals in databases. The goal of data compression is to reduce the data significantly while keeping the essential information in the data (signals, images) that will be necessary for a given application. We develop also approximation and numerical analysis for simulation with a given accuracy for evaluation means and distributions of functionals of realizations of random functions. The results can be applied, for example, to various problems in telecommunication and multimedia databases, financial mathematics, and bioinformatics.