Database Architecture Fertilizers: Just-in-time, Just-enough, and Autonomous Growth
Martin Kersten, CWI Amsterdam, The Netherlands
Organic Databases
Ambient application environments call for innovations in database technology to fulfil the dream of an organic database, a database system which can be embedded in a wide collection of hardware appliances and provides an autonomous self-descriptive, self-organizing, self-repairable, self-aware and stable data store-recall functionality to its environment.
The envisioned setting consists of a large collection of database servers holding portions of the database. Each server joins this assembly voluntarily, donating storage and processing capacity, but without a ”contract” to act as an obedient agent for the user in coordination of access to all other servers. They are aware of being part of a distributed database, but do not carry the burden to make this situation transparent for the application.
Applications should be prepared that updates sent to a server are either accepted, rejected with referrals, or only partly dealt with. An active client is the sole basis to exploit the distributed system and to realize the desired level of ACID properties. The query language envisioned for this system avoids the trap to allocate unbounded resources to semantically ill-phrased, erroneous, or simply too time-consuming queries. It limits the amount of resources spent and returns a partial answer together with referral queries. The user can at any point in time come back and use the referral queries to obtain more answers.
The topics presented are part of ongoing long-term research in novel database technology based on and extending MonetDB (See http://monetdb.cwi.nl).
Biography
Martin Kersten is professor at the University of Amsterdam and head of the Information Systems department of CWI, Amsterdam. He founded the CWI database research group in 1985. He devoted most of his scientific career on the development of database kernel software. The latest incarnation is the open-source system MonetDB, which demonstrates maturity of the decomposed storage scheme as an efficient basis for both SQL and XQuery front-ends. In 1995 he co-founded the company Data Distilleries to commercialize the data mining technology based on MonetDB technology. The company was sold to SPSS in 2003. In recent years his focus has been shifting to the implications of high demanding applications on the next generation systems. He is a member emeritus of the VLDB Endowment.
Digital Video: Just Another Data Stream?
Alan F. Smeaton, Centre for Digital Video Processing & Adaptive Information Cluster, Dublin City University, Ireland
Tuesday, 2006-03-28 08:45-10:05, Ball Room Chair: Yannis Ioannidis
Technology is making huge progress in allowing us to generate data of all kinds, and the volume of such data which we routinely generate is exceeded only by its variety and its diversity. For certain kinds of data we can manage it very efficiently (web searching and enterprise database lookup are good examples of this), but for most of the data we generate we are not good at all about managing it effectively. As an example, video information in digital format can be either generated or captured, very easily in huge quantities. It can also be compressed, stored, transmitted and played back on devices which range from large-format displays to portable handhelds, and we now take all of this for granted. What we cannot yet do with video, however, is effectively manage it based on its actual content. In this presentation I will summarise where we are in terms of being able to automatically analyse and index, and then provide searching, summarisation, browsing and linking within large collections of video libraries and I will outline what I see as the current challenges to the field.
Alan Smeaton is Professor of Computing at Dublin City University where he is Director of the Centre for Digital Video Processing and a member of the Adaptive Information Cluster. He was Dean of the Faculty of Computing and Mathematical Sciences from 1998 to 2004 and was Head of the School of Computer Applications from January 1999 to December 2001. His early research interests covered the application of natural language processing techniques to information retrieval but this has broadened to cover the indexing and content-based retrieval of information in all media, text, image, audio and especially digital video. Currently his major research funding is in the area of indexing and retrieval of digital video where most of his work is in the area of analysis, indexing, searching, browsing and summarisation of digital video information.
Alan Smeaton was program co-chair of the ACM SIGIR Conference in Toronto in 2003, general chair of CIVR which he hosted in Dublin in 2004, and co-chair of ICME in 2005. He is a founding coordinator of TRECVid, the annual benchmarking activity for content-based video analysis and searching, he has published over 150 book chapters, journal and conference papers and he is on the editorial boards of four journals. He holds the B.Sc., M.Sc. and PhD degrees in Computer Science from the National University of Ireland.
Charting a Dataspace: Lessons from Lewis and Clark
David Maier, Department of Computer Science, Portland State University, USA
Wednesday, 2006-03-29 08:45-10:05, Ball Room Chair: Marc H. Scholl
Learning from the Past
A dataspace system (DSS) aims to manage all the data in an enterprise or project, be it structured, unstructured or somewhere between. A fundamental task in deploying a DSS is discovering the data sources in a space and understanding their relationships. Charting these connections helps prepare the way for other DSS services, such as cataloging, search, query, indexing, monitoring and extension. In this, the bicentennial of the Lewis and Clark Expedition, it is enlightening to look back at the problems and issues they encountered in crossing an unfamiliar territory. Many challenges they confronted are not that different from those that arise in exploring a new dataspace: evaluating existing maps, understanding local legends and myths, translating between languages, reconciling different world models, identifying landmarks and surveying the countryside. I will illustrate these issues and possible approaches using examples from medication vocabularies and gene annotation.
The work I describe on dataspaces is joint with Mike Franklin and Alon Halevy. Nick Rayner, Bill Howe, Ranjani Ramakrishnan and Shannon McWeeney have all contributed to the work on understanding relationships among data sources.
Dr. David Maier is Maseeh Professor of Emerging Technologies at Portland State University. Prior to joining PSU, he taught at the OGI School of Science & Engineering at Oregon Health & Science University, and at the State University of New York at Stony Brook. He is the author of books on relational databases, logic programming and object-oriented databases, as well as papers in database theory, object-oriented technology, query processing and scientific databases. He received the Presidential Young Investigator Award from the National Science Foundation in 1984 and was awarded the 1997 SIGMOD Innovations Award for his contributions in objects and databases. He is also an ACM Fellow. His current research interests include data stream processing, superimposed information management, data product generation and forensic system reconstruction. He holds a double B.A. in Mathematics and Computer Science from the University of Oregon (Honors College, 1974) and a Ph.D. in Electrical Engineering and Computer Science from Princeton University (1978).