EDBT 2006 Conference in Munich

Do you have a lack of information? Or do you rather feel overwhelmed by the sheer amount of (online) available content, like emails, news, web pages, and electronic documents? The rather young field of Text Mining developed from the observation that most knowledge today - more than 80% of the data stored in databases - is hidden within documents written in natural languages and thus cannot be automatically processed by traditional information systems.

Text Mining, "also known as intelligent text analysis, text data mining or knowledge-discovery in text (KDT), refers generally to the process of extracting interesting and non-trivial information and knowledge from unstructured text." Text Mining is a highly interdisciplinary field, drawing on foundations and technologies from fields like computational linguistics, database systems, and artificial intelligence, but applying these in new and often unconventional ways.

There already exists a high commercial interest as well, both in the areas of applied research performed by companies like Google or IBM and industrially deployed applications, especially in the pharmaceutical domain and within governmental intelligence agencies.

In this tutorial, we will give an introduction to the field of text mining and provide participants with the necessary theoretical and technical foundations for an understanding of current and emerging research and technology. Several application examples are then examined in detail, like the automatic summarization of documents, the extraction of biological knowledge from research papers, and the analysis of user's opinions on products from web sites.

Target audience are researchers and practitioners with a database and information system background, but no experience in natural language processing (NLP) or computational linguistics. Thus, the tutorial will provide the necessary theoretical foundations and terminology to understand issues and methods from the text mining domain. A strong focus will be placed on practical applications of text mining, introducing publicly available tools and resources in the presentation.

Tutorial Slides

Graph Mining Techniques and Their Applications
Sharma Chakravarthy (The University of Texas at Arlington, USA)
Monday, 2006-03-27 16:30 - 18:00, Club Room

In this tutorial, we argue that graph mining techniques are extremely important and very little attention has been paid to this research and technology so far. Most of the currently used mining approaches assume transactional and other forms of data. However, there are a large number of applications where the relationships among data objects is extremely important. For those applications, use of conventional approaches results in loss of information that will critically affect the knowledge that is discovered using earlier approaches. Mining techniques that preserve and exploit the domain characteristics are extremely important and graph mining is one such general purpose technique as it uses arbitrary graph representation using which complex relationships can be represented. Graph mining, as opposed to transaction mining (association rules, decision trees and others) is suitable for mining structural data.

Graph mining is certainly appropriate for mining on chemical compounds, proteins, DNA etc., which have an inherent structure. The complex relationships that exist between entities that comprise these structures (for example, the double and triple bonds between carbon and other elements in a complex organic compound, the helical structure of DNA molecules and so on) can be faithfully represented using graph format. A graph representation comes across as a natural choice for representing complex relationships because it is relatively simple to visualize as compared to a transactional representation. The various associations between objects in a complex structure are easy to understand and represent graphically. Most importantly, the representation in graph format preserves the structural information of the original application which may be lost when the problem is translated/mapped to other representation schemes.

In this tutorial, we overview conventional mining techniques, contrast them with the requirements of applications and introduce graph mining as an alternative approach for a large class of applications. We present details of several graph mining approaches along with their advantages and disadvantages. We then present a few applications that have used graph mining techniques beneficially.

Tutorial Slides

Location-aware Query Processing

Mohamed F. Mokbel
(University of Minnesota, USA)

Walid G. Aref
(Purdue University, USA)

Tuesday, 2006-03-28 13:30 - 15:00, Seidl Room

The wide spread use of cellular phones, handheld devices, and GPS-like technology enables location-aware environments where virtually all objects are aware of their locations. Location-aware environments are characterized by the large numbers of moving objects and moving queries (also known as spatio-temporal queries). Such environments call for new query processing techniques that deal with the continuous movement and frequent updates of both spatio-temporal objects and queries. The goal of this tutorial is to:

1) Give an in-depth view on supporting location-aware queries as an increasingly interesting area of research,

(2) Present the state-of-the-art techniques for efficient handling of location-aware snapshot queries and location-aware continuous queries,

(3) Motivate the need for integrating location-awareness as a new query processing and optimization dimension, and

(4) Raise several research challenges that need to be addressed towards a true and scalable support for location-aware queries in database management systems.

Tutorial Slides

Information Quality: Fundamentals, Techniques, and Use

Felix Naumann (Humboldt-Universität zu Berlin, Germany)

Kai-Uwe Sattler (Technische Universität
Ilmenau, Germany)

Tuesday, 2006-03-28 15:30 - 17:00, Seidl Room

Information quality is an important issue for any large-scale information system. The well-known garbage-in-garbage-out problem is even compounded in integrated systems, such as data warehouses and federated databases. The purpose of this tutorial is to introduce database researchers and practitioners to the broad scope, the challenges, and the opportunities of information quality research. Beyond the mere cleansing of data errors there is a strong body of research in IQ models, optimization, and integration. After a brief motivation for information quality considerations we turn to a definition for IQ that goes beyond the usual "fitness for use'' and introduces a range of criteria, such as understandability, timeliness, completeness, accuracy, relevancy, reputation, etc. We present several basic assessment approaches for individual criteria, describe the aggregation of scores and point to more comprehensive metrics. Once IQ assessment has shown that the quality should be improved, an important task (and problem) is to clean the data. We give a survey on corresponding techniques and relate them to the IQ problems they solve. The tutorial concludes with an overview of existing work both in the database area and in other communities.

Tutorial Slides: Powerpoint format PDF format

Introduction to Scientific Data and Workflow Management

Bertram Ludäscher (University of
California at Davis, USA)

Michael Gertz (University of
California at Davis, USA)

Wednesday, 2006-03-29 13:30 - 17:00, Seidl Room

Scientific data and workflow management pose unique computer science research challenges, in particular for the database community. While traditional techniques, developed mostly for commercial/business applications, remain an important ingredient for scientific data and workflow management, it has been recognized that the volume and complexity of data and tasks in scientific application domains require the development of new technologies and tools.

The overall goal of this tutorial is to provide an introduction to several important areas of research and development in scientific data and workflow management. Given the breadth of the topic area, the tutorial will inevitably be mostly a high-level introduction to the various subfields in scientific data and workflow management. However, some concepts, techniques, and applications will be presented in more detail. The tutorial is organized into the following modules:

We first provide a high-level overview of the challenges and basic techniques of scientific data and workflow management, such as different types of scientific data and metadata, data management processes, scientific workflows, and the role of data provenance and knowledge representation (ontologies).

The second module focuses on streaming scientific data that is processed on-line, often in real-time. We present the fundamental models, techniques, and architectures underlying data processing pipelines that deal with different types of streaming scientific data. A particular focus is on spatio-temporal data originating from sensor networks and remote-sensing equipment such as satellites and telescopes.

The concept of "scientific workflow" aims at a wide area of applications, ranging from experiment and workflow design to execution, monitoring, archival, and reuse of data management and analysis pipelines. For example, aspects such as automated metadata and provenance management need to be integrated into scientific workflow systems. Such mechanisms facilitate data analysis and interpretation as well as workflow reuse. The tutorial is aimed at researchers and scientists from industry, government, and academia who are interested in management and analysis of large-scale scientific data sets, and who want to get an overview of new models, techniques, and architectures to effectively manage scientific data. We use illustrating examples from different scientific application domains, including life sciences, geosciences, and cosmology.

Tutorial Slides:

I. Overview on Scientific Data Management
II. From Conventional to Scientific Data Integration
III. From Scientific Data Formats to Data Stream Processing
IV. Introduction to Scientific Workflows

EDBT 2006 Panel

Data Management in the Social Web

Karl Aberer (School of Computer and Communications Science, EPFL, Switzerland)

Wednesday, 2006-03-29 13:30-15:00, Ball Room

Panelists: Stefan Decker (DERI and NUG, Ireland), Wolfgang Kellerer (Docomo Eurolabs, Germany), Peter Triantafillou (University Patras, Greece), John Mylopoulos (University of Trento, Italy)

An interesting observation relates to the fact that the most successful applications on the Web incorporate some sort of social mechanism. This is true for commercial success stories, such as Ebay with its reputation mechanism, Amazon with its recommendation tool and Google with PageRank, a recommendation-based ranking algorithm. Peer-to-peer file sharing and photo sharing are other recent examples were the essence of the application consists of social interactions. In these applications large numbers of anonymous participants interact, such that mechanisms for social control become increasingly important. This explains the recent interest in reputation-based trust management. The same issues will emerge when large numbers of services will be deployed over the Web through Web services and Grid computing technology.

The goal of the panel is to reflect on these developments, identify important classes of applications involving social interaction which require data management support and information management capabilities, and project from there the potential future impact on the field of data management.

A characteristic property of applications involving social interactions is the large numbers of participants of whom the behavior needs to be tracked and analyzed. This implies a strong need for scalable data management capabilities. Will this require novel approaches in the area of data management or will existing technology be sufficient? The past has shown that new applications frequently open new avenues in data management research. Examples are semi-structured data management responding to the need of managing data on the Web and stream data management responding to the need of managing data in networked environments and sensor data.

Recently, in the context of the Semantic Web, social mechanisms for semantic tagging, so-called folksonomies, have created quite some interest. There the creation and alignment of structured data annotations for Web content becomes a social activity. Similarly as with collaborative filtering in information retrieval the social context is exploited in order to deal with the semantics problem, namely providing proper interpretation of data. Is this a promising approach for dealing with one of the hardest problems in data management, namely dealing with semantic heterogeneity of structured data?

In social settings uncertainty is omnipresent, since intentions and interpretations of autonomous participants cannot be made completely transparent. This holds also true when it comes to the exchange and shared use of data. Is it possible, that the recent growing interest of the database community in applying probabilistic techniques in data management roots also in the need of having appropriate tools for dealing with the uncertainty resulting from interaction in a social context?

Finally, from a more general perspective, new requirements on data management often initiate new directions of interdisciplinary research for the field. Will the need to provide solutions for data management on the Social Web lead database researchers to look into areas such as agent technologies, game theory or micro-economy to better understand the mechanics of social interactions and their impact on data management solutions?

These were some of questions that we will pose to the panel in order to identify interesting directions for future data management research.