EDBT
2006 Tutorials
Participants
of EDBT 2006 can take part in the tutorials free of charge
Introduction to Text Mining |
René
Witte (Universität Karlsruhe, Germany) |
|
Monday, 2006-03-27 11:30 - 13:00
& 14:30
- 16:00, Club Room |
Do
you have a lack of information? Or do you rather feel overwhelmed
by the sheer amount of (online) available content, like emails,
news, web pages, and electronic documents? The rather young
field of Text Mining developed from the observation that most
knowledge today - more than 80% of the data stored in databases
- is hidden within documents written in natural languages
and thus cannot be automatically processed by traditional
information systems.
Text
Mining, "also known as intelligent text analysis, text
data mining or knowledge-discovery in text (KDT), refers generally
to the process of extracting interesting and non-trivial information
and knowledge from unstructured text." Text Mining is
a highly interdisciplinary field, drawing on foundations and
technologies from fields like computational linguistics, database
systems, and artificial intelligence, but applying these in
new and often unconventional ways.
There
already exists a high commercial interest as well, both in
the areas of applied research performed by companies like
Google or IBM and industrially deployed applications, especially
in the pharmaceutical domain and within governmental intelligence
agencies.
In
this tutorial, we will give an introduction to the field of
text mining and provide participants with the necessary theoretical
and technical foundations for an understanding of current
and emerging research and technology. Several application
examples are then examined in detail, like the automatic summarization
of documents, the extraction of biological knowledge from
research papers, and the analysis of user's opinions on products
from web sites.
Target
audience are researchers and practitioners with a database
and information system background, but no experience in natural
language processing (NLP) or computational linguistics. Thus,
the tutorial will provide the necessary theoretical foundations
and terminology to understand issues and methods from the
text mining domain. A strong focus will be placed on practical
applications of text mining, introducing publicly available
tools and resources in the presentation.
Tutorial
Slides
Graph
Mining Techniques and Their Applications |
Sharma
Chakravarthy
(The University of Texas at Arlington,
USA) |
|
Monday, 2006-03-27 16:30 - 18:00,
Club Room |
In
this tutorial, we argue that graph mining techniques are extremely
important and very little attention has been paid to this
research and technology so far. Most of the currently used
mining approaches assume transactional and other forms of
data. However, there are a large number of applications where
the relationships among data objects is extremely important.
For those applications, use of conventional approaches results
in loss of information that will critically affect the knowledge
that is discovered using earlier approaches. Mining techniques
that preserve and exploit the domain characteristics are extremely
important and graph mining is one such general purpose technique
as it uses arbitrary graph representation using which complex
relationships can be represented. Graph mining, as opposed
to transaction mining (association rules, decision trees and
others) is suitable for mining structural data.
Graph
mining is certainly appropriate for mining on chemical compounds,
proteins, DNA etc., which have an inherent structure. The
complex relationships that exist between entities that comprise
these structures (for example, the double and triple bonds
between carbon and other elements in a complex organic compound,
the helical structure of DNA molecules and so on) can be faithfully
represented using graph format. A graph representation comes
across as a natural choice for representing complex relationships
because it is relatively simple to visualize as compared to
a transactional representation. The various associations between
objects in a complex structure are easy to understand and
represent graphically. Most importantly, the representation
in graph format preserves the structural information of the
original application which may be lost when the problem is
translated/mapped to other representation schemes.
In
this tutorial, we overview conventional mining techniques,
contrast them with the requirements of applications and introduce
graph mining as an alternative approach for a large class
of applications. We present details of several graph mining
approaches along with their advantages and disadvantages.
We then present a few applications that have used graph mining
techniques beneficially.
Tutorial
Slides
Location-aware Query Processing |
|
Mohamed F. Mokbel
(University
of Minnesota, USA)
Walid
G. Aref
(Purdue University,
USA)
|
|
Tuesday,
2006-03-28
13:30
- 15:00, Seidl Room |
The
wide spread use of cellular phones, handheld devices, and
GPS-like technology enables location-aware environments where
virtually all objects are aware of their locations. Location-aware
environments are characterized by the large numbers of moving
objects and moving queries (also known as spatio-temporal
queries). Such environments call for new query processing
techniques that deal with the continuous movement and frequent
updates of both spatio-temporal objects and queries. The goal
of this tutorial is to:
1)
Give an in-depth view on supporting location-aware queries
as an increasingly interesting area of research,
(2)
Present the state-of-the-art techniques for efficient handling
of location-aware snapshot queries and location-aware continuous
queries,
(3)
Motivate the need for integrating location-awareness as a
new query processing and optimization dimension, and
(4)
Raise several research challenges that need to be addressed
towards a true and scalable support for location-aware queries
in database management systems.
Tutorial
Slides
Information
Quality: Fundamentals, Techniques, and Use |
|
Felix
Naumann (Humboldt-Universität zu Berlin, Germany)
Kai-Uwe
Sattler (Technische Universität
Ilmenau, Germany) |
|
Tuesday, 2006-03-28 15:30 - 17:00,
Seidl Room |
Information
quality is an important issue for any large-scale information
system. The well-known garbage-in-garbage-out problem is even
compounded in integrated systems, such as data warehouses
and federated databases. The purpose of this tutorial is to
introduce database researchers and practitioners to the broad
scope, the challenges, and the opportunities of information
quality research. Beyond the mere cleansing of data errors
there is a strong body of research in IQ models, optimization,
and integration. After a brief motivation for information
quality considerations we turn to a definition for IQ that
goes beyond the usual "fitness for use'' and introduces
a range of criteria, such as understandability, timeliness,
completeness, accuracy, relevancy, reputation, etc. We present
several basic assessment approaches for individual criteria,
describe the aggregation of scores and point to more comprehensive
metrics. Once IQ assessment has shown that the quality should
be improved, an important task (and problem) is to clean the
data. We give a survey on corresponding techniques and relate
them to the IQ problems they solve. The tutorial concludes
with an overview of existing work both in the database area
and in other communities.
Tutorial Slides: Powerpoint
format PDF
format
Introduction
to Scientific Data and Workflow Management |
|
Bertram
Ludäscher (University of
California
at Davis, USA)
Michael
Gertz (University of
California
at Davis, USA) |
|
Wednesday, 2006-03-29 13:30 - 17:00, Seidl Room |
Scientific data
and workflow management pose unique computer science research
challenges, in particular for the database community. While
traditional techniques, developed mostly for commercial/business
applications, remain an important ingredient for scientific
data and workflow management, it has been recognized that
the volume and complexity of data and tasks in scientific
application domains require the development of new technologies
and tools.
The overall goal
of this tutorial is to provide an introduction to several
important areas of research and development in scientific
data and workflow management. Given the breadth of the topic
area, the tutorial will inevitably be mostly a high-level
introduction to the various subfields in scientific data and
workflow management. However, some concepts, techniques, and
applications will be presented in more detail. The tutorial
is organized into the following modules:
We first provide
a high-level overview of the challenges and basic techniques
of scientific data and workflow management, such as different
types of scientific data and metadata, data management processes,
scientific workflows, and the role of data provenance and
knowledge representation (ontologies).
The second module
focuses on streaming scientific data that is processed on-line,
often in real-time. We present the fundamental models, techniques,
and architectures underlying data processing pipelines that
deal with different types of streaming scientific data. A
particular focus is on spatio-temporal data originating from
sensor networks and remote-sensing equipment such as satellites
and telescopes.
The concept of "scientific
workflow" aims at a wide area of applications, ranging
from experiment and workflow design to execution, monitoring,
archival, and reuse of data management and analysis pipelines.
For example, aspects such as automated metadata and provenance
management need to be integrated into scientific workflow
systems. Such mechanisms facilitate data analysis and interpretation
as well as workflow reuse. The tutorial is aimed at researchers
and scientists from industry, government, and academia who
are interested in management and analysis of large-scale scientific
data sets, and who want to get an overview of new models,
techniques, and architectures to effectively manage scientific
data. We use illustrating examples from different scientific
application domains, including life sciences, geosciences,
and cosmology.
Tutorial Slides:
I.
Overview on Scientific Data Management
II.
From Conventional to Scientific Data Integration
III.
From Scientific Data Formats to Data Stream Processing
IV.
Introduction to Scientific Workflows
EDBT
2006 Panel
Data Management in the Social Web
Karl Aberer (School of
Computer and
Communications Science, EPFL, Switzerland) |
|
Wednesday, 2006-03-29 13:30-15:00, Ball Room |
Panelists:
Stefan Decker (DERI and NUG, Ireland), Wolfgang Kellerer
(Docomo Eurolabs, Germany), Peter Triantafillou (University
Patras, Greece), John Mylopoulos (University of Trento,
Italy)
|
An
interesting observation relates to the fact that the most
successful applications on the Web incorporate some sort of
social mechanism. This is true for commercial success stories,
such as Ebay with its reputation mechanism, Amazon with its
recommendation tool and Google with PageRank, a recommendation-based
ranking algorithm. Peer-to-peer file sharing and photo sharing
are other recent examples were the essence of the application
consists of social interactions. In these applications large
numbers of anonymous participants interact, such that mechanisms
for social control become increasingly important. This explains
the recent interest in reputation-based trust management.
The same issues will emerge when large numbers of services
will be deployed over the Web through Web services and Grid
computing technology.
The
goal of the panel is to reflect on these developments, identify
important classes of applications involving social interaction
which require data management support and information management
capabilities, and project from there the potential future
impact on the field of data management.
A
characteristic property of applications involving social interactions
is the large numbers of participants of whom the behavior
needs to be tracked and analyzed. This implies a strong need
for scalable data management capabilities. Will this require
novel approaches in the area of data management or will existing
technology be sufficient? The past has shown that new applications
frequently open new avenues in data management research. Examples
are semi-structured data management responding to the need
of managing data on the Web and stream data management responding
to the need of managing data in networked environments and
sensor data.
Recently,
in the context of the Semantic Web, social mechanisms for
semantic tagging, so-called folksonomies, have created quite
some interest. There the creation and alignment of structured
data annotations for Web content becomes a social activity.
Similarly as with collaborative filtering in information retrieval
the social context is exploited in order to deal with the
semantics problem, namely providing proper interpretation
of data. Is this a promising approach for dealing with one
of the hardest problems in data management, namely dealing
with semantic heterogeneity of structured data?
In
social settings uncertainty is omnipresent, since intentions
and interpretations of autonomous participants cannot be made
completely transparent. This holds also true when it comes
to the exchange and shared use of data. Is it possible, that
the recent growing interest of the database community in applying
probabilistic techniques in data management roots also in
the need of having appropriate tools for dealing with the
uncertainty resulting from interaction in a social context?
Finally,
from a more general perspective, new requirements on data
management often initiate new directions of interdisciplinary
research for the field. Will the need to provide solutions
for data management on the Social Web lead database researchers
to look into areas such as agent technologies, game theory
or micro-economy to better understand the mechanics of social
interactions and their impact on data management solutions?
These
were some of questions that we will pose to the panel in order
to identify interesting directions for future data management
research.
|