Current commercial and academic OLAP tools do not process XML data that contains XLink. Aiming at overcoming this issue, this paper proposes an analytical system composed by LMDQL, an analytical query language. Also, the XLDM metamodel is given to model cubes of XML documents with XLink and to deal with syntactic, semantic and structural heterogeneities commonly found in XML documents. As current W3C query languages for navigating in XML documents do not support XLink, XLPath is discussed in this article to provide features for the LMDQL query processing. A prototype system enabling the analytical processing of XML documents that use XLink is also detailed. This prototype includes a driver, named sql2xquery, which performs the mapping of SQL queries into XQuery. To validate the proposed system, a case study and its performance evaluation are presented to analyze the impact of analytical processing over XML/XLink documents.; FAPESP; FACEPE; CAPES; CNPq; INEP; FINEP
XML has become an important medium for data exchange, and is frequently used as an interface to - i.e. a view of - a relational database. Although lots of work have been done on querying relational databases through XML views, the problem of updating relational databases through XML views has not received much attention. In this work, we give the rst steps towards solving this problem. Using query trees to capture the notions of selection, projection, nesting, grouping, and heterogeneous sets found throughout most XML query languages, we show how XML views expressed using query trees can be mapped to a set of corresponding relational views. Thus, we transform the problem of updating relational databases through XML views into a classical problem of updating relational databases through relational views. We then show how updates on the XML view are mapped to updates on the corresponding relational views. Existing work on updating relational views can then be leveraged to determine whether or not the relational views are updatable with respect to the relational updates, and if so, to translate the updates to the underlying relational database. Since query trees are a formal characterization of view de nition queries, they are not well suited for end-users. We then investigate how a subset of XQuery can be used as a top level language...
O uso da XML (Extensible Markup Language) em aplicações envolvendo bancos de dados vem se consolidando nos últimos dois anos. Os principais sistemas de gerenciamento de banco de dados já incorporam essa tecnologia em suas mais recentes versões. Dentre diversas aplicações destaca-se a publicação de dados relacionais em visões XML. Diferentemente da XML, o Modelo Temporal de Versões (TVM) não apresenta suporte entre os bancos de dados atuais. Esse modelo, que une características temporais com o conceito de versão para projetar aplicações orientadas a objetos, precisa ser mapeado para ser adequadamente controlado em um SGBD (Sistema de Gerenciamento de Banco de Dados). Cumprida essa etapa, aplicações do TVM também podem gerar visões XML. Nesse trabalho é inicialmente apresentada uma forma de representar instâncias de aplicações do TVM em um formato XML. Os documentos definidos a partir desse formato de representação são utilizados como base para consultas. Em seguida, é proposta uma extensão de uma linguagem de consulta XML visando proporcionar recursos para a recuperação de informações temporais e de versão representadas em documentos XML. São definidas funções temporais e versionadas que são incorporadas à linguagem base. O funcionamento das funções e a especificação de consultas temporais versionadas são descritos em detalhes no decorrer do trabalho. Uma ferramenta que implementa a linguagem base é utilizada na realização de testes visando validar as novas funções.; The use of the XML in applications involving databases has grown in the last two years. Recent versions of the main database management systems already incorporate this technology. Publishing relational data in XML can be identified as one of the different applications of XML. The Temporal Version Model (TVM) has no support in current databases. This model matches temporal features with the version concept to project object-oriented applications and needs to be mapped to be managed in a DBMS (Database Management System). Once this mapping is achieved...
The proposed work explores the application of temporal and versioning management techniques to the content of XML documents. Recently, many methods have been proposed to address this problem. These methods differ in many aspects, such as which XML objects should be temporalized, thus possessing each one a set of advantages and limitations. We present the TVX model, whose goal is to combine these techniques into a single approach; its united use should supply great flexibility in the temporal treatment of XML documents. This will be accomplished through a high-level data model which can be easily implemented in XML format, along with XQuery-like query and data manipulation languages to handle the temporal XML documents. We also present a case study considering a historic query module over the text of the Brazilian Constitution, that demonstrates the employment of the TVX model in a real-life application.
Various programming languages allow the construction of structure-shy programs. Such programs are defined generically for many different datatypes and only specify specific behavior for a few relevant subtypes. Typical examples are XML query languages that allow selection of subdocuments without exhaustively specifying intermediate element tags. Other examples are languages and libraries for polytypic or strategic functional programming and for adaptive object-oriented programming.
In this paper, we present an algebraic approach to transformation of declarative structure-shy programs, in particular for strategic functions and XML queries. We formulate a rich set of algebraic laws, not just for transformation of structure-shy programs, but also for their conversion into structure-sensitive programs and vice versa. We show how subsets of these laws can be used to construct effective rewrite systems for specialization, generalization, and optimization of structure-shy programs. We present a type-safe encoding of these rewrite systems in Haskell which itself uses strategic functional programming techniques. We discuss the application of these rewrite systems for XPath query optimization and for query migration in the context of schema evolution.
A mineração de padrões freqüentes em dados representados por estruturas mais complexas como árvores e grafos vêm crescendo muito nos últimos tempos. Entre as razões para esse crescimento está o fato do padrão arborescente ou em forma de grafo possuir mais informações do que os padrões seqüenciais, e na possibilidade de aplicação desse tipo de mineração em várias áreas como XML Mining, Web Mining e Bioinformática. Um problema que ocorre na mineração de padrões em geral é a grande quantidade de padrões gerados; sendo que muitos deles nem são do interesse do usuário. A diminuição da quantidade de padrões gerados pode ser feita restringido o tipo de padrão produzido através de especificações do usuário. Mesmo incorporando restrições no processo de mineração, a quantidade de padrões arborescentes minerados é grande, o que torna necessário uma ferramenta de análise dos padrões, possibilitando ao usuário especificar consultas para extrair da massa de padrões minerados aqueles que satisfazem os critérios de seleção da consulta. A mineração de padrões com restrição, visa obter como resultado de um processo de mineração apenas os padrões de real interesse do usuário. Uma restrição sobre padrões será representada de acordo com a estrutura dos mesmos. Para a mineração de padrões seqüencias uma forma de representá-la seria através de expressões regulares...
Fonte: Universidade da CorunhaPublicador: Universidade da Corunha
Tipo: Tese de Doutorado
Relevância na Pesquisa
[Abstract] The popularity of the eXtensible Markup Language (XML) has been continuously growing since its first introduction, being today acknowledged as the de facto standard for semi-structured data representation and data exchange on the World Wide Web. In this scenario, several query languages were proposed to exploit the expressiveness of XML data, as well as systems to provide an eficient support. At the same time, as research in compression became more and more relevant, works also focused their efforts on studying new approaches to provide eficient solutions, using the minimum amount of space. Today, however, there is a lack of practical available tools that join both eficient query support, and minimum space requirements.
In this thesis we address this problem, and propose a new approach for storing, processing and querying XML documents in time and space eficient way, by specially focusing on XPath queries. We have developed a new compressed selfindexed representation of XML documents that obtains compression ratios about 30%-40%, over which a query module providing eficient XPath query evaluation has also been developed. As a whole, both parts make up a complete system, we called XXS, for the eficient evaluation of XPath queries over compressed self-indexed XML documents. Experimental results show the outstanding performance of our proposal...
Computation with large-scale heterogeneous data typically requires universal traversal to search for all occurrences of a substructure that matches a possibly complex search pattern, whose context may be different in different places within the data. Both aspects cause difficulty for existing general-purpose programming languages, because these languages are designed for homogeneous data and have problems typing the different substructures in heterogeneous data, and the complex patterns to match with the substructures. Programmers either have to hard-code the structures and search patterns, preventing programs from being reusable and scalable, or
have to use low-level untyped programming or programming with special-purpose query languages, opening the door to type mismatches that cause a high risk of program correctness and security problems.
This thesis invents the concept of pattern structures, and proposes a general solution to the above problems - a programming technique using pattern structures. In this solution, well-typed pattern structures are
defined to represent complex search patterns, and pattern searching over heterogeneous data is programmed with pattern parameters, in a statically-typed language that supports first-class typing of structures and patterns. The resulting programs are statically-typed...
Artículo de publicación ISI; The eXtensible Markup Language (XML) is acknowledged as the de facto standard for semistructured data
representation and data exchange on the Web and many other scenarios. A well-known shortcoming of XML
is its verbosity, which increases manipulation, transmission, and processing costs. Various structure-blind
and structure-conscious compression techniques can be applied to XML, and some are even access-friendly,
meaning that the documents can be efficiently accessed in compressed form. Direct access is necessary to
implement the query languages XPath and XQuery, which are the standard ones to exploit the expressiveness
of XML. While a good deal of theoretical and practical proposals exist to solve XPath/XQuery operations on
XML, only a few ones are well integrated with a compression format that supports the required access
operations on the XML data. In this work we go one step further and design a compression format for XML
collections that boosts the performance of XPath queries on the data. This is done by designing compressed
representations of the XML data that support some complex operations apart from just accessing the data,
and those are exploited to solve key components of the XPath queries. Our system...
Various programming languages allow the construction of structure-shy programs. Such programs are defined generically for many different datatypes and only specify specific behavior for a few relevant subtypes. Typical examples are XML query languages that allow selection of subdocuments without exhaustively specifying intermediate element tags. Other examples are languages and libraries for polytypic or strategic functional programming and for adaptive object-oriented programming. In this paper, we present an algebraic approach to transformation of declarative structure-shy programs, in particular for strategic functions and XML queries. We formulate a rich set of algebraic laws, not just for transformation of structure-shy programs, but also for their conversion into structure-sensitive programs and vice versa. We show how subsets of these laws can be used to construct effective rewrite systems for specialization, generalization, and optimization of structure-shy programs. We present a type-safe encoding of these rewrite systems in Haskell which itself uses strategic functional programming techniques.
Various query languages have been proposed to extract and restructure
information in XML documents. These languages, usually claiming to be
declarative, mainly consider the conjunctive relationships among data elements.
In order to present the operations where the hierarchical and the disjunctive
relationships need to be considered, such as restructuring hierarchy and
handling heterogeneity, the programs in these languages often exhibit a
procedural style and thus the declarativeness in them is not so prominent as in
conventional query languages like SQL.
In this paper, we propose a declarative pattern-based functional XML query
language named XML Tree Query (XTQ). XTQ adopts expressive composite patterns
to present data extraction, meanwhile establishing the conjunctive, the
disjunctive and the hierarchical relationships among data elements. It uses the
matching terms, a composite structure of the variables bound to the matched
data elements, to present a global sketch of the extracted data, and develops a
deductive restructuring mechanism of matching terms to indicate data
transformation, especially for restructuring hierarchy and handling
heterogeneity. Based on matching terms, XTQ employs a coherent approach to
function declaration and invocation to consistently extract and construct
composite data structure...
In contrast to XML query languages as e.g. XPath which require knowledge on
the query language as well as on the document structure, keyword search is open
to anybody. As the size of XML sources grows rapidly, the need for efficient
search indices on XML data that support keyword search increases. In this
paper, we present an approach of XML keyword search which is based on the DAG
of the XML data, where repeated substructures are considered only once, and
therefore, have to be searched only once. As our performance evaluation shows,
this DAG-based extension of the set intersection search algorithm, , can
lead to search times that are on large documents more than twice as fast as the
search times of the XML-based approach. Additionally, we utilize a smaller
index, i.e., we consume less main memory to compute the results.
XML is becoming the most relevant new standard for data representation and
exchange on the WWW. Novel languages for extracting and restructuring the XML
content have been proposed, some in the tradition of database query languages
(i.e. SQL, OQL), others more closely inspired by XML. No standard for XML query
language has yet been decided, but the discussion is ongoing within the World
Wide Web Consortium and within many academic institutions and Internet-related
major companies. We present a comparison of five, representative query
languages for XML, highlighting their common features and differences.; Comment: TeX v3.1415, 17 pages, 6 figures, to be published in ACM Sigmod
Record, March 2000
This thesis describes the theoretical and practical foundations of a system
for the static analysis of XML processing languages. The system relies on a
fixpoint temporal logic with converse, derived from the mu-calculus, where
models are finite trees. This calculus is expressive enough to capture regular
tree types along with multi-directional navigation in trees, while having a
single exponential time complexity. Specifically the decidability of the logic
is proved in time 2^O(n) where n is the size of the input formula.
Major XML concepts are linearly translated into the logic: XPath navigation
and node selection semantics, and regular tree languages (which include DTDs
and XML Schemas). Based on these embeddings, several problems of major
importance in XML applications are reduced to satisfiability of the logic.
These problems include XPath containment, emptiness, equivalence, overlap,
coverage, in the presence or absence of regular tree type constraints, and the
static type-checking of an annotated query.
The focus is then given to a sound and complete algorithm for deciding the
logic, along with a detailed complexity analysis, and crucial implementation
techniques for building an effective solver. Practical experiments using a full
implementation of the system are presented. The system appears to be efficient
in practice for several realistic scenarios.
The main application of this work is a new class of static analyzers for
programming languages using both XPath expressions and XML type annotations
(input and output). Such analyzers allow to ensure at compile-time valuable
properties such as type-safety and optimizations...
Extensible Markup Language (XML) is a simple, very flexible text format
derived from SGML. Originally designed to meet the challenges of large-scale
electronic publishing, XML is also playing an increasingly important role in
the exchange of a wide variety of data on the Web and elsewhere. XPath language
is the result of an effort to provide address parts of an XML document. In
support of this primary purpose, it becomes in a query language against an XML
document. In this paper we present a proposal for the implementation of the
XPath language in logic programming. With this aim we will describe the
representation of XML documents by means of a logic program. Rules and facts
can be used for representing the document schema and the XML document itself.
In particular, we will present how to index XML documents in logic programs:
rules are supposed to be stored in main memory, however facts are stored in
secondary memory by using two kind of indexes: one for each XML tag, and other
for each group of terminal items. In addition, we will study how to query by
means of the XPath language against a logic program representing an XML
document. It evolves the specialization of the logic program with regard to the
XPath expression. Finally, we will also explain how to combine the indexing and
the top-down evaluation of the logic program. To appear in Theory and Practice
of Logic Programming (TPLP)"
Macro tree transducers (mtts) are a useful formal model for XML query and
transformation languages. In this paper one of the fundamental decision
problems on translations, namely the "translation membership problem" is
studied for mtts. For a fixed translation, the translation membership problem
asks whether a given input/output pair is element of the translation. For
call-by-name mtts this problem is shown to be NP-complete. The main result is
that translation membership for call-by-value mtts is in polynomial time. For
several extensions, such as addition of regular look-ahead or the
generalization to multi-return mtts, it is shown that translation membership
still remains in PTIME.; Comment: 9 pages, appeared at International Workshop on Programming Language
Techniques for XML (PLAN-X 2009)
XML database query languages have been studied extensively, but XML database
updates have received relatively little attention, and pose many challenges to
language design. We are developing an XML update language called Flux, which
stands for FunctionaL Updates for XML, drawing upon ideas from functional
programming languages. In prior work, we have introduced a core language for
Flux with a clear operational semantics and a sound, decidable static type
system based on regular expression types.
Our initial proposal had several limitations. First, it lacked support for
recursive types or update procedures. Second, although a high-level source
language can easily be translated to the core language, it is difficult to
propagate meaningful type errors from the core language back to the source.
Third, certain updates are well-formed yet contain path errors, or ``dead''
subexpressions which never do any useful work. It would be useful to detect
path errors, since they often represent errors or optimization opportunities.
In this paper, we address all three limitations. Specifically, we present an
improved, sound type system that handles recursion. We also formalize a source
update language and give a translation to the core language that preserves and
reflects typability. We also develop a path-error analysis (a form of dead-code
analysis) for updates.; Comment: Extended version of ICFP 2008 paper
During the life cycle of an XML application, both schemas and queries may
change from one version to another. Schema evolutions may affect query results
and potentially the validity of produced data. Nowadays, a challenge is to
assess and accommodate the impact of theses changes in rapidly evolving XML
This article proposes a logical framework and tool for verifying
forward/backward compatibility issues involving schemas and queries. First, it
allows analyzing relations between schemas. Second, it allows XML designers to
identify queries that must be reformulated in order to produce the expected
results across successive schema versions. Third, it allows examining more
precisely the impact of schema changes over queries, therefore facilitating
The Web of Data is an open environment consisting of a great number of large
inter-linked RDF datasets from various domains. In this environment,
organizations and companies adopt the Linked Data practices utilizing Semantic
Web (SW) technologies, in order to publish their data and offer SPARQL
endpoints (i.e., SPARQL-based search services). On the other hand, the dominant
standard for information exchange in the Web today is XML. The SW and XML
worlds and their developed infrastructures are based on different data models,
semantics and query languages. Thus, it is crucial to develop interoperability
mechanisms that allow the Web of Data users to access XML datasets, using
SPARQL, from their own working environments. It is unrealistic to expect that
all the existing legacy data (e.g., Relational, XML, etc.) will be transformed
into SW data. Therefore, publishing legacy data as Linked Data and providing
SPARQL endpoints over them has become a major research challenge. In this
direction, we introduce the SPARQL2XQuery Framework which creates an
interoperable environment, where SPARQL queries are automatically translated to
XQuery queries, in order to access XML data across the Web. The SPARQL2XQuery
Framework provides a mapping model for the expression of OWL-RDF/S to XML
Schema mappings as well as a method for SPARQL to XQuery translation. To this
XML database query languages such as XQuery employ regular expression types
with structural subtyping. Subtyping systems typically have two presentations,
which should be equivalent: a declarative version in which the subsumption rule
may be used anywhere, and an algorithmic version in which the use of
subsumption is limited in order to make typechecking syntax-directed and
decidable. However, the XQuery standard type system circumvents this issue by
using imprecise typing rules for iteration constructs and defining only
algorithmic typechecking, and another extant proposal provides more precise
types for iteration constructs but ignores subtyping. In this paper, we
consider a core XQuery-like language with a subsumption rule and prove the
completeness of algorithmic typechecking; this is straightforward for XQuery
proper but requires some care in the presence of more precise iteration typing
disciplines. We extend this result to an XML update language we have introduced
in earlier work.; Comment: ESOP 2008. Companion technical report with proofs