Página 8 dos resultados de 23237 itens digitais encontrados em 0.022 segundos

Querying and Manipulating Temporal Databases

Mkaouar, Mohamed; Bouaziz, Rafik; Moalla, Mohamed
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 03/03/2011
Relevância na Pesquisa
36.31%
Many works have focused, for over twenty five years, on the integration of the time dimension in databases (DB). However, the standard SQL3 does not yet allow easy definition, manipulation and querying of temporal DBs. In this paper, we study how we can simplify querying and manipulating temporal facts in SQL3, using a model that integrates time in a native manner. To do this, we propose new keywords and syntax to define different temporal versions for many relational operators and functions used in SQL. It then becomes possible to perform various queries and updates appropriate to temporal facts. We illustrate the use of these proposals on many examples from a real application.

Unifying Causality, Diagnosis, Repairs and View-Updates in Databases

Bertossi, Leopoldo; Salimi, Babak
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
36.31%
In this work we establish and point out connections between the notion of query-answer causality in databases and database repairs, model-based diagnosis in its consistency-based and abductive versions, and database updates through views. The mutual relationships among these areas of data management and knowledge representation shed light on each of them and help to share notions and results they have in common. In one way or another, these are all approaches to uncertainty management, which becomes even more relevant in the context of big data that have to be made sense of.; Comment: On-line Proc. First International Workshop on Big Uncertain Data (BUDA 2014). Co-located with ACM PODS 2014. arXiv admin note: text overlap with arXiv:1404.6857

Design Issues of JPQ: a Pattern-based Query Language for Document Databases

Li, Xuhui; Liu, Mengchi; Wu, Xiaoying; Zhu, Shanfeng
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 14/04/2015
Relevância na Pesquisa
36.31%
Document databases are becoming popular, but how to present complex document query to obtain useful information from the document remains an important topic to study. In this paper, we describe the design issues of a pattern-based document database query language named JPQ. JPQ uses various expressive patterns to extract and construct document fragments following a JSON-like document data model. It adopts tree-like extraction patterns with a coherent pattern composition mechanism to extract data elements from hierarchically structured documents and maintain the logical relationships among the elements. Based on these relationships, JPQ deploys a deductive mechanism to declaratively specify the data transformation requests and considers also data filtering on hierarchical data structure. We use various examples to show the features of the language and to demonstrate its expressiveness and declarativeness in presenting complex document queries.; Comment: 12 pages

A Data Cleansing Method for Clustering Large-scale Transaction Databases

Loh, Woong-Kee; Moon, Yang-Sae; Kang, Jun-Gyu
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 27/04/2010
Relevância na Pesquisa
36.31%
In this paper, we emphasize the need for data cleansing when clustering large-scale transaction databases and propose a new data cleansing method that improves clustering quality and performance. We evaluate our data cleansing method through a series of experiments. As a result, the clustering quality and performance were significantly improved by up to 165% and 330%, respectively.; Comment: 6 pages, 5 figures

Faster Query Answering in Probabilistic Databases using Read-Once Functions

Roy, Sudeepa; Perduca, Vittorio; Tannen, Val
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
36.31%
A boolean expression is in read-once form if each of its variables appears exactly once. When the variables denote independent events in a probability space, the probability of the event denoted by the whole expression in read-once form can be computed in polynomial time (whereas the general problem for arbitrary expressions is #P-complete). Known approaches to checking read-once property seem to require putting these expressions in disjunctive normal form. In this paper, we tell a better story for a large subclass of boolean event expressions: those that are generated by conjunctive queries without self-joins and on tuple-independent probabilistic databases. We first show that given a tuple-independent representation and the provenance graph of an SPJ query plan without self-joins, we can, without using the DNF of a result event expression, efficiently compute its co-occurrence graph. From this, the read-once form can already, if it exists, be computed efficiently using existing techniques. Our second and key contribution is a complete, efficient, and simple to implement algorithm for computing the read-once forms (whenever they exist) directly, using a new concept, that of co-table graph, which can be significantly smaller than the co-occurrence graph.; Comment: Accepted in ICDT 2011

Exact Indexing for Massive Time Series Databases under Time Warping Distance

Niennattrakul, Vit; Ruengronghirunya, Pongsakorn; Ratanamahatana, Chotirat Ann
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 13/06/2009
Relevância na Pesquisa
36.31%
Among many existing distance measures for time series data, Dynamic Time Warping (DTW) distance has been recognized as one of the most accurate and suitable distance measures due to its flexibility in sequence alignment. However, DTW distance calculation is computationally intensive. Especially in very large time series databases, sequential scan through the entire database is definitely impractical, even with random access that exploits some index structures since high dimensionality of time series data incurs extremely high I/O cost. More specifically, a sequential structure consumes high CPU but low I/O costs, while an index structure requires low CPU but high I/O costs. In this work, we therefore propose a novel indexed sequential structure called TWIST (Time Warping in Indexed Sequential sTructure) which benefits from both sequential access and index structure. When a query sequence is issued, TWIST calculates lower bounding distances between a group of candidate sequences and the query sequence, and then identifies the data access order in advance, hence reducing a great number of both sequential and random accesses. Impressively, our indexed sequential structure achieves significant speedup in a querying process by a few orders of magnitude. In addition...

NoSQL Database: New Era of Databases for Big data Analytics - Classification, Characteristics and Comparison

Moniruzzaman, A B M; Hossain, Syed Akhter
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 30/06/2013
Relevância na Pesquisa
36.31%
Digital world is growing very fast and become more complex in the volume (terabyte to petabyte), variety (structured and un-structured and hybrid), velocity (high speed in growth) in nature. This refers to as Big Data that is a global phenomenon. This is typically considered to be a data collection that has grown so large it can not be effectively managed or exploited using conventional data management tools: e.g., classic relational database management systems (RDBMS) or conventional search engines. To handle this problem, traditional RDBMS are complemented by specifically designed a rich set of alternative DBMS; such as - NoSQL, NewSQL and Search-based systems. This paper motivation is to provide - classification, characteristics and evaluation of NoSQL databases in Big Data Analytics. This report is intended to help users, especially to the organizations to obtain an independent understanding of the strengths and weaknesses of various NoSQL database approaches to supporting applications that process huge volumes of data.; Comment: 14 pages, 10 figures, 44 references used and with authors biographies

LHC Databases on the Grid: Achievements and Open Issues

Vaniachine, A. V.
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 12/07/2010
Relevância na Pesquisa
36.31%
To extract physics results from the recorded data, the LHC experiments are using Grid computing infrastructure. The event data processing on the Grid requires scalable access to non-event data (detector conditions, calibrations, etc.) stored in relational databases. The database-resident data are critical for the event data reconstruction processing steps and often required for physics analysis. This paper reviews LHC experience with database technologies for the Grid computing. List of topics includes: database integration with Grid computing models of the LHC experiments; choice of database technologies; examples of database interfaces; distributed database applications (data complexity, update frequency, data volumes and access patterns); scalability of database access in the Grid computing environment of the LHC experiments. The review describes areas in which substantial progress was made and remaining open issues.; Comment: 10 pages, invited talk presented at the IV International Conference on "Distributed computing and Grid-technologies in science and education" (Grid2010), JINR, Dubna, Russia, June 28 - July 3, 2010

Query-Answer Causality in Databases: Abductive Diagnosis and View-Updates

Salimi, Babak; Bertossi, Leopoldo
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
36.31%
Causality has been recently introduced in databases, to model, characterize and possibly compute causes for query results (answers). Connections between query causality and consistency-based diagnosis and database repairs (wrt. integrity constrain violations) have been established in the literature. In this work we establish connections between query causality and abductive diagnosis and the view-update problem. The unveiled relationships allow us to obtain new complexity results for query causality -the main focus of our work- and also for the two other areas.; Comment: To appear in Proc. UAI Causal Inference Workshop, 2015. One example was fixed

Efficient Subgraph Similarity Search on Large Probabilistic Graph Databases

Yuan, Ye; Wang, Guoren; Chen, Lei; Wang, Haixun
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 30/05/2012
Relevância na Pesquisa
36.31%
Many studies have been conducted on seeking the efficient solution for subgraph similarity search over certain (deterministic) graphs due to its wide application in many fields, including bioinformatics, social network analysis, and Resource Description Framework (RDF) data management. All these works assume that the underlying data are certain. However, in reality, graphs are often noisy and uncertain due to various factors, such as errors in data extraction, inconsistencies in data integration, and privacy preserving purposes. Therefore, in this paper, we study subgraph similarity search on large probabilistic graph databases. Different from previous works assuming that edges in an uncertain graph are independent of each other, we study the uncertain graphs where edges' occurrences are correlated. We formally prove that subgraph similarity search over probabilistic graphs is #P-complete, thus, we employ a filter-and-verify framework to speed up the search. In the filtering phase,we develop tight lower and upper bounds of subgraph similarity probability based on a probabilistic matrix index, PMI. PMI is composed of discriminative subgraph features associated with tight lower and upper bounds of subgraph isomorphism probability. Based on PMI...

Relational Division in Rank-Aware Databases

Vaverka, Ondrej; Vychodil, Vilem
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 02/07/2015
Relevância na Pesquisa
36.31%
We present a survey of existing approaches to relational division in rank-aware databases, discuss issues of the present approaches, and outline generalizations of several types of classic division-like operations. We work in a model which generalizes the Codd model of data by considering tuples in relations annotated by ranks, indicating degrees to which tuples in relations match queries. The approach utilizes complete residuated lattices as the basic structures of degrees. We argue that unlike the classic model, relational divisions are fundamental operations which cannot in general be expressed by means of other operations. In addition, we compare the existing and proposed operations and identify those which are faithful counterparts of universally quantified queries formulated in relational calculi. We introduce Pseudo Tuple Calculus in the ranked model which is further used to show mutual definability of the various forms of divisions presented in the paper.

A Compositional Query Algebra for Second-Order Logic and Uncertain Databases

Koch, Christoph
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 29/07/2008
Relevância na Pesquisa
36.31%
World-set algebra is a variable-free query language for uncertain databases. It constitutes the core of the query language implemented in MayBMS, an uncertain database system. This paper shows that world-set algebra captures exactly second-order logic over finite structures, or equivalently, the polynomial hierarchy. The proofs also imply that world-set algebra is closed under composition, a previously open problem.; Comment: 22 pages, 1 figure

Proposing Cluster_Similarity Method in Order to Find as Much Better Similarities in Databases

Feizi-Derakhshi, Mohammad-Reza; Roohany, Azade
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 14/12/2011
Relevância na Pesquisa
36.31%
Different ways of entering data into databases result in duplicate records that cause increasing of databases' size. This is a fact that we cannot ignore it easily. There are several methods that are used for this purpose. In this paper, we have tried to increase the accuracy of operations by using cluster similarity instead of direct similarity of fields. So that clustering is done on fields of database and according to accomplished clustering on fields, similarity degree of records is obtained. In this method by using present information in database, more logical similarity is obtained for deficient information that in general, the method of cluster similarity could improve operations 24% compared with previous methods.

A Uniform Fixpoint Approach to the Implementation of Inference Methods for Deductive Databases

Behrend, Andreas
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 27/08/2011
Relevância na Pesquisa
36.31%
Within the research area of deductive databases three different database tasks have been deeply investigated: query evaluation, update propagation and view updating. Over the last thirty years various inference mechanisms have been proposed for realizing these main functionalities of a rule-based system. However, these inference mechanisms have been rarely used in commercial DB systems until now. One important reason for this is the lack of a uniform approach well-suited for implementation in an SQL-based system. In this paper, we present such a uniform approach in form of a new version of the soft consequence operator. Additionally, we present improved transformation-based approaches to query optimization and update propagation and view updating which are all using this operator as underlying evaluation mechanism.; Comment: to appear in the Proceedings of the 19th International Conference on Applications of Declarative Programming and Knowledge Management (INAP 2011)

Automating Fine Concurrency Control in Object-Oriented Databases

Malta, Carmelo; Martinez, José
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 25/03/2010
Relevância na Pesquisa
36.31%
Several propositions were done to provide adapted concurrency control to object-oriented databases. However, most of these proposals miss the fact that considering solely read and write access modes on instances may lead to less parallelism than in relational databases! This paper cope with that issue, and advantages are numerous: (1) commutativity of methods is determined a priori and automatically by the compiler, without measurable overhead, (2) run-time checking of commutativity is as efficient as for compatibility, (3) inverse operations need not be specified for recovery, (4) this scheme does not preclude more sophisticated approaches, and, last but not least, (5) relational and object-oriented concurrency control schemes with read and write access modes are subsumed under this proposition.

The evaluation of geometric queries: constraint databases and quantifier elimination

Giusti, Marc; Heintz, Joos; Kuijpers, Bart
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
36.31%
We model the algorithmic task of geometric elimination (e.g., quantifier elimination in the elementary field theories of real and complex numbers) by means of certain constraint database queries, called geometric queries. As a particular case of such a geometric elimination task, we consider sample point queries. We show exponential lower complexity bounds for evaluating geometric queries in the general and in the particular case of sample point queries. Although this paper is of theoretical nature, its aim is to explore the possibilities and (complexity-)limits of computer implemented query evaluation algorithms for Constraint Databases, based on the principles of the most advanced geometric elimination procedures and their implementations, like, e.g., the software package "Kronecker".; Comment: This paper is representing work in progress of the authors. It is not aimed for publication in the present form

Forecasting the cost of processing multi-join queries via hashing for main-memory databases (Extended version)

Liu, Feilong; Blanas, Spyros
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
36.31%
Database management systems (DBMSs) carefully optimize complex multi-join queries to avoid expensive disk I/O. As servers today feature tens or hundreds of gigabytes of RAM, a significant fraction of many analytic databases becomes memory-resident. Even after careful tuning for an in-memory environment, a linear disk I/O model such as the one implemented in PostgreSQL may make query response time predictions that are up to 2X slower than the optimal multi-join query plan over memory-resident data. This paper introduces a memory I/O cost model to identify good evaluation strategies for complex query plans with multiple hash-based equi-joins over memory-resident data. The proposed cost model is carefully validated for accuracy using three different systems, including an Amazon EC2 instance, to control for hardware-specific differences. Prior work in parallel query evaluation has advocated right-deep and bushy trees for multi-join queries due to their greater parallelization and pipelining potential. A surprising finding is that the conventional wisdom from shared-nothing disk-based systems does not directly apply to the modern shared-everything memory hierarchy. As corroborated by our model, the performance gap between the optimal left-deep and right-deep query plan can grow to about 10X as the number of joins in the query increases.; Comment: 15 pages...

Explicit probabilistic models for databases and networks

De Bie, Tijl
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 29/06/2009
Relevância na Pesquisa
36.31%
Recent work in data mining and related areas has highlighted the importance of the statistical assessment of data mining results. Crucial to this endeavour is the choice of a non-trivial null model for the data, to which the found patterns can be contrasted. The most influential null models proposed so far are defined in terms of invariants of the null distribution. Such null models can be used by computation intensive randomization approaches in estimating the statistical significance of data mining results. Here, we introduce a methodology to construct non-trivial probabilistic models based on the maximum entropy (MaxEnt) principle. We show how MaxEnt models allow for the natural incorporation of prior information. Furthermore, they satisfy a number of desirable properties of previously introduced randomization approaches. Lastly, they also have the benefit that they can be represented explicitly. We argue that our approach can be used for a variety of data types. However, for concreteness, we have chosen to demonstrate it in particular for databases and networks.; Comment: Submitted

Adaptive Logging for Distributed In-memory Databases

Yao, Chang; Agrawal, Divyakant; Chen, Gang; Ooi, Beng Chin; Wu, Sai
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
36.31%
A new type of logs, the command log, is being employed to replace the traditional data log (e.g., ARIES log) in the in-memory databases. Instead of recording how the tuples are updated, a command log only tracks the transactions being executed, thereby effectively reducing the size of the log and improving the performance. Command logging on the other hand increases the cost of recovery, because all the transactions in the log after the last checkpoint must be completely redone in case of a failure. In this paper, we first extend the command logging technique to a distributed environment, where all the nodes can perform recovery in parallel. We then propose an adaptive logging approach by combining data logging and command logging. The percentage of data logging versus command logging becomes an optimization between the performance of transaction processing and recovery to suit different OLTP applications. Our experimental study compares the performance of our proposed adaptive logging, ARIES-style data logging and command logging on top of H-Store. The results show that adaptive logging can achieve a 10x boost for recovery and a transaction throughput that is comparable to that of command logging.; Comment: 13 pages

Using Object-Relational Mapping to Create the Distributed Databases in a Hybrid Cloud Infrastructure

Lukyanchikov, Oleg; Pluzhnik, Evgeniy; Payain, Simon; Nikulchev, Evgeny
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 04/01/2015
Relevância na Pesquisa
36.31%
One of the challenges currently problems in the use of cloud services is the task of designing of specialized data management systems. This is especially important for hybrid systems in which the data are located in public and private clouds. Implementation monitoring functions querying, scheduling and processing software must be properly implemented and is an integral part of the system. To provide these functions is proposed to use an object-relational mapping (ORM). The article devoted to presenting the approach of designing databases for information systems hosted in a hybrid cloud infrastructure. It also provides an example of the development of ORM library.