Página 1 dos resultados de 23237 itens digitais encontrados em 0.021 segundos
Resultados filtrados por Publicador: Universidade Cornell

Semantics and Evaluation of Top-k Queries in Probabilistic Databases

Zhang, Xi; Chomicki, Jan
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
36.62%
We study here fundamental issues involved in top-k query evaluation in probabilistic databases. We consider simple probabilistic databases in which probabilities are associated with individual tuples, and general probabilistic databases in which, additionally, exclusivity relationships between tuples can be represented. In contrast to other recent research in this area, we do not limit ourselves to injective scoring functions. We formulate three intuitive postulates that the semantics of top-k queries in probabilistic databases should satisfy, and introduce a new semantics, Global-Topk, that satisfies those postulates to a large degree. We also show how to evaluate queries under the Global-Topk semantics. For simple databases we design dynamic-programming based algorithms, and for general databases we show polynomial-time reductions to the simple cases. For example, we demonstrate that for a fixed k the time complexity of top-k query evaluation is as low as linear, under the assumption that probabilistic databases are simple and scoring functions are injective.; Comment: 60 pages, section 4.4 added, section 6 added, typos corrected

Aggregate Estimation Over Dynamic Hidden Web Databases

Liu, Weimo; Thirumuruganathan, Saravanan; Zhang, Nan; Das, Gautam
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
36.57%
Many databases on the web are "hidden" behind (i.e., accessible only through) their restrictive, form-like, search interfaces. Recent studies have shown that it is possible to estimate aggregate query answers over such hidden web databases by issuing a small number of carefully designed search queries through the restrictive web interface. A problem with these existing work, however, is that they all assume the underlying database to be static, while most real-world web databases (e.g., Amazon, eBay) are frequently updated. In this paper, we study the novel problem of estimating/tracking aggregates over dynamic hidden web databases while adhering to the stringent query-cost limitation they enforce (e.g., at most 1,000 search queries per day). Theoretical analysis and extensive real-world experiments demonstrate the effectiveness of our proposed algorithms and their superiority over baseline solutions (e.g., the repeated execution of algorithms designed for static web databases).

An Approach for Normalizing Fuzzy Relational Databases Based on Join Dependency

S, Deepa
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 11/03/2014
Relevância na Pesquisa
36.57%
Fuzziness in databases is used to denote uncertain or incomplete data. Relational Databases stress on the nature of the data to be certain. This certainty based data is used as the basis of the normalization approach designed for traditional relational databases. But real world data may not always be certain, thereby making it necessary to design an approach for normalization that deals with fuzzy data. This paper focuses on the approach for designing the fifth normal form (5NF) based on join dependencies for fuzzy data. The basis of join dependency for fuzzy relational databases is derived from the basic relational database concepts. As join dependency implies an multivalued dependency by symmetry the proof of join dependency based normalization is stated from the perspective of multivalued dependency based normalization on fuzzy relational databases.; Comment: 3 pages

Empirical Probabilities in Monadic Deductive Databases

Ng, Raymond T.; Subrahmanian, V. S.
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 13/03/2013
Relevância na Pesquisa
36.57%
We address the problem of supporting empirical probabilities in monadic logic databases. Though the semantics of multivalued logic programs has been studied extensively, the treatment of probabilities as results of statistical findings has not been studied in logic programming/deductive databases. We develop a model-theoretic characterization of logic databases that facilitates such a treatment. We present an algorithm for checking consistency of such databases and prove its total correctness. We develop a sound and complete query processing procedure for handling queries to such databases.; Comment: Appears in Proceedings of the Eighth Conference on Uncertainty in Artificial Intelligence (UAI1992)

Defining and Mining Functional Dependencies in Probabilistic Databases

De, Sushovan; Kambhampati, Subbarao
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
36.57%
Functional dependencies -- traditional, approximate and conditional are of critical importance in relational databases, as they inform us about the relationships between attributes. They are useful in schema normalization, data rectification and source selection. Most of these were however developed in the context of deterministic data. Although uncertain databases have started receiving attention, these dependencies have not been defined for them, nor are fast algorithms available to evaluate their confidences. This paper defines the logical extensions of various forms of functional dependencies for probabilistic databases and explores the connections between them. We propose a pruning-based exact algorithm to evaluate the confidence of functional dependencies, a Monte-Carlo based algorithm to evaluate the confidence of approximate functional dependencies and algorithms for their conditional counterparts in probabilistic databases. Experiments are performed on both synthetic and real data evaluating the performance of these algorithms in assessing the confidence of dependencies and mining them from data. We believe that having these dependencies and algorithms available for probabilistic databases will drive adoption of probabilistic data storage in the industry.; Comment: 9 pages...

Querying Databases of Annotated Speech

Cassidy, Steve; Bird, Steven
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 11/04/2002
Relevância na Pesquisa
36.57%
Annotated speech corpora are databases consisting of signal data along with time-aligned symbolic `transcriptions'. Such databases are typically multidimensional, heterogeneous and dynamic. These properties present a number of tough challenges for representation and query. The temporal nature of the data adds an additional layer of complexity. This paper presents and harmonises two independent efforts to model annotated speech databases, one at Macquarie University and one at the University of Pennsylvania. Various query languages are described, along with illustrative applications to a variety of analytical problems. The research reported here forms a part of several ongoing projects to develop platform-independent open-source tools for creating, browsing, searching, querying and transforming linguistic databases, and to disseminate large linguistic databases over the internet.; Comment: 9 pages, 4 figures

Automatic Classification of Text Databases through Query Probing

Ipeirotis, Panagiotis; Gravano, Luis; Sahami, Mehran
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 08/03/2000
Relevância na Pesquisa
36.6%
Many text databases on the web are "hidden" behind search interfaces, and their documents are only accessible through querying. Search engines typically ignore the contents of such search-only databases. Recently, Yahoo-like directories have started to manually organize these databases into categories that users can browse to find these valuable resources. We propose a novel strategy to automate the classification of search-only text databases. Our technique starts by training a rule-based document classifier, and then uses the classifier's rules to generate probing queries. The queries are sent to the text databases, which are then classified based on the number of matches that they produce for each query. We report some initial exploratory experiments that show that our approach is promising to automatically characterize the contents of text databases accessible on the web.; Comment: 7 pages, 1 figure

Scalable Continual Top-k Keyword Search in Relational Databases

XU, Yanwei
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 23/08/2011
Relevância na Pesquisa
36.57%
Keyword search in relational databases has been widely studied in recent years because it does not require users neither to master a certain structured query language nor to know the complex underlying database schemas. Most of existing methods focus on answering snapshot keyword queries in static databases. In practice, however, databases are updated frequently, and users may have long-term interests on specific topics. To deal with such a situation, it is necessary to build effective and efficient facility in a database system to support continual keyword queries. In this paper, we propose an efficient method for answering continual top-$k$ keyword queries over relational databases. The proposed method is built on an existing scheme of keyword search on relational data streams, but incorporates the ranking mechanisms into the query processing methods and makes two improvements to support efficient top-$k$ keyword search in relational databases. Compared to the existing methods, our method is more efficient both in computing the top-$k$ results in a static database and in maintaining the top-$k$ results when the database continually being updated. Experimental results validate the effectiveness and efficiency of the proposed method.