Página 1 dos resultados de 1721 itens digitais encontrados em 0.012 segundos

## Operações de consulta por similaridade em grandes bases de dados complexos; Similarity search operations in large complex databases

Barioni, Maria Camila Nardini
Fonte: Biblioteca Digitais de Teses e Dissertações da USP Publicador: Biblioteca Digitais de Teses e Dissertações da USP
Tipo: Dissertação de Mestrado Formato: application/pdf
Relevância na Pesquisa
56.25%

## Sistema de busca e exibição de dados georreferenciados; Georeferenced data search and display system

Santos, Vinícius Rosa dos
Tipo: Trabalho de Conclusão de Curso Formato: application/pdf
POR
Relevância na Pesquisa
56.11%

## Similaridade de series temporais na bolsa de valores; Time Series similarity applied to Brazilian stock market

Jeske, Jonas
Tipo: Trabalho de Conclusão de Curso Formato: application/pdf
POR
Relevância na Pesquisa
56.27%

## Busca por similaridade em uma base de dados de genealogia; Similarity search in a personal database

Veronez, Rovian Voelz
Tipo: Trabalho de Conclusão de Curso Formato: application/pdf
POR
Relevância na Pesquisa
66.26%
Na área da genealogia, nomes são muitas vezes grafados de várias maneiras diferentes, porém semelhantes. Os motivos para isto são vários, desde mudanças na gramática ao longo dos anos, diferença na grafia de certos nomes em línguas diferentes e até por erros ortográficos cometidos ao longo da história. Portanto, é importante que ao realizar uma pesquisa em uma base de dados genealógica, exista a opção de realizar uma pesquisa por palavras similares, para que resultados relevantes não sejam ignorados por não serem idênticos à palavra pesquisada. Banco de dados relacionais não oferecem naturalmente o suporte para que uma busca por similaridade seja feita, por isso, este trabalho se propõe a apresentar uma implementação de uma busca por similaridade no software de genealogia TNG, com uma técnica eficiente, evitando assim a perda de resultados relevantes em uma pesquisa histórica.; In genealogy, names are often spelled in different ways, although similar. There are a plenty of reasons for this, from grammar changes over the years, difference between names spellings in different languages and even spelling errors committed throughout history. For this reason, it is important to have the option of making a similarity search when a research is made over a genealogical database...

## Biosequence Similarity Search on the Mercury System

Krishnamurthy, Praveen; Buhler, Jeremy; Chamberlain, Roger; Franklin, Mark; Gyang, Kwame; Jacob, Arpith; Lancaster, Joseph
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.21%
Biosequence similarity search is an important application in modern molecular biology. Search algorithms aim to identify sets of sequences whose extensional similarity suggests a common evolutionary origin or function. The most widely used similarity search tool for biosequences is BLAST, a program designed to compare query sequences to a database. Here, we present the design of BLASTN, the version of BLAST that searches DNA sequences, on the Mercury system, an architecture that supports high-volume, high-throughput data movement off a data store and into reconfigurable hardware. An important component of application deployment on the Mercury system is the functional decomposition of the application onto both the reconfigurable hardware and the traditional processor. Both the Mercury BLASTN application design and its performance analysis are described.

## G-Hash: Towards Fast Kernel-based Similarity Search in Large Graph Databases

Wang, Xiaohong; Smalter, Aaron; Huan, Jun; Lushington, Gerald H.
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.21%
Structured data including sets, sequences, trees and graphs, pose significant challenges to fundamental aspects of data management such as efficient storage, indexing, and similarity search. With the fast accumulation of graph databases, similarity search in graph databases has emerged as an important research topic. Graph similarity search has applications in a wide range of domains including cheminformatics, bioinformatics, sensor network management, social network management, and XML documents, among others.

## RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data

Zhao, Yongan; Tang, Haixu; Ye, Yuzhen
Fonte: Oxford University Press Publicador: Oxford University Press
Tipo: Artigo de Revista Científica
EN
Relevância na Pesquisa
46.21%
Summary: With the wide application of next-generation sequencing (NGS) techniques, fast tools for protein similarity search that scale well to large query datasets and large databases are highly desirable. In a previous work, we developed RAPSearch, an algorithm that achieved a ~20–90-fold speedup relative to BLAST while still achieving similar levels of sensitivity for short protein fragments derived from NGS data. RAPSearch, however, requires a substantial memory footprint to identify alignment seeds, due to its use of a suffix array data structure. Here we present RAPSearch2, a new memory-efficient implementation of the RAPSearch algorithm that uses a collision-free hash table to index a similarity search database. The utilization of an optimized data structure further speeds up the similarity search—another 2–3 times. We also implemented multi-threading in RAPSearch2, and the multi-thread modes achieve significant acceleration (e.g. 3.5X for 4-thread mode). RAPSearch2 requires up to 2G memory when running in single thread mode, or up to 3.5G memory when running in 4-thread mode.

## Querying Event Sequences by Exact Match or Similarity Search: Design and Empirical Evaluation

Wongsuphasawat, Krist; Plaisant, Catherine; Taieb-Maimon, Meirav; Shneiderman, Ben
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.24%
Specifying event sequence queries is challenging even for skilled computer professionals familiar with SQL. Most graphical user interfaces for database search use an exact match approach, which is often effective, but near misses may also be of interest. We describe a new similarity search interface, in which users specify a query by simply placing events on a blank timeline and retrieve a similarity-ranked list of results. Behind this user interface is a new similarity measure for event sequences which the users can customize by four decision criteria, enabling them to adjust the impact of missing, extra, or swapped events or the impact of time shifts. We describe a use case with Electronic Health Records based on our ongoing collaboration with hospital physicians. A controlled experiment with 18 participants compared exact match and similarity search interfaces. We report on the advantages and disadvantages of each interface and suggest a hybrid interface combining the best of both.

## Distributed Efficient Similarity Search Mechanism in Wireless Sensor Networks

Ahmed, Khandakar; Gregory, Mark A.
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.26%
The Wireless Sensor Network similarity search problem has received considerable research attention due to sensor hardware imprecision and environmental parameter variations. Most of the state-of-the-art distributed data centric storage (DCS) schemes lack optimization for similarity queries of events. In this paper, a DCS scheme with metric based similarity searching (DCSMSS) is proposed. DCSMSS takes motivation from vector distance index, called iDistance, in order to transform the issue of similarity searching into the problem of an interval search in one dimension. In addition, a sector based distance routing algorithm is used to efficiently route messages. Extensive simulation results reveal that DCSMSS is highly efficient and significantly outperforms previous approaches in processing similarity search queries.

## Optimizing similarity queries in metric spaces meeting user\'s expectation; Otimização de operações de busca por similaridade em espaços métricos

Ferreira, Mônica Ribeiro Porto
Fonte: Biblioteca Digitais de Teses e Dissertações da USP Publicador: Biblioteca Digitais de Teses e Dissertações da USP
Tipo: Tese de Doutorado Formato: application/pdf
Relevância na Pesquisa
46.31%
The complexity of data stored in large databases has increased at very fast paces. Hence, operations more elaborated than traditional queries are essential in order to extract all required information from the database. Therefore, the interest of the database community in similarity search has increased significantly. Two of the well-known types of similarity search are the Range (\'R IND. q\') and the k-Nearest Neighbor (\'kNN IND. q\') queries, which, as any of the traditional ones, can be sped up by indexing structures of the Database Management System (DBMS). Another way of speeding up queries is to perform query optimization. In this process, metrics about data are collected and employed to adjust the parameters of the search algorithms in each query execution. However, although the integration of similarity search into DBMS has begun to be deeply studied more recently, the query optimization has been developed and employed just to answer traditional queries. The execution of similarity queries, even using efficient indexing structures, tends to present higher computational cost than the execution of traditional ones. Two strategies can be applied to speed up the execution of any query, and thus they are worth to employ to answer also similarity queries. The first strategy is query rewriting based on algebraic properties and cost functions. The second technique is when external query factors are applied...

## Dynamic Similarity Search in MultiMetric Spaces

Skopal, Tomás; Bustos Cárdenas, Benjamín Eugenio
Tipo: Artículo de revista
EN_US
Relevância na Pesquisa
46.21%
Artículo de publicación ISI; An important research issue in multimedia databases is the retrieval of similar objects. For most applications in multimedia databases, an exact search is not meaningful. Thus, much effort has been devoted to develop efficient and effective similarity search techniques. A recent approach, that has been shown to improve the effectiveness of similarity search in multimedia databases, resorts to the usage of combinations of metrics where the desirable contribution (weight) of each metric is chosen at query time. This paper presents the Multi-Metric M-tree (M3-tree), a metric access method that supports similarity queries with dynamic combinations of metric functions. The M3-tree, an extension of the Mtree, stores partial distances to better estimate the weighed distances between routing/ground entries and each query, where a single distance function is used to build the whole index. An experimental evaluation shows that the M3-tree may be as efficient as having multiple M-trees (one for each combination of metrics).

## Bayesian Locality Sensitive Hashing for Fast Similarity Search

Satuluri, Venu; Parthasarathy, Srinivasan
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.29%
Given a collection of objects and an associated similarity measure, the all-pairs similarity search problem asks us to find all pairs of objects with similarity greater than a certain user-specified threshold. Locality-sensitive hashing (LSH) based methods have become a very popular approach for this problem. However, most such methods only use LSH for the first phase of similarity search - i.e. efficient indexing for candidate generation. In this paper, we present BayesLSH, a principled Bayesian algorithm for the subsequent phase of similarity search - performing candidate pruning and similarity estimation using LSH. A simpler variant, BayesLSH-Lite, which calculates similarities exactly, is also presented. BayesLSH is able to quickly prune away a large majority of the false positive candidate pairs, leading to significant speedups over baseline approaches. For BayesLSH, we also provide probabilistic guarantees on the quality of the output, both in terms of accuracy and recall. Finally, the quality of BayesLSH's output can be easily tuned and does not require any manual setting of the number of hashes to use for similarity estimation, unlike standard approaches. For two state-of-the-art candidate generation algorithms, AllPairs and LSH...

## Efficient Subgraph Similarity Search on Large Probabilistic Graph Databases

Yuan, Ye; Wang, Guoren; Chen, Lei; Wang, Haixun
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.26%
Many studies have been conducted on seeking the efficient solution for subgraph similarity search over certain (deterministic) graphs due to its wide application in many fields, including bioinformatics, social network analysis, and Resource Description Framework (RDF) data management. All these works assume that the underlying data are certain. However, in reality, graphs are often noisy and uncertain due to various factors, such as errors in data extraction, inconsistencies in data integration, and privacy preserving purposes. Therefore, in this paper, we study subgraph similarity search on large probabilistic graph databases. Different from previous works assuming that edges in an uncertain graph are independent of each other, we study the uncertain graphs where edges' occurrences are correlated. We formally prove that subgraph similarity search over probabilistic graphs is #P-complete, thus, we employ a filter-and-verify framework to speed up the search. In the filtering phase,we develop tight lower and upper bounds of subgraph similarity probability based on a probabilistic matrix index, PMI. PMI is composed of discriminative subgraph features associated with tight lower and upper bounds of subgraph isomorphism probability. Based on PMI...

## SEAL: Spatio-Textual Similarity Search

Fan, Ju; Li, Guoliang; Zhou, Lizhu; Chen, Shanshan; Hu, Jun
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.23%
Location-based services (LBS) have become more and more ubiquitous recently. Existing methods focus on finding relevant points-of-interest (POIs) based on users' locations and query keywords. Nowadays, modern LBS applications generate a new kind of spatio-textual data, regions-of-interest (ROIs), containing region-based spatial information and textual description, e.g., mobile user profiles with active regions and interest tags. To satisfy search requirements on ROIs, we study a new research problem, called spatio-textual similarity search: Given a set of ROIs and a query ROI, we find the similar ROIs by considering spatial overlap and textual similarity. Spatio-textual similarity search has many important applications, e.g., social marketing in location-aware social networks. It calls for an efficient search method to support large scales of spatio-textual data in LBS systems. To this end, we introduce a filter-and-verification framework to compute the answers. In the filter step, we generate signatures for the ROIs and the query, and utilize the signatures to generate candidates whose signatures are similar to that of the query. In the verification step, we verify the candidates and identify the final answers. To achieve high performance...

## Scalable Locality-Sensitive Hashing for Similarity Search in High-Dimensional, Large-Scale Multimedia Datasets

Teixeira, Thiago S. F. X.; Teodoro, George; Valle, Eduardo; Saltz, Joel H.
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.28%
Similarity search is critical for many database applications, including the increasingly popular online services for Content-Based Multimedia Retrieval (CBMR). These services, which include image search engines, must handle an overwhelming volume of data, while keeping low response times. Thus, scalability is imperative for similarity search in Web-scale applications, but most existing methods are sequential and target shared-memory machines. Here we address these issues with a distributed, efficient, and scalable index based on Locality-Sensitive Hashing (LSH). LSH is one of the most efficient and popular techniques for similarity search, but its poor referential locality properties has made its implementation a challenging problem. Our solution is based on a widely asynchronous dataflow parallelization with a number of optimizations that include a hierarchical parallelization to decouple indexing and data storage, locality-aware data partition strategies to reduce message passing, and multi-probing to limit memory usage. The proposed parallelization attained an efficiency of 90% in a distributed system with about 800 CPU cores. In particular, the original locality-aware data partition reduced the number of messages exchanged in 30%. Our parallel LSH was evaluated using the largest public dataset for similarity search (to the best of our knowledge) with $10^9$ 128-d SIFT descriptors extracted from Web images. This is two orders of magnitude larger than datasets that previous LSH parallelizations could handle.

## Representation Independent Proximity and Similarity Search

Chodpathumwan, Yodsawalai; Aleyasin, Amirhossein; Termehchy, Arash; Sun, Yizhou
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.38%
Finding similar or strongly related entities in a graph database is a fundamental problem in data management and analytics with applications in similarity query processing, entity resolution, and pattern matching. Similarity search algorithms usually leverage the structural properties of the data graph to quantify the degree of similarity or relevance between entities. Nevertheless, the same information can be represented in many different structures and the structural properties observed over particular representations do not necessarily hold for alternative structures. Thus, these algorithms are effective on some representations and ineffective on others. We postulate that a similarity search algorithm should return essentially the same answers over different databases that represent the same information. We formally define the property of representation independence for similarity search algorithms as their robustness against transformations that modify the structure of databases and preserve their information content. We formalize two widespread groups of such transformations called {\it relationship reorganizing} and {\it entity rearranging} transformations. We show that current similarity search algorithms are not representation independent under these transformations and propose an algorithm called {\bf R-PathSim}...

## Curse of Dimensionality in the Application of Pivot-based Indexes to the Similarity Search Problem

Volnyansky, Ilya
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.23%
In this work we study the validity of the so-called curse of dimensionality for indexing of databases for similarity search. We perform an asymptotic analysis, with a test model based on a sequence of metric spaces $(\Omega_d)$ from which we pick datasets $X_d$ in an i.i.d. fashion. We call the subscript $d$ the dimension of the space $\Omega_d$ (e.g. for $\mathbb{R}^d$ the dimension is just the usual one) and we allow the size of the dataset $n=n_d$ to be such that $d$ is superlogarithmic but subpolynomial in $n$. We study the asymptotic performance of pivot-based indexing schemes where the number of pivots is $o(n/d)$. We pick the relatively simple cost model of similarity search where we count each distance calculation as a single computation and disregard the rest. We demonstrate that if the spaces $\Omega_d$ exhibit the (fairly common) concentration of measure phenomenon the performance of similarity search using such indexes is asymptotically linear in $n$. That is for large enough $d$ the difference between using such an index and performing a search without an index at all is negligeable. Thus we confirm the curse of dimensionality in this setting.; Comment: 56 pages, 7 figures Master's Thesis in Mathematics, University of Ottawa (Canada) Supervisor: Vladimir Pestov

## Performance Evaluation and Optimization of Math-Similarity Search

Zhang, Qun; Youssef, Abdou
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.29%
Similarity search in math is to find mathematical expressions that are similar to a user's query. We conceptualized the similarity factors between mathematical expressions, and proposed an approach to math similarity search (MSS) by defining metrics based on those similarity factors [11]. Our preliminary implementation indicated the advantage of MSS compared to non-similarity based search. In order to more effectively and efficiently search similar math expressions, MSS is further optimized. This paper focuses on performance evaluation and optimization of MSS. Our results show that the proposed optimization process significantly improved the performance of MSS with respect to both relevance ranking and recall.; Comment: 15 pages, 8 figures

## Fast Structural Similarity Search of Noncoding RNAs Based on Matched Filtering of Stem Patterns

Yoon, Byung-Jun; Vaidyanathan, P. P.
Tipo: Book Section; PeerReviewed Formato: application/pdf