Página 1 dos resultados de 50 itens digitais encontrados em 0.002 segundos

VGLib2D: class library for the 2D graphic visualisation of large volumes of data

Gameiro, Sofia; Almeida, Luís; Marcos, Adérito
Fonte: ADETTI-ISCTE Publicador: ADETTI-ISCTE
Tipo: Artigo de Revista Científica
Publicado em /10/2004 ENG
Relevância na Pesquisa
26.13%
This article presents the implementation of a class library in C# language, which uses the recent .Net technologies, aimed at the creation of documents or images in Scalable Vector Graphics (SVG) format, for the representation, visualisation and printing of large volumes of two-dimensional (2D) data. The goal is to take advantage of SVG for applications implementation effects that, for instance, are Web based and where the graphical representation of large volumes of data, both dynamically and in real time, is necessary, along with its immediate availability for the purposes of visualisation, printing and downloading.

Document engineering approaches toward scalable and structured multimedia, web and printable documents

PIMENTEL, Maria da Graca; BULTERMAN, Dick C. A.; SOARES, Luiz Fernando Gomes
Fonte: SPRINGER Publicador: SPRINGER
Tipo: Artigo de Revista Científica
ENG
Relevância na Pesquisa
46.5%
Document engineering is the computer science discipline that investigates systems for documents in any form and in all media. As with the relationship between software engineering and software, document engineering is concerned with principles, tools and processes that improve our ability to create, manage, and maintain documents (http://www.documentengineering.org). The ACM Symposium on Document Engineering is an annual meeting of researchers active in document engineering: it is sponsored by ACM by means of the ACM SIGWEB Special Interest Group. In this editorial, we first point to work carried out in the context of document engineering, which are directly related to multimedia tools and applications. We conclude with a summary of the papers presented in this special issue.

Mobile Usage at the Base of the Pyramid in South Africa

World Bank
Fonte: Washington, DC Publicador: Washington, DC
EN_US
Relevância na Pesquisa
25.94%
Mobile phones are the primary means of accessing information or communicating for those who live at the base of the pyramid (BoP). It is likely that the mobile phone will therefore also be the preferred medium to provide value-added services to those at the BoP, whether they are private users or informal businesses, for the foreseeable future. Although the prepaid mobile model has brought voice and text services to this group, sustainable, replicable models for enhanced services, products and applications are far more limited. The purpose of the study is to investigate the demand for mobile applications, services and products, with a view to increasing economic opportunities and improving well-being for users at the BoP. The key objectives of the study are the following: i) to increase understanding of the actual usage of mobile services, products and applications at the BoP, and to understand their potential for economic and social empowerment; ii) to identify scalable examples of services, products and applications at the concept...

Central America : Big Data in Action for Development

World Bank
Fonte: Washington, DC Publicador: Washington, DC
EN_US
Relevância na Pesquisa
25.94%
This report stemmed from a World Bank pilot activity to explore the potential of big data to address development challenges in Central American countries. As part of this activity we collected and analyzed a number of examples of leveraging big data for development. Because of the growing interest in this topic this report makes available to a broader audience those examples as well as the underlying conceptual framework to think about big data for development. To make effective use of big data, many practitioners emphasize the importance of beginning with a question instead of the data itself. A question clarifies the purpose of utilizing big data, whether it is for awareness, understanding, and/or forecasting. In addition, a question suggests the kinds of real-world behaviors or conditions that are of interest. These behaviors are encoded into data through some generating process which includes the media through which behavior is captured. Then various data sources are accessed, prepared, consolidated and analyzed. This ultimately gives rise to insights into the question of interest...

Estratégias de paralelização para renderização de documentos XSL-FO com uso da ferramenta FOP

Zambon, Rogério Timmers
Fonte: Pontifícia Universidade Católica do Rio Grande do Sul; Porto Alegre Publicador: Pontifícia Universidade Católica do Rio Grande do Sul; Porto Alegre
Tipo: Dissertação de Mestrado
PORTUGUêS
Relevância na Pesquisa
16.39%
Grandes volumes de trabalho para impressão são cada vez mais comuns devido ao aumento da demanda por documentos personalizados. Neste contexto, Impressão de Dados Variáveis (Variable Data Printing - VDP) tornou-se uma ferramenta muito útil para profissionais de marketing que necessitam personalizar mensagens para cada cliente em materiais promocionais e campanhas de publicidade. VDP permite a criação de documentos baseados em um modelo (template) contendo partes estáticas e variáveis. A ferramenta de renderização deve ser capaz de transformar a parte variável em um formato composto, ou PDL (Page Description Language) tais como PDF (Portable Document Format), PS (PostScript) ou SVG (Scalable Vector Graphics). A quantidade de conteúdo variável em um documentoé totalmente dependente do modelo (layout) da publicação definido por um profissional da área. Além disso, o conteúdo variável a ser renderizado pode variar de acordo com os dados lidos do banco de dados. Desta forma, este processoé chamado repetidamente e pode tornar-se facilmente um gargalo, especialmente em um ambiente de produção comprometendo inteiramente a geração de um documento. Neste cenário, técnicas de alto desempenho aparecem como uma interessante alternativa para aumentar o rendimento da fase de renderização. Este trabalho introduz uma solução paralela portável e escalável para a ferramenta de renderização chamada FOP (Formatting Objects Processor)...

Raw fabric hardware implementation and characterization

Sun, Albert (Albert G.)
Fonte: Massachusetts Institute of Technology Publicador: Massachusetts Institute of Technology
Tipo: Tese de Doutorado Formato: 110 p.
ENG
Relevância na Pesquisa
16.23%
The Raw architecture is scalable, improving performance not by pushing the limits of clock frequency, but by spreading computation across numerous simple, replicated tiles. The first Raw processors fabricated have 16 RISC processor tiles that share the workload. The Raw Fabric system extends Raw's scalability by weaving together multiple 16-tile Raw processors. The Raw Fabric is a modular and scalable system comprised of two board types: one to house 4 Raw processors (Processor board) and one to handle communications (I/O board). The design is modular because it breaks down the system into smaller parts, and it is scalable because these modules may be combined to create large Fabrics. The ultimate goal is to produce a Raw Fabric with 16 Processor boards (equivalently, 64 Raw processors or 1024 tiles), though the current largest Fabric system includes one Processor board and 3 I/O boards. This thesis walks through the important design and implementation challenges and documents how they were solved. The most basic challenge faced was to design a system flexible enough to accommodate a variety of Fabric sizes.; (cont.) Next, the distribution of vital signals such as power and clock provides a problem unique to the Fabric system because of the possible size of the final product. Finally...

A method for interactive medical instruction utilizing the World Wide Web.

McEnery, K. W.; Roth, S. M.; Kelley, L. K.; Hirsch, K. R.; Menton, D. N.; Kelly, E. A.
Fonte: American Medical Informatics Association Publicador: American Medical Informatics Association
Tipo: Artigo de Revista Científica
Publicado em //1995 EN
Relevância na Pesquisa
16.18%
We describe the implementation of interactive medical teaching programs in radiology and histology which utilize the Internet's World Wide Web (WWW). The WWW standard hypertext interface allows for simple navigation between related documents but does not provide a method for student tracking or question queries. Electronic forms, a recent feature of the WWW, provide the means to present question documents to remote clients and track student performance. A feature of our approach is dynamic creation of HTML documents based upon interaction with database applications. The approach allows multiple simultaneous, yet asynchronous interactions by geographically dispersed students upon the same instructional database and is scalable, providing the capability for multiple image/document servers. The security of the database is assured given that it is not accessible through the Internet.

Scaling Inference for Markov Logic with a Task-Decomposition Approach

Niu, Feng; Zhang, Ce; Ré, Christopher; Shavlik, Jude
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
16.23%
Motivated by applications in large-scale knowledge base construction, we study the problem of scaling up a sophisticated statistical inference framework called Markov Logic Networks (MLNs). Our approach, Felix, uses the idea of Lagrangian relaxation from mathematical programming to decompose a program into smaller tasks while preserving the joint-inference property of the original MLN. The advantage is that we can use highly scalable specialized algorithms for common tasks such as classification and coreference. We propose an architecture to support Lagrangian relaxation in an RDBMS which we show enables scalable joint inference for MLNs. We empirically validate that Felix is significantly more scalable and efficient than prior approaches to MLN inference by constructing a knowledge base from 1.8M documents as part of the TAC challenge. We show that Felix scales and achieves state-of-the-art quality numbers. In contrast, prior approaches do not scale even to a subset of the corpus that is three orders of magnitude smaller.

Extraction of Salient Sentences from Labelled Documents

Denil, Misha; Demiraj, Alban; de Freitas, Nando
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
25.94%
We present a hierarchical convolutional document model with an architecture designed to support introspection of the document structure. Using this model, we show how to use visualisation techniques from the computer vision literature to identify and extract topic-relevant sentences. We also introduce a new scalable evaluation technique for automatic sentence extraction systems that avoids the need for time consuming human annotation of validation data.; Comment: arXiv admin note: substantial text overlap with arXiv:1406.3830

Boosting XML Filtering with a Scalable FPGA-based Architecture

Mitra, Abhishek; Vieira, Marcos; Bakalov, Petko; Najjar, Walid; Tsotras, Vassilis
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 09/09/2009
Relevância na Pesquisa
26.29%
The growing amount of XML encoded data exchanged over the Internet increases the importance of XML based publish-subscribe (pub-sub) and content based routing systems. The input in such systems typically consists of a stream of XML documents and a set of user subscriptions expressed as XML queries. The pub-sub system then filters the published documents and passes them to the subscribers. Pub-sub systems are characterized by very high input ratios, therefore the processing time is critical. In this paper we propose a "pure hardware" based solution, which utilizes XPath query blocks on FPGA to solve the filtering problem. By utilizing the high throughput that an FPGA provides for parallel processing, our approach achieves drastically better throughput than the existing software or mixed (hardware/software) architectures. The XPath queries (subscriptions) are translated to regular expressions which are then mapped to FPGA devices. By introducing stacks within the FPGA we are able to express and process a wide range of path queries very efficiently, on a scalable environment. Moreover, the fact that the parser and the filter processing are performed on the same FPGA chip, eliminates expensive communication costs (that a multi-core system would need) thus enabling very fast and efficient pipelining. Our experimental evaluation reveals more than one order of magnitude improvement compared to traditional pub/sub systems.; Comment: CIDR 2009

ZenLDA: An Efficient and Scalable Topic Model Training System on Distributed Data-Parallel Platform

Zhao, Bo; Zhou, Hucheng; Li, Guoqiang; Huang, Yihua
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 02/11/2015
Relevância na Pesquisa
26.13%
This paper presents our recent efforts, zenLDA, an efficient and scalable Collapsed Gibbs Sampling system for Latent Dirichlet Allocation training, which is thought to be challenging that both data parallelism and model parallelism are required because of the Big sampling data with up to billions of documents and Big model size with up to trillions of parameters. zenLDA combines both algorithm level improvements and system level optimizations. It first presents a novel CGS algorithm that balances the time complexity, model accuracy and parallelization flexibility. The input corpus in zenLDA is represented as a directed graph and model parameters are annotated as the corresponding vertex attributes. The distributed training is parallelized by partitioning the graph that in each iteration it first applies CGS step for all partitions in parallel, followed by synchronizing the computed model each other. In this way, both data parallelism and model parallelism are achieved by converting them to graph parallelism. We revisited the tradeoff between system efficiency and model accuracy and presented approximations such as unsynchronized model, sparse model initialization and "converged" token exclusion. zenLDA is built on GraphX in Spark that provides distributed data abstraction (RDD) and expressive APIs to simplify the programming efforts and simultaneously hides the system complexities. This enables us to implement other CGS algorithm with a few lines of code change. To better fit in distributed data-parallel framework and achieve comparable performance with contemporary systems...

Scalable XSLT Evaluation

Guo, Zhimao; Li, Min; Wang, Xiaoling; Zhou, Aoying
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 21/08/2004
Relevância na Pesquisa
26.1%
XSLT is an increasingly popular language for processing XML data. It is widely supported by application platform software. However, little optimization effort has been made inside the current XSLT processing engines. Evaluating a very simple XSLT program on a large XML document with a simple schema may result in extensive usage of memory. In this paper, we present a novel notion of \emph{Streaming Processing Model} (\emph{SPM}) to evaluate a subset of XSLT programs on XML documents, especially large ones. With SPM, an XSLT processor can transform an XML source document to other formats without extra memory buffers required. Therefore, our approach can not only tackle large source documents, but also produce large results. We demonstrate with a performance study the advantages of the SPM approach. Experimental results clearly confirm that SPM improves XSLT evaluation typically 2 to 10 times better than the existing approaches. Moreover, the SPM approach also features high scalability.; Comment: It appeared on the international conference of APWeb 04. And it includes 10 pages

A Geometric Model for Information Retrieval Systems

Kim, Myung Ho
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 05/12/1999
Relevância na Pesquisa
16.18%
This decade has seen a great deal of progress in the development of information retrieval systems. Unfortunately, we still lack a systematic understanding of the behavior of the systems and their relationship with documents. In this paper we present a completely new approach towards the understanding of the information retrieval systems. Recently, it has been observed that retrieval systems in TREC 6 show some remarkable patterns in retrieving relevant documents. Based on the TREC 6 observations, we introduce a geometric linear model of information retrieval systems. We then apply the model to predict the number of relevant documents by the retrieval systems. The model is also scalable to a much larger data set. Although the model is developed based on the TREC 6 routing test data, I believe it can be readily applicable to other information retrieval systems. In Appendix, we explained a simple and efficient way of making a better system from the existing systems.; Comment: 13 pages

A Distributed Framework for Scalable Search over Encrypted Documents

Kuzu, Mehmet; Islam, Mohammad Saiful; Kantarcioglu, Murat
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 23/08/2014
Relevância na Pesquisa
26.18%
Nowadays, huge amount of documents are increasingly transferred to the remote servers due to the appealing features of cloud computing. On the other hand, privacy and security of the sensitive information in untrusted cloud environment is a big concern. To alleviate such concerns, encryption of sensitive data before its transfer to the cloud has become an important risk mitigation option. Encrypted storage provides protection at the expense of a significant increase in the data management complexity. For effective management, it is critical to provide efficient selective document retrieval capability on the encrypted collection. In fact, considerable amount of searchable symmetric encryption schemes have been designed in the literature to achieve this task. However, with the emergence of big data everywhere, available approaches are insufficient to address some crucial real-world problems such as scalability. In this study, we focus on practical aspects of a secure keyword search mechanism over encrypted data on a real cloud infrastructure. First, we propose a provably secure distributed index along with a parallelizable retrieval technique that can easily scale to big data. Second, we integrate authorization into the search scheme to limit the information leakage in multi-user setting where users are allowed to access only particular documents. Third...

A Scalable Asynchronous Distributed Algorithm for Topic Modeling

Yu, Hsiang-Fu; Hsieh, Cho-Jui; Yun, Hyokun; Vishwanathan, S. V. N; Dhillon, Inderjit S.
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 16/12/2014
Relevância na Pesquisa
26.29%
Learning meaningful topic models with massive document collections which contain millions of documents and billions of tokens is challenging because of two reasons: First, one needs to deal with a large number of topics (typically in the order of thousands). Second, one needs a scalable and efficient way of distributing the computation across multiple machines. In this paper we present a novel algorithm F+Nomad LDA which simultaneously tackles both these problems. In order to handle large number of topics we use an appropriately modified Fenwick tree. This data structure allows us to sample from a multinomial distribution over $T$ items in $O(\log T)$ time. Moreover, when topic counts change the data structure can be updated in $O(\log T)$ time. In order to distribute the computation across multiple processor we present a novel asynchronous framework inspired by the Nomad algorithm of \cite{YunYuHsietal13}. We show that F+Nomad LDA significantly outperform state-of-the-art on massive problems which involve millions of documents, billions of words, and thousands of topics.

Multi-GPU Distributed Parallel Bayesian Differential Topic Modelling

Li, Aaron Q
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 22/10/2015
Relevância na Pesquisa
16.37%
There is an explosion of data, documents, and other content, and people require tools to analyze and interpret these, tools to turn the content into information and knowledge. Topic modeling have been developed to solve these problems. Topic models such as LDA [Blei et. al. 2003] allow salient patterns in data to be extracted automatically. When analyzing texts, these patterns are called topics. Among numerous extensions of LDA, few of them can reliably analyze multiple groups of documents and extract topic similarities. Recently, the introduction of differential topic modeling (SPDP) [Chen et. al. 2012] performs uniformly better than many topic models in a discriminative setting. There is also a need to improve the sampling speed for topic models. While some effort has been made for distributed algorithms, there is no work currently done using graphical processing units (GPU). Note the GPU framework has already become the most cost-efficient platform for many problems. In this thesis, I propose and implement a scalable multi-GPU distributed parallel framework which approximates SPDP. Through experiments, I have shown my algorithms have a gain in speed of about 50 times while being almost as accurate, with only one single cheap laptop GPU. Furthermore...

Architecture of A Scalable Dynamic Parallel WebCrawler with High Speed Downloadable Capability for a Web Search Engine

Mukhopadhyay, Debajyoti; Mukherjee, Sajal; Ghosh, Soumya; Kar, Saheli; Kim, Young-Chon
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 03/02/2011
Relevância na Pesquisa
26.13%
Today World Wide Web (WWW) has become a huge ocean of information and it is growing in size everyday. Downloading even a fraction of this mammoth data is like sailing through a huge ocean and it is a challenging task indeed. In order to download a large portion of data from WWW, it has become absolutely essential to make the crawling process parallel. In this paper we offer the architecture of a dynamic parallel Web crawler, christened as "WEB-SAILOR," which presents a scalable approach based on Client-Server model to speed up the download process on behalf of a Web Search Engine in a distributed Domain-set specific environment. WEB-SAILOR removes the possibility of overlapping of downloaded documents by multiple crawlers without even incurring the cost of communication overhead among several parallel "client" crawling processes.; Comment: 6 pages, 6 figures

Scalable Text and Link Analysis with Mixed-Topic Link Models

Zhu, Yaojia; Yan, Xiaoran; Getoor, Lise; Moore, Cristopher
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 28/03/2013
Relevância na Pesquisa
26.13%
Many data sets contain rich information about objects, as well as pairwise relations between them. For instance, in networks of websites, scientific papers, and other documents, each node has content consisting of a collection of words, as well as hyperlinks or citations to other nodes. In order to perform inference on such data sets, and make predictions and recommendations, it is useful to have models that are able to capture the processes which generate the text at each node and the links between them. In this paper, we combine classic ideas in topic modeling with a variant of the mixed-membership block model recently developed in the statistical physics community. The resulting model has the advantage that its parameters, including the mixture of topics of each document and the resulting overlapping communities, can be inferred with a simple and scalable expectation-maximization algorithm. We test our model on three data sets, performing unsupervised topic classification and link prediction. For both tasks, our model outperforms several existing state-of-the-art methods, achieving higher accuracy with significantly less computation, analyzing a data set with 1.3 million words and 44 thousand links in a few minutes.; Comment: 11 pages...

Document image representation, classification and retrieval in large-scale domains

Gordo, Albert
Fonte: [Barcelona] : Universitat Autònoma de Barcelona, Publicador: [Barcelona] : Universitat Autònoma de Barcelona,
Tipo: Tesis i dissertacions electròniques; info:eu-repo/semantics/doctoralThesis Formato: application/pdf
Publicado em //2013 ENG; ENG
Relevância na Pesquisa
16.35%
Als preliminars: The research described in this book was carried out at the Computer Vision Center; A pesar del ideal de “oficina sin papeles” nacida en la década de los setenta, la mayoría de empresas siguen todavía luchando contra una ingente cantidad de documentación en papel. Aunque muchas empresas están haciendo un esfuerzo en la transformación de parte de su documentación interna a un formato digital sin necesidad de pasar por el papel, la comunicación con otras empresas y clientes en un formato puramente digital es un problema mucho más complejo debido a la escasa adopción de estándares. Las empresas reciben una gran cantidad de documentación en papel que necesita ser analizada y procesada, en su mayoría de forma manual. Una solución para esta tarea consiste en, en primer lugar, el escaneo automático de los documentos entrantes. A continuación, las imágenes de los documentos puede ser analizadas y la información puede ser extraida a partir de los datos. Los documentos también pueden ser automáticamente enviados a los flujos de trabajo adecuados, usados para buscar documentos similares en bases de datos para transferir información, etc. Debido a la naturaleza de esta “sala de correo” digital, es necesario que los métodos de representación de documentos sean generales...

Segmentation et indexation d'objets complexes dans les images de bandes déssinées

Rigaud, Christophe; Karatzas, Dimosthenis; Ogier, Jean-Marc
Fonte: [Barcelona] : Universitat Autònoma de Barcelona, Publicador: [Barcelona] : Universitat Autònoma de Barcelona,
Tipo: Tesis i dissertacions electròniques; info:eu-repo/semantics/doctoralThesis; info:eu-repo/semantics/publishedVersion Formato: application/pdf
Publicado em //2015 ENG
Relevância na Pesquisa
26.1%
Nacido en el siglo 19, los historietas se utilizan para la expresión de ideas a través de secuencias de imágenes, a menudo en combinación con el texto y los gráficos. El cómic esta considerado como un noveno arte, arte secuencial, salida con los avances en la impresión y la Internet en todo el mundo en periódicos, libros y revistas. Hoy en día, el creciente desarrollo de las nuevas tecnologías y la World Wide Web (el lienzo Internet) da lugar a nuevas formas de expresión que lleva el papel a disfrutar de la libertad del mundo virtual. Sin embargo, el cómic tradicional persiste y es un patrimonio cultural importante en muchos países. A diferencia de la música, el cine o la literatura clásica, que aún no ha encontrado son homólogos en el mundo digital. El uso de tecnologías de la información y de las telecomunicaciones podría facilitar la exploración de bibliotecas en línea, la traducción y acelerar su permiso de exportación a la mayor lectura (enriquecimiento de los contenidos durante la reproducción, a la carta y personalizado ) o permitir la escucha de texto y efectos de sonido para los estudiantes con discapacidad visual o allumnos. Agencias de la preservación del patrimonio cultural como CIBDI en Angouleme (Centro Internacional del Cómic y de imagen)...