Página 1 dos resultados de 230 itens digitais encontrados em 0.006 segundos

Representação de coleções de documentos textuais por meio de regras de associação; Representation of textual document collections through association rules

Rossi, Rafael Geraldeli
Fonte: Biblioteca Digitais de Teses e Dissertações da USP Publicador: Biblioteca Digitais de Teses e Dissertações da USP
Tipo: Dissertação de Mestrado Formato: application/pdf
Publicado em 16/08/2011 PT
Relevância na Pesquisa
26.16%
O número de documentos textuais disponíveis em formato digital tem aumentado incessantemente. Técnicas de Mineração de Textos são cada vez mais utilizadas para organizar e extrair conhecimento de grandes coleções de documentos textuais. Para o uso dessas técnicas é necessário que os documentos textuais estejam representados em um formato apropriado. A maioria das pesquisas de Mineração de Textos utiliza a abordagem bag-of-words para representar os documentos da coleção. Essa representação usa cada palavra presente na coleção de documentos como possível atributo, ignorando a ordem das palavras, informa ções de pontuação ou estruturais, e é caracterizada pela alta dimensionalidade e por dados esparsos. Por outro lado, a maioria dos conceitos são compostos por mais de uma palavra, como Inteligência Articial, Rede Neural, e Mineração de Textos. As abordagens que geram atributos compostos por mais de uma palavra apresentam outros problemas além dos apresentados pela representação bag-of-words, como a geração de atributos com pouco signicado e uma dimensionalidade muito maior. Neste projeto de mestrado foi proposta uma abordagem para representar documentos textuais nomeada bag-of-related-words. A abordagem proposta gera atributos compostos por palavras relacionadas com o uso de regras de associação. Com as regras de associação...

Using a Statistical Natural Language Parser Augmented with the UMLS Specialist Lexicon to Assign SNOMED CT Codes to Anatomic Sites and Pathologic Diagnoses in Full Text Pathology Reports

Lowe, Henry J.; Huang, Yang; Regula, Donald P.
Fonte: American Medical Informatics Association Publicador: American Medical Informatics Association
Tipo: Artigo de Revista Científica
EN
Relevância na Pesquisa
26.02%
To address the problem of extracting structured information from pathology reports for research purposes in the STRIDE Clinical Data Warehouse, we adapted the ChartIndex Medical Language Processing system to automatically identify and map anatomic and diagnostic noun phrases found in full-text pathology reports to SNOMED CT concept descriptors. An evaluation of the system’s performance showed a positive predictive value for anatomic concepts of 92.3% and positive predictive value for diagnostic concepts of 84.4%. The experiment also suggested strategies for improving ChartIndex’s performance coding pathology reports.

Spelling is Just a Click Away – A User-Centered Brain–Computer Interface Including Auto-Calibration and Predictive Text Entry

Kaufmann, Tobias; Völker, Stefan; Gunesch, Laura; Kübler, Andrea
Fonte: Frontiers Research Foundation Publicador: Frontiers Research Foundation
Tipo: Artigo de Revista Científica
Publicado em 23/05/2012 EN
Relevância na Pesquisa
46.29%
Brain–computer interfaces (BCI) based on event-related potentials (ERP) allow for selection of characters from a visually presented character-matrix and thus provide a communication channel for users with neurodegenerative disease. Although they have been topic of research for more than 20 years and were multiply proven to be a reliable communication method, BCIs are almost exclusively used in experimental settings, handled by qualified experts. This study investigates if ERP–BCIs can be handled independently by laymen without expert support, which is inevitable for establishing BCIs in end-user’s daily life situations. Furthermore we compared the classic character-by-character text entry against a predictive text entry (PTE) that directly incorporates predictive text into the character-matrix. N = 19 BCI novices handled a user-centered ERP–BCI application on their own without expert support. The software individually adjusted classifier weights and control parameters in the background, invisible to the user (auto-calibration). All participants were able to operate the software on their own and to twice correctly spell a sentence with the auto-calibrated classifier (once with PTE, once without). Our PTE increased spelling speed and...

Text Mining Driven Drug-Drug Interaction Detection

Yan, Su; Jiang, Xiaoqian; Chen, Ying
Fonte: PubMed Publicador: PubMed
Tipo: Artigo de Revista Científica
Publicado em //2013 EN
Relevância na Pesquisa
25.98%
Identifying drug-drug interactions is an important and challenging problem in computational biology and healthcare research. There are accurate, structured but limited domain knowledge and noisy, unstructured but abundant textual information available for building predictive models. The difficulty lies in mining the true patterns embedded in text data and developing efficient and effective ways to combine heterogenous types of information. We demonstrate a novel approach of leveraging augmented text-mining features to build a logistic regression model with improved prediction performance (in terms of discrimination and calibration). Our model based on synthesized features significantly outperforms the model trained with only structured features (AUC: 96% vs. 91%, Sensitivity: 90% vs. 82% and Specificity: 88% vs. 81%). Along with the quantitative results, we also show learned “latent topics”, an intermediary result of our text mining module, and discuss their implications.

Applying a Novel Combination of Techniques to Develop a Predictive Model for Diabetes Complications

Sangi, Mohsen; Win, Khin Than; Shirvani, Farid; Namazi-Rad, Mohammad-Reza; Shukla, Nagesh
Fonte: Public Library of Science Publicador: Public Library of Science
Tipo: Artigo de Revista Científica
Publicado em 22/04/2015 EN
Relevância na Pesquisa
26.02%
Among the many related issues of diabetes management, its complications constitute the main part of the heavy burden of this disease. The aim of this paper is to develop a risk advisor model to predict the chances of diabetes complications according to the changes in risk factors. As the starting point, an inclusive list of (k) diabetes complications and (n) their correlated predisposing factors are derived from the existing endocrinology text books. A type of data meta-analysis has been done to extract and combine the numeric value of the relationships between these two. The whole n (risk factors) - k (complications) model was broken down into k different (n-1) relationships and these (n-1) dependencies were broken into n (1-1) models. Applying regression analysis (seven patterns) and artificial neural networks (ANN), we created models to show the (1-1) correspondence between factors and complications. Then all 1-1 models related to an individual complication were integrated using the naïve Bayes theorem. Finally, a Bayesian belief network was developed to show the influence of all risk factors and complications on each other. We assessed the predictive power of the 1-1 models by R2, F-ratio and adjusted R2 equations; sensitivity...

Extracting Concepts Related to Homelessness from the Free Text of VA Electronic Medical Records

Gundlapalli, Adi V.; Carter, Marjorie E.; Divita, Guy; Shen, Shuying; Palmer, Miland; South, Brett; Durgahee, B.S. Begum; Redd, Andrew; Samore, Matthew
Fonte: American Medical Informatics Association Publicador: American Medical Informatics Association
Tipo: Artigo de Revista Científica
Publicado em 14/11/2014 EN
Relevância na Pesquisa
25.98%
Mining the free text of electronic medical records (EMR) using natural language processing (NLP) is an effective method of extracting information not always captured in administrative data. We sought to determine if concepts related to homelessness, a non-medical condition, were amenable to extraction from the EMR of Veterans Affairs (VA) medical records. As there were no off-the-shelf products, a lexicon of terms related to homelessness was created. A corpus of free text documents from outpatient encounters was reviewed to create the reference standard for NLP training and testing. V3NLP Framework was used to detect instances of lexical terms and was compared to the reference standard. With a positive predictive value of 77% for extracting relevant concepts, this study demonstrates the feasibility of extracting positively asserted concepts related to homelessness from the free text of medical records.

Statistical Machine Learning for Text Mining with Markov Chain Monte Carlo Inference

Drummond, Anna
Fonte: Universidade Rice Publicador: Universidade Rice
ENG
Relevância na Pesquisa
36.02%
This work concentrates on mining textual data. In particular, I apply Statistical Machine Learning to document clustering, predictive modeling, and document classification tasks undertaken in three different application domains. I have designed novel statistical Bayesian models for each application domain, as well as derived Markov Chain Monte Carlo (MCMC) algorithms for the model inference. First, I investigate the usefulness of using topic models, such as the popular Latent Dirichlet Allocation (LDA) and its extensions, as a pre-processing feature selection step for unsupervised document clustering. Documents are clustered using the pro- portion of the various topics that are present in each document; the topic proportion vectors are then used as an input to an unsupervised clustering algorithm. I analyze two approaches to topic model design utilized in the pre-processing step: (1) A traditional topic model, such as LDA (2) A novel topic model integrating a discrete mixture to simultaneously learn the clustering structure and the topic model that is conducive to the learned structure. I propose two variants of the second approach, one of which is experimentally found to be the best option. Given that clustering is one of the most common data mining tasks...

Modelling Deception Detection in Text

Gupta, Smita
Fonte: Quens University Publicador: Quens University
Tipo: Tese de Doutorado Formato: 12133825 bytes; application/pdf
EN; EN
Relevância na Pesquisa
35.94%
As organizations and government agencies work diligently to detect financial irregularities, malfeasance, fraud and criminal activities through intercepted communication, there is an increasing interest in devising an automated model/tool for deception detection. We build on Pennebaker's empirical model which suggests that deception in text leaves a linguistic signature characterised by changes in frequency of four categories of words: first-person pronouns, exclusive words, negative emotion words, and action words. By applying the model to the Enron email dataset and using an unsupervised matrix-decomposition technique, we explore the differential use of these cue-words/categories in deception detection. Instead of focusing on the predictive power of the individual cue-words, we construct a descriptive model which helps us to understand the multivariate profile of deception based on several linguistic dimensions and highlights the qualitative differences between deceptive and truthful communication. This descriptive model can not only help detect unusual and deceptive communication, but also possibly rank messages along a scale of relative deceptiveness (for instance from strategic negotiation and spin to deception and blatant lying). The model is unintrusive...

Reading a Suspenseful Literary Text Activates Brain Areas Related to Social Cognition and Predictive Inference

Lehne, Moritz; Engel, Philipp; Rohrmeier, Martin; Menninghaus, Winfried; Jacobs, Arthur M.; Koelsch, Stefan
Fonte: Public Library of Science Publicador: Public Library of Science
Tipo: Artigo de Revista Científica
Publicado em 06/05/2015 EN
Relevância na Pesquisa
26.13%
Stories can elicit powerful emotions. A key emotional response to narrative plots (e.g., novels, movies, etc.) is suspense. Suspense appears to build on basic aspects of human cognition such as processes of expectation, anticipation, and prediction. However, the neural processes underlying emotional experiences of suspense have not been previously investigated. We acquired functional magnetic resonance imaging (fMRI) data while participants read a suspenseful literary text (E.T.A. Hoffmann's “The Sandman”) subdivided into short text passages. Individual ratings of experienced suspense obtained after each text passage were found to be related to activation in the medial frontal cortex, bilateral frontal regions (along the inferior frontal sulcus), lateral premotor cortex, as well as posterior temporal and temporo-parietal areas. The results indicate that the emotional experience of suspense depends on brain areas associated with social cognition and predictive inference.

A Novel 9-Class Auditory ERP Paradigm Driving a Predictive Text Entry System

Höhne, Johannes; Schreuder, Martijn; Blankertz, Benjamin; Tangermann, Michael
Fonte: Frontiers Research Foundation Publicador: Frontiers Research Foundation
Tipo: Artigo de Revista Científica
Publicado em 22/08/2011 EN
Relevância na Pesquisa
46.09%
Brain–computer interfaces (BCIs) based on event related potentials (ERPs) strive for offering communication pathways which are independent of muscle activity. While most visual ERP-based BCI paradigms require good control of the user's gaze direction, auditory BCI paradigms overcome this restriction. The present work proposes a novel approach using auditory evoked potentials for the example of a multiclass text spelling application. To control the ERP speller, BCI users focus their attention to two-dimensional auditory stimuli that vary in both, pitch (high/medium/low) and direction (left/middle/right) and that are presented via headphones. The resulting nine different control signals are exploited to drive a predictive text entry system. It enables the user to spell a letter by a single nine-class decision plus two additional decisions to confirm a spelled word. This paradigm – called PASS2D – was investigated in an online study with 12 healthy participants. Users spelled with more than 0.8 characters per minute on average (3.4 bits/min) which makes PASS2D a competitive method. It could enrich the toolbox of existing ERP paradigms for BCI end users like people with amyotrophic lateral sclerosis disease in a late stage.

Architecture of a Web-based Predictive Editor for Controlled Natural Language Processing

Guy, Stephen; Schwitter, Rolf
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 26/06/2014
Relevância na Pesquisa
36.21%
In this paper, we describe the architecture of a web-based predictive text editor being developed for the controlled natural language PENG$^{ASP)$. This controlled language can be used to write non-monotonic specifications that have the same expressive power as Answer Set Programs. In order to support the writing process of these specifications, the predictive text editor communicates asynchronously with the controlled natural language processor that generates lookahead categories and additional auxiliary information for the author of a specification text. The text editor can display multiple sets of lookahead categories simultaneously for different possible sentence completions, anaphoric expressions, and supports the addition of new content words to the lexicon.

A predictive coding account of OCD

Moore, P. J.
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
25.97%
This paper presents a predictive coding account of obsessive-compulsive disorder (OCD). We extend the predictive coding model to include the concept of a 'formal narrative', or temporal sequence of cognitive states inferred from sense data. We propose that human cognition uses a hierarchy of narratives to predict changes in the natural and social environment. Each layer in the hierarchy represents a distinct view of the world, but it also contributes to a global unitary perspective. We suggest that the global perspective remains intact in OCD but there is a dysfunction at a sub-linguistic level of cognition. The consequent failure of recognition is experienced as the external world being 'not just right', and its automatic correction is felt as compulsion. A wide variety of symptoms and some neuropsychological findings are thus explained by a single dysfunction. We conclude that the model provides a deeper explanation for behavioural observations than current models, and that it has potential for further development for application to neuropsychological data.; Comment: arXiv admin note: substantial text overlap with arXiv:1503.00999

Adaptive Model Predictive Control of a Batch Solution Polymerization Process using Trajectory Linearization

Abbaszadeh, Masoud
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 12/10/2015
Relevância na Pesquisa
25.97%
A sequential trajectory linearized adaptive model based predictive controller is designed using the DMC algorithm to control the temperature of a batch MMA polymerization process. Using the mechanistic model of the polymerization, a parametric transfer function is derived to relate the reactor temperature to the power of the heaters. Then, a multiple model predictive control approach is taken in to track a desired temperature trajectory.The coefficients of the multiple transfer functions are calculated along the selected temperature trajectory by sequential linearization and the model is validated experimentally. The controller performance is studied on a small scale batch reactor.; Comment: 12 pages, 5 figures. arXiv admin note: text overlap with arXiv:1502.04266

Predictive Non-equilibrium Social Science

Colbaugh, Richard; Glass, Kristin; Johnson, Curtis
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 30/12/2012
Relevância na Pesquisa
26.07%
Non-Equilibrium Social Science (NESS) emphasizes dynamical phenomena, for instance the way political movements emerge or competing organizations interact. This paper argues that predictive analysis is an essential element of NESS, occupying a central role in its scientific inquiry and representing a key activity of practitioners in domains such as economics, public policy, and national security. We begin by clarifying the distinction between models which are useful for prediction and the much more common explanatory models studied in the social sciences. We then investigate a challenging real-world predictive analysis case study, and find evidence that the poor performance of standard prediction methods does not indicate an absence of human predictability but instead reflects (1.) incorrect assumptions concerning the predictive utility of explanatory models, (2.) misunderstanding regarding which features of social dynamics actually possess predictive power, and (3.) practical difficulties exploiting predictive representations.; Comment: arXiv admin note: substantial text overlap with arXiv:1212.6806

Visual and Predictive Analytics on Singapore News: Experiments on GDELT, Wikipedia, and ^STI

Phua, Clifton; Feng, Yuzhang; Ji, Junyao; Soh, Timothy
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 07/04/2014
Relevância na Pesquisa
26.1%
The open-source Global Database of Events, Language, and Tone (GDELT) is the most comprehensive and updated Big Data source of important terms extracted from international news articles . We focus only on GDELT's Singapore events to better understand the data quality of its news articles, accuracy of its term extraction, and potential for prediction. To test news completeness and validity, we visually compared GDELT (Singapore news articles' terms from 1979 to 2013) to Wikipedia's timeline of Singaporean history. To test term extraction accuracy, we visually compared GDELT (CAMEO codes and TABARI system of extraction from Singapore news articles' text from April to December 2013) to SAS Text Miner's term and topic extraction. To perform predictive analytics, we propose a novel feature engineering method to transform row-level GDELT from articles to a user-specified temporal resolution. For example, we apply a decision tree using daily counts of feature values from GDELT to predict Singapore stock market's Straits Times Index (^STI). Of practical interest from the above results is SAS Visual Analytics' ability to highlight the various impacts of June 2013 Southeast Asian haze and December 2013 Little India riot on Singapore. Although Singapore is unique as a sovereign city-state...

PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks

Tang, Jian; Qu, Meng; Mei, Qiaozhu
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 02/08/2015
Relevância na Pesquisa
46.39%
Unsupervised text embedding methods, such as Skip-gram and Paragraph Vector, have been attracting increasing attention due to their simplicity, scalability, and effectiveness. However, comparing to sophisticated deep learning architectures such as convolutional neural networks, these methods usually yield inferior results when applied to particular machine learning tasks. One possible reason is that these text embedding methods learn the representation of text in a fully unsupervised way, without leveraging the labeled information available for the task. Although the low dimensional representations learned are applicable to many different tasks, they are not particularly tuned for any task. In this paper, we fill this gap by proposing a semi-supervised representation learning method for text data, which we call the \textit{predictive text embedding} (PTE). Predictive text embedding utilizes both labeled and unlabeled data to learn the embedding of text. The labeled information and different levels of word co-occurrence information are first represented as a large-scale heterogeneous text network, which is then embedded into a low dimensional space through a principled and efficient algorithm. This low dimensional embedding not only preserves the semantic closeness of words and documents...

Sparse preconditioning for model predictive control

Knyazev, Andrew; Malyshev, Alexander
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 01/12/2015
Relevância na Pesquisa
25.97%
We propose fast O(N) preconditioning, where N is the number of gridpoints on the prediction horizon, for iterative solution of (non)-linear systems appearing in model predictive control methods such as forward-difference Newton-Krylov methods. The Continuation/GMRES method for nonlinear model predictive control, suggested by T. Ohtsuka in 2004, is a specific application of the Newton-Krylov method, which uses the GMRES iterative algorithm to solve a forward difference approximation of the optimality equations on every time step.; Comment: 6 pages, 5 figures, submitted to the 2016 American Control Conference, July 6-8, Boston, MA, USA. arXiv admin note: text overlap with arXiv:1509.02861

On the Predictive Properties of Binary Link Functions

Gunduz, Necla; Fokoue, Ernest
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 16/02/2015
Relevância na Pesquisa
26.02%
This paper provides a theoretical and computational justification of the long held claim that of the similarity of the probit and logit link functions often used in binary classification. Despite this widespread recognition of the strong similarities between these two link functions, very few (if any) researchers have dedicated time to carry out a formal study aimed at establishing and characterizing firmly all the aspects of the similarities and differences. This paper proposes a definition of both structural and predictive equivalence of link functions-based binary regression models, and explores the various ways in which they are either similar or dissimilar. From a predictive analytics perspective, it turns out that not only are probit and logit perfectly predictively concordant, but the other link functions like cauchit and complementary log log enjoy very high percentage of predictive equivalence. Throughout this paper, simulated and real life examples demonstrate all the equivalence results that we prove theoretically.; Comment: 17 pages, 10 figures. arXiv admin note: text overlap with arXiv:math-ph/0607066 by other authors

Predicting Abnormal Returns From News Using Text Classification

Luss, Ronny; d'Aspremont, Alexandre
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
26.01%
We show how text from news articles can be used to predict intraday price movements of financial assets using support vector machines. Multiple kernel learning is used to combine equity returns with text as predictive features to increase classification performance and we develop an analytic center cutting plane method to solve the kernel learning problem efficiently. We observe that while the direction of returns is not predictable using either text or returns, their size is, with text features producing significantly better performance than historical returns alone.; Comment: Larger data sets, results on time of day effect, and use of delta hedged covered call options to trade on daily predictions

Building a predictive modeling system for sentence classification: a case study using tardive dyskinesia

Bi, Xia
Fonte: University of Delaware Publicador: University of Delaware
Tipo: Tese de Doutorado
Relevância na Pesquisa
26.02%
Wu, Cathy H.; Advances in computational and biological methods have greatly accelerated the pace of scientific discovery and produced a tremendous amount of experimental and computational data in the biomedical domain. Given the wealth of information that are available both in scientific papers and electronic databases, one particular challenge in biomedicine is to detect disease-drug associations and to organize them in a meaningful way that will accelerate pharmacogenetic research. Several text mining tools have been developed to facilitate this purpose. They perform adequately well in identifying facts and entities using on-the-fly search of scientific articles from many different databases; however, they cannot analyze the type of relationship that exist between the objects identified. In this thesis, we propose a novel method to analyze drug-disease relationships using a combination of in-house and open-source tools that exploit the Multinomial Naïve Bayes (MNB) modeling technique. The main motivation behind this thesis work is to assist researchers to quickly identify disease-drug relationships from the biomedical literature using the case study of tardive dyskinesia (TD) and to classify those relationships into specific categories to enable better understanding of various drug effects. We have manually developed and annotated a biomedical training corpus for TD via sentence classification. Using the MNB modeling technique...