Página 1 dos resultados de 1272 itens digitais encontrados em 0.011 segundos

Reconhecimento de texto e rastreamento de objetos 2D/3D; Text recognition and 2D/3D object tracking

Rodrigo Minetto
Fonte: Biblioteca Digital da Unicamp Publicador: Biblioteca Digital da Unicamp
Tipo: Tese de Doutorado Formato: application/pdf
Publicado em 19/03/2012 PT
Relevância na Pesquisa
56.23%
Nesta tese abordamos três problemas de visão computacional: (1) detecção e reconhecimento de objetos de texto planos em imagens de cenas reais; (2) rastreamento destes objetos de texto em vídeos digitais; e (3) o rastreamento de um objeto tridimensional rígido arbitrário com marcas conhecidas em um vídeo digital. Nós desenvolvemos, para cada um dos problemas, algoritmos inovadores, que são pelo menos tão precisos e robustos quanto outros algoritmos estado-da-arte. Especificamente, para reconhecimento de texto nós desenvolvemos (e validamos extensivamente) um novo descritor de imagem baseado em HOG especializado para escrita romana, que denominamos T-HOG, e mostramos sua contribuição como um filtro em um detector de texto (SNOOPERTEXT). Nós também melhoramos o algoritmo SNOOPERTEXT através do uso da técnica multiescala para tratar caracteres de tamanhos bastante variados e limitar a sensibilidade do algoritmo a vários artefatos. Para rastreamento de texto, nós descrevemos quatro estratégias básicas para combinar a detecção e o rastreamento de texto, e desenvolvemos também um rastreador específico baseado em filtro de partículas que explora o uso do reconhecedor T-HOG. Para o rastreamento de objetos rígidos...

Scene Text Recognition using Similarity and a Lexicon with Sparse Belief Propagation

Weinman, Jerod J.; Learned-Miller, Erik; Hanson, Allen R.
Fonte: PubMed Publicador: PubMed
Tipo: Artigo de Revista Científica
Publicado em /10/2009 EN
Relevância na Pesquisa
46.23%
Scene text recognition (STR) is the recognition of text anywhere in the environment, such as signs and store fronts. Relative to document recognition, it is challenging because of font variability, minimal language context, and uncontrolled conditions. Much information available to solve this problem is frequently ignored or used sequentially. Similarity between character images is often overlooked as useful information. Because of language priors, a recognizer may assign different labels to identical characters. Directly comparing characters to each other, rather than only a model, helps ensure that similar instances receive the same label. Lexicons improve recognition accuracy but are used post hoc. We introduce a probabilistic model for STR that integrates similarity, language properties, and lexical decision. Inference is accelerated with sparse belief propagation, a bottom-up method for shortening messages by reducing the dependency between weakly supported hypotheses. By fusing information sources in one model, we eliminate unrecoverable errors that result from sequential processing, improving accuracy. In experimental results recognizing text from images of signs in outdoor scenes, incorporating similarity reduces character recognition error by 19%...

Neural network based off-line handwritten text recognition system

Han, Changan
Fonte: FIU Digital Commons Publicador: FIU Digital Commons
Tipo: Artigo de Revista Científica
EN
Relevância na Pesquisa
46.12%
This dissertation introduces a new system for handwritten text recognition based on an improved neural network design. Most of the existing neural networks treat mean square error function as the standard error function. The system as proposed in this dissertation utilizes the mean quartic error function, where the third and fourth derivatives are non-zero. ^ Consequently, many improvements on the training methods were achieved. The training results are carefully assessed before and after the update. To evaluate the performance of a training system, there are three essential factors to be considered, and they are from high to low importance priority: (1) error rate on testing set, (2) processing time needed to recognize a segmented character and (3) the total training time and subsequently the total testing time. It is observed that bounded training methods accelerate the training process, while semi-third order training methods, next-minimal training methods, and preprocessing operations reduce the error rate on the testing set. Empirical observations suggest that two combinations of training methods are needed for different case character recognition. ^ Since character segmentation is required for word and sentence recognition...

Neural Network Based Off-line Handwritten Text Recognition System

Han, Changan
Fonte: FIU Digital Commons Publicador: FIU Digital Commons
Tipo: Artigo de Revista Científica Formato: application/pdf
Relevância na Pesquisa
56.13%
This dissertation introduces a new system for handwritten text recognition based on an improved neural network design. Most of the existing neural networks treat mean square error function as the standard error function. The system as proposed in this dissertation utilizes the mean quartic error function, where the third and fourth derivatives are non-zero. Consequently, many improvements on the training methods were achieved. The training results are carefully assessed before and after the update. To evaluate the performance of a training system, there are three essential factors to be considered, and they are from high to low importance priority: 1) error rate on testing set, 2) processing time needed to recognize a segmented character and 3) the total training time and subsequently the total testing time. It is observed that bounded training methods accelerate the training process, while semi-third order training methods, next-minimal training methods, and preprocessing operations reduce the error rate on the testing set. Empirical observations suggest that two combinations of training methods are needed for different case character recognition. Since character segmentation is required for word and sentence recognition, this dissertation provides also an effective rule-based segmentation method...

Bangla Text Recognition from Video Sequence: A New Focus

Bhowmick, Souvik; Banerjee, Purnendu
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 06/01/2014
Relevância na Pesquisa
46.25%
Extraction and recognition of Bangla text from video frame images is challenging due to complex color background, low-resolution etc. In this paper, we propose an algorithm for extraction and recognition of Bangla text form such video frames with complex background. Here, a two-step approach has been proposed. First, the text line is segmented into words using information based on line contours. First order gradient value of the text blocks are used to find the word gap. Next, a local binarization technique is applied on each word and text line is reconstructed using those words. Secondly, this binarized text block is sent to OCR for recognition purpose.

Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

Goodfellow, Ian J.; Bulatov, Yaroslav; Ibarz, Julian; Arnoud, Sacha; Shet, Vinay
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
46.19%
Recognizing arbitrary multi-character text in unconstrained natural photographs is a hard problem. In this paper, we address an equally hard sub-problem in this domain viz. recognizing arbitrary multi-digit numbers from Street View imagery. Traditional approaches to solve this problem typically separate out the localization, segmentation, and recognition steps. In this paper we propose a unified approach that integrates these three steps via the use of a deep convolutional neural network that operates directly on the image pixels. We employ the DistBelief implementation of deep neural networks in order to train large, distributed neural networks on high quality images. We find that the performance of this approach increases with the depth of the convolutional network, with the best performance occurring in the deepest architecture we trained, with eleven hidden layers. We evaluate this approach on the publicly available SVHN dataset and achieve over $96\%$ accuracy in recognizing complete street numbers. We show that on a per-digit recognition task, we improve upon the state-of-the-art, achieving $97.84\%$ accuracy. We also evaluate this approach on an even more challenging dataset generated from Street View imagery containing several tens of millions of street number annotations and achieve over $90\%$ accuracy. To further explore the applicability of the proposed system to broader text recognition tasks...

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

Shi, Baoguang; Bai, Xiang; Yao, Cong
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 21/07/2015
Relevância na Pesquisa
56.28%
Image-based sequence recognition has been a long-standing research topic in computer vision. In this paper, we investigate the problem of scene text recognition, which is among the most important and challenging tasks in image-based sequence recognition. A novel neural network architecture, which integrates feature extraction, sequence modeling and transcription into a unified framework, is proposed. Compared with previous systems for scene text recognition, the proposed architecture possesses four distinctive properties: (1) It is end-to-end trainable, in contrast to most of the existing algorithms whose components are separately trained and tuned. (2) It naturally handles sequences in arbitrary lengths, involving no character segmentation or horizontal scale normalization. (3) It is not confined to any predefined lexicon and achieves remarkable performances in both lexicon-free and lexicon-based scene text recognition tasks. (4) It generates an effective yet much smaller model, which is more practical for real-world application scenarios. The experiments on standard benchmarks, including the IIIT-5K, Street View Text and ICDAR datasets, demonstrate the superiority of the proposed algorithm over the prior arts. Moreover, the proposed algorithm performs well in the task of image-based music score recognition...

Similarity-based Text Recognition by Deeply Supervised Siamese Network

Hosseini-Asl, Ehsan; Guha, Angshuman
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
56.15%
In this paper, we propose a new text recognition model based on measuring the visual similarity of text and predicting the content of unlabeled texts. First a Siamese network is trained with deep supervision on a labeled training dataset. This network projects texts into a similarity manifold. The Deeply Supervised Siamese network learns visual similarity of texts. Then a K-nearest neighbor classifier is used to predict unlabeled text based on similarity distance to labeled texts. The performance of the model is evaluated on three datasets of machine-print and hand-written text combined. We demonstrate that the model reduces the cost of human estimation by $50\%-85\%$. The error of the system is less than $0.5\%$. The results also demonstrate that the predicted labels are sometimes better than human labels e.g. spelling correction.; Comment: Under review as a conference paper at ICLR 2016

Reading Ancient Coin Legends: Object Recognition vs. OCR

Kavelar, Albert; Zambanini, Sebastian; Kampel, Martin
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 26/04/2013
Relevância na Pesquisa
46.18%
Standard OCR is a well-researched topic of computer vision and can be considered solved for machine-printed text. However, when applied to unconstrained images, the recognition rates drop drastically. Therefore, the employment of object recognition-based techniques has become state of the art in scene text recognition applications. This paper presents a scene text recognition method tailored to ancient coin legends and compares the results achieved in character and word recognition experiments to a standard OCR engine. The conducted experiments show that the proposed method outperforms the standard OCR engine on a set of 180 cropped coin legend words.; Comment: Part of the OAGM/AAPR 2013 proceedings (arXiv:1304.1876)

Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition

Jaderberg, Max; Simonyan, Karen; Vedaldi, Andrea; Zisserman, Andrew
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
56.23%
In this work we present a framework for the recognition of natural scene text. Our framework does not require any human-labelled data, and performs word recognition on the whole image holistically, departing from the character based recognition systems of the past. The deep neural network models at the centre of this framework are trained solely on data produced by a synthetic text generation engine -- synthetic data that is highly realistic and sufficient to replace real data, giving us infinite amounts of training data. This excess of data exposes new possibilities for word recognition models, and here we consider three models, each one "reading" words in a different way: via 90k-way dictionary encoding, character sequence encoding, and bag-of-N-grams encoding. In the scenarios of language based and completely unconstrained text recognition we greatly improve upon state-of-the-art performance on standard datasets, using our fast, simple machinery and requiring zero data-acquisition costs.

Deep Structured Output Learning for Unconstrained Text Recognition

Jaderberg, Max; Simonyan, Karen; Vedaldi, Andrea; Zisserman, Andrew
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
56.15%
We develop a representation suitable for the unconstrained recognition of words in natural images: the general case of no fixed lexicon and unknown length. To this end we propose a convolutional neural network (CNN) based architecture which incorporates a Conditional Random Field (CRF) graphical model, taking the whole word image as a single input. The unaries of the CRF are provided by a CNN that predicts characters at each position of the output, while higher order terms are provided by another CNN that detects the presence of N-grams. We show that this entire model (CRF, character predictor, N-gram predictor) can be jointly optimised by back-propagating the structured output loss, essentially requiring the system to perform multi-task learning, and training uses purely synthetically generated data. The resulting model is a more accurate system on standard real-world text recognition benchmarks than character prediction alone, setting a benchmark for systems that have not been trained on a particular lexicon. In addition, our model achieves state-of-the-art accuracy in lexicon-constrained scenarios, without being specifically modelled for constrained recognition. To test the generalisation of our model, we also perform experiments with random alpha-numeric strings to evaluate the method when no visual language model is applicable.; Comment: arXiv admin note: text overlap with arXiv:1406.2227

A hypothesize-and-verify framework for Text Recognition using Deep Recurrent Neural Networks

Ray, Anupama; Rajeswar, Sai; Chaudhury, Santanu
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 26/02/2015
Relevância na Pesquisa
56.21%
Deep LSTM is an ideal candidate for text recognition. However text recognition involves some initial image processing steps like segmentation of lines and words which can induce error to the recognition system. Without segmentation, learning very long range context is difficult and becomes computationally intractable. Therefore, alternative soft decisions are needed at the pre-processing level. This paper proposes a hybrid text recognizer using a deep recurrent neural network with multiple layers of abstraction and long range context along with a language model to verify the performance of the deep neural network. In this paper we construct a multi-hypotheses tree architecture with candidate segments of line sequences from different segmentation algorithms at its different branches. The deep neural network is trained on perfectly segmented data and tests each of the candidate segments, generating unicode sequences. In the verification step, these unicode sequences are validated using a sub-string match with the language model and best first search is used to find the best possible combination of alternative hypothesis from the tree structure. Thus the verification framework using language models eliminates wrong segmentation outputs and filters recognition errors.

End-to-End Text Recognition with Hybrid HMM Maxout Models

Alsharif, Ouais; Pineau, Joelle
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 07/10/2013
Relevância na Pesquisa
56.17%
The problem of detecting and recognizing text in natural scenes has proved to be more challenging than its counterpart in documents, with most of the previous work focusing on a single part of the problem. In this work, we propose new solutions to the character and word recognition problems and then show how to combine these solutions in an end-to-end text-recognition system. We do so by leveraging the recently introduced Maxout networks along with hybrid HMM models that have proven useful for voice recognition. Using these elements, we build a tunable and highly accurate recognition system that beats state-of-the-art results on all the sub-problems for both the ICDAR 2003 and SVT benchmark datasets.; Comment: 9 pages, 7 figures

Arabic Text Recognition in Video Sequences

Halima, M. Ben; Karray, H.; Alimi, A. M.
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 14/08/2013
Relevância na Pesquisa
46.21%
In this paper, we propose a robust approach for text extraction and recognition from Arabic news video sequence. The text included in video sequences is an important needful for indexing and searching system. However, this text is difficult to detect and recognize because of the variability of its size, their low resolution characters and the complexity of the backgrounds. To solve these problems, we propose a system performing in two main tasks: extraction and recognition of text. Our system is tested on a varied database composed of different Arabic news programs and the obtained results are encouraging and show the merits of our approach.; Comment: 10 pages - International Journal of Computational Linguistics Research. arXiv admin note: substantial text overlap with arXiv:1211.2150

Text recognition in both ancient and cartographic documents

Zaghden, Nizar; Khelifi, Badreddine; Alimi, Adel M.; Mullot, Remy
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 28/08/2013
Relevância na Pesquisa
46.12%
This paper deals with the recognition and matching of text in both cartographic maps and ancient documents. The purpose of this work is to find similar text regions based on statistical and global features. A phase of normalization is done first, in object to well categorize the same quantity of information. A phase of wordspotting is done next by combining local and global features. We make different experiments by combining the different techniques of extracting features in order to obtain better results in recognition phase. We applied fontspotting on both ancient documents and cartographic ones. We also applied the wordspotting in which we adopted a new technique which tries to compare the images of character and not the entire images words. We present the precision and recall values obtained with three methods for the new method of wordspotting applied on characters only.; Comment: 4 pages

Desarrollo de una aplicación móvil para la detección de texto en imágenes TDetect

Saura Lopez, Macari
Fonte: Universidade Autônoma de Barcelona Publicador: Universidade Autônoma de Barcelona
Tipo: info:eu-repo/semantics/bachelorThesis; Text Formato: application/pdf
Publicado em 12/02/2015 SPA
Relevância na Pesquisa
46.16%
En la actualidad existe gran diversidad de aplicaciones móviles para abastecer las distintas necesidades de los usuarios, entre ellas están las que nos ayudan a localizar lugares de interés, reconocer canciones en tiempo real con solo escucharla, traducir textos en múltiples idiomas o los típicos y más comunes como son los buscadores web entre otros. Dentro de esta gran variedad aparecen algunas que nos ayudan al reconocimiento de caracteres o texto en imágenes. Actualmente existen algunos ejemplos como los OCRs que, a partir de una imagen capturada, son capaces de detectar el texto dentro de una imagen y convertirlo a un formato en concreto, o otras más interesantes como la recién adquirida por Google, Word Lens la cual está integrada en su aplicación Google Translate capaz de traducir texto en tiempo real con solo enfocar la cámara a la imagen a tratar. Este proyecto no es tan sofisticado como la de Google pero se podría decir que esta dentro de este grupo, el de reconocimiento de texto o caracteres a partir de una imagen, la finalidad es realizar una aplicación en Adroid que mediante unas librerías OpenCV sea capaz de detectar texto dentro de las imágenes.; Actualment hi ha gran diversitat d'aplicacions mòbils per a proveir les diferents necessitats dels usuaris...

Human or Computer Assisted Interactive Transcription: Automated Text Recognition, Text Annotation, and Scholarly Edition in the Twenty-First Century

Castro-Bleda, M. J.; Vilar, J. M.; España, S.; Llorens, D.; Marzal, A.; Prat, F.; Zamora, F.
Fonte: Universidade Autônoma de Barcelona Publicador: Universidade Autônoma de Barcelona
Tipo: Artigo de Revista Científica Formato: application/pdf
Publicado em //2014 ENG
Relevância na Pesquisa
56.05%
Computer assisted transcription tools can speed up the initial process of reading and transcribing texts. At the same time, new annotation tools open new ways of accessing the text in its graphical form. The balance and value of each method still needs to be explored. STATE, a complete assisted transcription system for ancient documents, was presented to the audience of the 2013 International Medieval Congress at Leeds. The system offers a multimodal interaction environment to assist humans in transcribing ancient documents: the user can type, write on the screen with a stylus, or utter a word. When one of these actions is used to correct an erroneous word, the system uses this new information to look for other mistakes in the rest of the line. The system is modular, composed of different parts: one part creates projects from a set of images of documents, another part controls an automatic transcription system, and the third part allows the user to interact with the transcriptions and easily correct them as needed. This division of labour allows great flexibility for organising the work in a team of transcribers.; Las herramientas de ayuda a la transcripción automática pueden acelerar el proceso inicial de la lectura y transcripción de textos. Al mismo tiempo...

Optimization of Weights in a Multiple Classifier Handwritten Word Recognition System Using a Genetic Algorithm

Günter, Simon; Bunke, Horst
Fonte: Universidade Autônoma de Barcelona Publicador: Universidade Autônoma de Barcelona
Tipo: Artigo de Revista Científica Formato: application/pdf
Publicado em //2004 ENG
Relevância na Pesquisa
56.12%
Automatic handwritten text recognition by computer has a number of interesting applications. However, due to a great variety of individual writing styles, the problem is very difficult and far from being solved. Recently, a number of classifier creation methods, known as ensemble methods, have been proposed in the field of machine learning. They have shown improved recognition performance over single classifiers. For the combination of these classifiers many methods have been proposed in the literature. In this paper we describe a weighted voting scheme where the weights are obtained by a genetic algorithm.

Methods for text segmentation from scene images

Kumar, Deepak
Fonte: Universidade Autônoma de Barcelona Publicador: Universidade Autônoma de Barcelona
Tipo: Artigo de Revista Científica Formato: application/pdf; application/pdf; application/pdf
Publicado em //2014 ENG
Relevância na Pesquisa
46.42%
Advisor/s: A. G. Ramakrishnan. Date and location of PhD thesis defense: 24 February 2014, Indian Institute of Science.; Camera-captured scene/born-digital image analysis helps in the development of vision for robots to read text, transliterate or translate text, navigate and retrieve search results. However, text in such images does nor follow any standard layout, and its location within the image is random in nature. In addition, motion blur, non-uniform illumination, skew, occlusion and scale-based degradations increase the complexity in locating and recognizing the text in a scene/born-digital image. OTCYMIST method is proposed to segment text from the born-digital images. This method won the first place in ICDAR 2011 and placed in the third position in ICDAR 2013 for its performance on the text segmentation task in robust reading competitions for born-digital image data set. Here, Otsu’s binarization and Canny edge detection are separately carried out on the three colour planes of the image. Connected components (CC’s) obtained from the segmented image are pruned based on thresholds applied on their area and aspect ratio. CC’s with sufficient edge pixels are retained. The centroids of the individual CC’s are used as nodes of a graph. A minimum spanning tree is built using these nodes of the graph. Long edges are broken from the minimum spanning tree of the graph. Pairwise height ratio is used to remove likely non-text components. CC’s are grouped based on their proximity in the horizontal direction to generate bounding boxes (BB’s) of text strings. Overlapping BB’s are removed using an overlap area threshold. Non-overlapping and minimally overlapping BB’s are retained for text segmentation. These BB’s are split vertically to localize text at the word level. A word cropped from a document image can easily be recognized using a traditional optical character recognition (OCR) engine. However...

Context sensitive optical character recognition using neural networks and hidden Markov models

Elliott, Steven C.
Fonte: Rochester Instituto de Tecnologia Publicador: Rochester Instituto de Tecnologia
Tipo: Tese de Doutorado
EN_US
Relevância na Pesquisa
56.12%
This thesis investigates a method for using contextual information in text recognition. This is based on the premise that, while reading, humans recognize words with missing or garbled characters by examining the surrounding characters and then selecting the appropriate character. The correct character is chosen based on an inherent knowledge of the language and spelling techniques. We can then model this statistically. The approach taken by this Thesis is to combine feature extraction techniques, Neural Networks and Hidden Markov Modeling. This method of character recognition involves a three step process: pixel image preprocessing, neural network classification and context interpretation. Pixel image preprocessing applies a feature extraction algorithm to original bit mapped images, which produces a feature vector for the original images which are input into a neural network. The neural network performs the initial classification of the characters by producing ten weights, one for each character. The magnitude of the weight is translated into the confidence the network has in each of the choices. The greater the magnitude and separation, the more confident the neural network is of a given choice. The output of the neural network is the input for a context interpreter. The context interpreter uses Hidden Markov Modeling (HMM) techniques to determine the most probable classification for all characters based on the characters that precede that character and character pair statistics. The HMMs are built using an a priori knowledge of the language: a statistical description of the probabilities of digrams. Experimentation and verification of this method combines the development and use of a preprocessor program...