Página 1 dos resultados de 1131 itens digitais encontrados em 0.118 segundos

Didactic speech synthesizer – acoustic module, formants model

Teixeira, João Paulo; Fernandes, Anildo
Fonte: Instituto Politécnico de Bragança Publicador: Instituto Politécnico de Bragança
Tipo: Conferência ou Objeto de Conferência
ENG
Relevância na Pesquisa
75.91%
Text-to-speech synthesis is the main subject treated in this work. It will be presented the constitution of a generic text-to-speech system conversion, explained the functions of the various modules and described the development techniques using the formants model. The development of a didactic formant synthesiser under Matlab environment will also be described. This didactic synthesiser is intended for a didactic understanding of the formant model of speech production.

Didactic speech synthesizer – acoustic module, formants model

Teixeira, João Paulo; Fernandes, Anildo
Fonte: Instituto Politécnico de Bragança Publicador: Instituto Politécnico de Bragança
Tipo: Conferência ou Objeto de Conferência
ENG
Relevância na Pesquisa
75.87%
Text-to-speech synthesis is lhe main subjecl treated in this work. II will be presented the conslilution of a generic lext-to-speech system conversion, explained lhe functions 01 the various modules and described lhe developmenl lechniques using lhe formants model. The development of a didactic forman! synthesiser under Matlab environmenl will also be described. This didactic synthesiser is inlended for a didactic understanding of lhe formant modelaI speech producllon.

Tradução grafema-fonema para a língua portuguesa baseada em autômatos adaptativos.; Grapheme-phoneme translation for portuguese based on adaptive automata.

Shibata, Danilo Picagli
Fonte: Biblioteca Digitais de Teses e Dissertações da USP Publicador: Biblioteca Digitais de Teses e Dissertações da USP
Tipo: Dissertação de Mestrado Formato: application/pdf
Publicado em 25/03/2008 PT
Relevância na Pesquisa
65.95%
Este trabalho apresenta um estudo sobre a utilização de dispositivos adaptativos para realizar tradução texto-voz. O foco do trabalho é a criação de um método para a tradução grafema-fonema para a língua portuguesa baseado em autômatos adaptativos e seu uso em um software de tradução texto-voz. O método apresentado busca mimetizar o comportamento humano no tratamento de regras de tonicidade, separação de sílabas e as influências que as sílabas exercem sobre suas vizinhas. Essa característica torna o método facilmente utilizável para outras variações da língua portuguesa, considerando que essas características são invariantes em relação à localidade e a época da variedade escolhida. A variação contemporânea da língua falada na cidade de São Paulo foi escolhida como alvo de análise e testes neste trabalho. Para essa variação, o modelo apresenta resultados satisfatórios superando 95% de acerto na tradução grafema-fonema de palavras, chegando a 90% de acerto levando em consideração a resolução de dúvidas geradas por palavras que podem possuir duas representações sonoras e gerando uma saída sonora inteligível aos nativos da língua por meio da síntese por concatenação baseada em sílabas. Como resultado do trabalho...

Algoritmos OPWI e LDM-GA para sistemas de conversão texto-fala de alta qualidade empregando a tecnologia SCAUS; Algorithm OPWI and LDM-GA for high quality text-to-speech synthesis based on automatic unit selection

Edmilson da Silva Morais
Fonte: Biblioteca Digital da Unicamp Publicador: Biblioteca Digital da Unicamp
Tipo: Tese de Doutorado Formato: application/pdf
Publicado em 20/04/2006 PT
Relevância na Pesquisa
65.72%
Esta Tese apresenta dois novos algoritmos denominados OPWI (Optimized Prototype Waveform Interpolation) e LDM-GA (Linguistic Data Mining Using Genetic Algorithm). Estes algoritmos são formulados no contexto de sistemas CTF-SCAUS (sistemas de Conversão Texto-Fala empregando a tecnologia de Seleção e Concatenação Automática de Unidades de Síntese). O algoritmo OPWI é apresentado como uma nova alternativa para o módulo de Back-End de sistemas CTF-SCAUS, permitindo modificações prosódicas e suavizações espectrais de alta qualidade. O algoritmo LDM-GA foi desenvolvido com o objetivo de minimizar problemas de treinamento, em sistemas CTF-SCAUS, relacionados a distribuições de probabilidade com características LNRE (Large Number of Rare Events). Resultados da avaliação dos algoritmos OPWI e LDM-GA são apresentados e discutidos detalhadamente. Além destes dois algoritmos, esta Tese apresenta uma ampla revisão bibliográfica sobre os principais módulos de um sistema CTF-SCAUS, módulos de Front-End (Módulo lingüístico), módulo prasódico, módulo de seleção de unidades de síntese e módulo de Back-End (Módulo de síntese); This Thesis presents two new algorithms for Unit Selection Based Text-to-Speech systems (USBTTS). The first algorithm is the OPWI (Optimized Prototype Waveform Interpolation)...

Estudo de um sistema de conversão texto-fala baseado em HMM; Study of a HMM-based text-to-speech system

Sarah Negreiros de Carvalho
Fonte: Biblioteca Digital da Unicamp Publicador: Biblioteca Digital da Unicamp
Tipo: Dissertação de Mestrado Formato: application/pdf
Publicado em 18/02/2013 PT
Relevância na Pesquisa
75.9%
Com o contínuo desenvolvimento da tecnologia, há uma demanda crescente por sistemas de síntese de fala que sejam capazes de falar como humanos, para integrá-los nas mais diversas aplicações, seja no âmbito da automação robótica, sejam para acessibilidade de pessoas com deficiências, seja em aplicativos destinados a cultura e lazer. A síntese de fala baseada em modelos ocultos de Markov (HMM) mostra-se promissora em suprir esta necessidade tecnológica. A sua natureza estatística e paramétrica a tornam um sistema flexível, capaz de adaptar vozes artificiais, inserir emoções no discurso e obter fala sintética de boa qualidade usando uma base de treinamento limitada. Esta dissertação apresenta o estudo realizado sobre o sistema de síntese de fala baseado em HMM (HTS), descrevendo as etapas que envolvem o treinamento dos modelos HMMs e a geração do sinal de fala. São apresentados os modelos espectrais, de pitch e de duração que constituem estes modelos HMM dos fonemas dependentes de contexto, considerando as diversas técnicas de estruturação deles. Alguns dos problemas encontrados no HTS, tais como a característica abafada e monótona da fala artificial, são analisados juntamente com algumas técnicas propostas para aprimorar a qualidade final do sinal de fala sintetizado.; With the continuous development of technology...

Text to speech : "a rewriting system approach"

Almeida, J. J.; Simões, Alberto
Fonte: Universidade do Minho Publicador: Universidade do Minho
Tipo: Conferência ou Objeto de Conferência
Publicado em //2001 ENG
Relevância na Pesquisa
65.9%
In this document we present an open source Portuguese text to speech. Our first goal is to provide a flexible way to extend it, using a generic way to convert Portuguese words on SAMPA phonemes, and consult dictionaries only on exceptions examples. The Text-to-Speech is compound of five layers, each one based on simple rules in a way to be easily tuned. In order to do that, we wrote a generic text rewriting system that is presented in the section two. The result of this work is a tool that can be used as an independent Text-to-Speech system or as a Natural Language Processing library for various tasks. We present some examples how them can be used in the Applications section.

Perception of synthetic speech produced automatically by rule: Intelligibility of eight text-to-speech systems

GREENE, BETH G.; LOGAN, JOHN S.; PISONI, DAVID B.
Fonte: PubMed Publicador: PubMed
Tipo: Artigo de Revista Científica
Publicado em /03/1986 EN
Relevância na Pesquisa
65.9%
We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered.

Development of Grapheme-to-Phoneme Conversion System for Yorùbá Text-to-Speech Synthesis

Ìyàndá, Abímbólá R.; O. báfé.mı Awóló.wò. University; Odéjobí, Odétúnjí A.; O. báfé.mı Awóló.wò. University; Soyoye, Festus A.; University of the West of England; Akinadé, Olúbénga O.; Bristol Enterprise Research and Innovation C
Fonte: Editora da UFLA Publicador: Editora da UFLA
Tipo: info:eu-repo/semantics/article; info:eu-repo/semantics/publishedVersion; Peer-reviewed Article Formato: application/pdf
Publicado em 26/08/2015 ENG
Relevância na Pesquisa
65.72%

The effects of word prediction and text-to-speech technologies on the narrative writing skills of students with specific learning disabilities

Silio, Monica C
Fonte: FIU Digital Commons Publicador: FIU Digital Commons
Tipo: Artigo de Revista Científica
EN
Relevância na Pesquisa
66.03%
This study investigated the effects of word prediction and text-to-speech on the narrative composition writing skills of 6, fifth-grade Hispanic boys with specific learning disabilities (SLD). A multiple baseline design across subjects was used to explore the efficacy of word prediction and text-to-speech alone and in combination on four dependent variables: writing fluency (words per minute), syntax (T-units), spelling accuracy, and overall organization (holistic scoring rubric). Data were collected and analyzed during baseline, assistive technology interventions, and at 2-, 4-, and 6-week maintenance probes. ^ Participants were equally divided into Cohorts A and B, and two separate but related studies were conducted. Throughout all phases of the study, participants wrote narrative compositions for 15-minute sessions. During baseline, participants used word processing only. During the assistive technology intervention condition, Cohort A participants used word prediction followed by word prediction with text-to-speech. Concurrently, Cohort B participants used text-to-speech followed by text-to-speech with word prediction. ^ The results of this study indicate that word prediction alone or in combination with text-to-speech has a positive effect on the narrative writing compositions of students with SLD. Overall...

Avoiding communication barriers in the classroom: the APEINTA project

Iglesias, Ana; Jiménez, Javier; Revuelta, Pablo; Moreno, Lourdes
Fonte: Taylor & Francis Publicador: Taylor & Francis
Tipo: info:eu-repo/semantics/acceptedVersion; info:eu-repo/semantics/article
Publicado em 09/06/2014 ENG
Relevância na Pesquisa
65.72%
Education is a fundamental human right, however unfortunately not everybody has the same learning opportunities. For instance, if a student has hearing impairments, s/he could face communications barriers in the classroom, which could affect his/her learning process. APEINTA is a Spanish educational project that aims for inclusive education for all. This project proposes two main accessible initiatives: (1) real-time captioning and text-to-speech (TTS) services in the classroom and (2) accessible Web-learning platform out of the classroom with accessible digital resources. This paper presents the inclusive initiatives of APEINTA. Also an evaluation of the into-the-classroom initiative (real-time captioning and TTS services) is presented. This evaluation has been conducted during a regular undergraduate course at a university and during a seminar at an integration school for deaf children. Forty-five hearing students, 1 foreign student, 3 experts in captioning, usability and accessibility, and 20 students with hearing impairments evaluated these services in the classroom. Evaluation results show that these initiatives are adequate to be used in the classroom and that students are satisfied with them.

A Comparison of Open-Source Segmentation Architectures for Dealing with Imperfect Data from the Media in Speech Synthesis

Gallardo-Antolín, Ascensión; Montero, Juan Manuel; King, Simon
Fonte: International Speech Communication Association Publicador: International Speech Communication Association
Tipo: info:eu-repo/semantics/publishedVersion; info:eu-repo/semantics/bookPart; info:eu-repo/semantics/conferenceObject
Publicado em //2014 ENG
Relevância na Pesquisa
85.87%
Traditional Text-To-Speech (TTS) systems have been developed using especially-designed non-expressive scripted recordings. In order to develop a new generation of expressive TTS systems in the Simple4All project, real recordings from the media should be used for training new voices with a whole new range of speaking styles. However, for processing this more spontaneous material, the new systems must be able to deal with imperfect data (multi-speaker recordings, background and fore-ground music and noise), filtering out low-quality audio segments and creating mono-speaker clusters. In this paper we compare several architectures for combining speaker diarization and music and noise detection which improve the precision and overall quality of the segmentation.; This work has been carried out during the research stay of A. Gallardo-Antolín and J. M. Montero at the Centre for Speech Technology Research (CSTR), University of Edinburgh, supported by the Spanish Ministry of Education, Culture and Sports under the National Program of Human Resources Mobility from the I+D+i 2008-2011 National Program, extended by agreement of the Council of Ministers in October 7th, 2011. The work leading to these results has received funding from the European Union under grant agreement No 287678. It has also been supported by EPSRC Programme Grant grant...

Blind Estimation of Perceptual Quality for Modern Speech Communications

Falk, Tiago
Fonte: Quens University Publicador: Quens University
Tipo: Tese de Doutorado Formato: 1412501 bytes; application/pdf
EN; EN
Relevância na Pesquisa
75.99%
Modern speech communication technologies expose users to perceptual quality degradations that were not experienced earlier with conventional telephone systems. Since perceived speech quality is a major contributor to the end user's perception of quality of service, speech quality estimation has become an important research field. In this dissertation, perceptual quality estimators are proposed for several emerging speech communication applications, in particular for i) wireless communications with noise suppression capabilities, ii) wireless-VoIP communications, iii) far-field hands-free speech communications, and iv) text-to-speech systems. First, a general-purpose speech quality estimator is proposed based on statistical models of normative speech behaviour and on innovative techniques to detect multiple signal distortions. The estimators do not depend on a clean reference signal hence are termed ``blind." Quality meters are then distributed along the network chain to allow for both quality degradations and quality enhancements to be handled. In order to improve estimation performance for wireless communications, statistical models of noise-suppressed speech are also incorporated. Next, a hybrid signal-and-link-parametric quality estimation paradigm is proposed for emerging wireless-VoIP communications. The algorithm uses VoIP connection parameters to estimate a base quality representative of the packet switching network. Signal-based distortions are then detected and quantified in order to adjust the base quality accordingly. The proposed hybrid methodology is shown to overcome the limitations of existing pure signal-based and pure link parametric algorithms. Temporal dynamics information is then investigated for quality diagnosis for hands-free speech communications. A spectro-temporal signal representation...

Using a hybrid approach to build a pronunciation dictionary for brazilian portuguese

Mendonça, Gustavo; Aluisio, Sandra Maria
Fonte: Chinese and Oriental Languages Information Processing Society - COLIPS; Institute for Infocomm Research - I2R; International Speech Communication Association - ISCA; Singapore Publicador: Chinese and Oriental Languages Information Processing Society - COLIPS; Institute for Infocomm Research - I2R; International Speech Communication Association - ISCA; Singapore
Tipo: Conferência ou Objeto de Conferência
ENG
Relevância na Pesquisa
65.72%
This paper describes the method employed to build a machinereadable pronunciation dictionary for Brazilian Portuguese. The dictionary makes use of a hybrid approach for converting graphemes into phonemes, based on both manual transcription rules and machine learning algorithms. It makes use of a word list compiled from the Portuguese Wikipedia dump. Wikipedia articles were transformed into plain text, tokenized and word types were extracted. A language identification tool was developed to detect loanwords among data. Words’ syllable boundaries and stress were identified. The transcription task was carried out in a two-step process: i) words are submitted to a set of transcription rules, in which predictable graphemes (mostly consonants) are transcribed; ii) a machine learning classifier is used to predict the transcription of the remaining graphemes (mostly vowels). The method was evaluated through 5-fold cross-validation; results show a F1-score of 0.98. The dictionary and all the resources used to build it were made publicly available.; Samsung Eletrônica da Amazônia Ltda.

Multilingual Text Analysis for Text-to-Speech Synthesis

Sproat, Richard
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 19/08/1996
Relevância na Pesquisa
65.81%
We present a model of text analysis for text-to-speech (TTS) synthesis based on (weighted) finite-state transducers, which serves as the text-analysis module of the multilingual Bell Labs TTS system. The transducers are constructed using a lexical toolkit that allows declarative descriptions of lexicons, morphological rules, numeral-expansion rules, and phonological rules, inter alia. To date, the model has been applied to eight languages: Spanish, Italian, Romanian, French, German, Russian, Mandarin and Japanese.

A Text to Speech (TTS) System with English to Punjabi Conversion

Singh, Prabhsimran; Singh, Amritpal
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 13/11/2014
Relevância na Pesquisa
65.76%
The paper aims to show how an application can be developed that converts the English language into the Punjabi Language, and the same application can convert the Text to Speech(TTS) i.e. pronounce the text. This application can be really beneficial for those with special needs.; Comment: 5 pages, 8 figures, 3 tables

Generating Segment Durations in a Text-To-Speech System: A Hybrid Rule-Based/Neural Network Approach

Corrigan, Gerald; Massey, Noel; Karaali, Orhan
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 24/11/1998
Relevância na Pesquisa
65.84%
A combination of a neural network with rule firing information from a rule-based system is used to generate segment durations for a text-to-speech system. The system shows a slight improvement in performance over a neural network system without the rule firing information. Synthesized speech using segment durations was accepted by listeners as having about the same quality as speech generated using segment durations extracted from natural speech.; Comment: 4 pages, PostScript

Text-To-Speech Conversion with Neural Networks: A Recurrent TDNN Approach

Karaali, Orhan; Corrigan, Gerald; Gerson, Ira; Massey, Noel
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 24/11/1998
Relevância na Pesquisa
65.81%
This paper describes the design of a neural network that performs the phonetic-to-acoustic mapping in a speech synthesis system. The use of a time-domain neural network architecture limits discontinuities that occur at phone boundaries. Recurrent data input also helps smooth the output parameter tracks. Independent testing has demonstrated that the voice quality produced by this system compares favorably with speech from existing commercial text-to-speech systems.; Comment: 4 pages, PostScript

A High Quality Text-To-Speech System Composed of Multiple Neural Networks

Karaali, Orhan; Corrigan, Gerald; Massey, Noel; Miller, Corey; Schnurr, Otto; Mackie, Andrew
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 04/12/1998
Relevância na Pesquisa
65.89%
While neural networks have been employed to handle several different text-to-speech tasks, ours is the first system to use neural networks throughout, for both linguistic and acoustic processing. We divide the text-to-speech task into three subtasks, a linguistic module mapping from text to a linguistic representation, an acoustic module mapping from the linguistic representation to speech, and a video module mapping from the linguistic representation to animated images. The linguistic module employs a letter-to-sound neural network and a postlexical neural network. The acoustic module employs a duration neural network and a phonetic neural network. The visual neural network is employed in parallel to the acoustic module to drive a talking head. The use of neural networks that can be retrained on the characteristics of different voices and languages affords our system a degree of adaptability and naturalness heretofore unavailable.; Comment: Source link (9812006.tar.gz) contains: 1 PostScript file (4 pages) and 3 WAV audio files. If your system does not support Windows WAV files, try a tool like "sox" to translate the audio into a format of your choice

Use Pronunciation by Analogy for text to speech system in Persian language

Jowharpour, Ali; dezfuli, Masha allah abbasi; Yektaee, Mohammad hosein
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 24/07/2011
Relevância na Pesquisa
65.93%
The interest in text to speech synthesis increased in the world .text to speech have been developed formany popular languages such as English, Spanish and French and many researches and developmentshave been applied to those languages. Persian on the other hand, has been given little attentioncompared to other languages of similar importance and the research in Persian is still in its infancy.Persian language possess many difficulty and exceptions that increase complexity of text to speechsystems. For example: short vowels is absent in written text or existence of homograph words. in thispaper we propose a new method for persian text to phonetic that base on pronunciations by analogy inwords, semantic relations and grammatical rules for finding proper phonetic. Keywords:PbA, text to speech, Persian language, FPbA

Text-to-speech vs. human voiced audio descriptions: a reception study in films dubbed into Catalan

Fernández i Torné, Anna; Matamala, Anna
Fonte: Universidade Autônoma de Barcelona Publicador: Universidade Autônoma de Barcelona
Tipo: Artigo de Revista Científica Formato: application/pdf
Publicado em //2015 ENG
Relevância na Pesquisa
95.91%
This article presents an experiment that aims to determine whether blind and visually impaired people would accept the implementation of text-to-speech in the audio description of dubbed feature films in the Catalan context. A user study was conducted with 67 blind and partially sighted people who assessed two synthetic voices when applied to audio description, as compared to two natural voices. All of the voices had been previously selected in a preliminary test. The analysis of the data (both quantitative and qualitative) concludes that most participants accept Catalan text-to-speech audio description as an alternative solution to the standard human-voiced audio description. However, natural voices obtain statistically higher scores than synthetic voices and are still the preferred solution.