Página 1 dos resultados de 2062 itens digitais encontrados em 0.013 segundos

Uso de heurísticas para a aceleração do aprendizado por reforço.; Heuristically acelerated reinforcement learning.

Bianchi, Reinaldo Augusto da Costa
Fonte: Biblioteca Digitais de Teses e Dissertações da USP Publicador: Biblioteca Digitais de Teses e Dissertações da USP
Tipo: Tese de Doutorado Formato: application/pdf
Publicado em 05/04/2004 PT
Relevância na Pesquisa
66.46%
Este trabalho propõe uma nova classe de algoritmos que permite o uso de heurísticas para aceleração do aprendizado por reforço. Esta classe de algoritmos, denominada "Aprendizado Acelerado por Heurísticas" ("Heuristically Accelerated Learning" - HAL), é formalizada por Processos Markovianos de Decisão, introduzindo uma função heurística H para influenciar o agente na escolha de suas ações, durante o aprendizado. A heurística é usada somente para a escolha da ação a ser tomada, não modificando o funcionamento do algoritmo de aprendizado por reforço e preservando muitas de suas propriedades. As heurísticas utilizadas nos HALs podem ser definidas a partir de conhecimento prévio sobre o domínio ou extraídas, em tempo de execução, de indícios que existem no próprio processo de aprendizagem. No primeiro caso, a heurística é definida a partir de casos previamente aprendidos ou definida ad hoc. No segundo caso são utilizados métodos automáticos de extração da função heurística H chamados "Heurística a partir de X" ("Heuristic from X"). Para validar este trabalho são propostos diversos algoritmos, entre os quais, o "Q-Learning Acelerado por Heurísticas" (Heuristically Accelerated Q-Learning - HAQL), que implementa um HAL estendendo o conhecido algoritmo Q-Learning...

Aprendizado por reforço relacional para o controle de robôs sociáveis; Relational reinforcement learning to control sociable robots

Silva, Renato Ramos da
Fonte: Biblioteca Digitais de Teses e Dissertações da USP Publicador: Biblioteca Digitais de Teses e Dissertações da USP
Tipo: Dissertação de Mestrado Formato: application/pdf
Publicado em 10/03/2009 PT
Relevância na Pesquisa
66.55%
A inteligência artificial não busca somente entender mas construir entidades inteligentes. A inteligência pode ser dividida em vários fatores e um deles é conhecido como aprendizado. A área de aprendizado de máquina visa o desenvolvimento de técnicas para aprendizado automático de máquinas, que incluem computadores, robôs ou qualquer outro dispositivo. Entre essas técnicas encontra-se o Aprendizado por Reforço, foco principal deste trabalho. Mais especificamente, o aprendizado por reforço relacional (ARR) foi investigado, que representa na forma relacional o aprendizado obtido através da interação direta com o ambiente. O ARR é bem interessante no campo de robótica, pois, em geral, não se dispôe do modelo do ambiente e se requer econômia de recursos utilizados. A técnica ARR foi investigada dentro do contexto de aprendizado de uma cabeça robótica. Uma modificação no algoritmo ARR foi proposta, denominada por ETG, e incorporada em uma arquitetura de controle de uma cabeça robótica. A arquitetura foi avaliada no contexto de um problema real não trivial: o aprendizado da atenção compartilhada. Os resultados obtidos mostram que a arquitetura é capaz de exibir comportamentos apropriados durante uma interação social controlada...

Agente topológico de aprendizado por reforço; Topological reinforcement learning agent

Braga, Arthur Plínio de Souza
Fonte: Biblioteca Digitais de Teses e Dissertações da USP Publicador: Biblioteca Digitais de Teses e Dissertações da USP
Tipo: Tese de Doutorado Formato: application/pdf
Publicado em 07/04/2004 PT
Relevância na Pesquisa
66.48%
Os métodos de Aprendizagem por Reforço (AR) se mostram adequados para problemas de tomadas de decisões em diversos domínios por sua estrutura flexível e adaptável. Apesar de promissores, os métodos AR frequentemente tem seu campo de atuação prático restrito a problemas com espaço de estados de pequeno ou médio porte devido em muito à forma com que realizam a estimativa da função de avaliação. Nesta tese, uma nova abordagem de AR, denominada de Agente Topológico de Aprendizagem por Reforço (ATAR), inspirada em aprendizagem latente, é proposta para acelerar a aprendizagem por reforço através de um mecanismo alternativo de seleção dos pares estado-ação para atualização da estimativa da função de avaliação. A aprendizagem latente refere-se à aprendizagem animal que ocorre na ausência de reforço e que não é aparente até que um sinal de reforço seja percebido pelo agente. Este aprendizado faz com que um agente aprenda parcialmente uma tarefa mesmo antes que este receba qualquer sinal de reforço. Mapas Cognitivos são usualmente empregados para codificar a informação do ambiente em que o agente está imerso. Desta forma, o ATAR usa um mapa topológico, baseado em Mapas Auto-Organizáveis, para realizar as funções do mapa cognitivo e permitir um mecanismo simples de propagação das atualizações. O ATAR foi testado...

Aprendizado por reforço em lote: um estudo de caso para o problema de tomada de decisão em processos de venda; Batch reinforcement learning: a case study for the problem of decision making in sales processes

Lacerda, Dênis Antonio
Fonte: Biblioteca Digitais de Teses e Dissertações da USP Publicador: Biblioteca Digitais de Teses e Dissertações da USP
Tipo: Dissertação de Mestrado Formato: application/pdf
Publicado em 12/12/2013 PT
Relevância na Pesquisa
66.53%
Planejamento Probabilístico estuda os problemas de tomada de decisão sequencial de um agente, em que as ações possuem efeitos probabilísticos, modelados como um processo de decisão markoviano (Markov Decision Process - MDP). Dadas a função de transição de estados probabilística e os valores de recompensa das ações, é possível determinar uma política de ações (i.e., um mapeamento entre estado do ambiente e ações do agente) que maximiza a recompensa esperada acumulada (ou minimiza o custo esperado acumulado) pela execução de uma sequência de ações. Nos casos em que o modelo MDP não é completamente conhecido, a melhor política deve ser aprendida através da interação do agente com o ambiente real. Este processo é chamado de aprendizado por reforço. Porém, nas aplicações em que não é permitido realizar experiências no ambiente real, por exemplo, operações de venda, é possível realizar o aprendizado por reforço sobre uma amostra de experiências passadas, processo chamado de aprendizado por reforço em lote (Batch Reinforcement Learning). Neste trabalho, estudamos técnicas de aprendizado por reforço em lote usando um histórico de interações passadas, armazenadas em um banco de dados de processos...

Aprendizado por reforço utilizando tile coding em cenários multiagente; Reinforcement learning using tile coding in multiagent scenarios

Waskow, Samuel Justo
Fonte: Universidade Federal do Rio Grande do Sul Publicador: Universidade Federal do Rio Grande do Sul
Tipo: Dissertação Formato: application/pdf
POR
Relevância na Pesquisa
66.44%
Atualmente pesquisadores de inteligência artificial buscam métodos para solucionar problemas de aprendizado por reforço que estão associados a uma grande quantidade de recursos computacionais. Em cenários multiagentes onde os espaços de estados e ações possuem alta dimensionalidade, as abordagens tradicionais de aprendizado por reforço são inadequadas. Como alternativa existem técnicas de generalização do espaço de estados que ampliam a capacidade de aprendizado através de abstrações. Desta maneira, o foco principal deste trabalho é utilizar as técnicas existentes de aprendizado por reforço com aproximação de funções através de tile coding para aplicação nos seguintes cenários: presa-predador, controle de tráfego veicular urbano e jogos de coordenação. Os resultados obtidos nos experimentos demonstram que a representação de estados por tile coding tem desempenho superior à representação tabular.; Nowadays, researchers are seeking methods to solve reinforcement learning (RL) problems in complex scenarios. RL is an efficient, widely used machine learning technique in single-agent problems. Regarding multiagent systems, in which the state space generally has high dimensionality, standard reinforcement learning approaches may not be adequate. As alternatives...

Using Reinforcement Learning in the tuning of Central Pattern Generators

Duarte, Ana Filipa de Sampaio Calçada
Fonte: Universidade do Minho Publicador: Universidade do Minho
Tipo: Dissertação de Mestrado
Publicado em 12/12/2012 ENG
Relevância na Pesquisa
66.6%
Dissertação de mestrado em Engenharia Informática; É objetivo deste trabalho aplicar técnicas de Reinforcement Learning em tarefas de aprendizagem e locomoção de robôs. Reinforcement Learning é uma técnica de aprendizagem útil no que diz respeito à locomoção de robôs, devido à ênfase que dá à interação direta entre o agente e o meio ambiente, e ao facto de não exigir supervisão ou modelos completos, ao contrário do que acontece nas abordagens clássicas. O objetivo desta técnica consiste na decisão das ações a tomar, de forma a maximizar uma recompensa cumulativa, tendo em conta o facto de que as decisões podem afetar não só as recompensas imediatas, como também as futuras. Neste trabalho será apresentada a estrutura e funcionamento do Reinforcement Learning e a sua aplicação em Central Pattern Generators, com o objetivo de gerar locomoção adaptativa otimizada. De forma a investigar e identificar os pontos fortes e capacidades do Reinforcement Learning, e para demonstrar de uma forma simples este tipo de algoritmos, foram implementados dois casos de estudo baseados no estado da arte. No que diz respeito ao objetivo principal desta tese, duas soluções diferentes foram abordadas: uma primeira baseada em métodos Natural-Actor Critic...

Dynamic equilibrium through reinforcement learning

Faustino, Paulo Fernando Pinho
Fonte: Instituto Politécnico de Lisboa Publicador: Instituto Politécnico de Lisboa
Tipo: Dissertação de Mestrado
Publicado em /09/2011 ENG
Relevância na Pesquisa
66.54%
Reinforcement Learning is an area of Machine Learning that deals with how an agent should take actions in an environment such as to maximize the notion of accumulated reward. This type of learning is inspired by the way humans learn and has led to the creation of various algorithms for reinforcement learning. These algorithms focus on the way in which an agent’s behaviour can be improved, assuming independence as to their surroundings. The current work studies the application of reinforcement learning methods to solve the inverted pendulum problem. The importance of the variability of the environment (factors that are external to the agent) on the execution of reinforcement learning agents is studied by using a model that seeks to obtain equilibrium (stability) through dynamism – a Cart-Pole system or inverted pendulum. We sought to improve the behaviour of the autonomous agents by changing the information passed to them, while maintaining the agent’s internal parameters constant (learning rate, discount factors, decay rate, etc.), instead of the classical approach of tuning the agent’s internal parameters. The influence of changes on the state set and the action set on an agent’s capability to solve the Cart-pole problem was studied. We have studied typical behaviour of reinforcement learning agents applied to the classic BOXES model and a new form of characterizing the environment was proposed using the notion of convergence towards a reference value. We demonstrate the gain in performance of this new method applied to a Q-Learning agent.; A Aprendizagem por Reforço é uma área da Aprendizagem Automática que se preocupa com a forma como um agente deve tomar acções num ambiente de modo a maximizar a noção de recompensa acumulada. Esta forma de aprendizagem é inspirada na forma como os humanos aprendem e tem levado à criação de diversos algoritmos de aprendizagem por reforço. Estes algoritmos focam a forma de melhorar o comportamento do agente...

Individual Differences in Reinforcement Learning: Behavioral, Electrophysiological, and Neuroimaging Correlates

Bogdan, Ryan; Goetz, Elena; Holmes, Avram J.; Birk, Jeffrey L.; Dillon, Daniel G.; Santesso, Diane L.; Pizzagalli, Diego
Fonte: Elsevier Publicador: Elsevier
Tipo: Artigo de Revista Científica
EN_US
Relevância na Pesquisa
66.44%
During reinforcement learning, phasic modulations of activity in midbrain dopamine neurons are conveyed to the dorsal anterior cingulate Cortex (dACC) and basal ganglia (BG) and serve to guide adaptive responding. While the animal literature supports a role for the dACC in integrating reward history over time, most human electrophysiological Studies of dACC function have focused on responses to single positive and negative outcomes. The present electrophysiological study investigated the role of the dACC in probabilistic reward learning in healthy subjects using a task that required integration of reinforcement history over time. We recorded the feedback-related negativity (FRN) to reward feedback in subjects who developed a response bias toward a more frequently rewarded ("rich") stimulus ("learners") versus subjects who did not ("non-learners"). Compared to non-learners, learners showed more positive (i.e., smaller) FRNs and greater dACC activation upon receiving reward for correct identification of the rich stimulus. In addition, dACC activation and a bias to select the rich Stimulus were positively correlated. The same participants also completed a monetary incentive delay (MID) task administered during functional magnetic resonance imaging. Compared to non-learners...

Mapping anhedonia onto reinforcement learning: a behavioural meta-analysis

Huys, Quentin JM; Pizzagalli, Diego A; Bogdan, Ryan; Dayan, Peter
Fonte: BioMed Central Publicador: BioMed Central
Tipo: Artigo de Revista Científica
EN_US
Relevância na Pesquisa
66.49%
Background: Depression is characterised partly by blunted reactions to reward. However, tasks probing this deficiency have not distinguished insensitivity to reward from insensitivity to the prediction errors for reward that determine learning and are putatively reported by the phasic activity of dopamine neurons. We attempted to disentangle these factors with respect to anhedonia in the context of stress, Major Depressive Disorder (MDD), Bipolar Disorder (BPD) and a dopaminergic challenge. Methods: Six behavioural datasets involving 392 experimental sessions were subjected to a model-based, Bayesian meta-analysis. Participants across all six studies performed a probabilistic reward task that used an asymmetric reinforcement schedule to assess reward learning. Healthy controls were tested under baseline conditions, stress or after receiving the dopamine D2 agonist pramipexole. In addition, participants with current or past MDD or BPD were evaluated. Reinforcement learning models isolated the contributions of variation in reward sensitivity and learning rate. Results: MDD and anhedonia reduced reward sensitivity more than they affected the learning rate, while a low dose of the dopamine D2 agonist pramipexole showed the opposite pattern. Stress led to a pattern consistent with a mixed effect on reward sensitivity and learning rate. Conclusion: Reward-related learning reflected at least two partially separable contributions. The first related to phasic prediction error signalling...

Importance Sampling for Reinforcement Learning with Multiple Objectives

Shelton, Christian Robert
Fonte: MIT - Massachusetts Institute of Technology Publicador: MIT - Massachusetts Institute of Technology
Formato: 108 p.; 10551422 bytes; 1268632 bytes; application/postscript; application/pdf
EN_US
Relevância na Pesquisa
66.47%
This thesis considers three complications that arise from applying reinforcement learning to a real-world application. In the process of using reinforcement learning to build an adaptive electronic market-maker, we find the sparsity of data, the partial observability of the domain, and the multiple objectives of the agent to cause serious problems for existing reinforcement learning algorithms. We employ importance sampling (likelihood ratios) to achieve good performance in partially observable Markov decision processes with few data. Our importance sampling estimator requires no knowledge about the environment and places few restrictions on the method of collecting data. It can be used efficiently with reactive controllers, finite-state controllers, or policies with function approximation. We present theoretical analyses of the estimator and incorporate it into a reinforcement learning algorithm. Additionally, this method provides a complete return surface which can be used to balance multiple objectives dynamically. We demonstrate the need for multiple goals in a variety of applications and natural solutions based on our sampling method. The thesis concludes with example results from employing our algorithm to the domain of automated electronic market-making.

A Study of Cooperative Mechanisms for Faster Reinforcement Learning

Whitehead, Steven Douglas (1960 - )
Fonte: University of Rochester. Computer Science Department. Publicador: University of Rochester. Computer Science Department.
Tipo: Relatório
ENG
Relevância na Pesquisa
66.44%
Using pure reinforcement learning to solve a multi-stage decision problem is computationally equivalent to performing a search over the entire state space. When a priori knowledge is not available for guidance, search can be excessive - limiting the applicability of reinforcement learning for real-world tasks. Cooperative mechanisms help reduce search by providing the learner with shorter latency feedback and auxiliary sources of trial-and-error experience. These mechanisms are based on the observation that in nature, intelligent agents exist in a cooperative social environment that helps structure and guide learning. Within this context, learning involves information transfer as much as it does trial-and-error discovery. Two general cooperative mechanisms are described: Learning-with-an-ExternalCritic (or LEC) and Learning- By-Watching (or LBW). Specific algorithms for each are studied empirically in a simple grid-world and shown to improve significantly agent adaptability. Analytical results for both, under various learning conditions, are also provided. These results indicate that while an unbiased search can be expected to require time exponential in the size of the state space, the LEC and LBW algorithms require at most time linear in the size of the state space and under appropriate conditions are independent of the state space size and require time proportional to the length of the optimal solution path. The issue of behavior interpretation is also discussed.

Thesis Proposal: Scaling Reinforcement Learning Systems

Whitehead, Steven Douglas (1960 - )
Fonte: University of Rochester. Computer Science Department. Publicador: University of Rochester. Computer Science Department.
Tipo: Relatório
ENG
Relevância na Pesquisa
66.45%
Thesis proposal.; Reinforcement learning systems are interesting because they meet three major criteria for animate control, namely: competence, responsiveness, and autonomous adaptability. Unfortunately, these systems have not been scaled to complex task domains. For my thesis I propose to study three separate problems that arise when scaling reinforcement learning systems to larger task domains. These are: the propagation problem, the transfer problem, and the attention problem. The propagation problem arises when the number of states in the problem domain is scaled and the distance the system must go for reinforcement is increased. The transfer problem occurs when reinforcement learning systems are applied to problem solving tasks where its desirable to transfer knowledge useful for solving one problem to another. The attention problem arises when a system with a fixed length input vector is applied to a task domains containing an arbitrary number of objects. Each of these problems are discussed along with possible approaches for their solution. A schedule for performing the research is also given.

Reinforcement learning with value advice

Daswani, Mayank; Sunehag, Peter; Hutter, Marcus
Fonte: Journal of Machine Learning Research Publicador: Journal of Machine Learning Research
Tipo: Conference paper
Relevância na Pesquisa
76.37%
The problem we consider in this paper is reinforcement learning with value advice. In this setting, the agent is given limited access to an oracle that can tell it the expected return (value) of any state-action pair with respect to the optimal policy. The agent must use this value to learn an explicit policy that performs well in the environment. We provide an algorithm called RLAdvice, based on the imitation learning algorithm DAgger. We illustrate the effectiveness of this method in the Arcade Learning Environment on three different games, using value estimates from UCT as advice.

Q-learning for history-based reinforcement learning

Daswani, Mayank; Sunehag, Peter; Hutter, Marcus
Fonte: Journal of Machine Learning Research Publicador: Journal of Machine Learning Research
Tipo: Conference paper
Relevância na Pesquisa
76.46%
We extend the Q-learning algorithm from the Markov Decision Process setting to problems where observations are non-Markov and do not reveal the full state of the world i.e. to POMDPs. We do this in a natural manner by adding l0 regularisation to the pathwise squared Q-learning objective function and then optimise this over both a choice of map from history to states and the resulting MDP parameters. The optimisation procedure involves a stochastic search over the map class nested with classical Q-learning of the parameters. This algorithm fits perfectly into the feature reinforcement learning framework, which chooses maps based on a cost criteria. The cost criterion used so far for feature reinforcement learning has been model-based and aimed at predicting future states and rewards. Instead we directly predict the return, which is what is needed for choosing optimal actions. Our Q-learning criteria also lends itself immediately to a function approximation setting where features are chosen based on the history. This algorithm is somewhat similar to the recent line of work on lasso temporal difference learning which aims at finding a small feature set with which one can perform policy evaluation. The distinction is that we aim directly for learning the Q-function of the optimal policy and we use l0 instead of l1 regularisation. We perform an experimental evaluation on classical benchmark domains and find improvement in convergence speed as well as in economy of the state representation. We also compare against MC-AIXI on the large Pocman domain and achieve competitive performance in average reward. We use less than half the CPU time and 36 times less memory. Overall...

Reinforcement learning

Auer, Peter; Hutter, Marcus; Orseau, Laurent
Fonte: Schloss Dagstuhl - Leibniz-Zentrum für Informatik Publicador: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
66.44%
This Dagstuhl Seminar also stood as the 11th European Workshop on Reinforcement Learning (EWRL11). Reinforcement learning gains more and more attention each year, as can be seen at the various conferences (ECML, ICML, IJCAI, ...). EWRL, and in particular this Dagstuhl Seminar, aimed at gathering people interested in reinforcement learning from all around the globe. This unusual format for EWRL helped viewing the field and discussing topics differently.

The sample-complexity of general reinforcement learning

Lattimore, Tor; Hutter, Marcus; Sunehag, Peter
Fonte: Journal of Machine Learning Research Publicador: Journal of Machine Learning Research
Tipo: Conference paper
Relevância na Pesquisa
76.3%
We present a new algorithm for general reinforcement learning where the true environment is known to belong to a finite class of N arbitrary models. The algorithm is shown to be near-optimal for all but O(N log2 N) timesteps with high probability. Infinite classes are also considered where we show that compactness is a key criterion for determining the existence of uniform sample-complexity bounds. A matching lower bound is given for the finite case.

A reinforcement learning algorithm in cooperative multi-robot domains

Fernández, Fernando; Borrajo, Daniel; Parker, Lynne E.
Fonte: Springer Publicador: Springer
Tipo: Artigo de Revista Científica Formato: application/pdf
Publicado em /08/2005 ENG
Relevância na Pesquisa
66.47%
Reinforcement learning has been widely applied to solve a diverse set of learning tasks, from board games to robot behaviours. In some of them, results have been very successful, but some tasks present several characteristics that make the application of reinforcement learning harder to define. One of these areas is multi-robot learning, which has two important problems. The first is credit assignment, or how to define the reinforcement signal to each robot belonging to a cooperative team depending on the results achieved by the whole team. The second one is working with large domains, where the amount of data can be large and different in each moment of a learning step. This paper studies both issues in a multi-robot environment, showing that introducing domain knowledge and machine learning algorithms can be combined to achieve successful cooperative behaviours.; This work has been partially funded by grants This work has been partially funded by grants from Spanish Science and Technology Department number TAP1999-0535-C02-02, and TIC2002-04146-C05-05.

Improving Reinforcement Learning by using Case-Based Heuristics

Bianchi, Reinaldo; Ros, Raquel; Lopez de Mantaras, Ramon
Fonte: Springer Publicador: Springer
Tipo: Artículo Formato: 189739 bytes; application/pdf
ENG
Relevância na Pesquisa
66.53%
The original publication is available at www.springerlink.com; This work presents a new approach that allows the use of cases in a case base as heuristics to speed up Reinforcement Learning algorithms, combining Case Based Reasoning (CBR) and Reinforcement Learning (RL) techniques. This approach, called Case Based Heuristically Accelerated Reinforcement Learning (CB-HARL), builds upon an emerging technique, the Heuristic Accelerated Reinforcement Learning (HARL), in which RL methods are accelerated by making use of heuristic information. CB-HARL is a subset of RL that makes use of a heuristic function derived from a case base, in a Case Based Reasoning manner. An algorithm that incorporates CBR techniques into the Heuristically Accelerated Q–Learning is also proposed. Empirical evaluations were conducted in a simulator for the RoboCup Four-Legged Soccer Competition, and results obtained shows that using CB-HARL, the agents learn faster than using either RL or HARL methods.; This work has been partially funded by the FI grant and the BE grant from the AGAUR, the 2005-SGR-00093 project, supported by the Generalitat de Catalunya, the MID-CBR project grant TIN 2006-15140-C03-01 and FEDER funds. Reinaldo Bianchi is supported by CNPq grant 201591/2007-3 and FAPESP grant 2009/01610-1.; Peer reviewed

Effective reinforcement learning following cerebellar damage requires a balance between exploration and motor noise; Cerebellar damage and reinforcement learning

Therrien, Amanda S.; Wolpert, Daniel M.; Bastian, Amy J.
Fonte: Oxford University Press Publicador: Oxford University Press
Tipo: Article; published version
EN
Relevância na Pesquisa
66.61%
This is the final version of the article. It was first available from Oxford University Press via http://dx.doi.org/10.1093/brain/awv329; Reinforcement and error-based processes are essential for motor learning, with the cerebellum thought to be required only for the error-based mechanism. Here we examined learning and retention of a reaching skill under both processes. Control subjects learned similarly from reinforcement and errorbased feedback, but showed much better retention under reinforcement. To apply reinforcement to cerebellar patients, we developed a closed-loop reinforcement schedule in which task difficulty was controlled based on recent performance. This schedule produced substantial learning in cerebellar patients and controls. Cerebellar patients varied in their learning under reinforcement but fully retained what was learned. In contrast, they showed complete lack of retention in error-based learning. We developed a mechanistic model of the reinforcement task and found that learning depended on a balance between exploration variability and motor noise. While the cerebellar and control groups had similar exploration variability, the patients had greater motor noise and hence learned less. Our results suggest that cerebellar damage indirectly impairs reinforcement learning by increasing motor noise...

Sparse Value Function Approximation for Reinforcement Learning

Painter-Wakefield, Christopher Robert
Fonte: Universidade Duke Publicador: Universidade Duke
Tipo: Dissertação
Publicado em //2013
Relevância na Pesquisa
66.45%

A key component of many reinforcement learning (RL) algorithms is the approximation of the value function. The design and selection of features for approximation in RL is crucial, and an ongoing area of research. One approach to the problem of feature selection is to apply sparsity-inducing techniques in learning the value function approximation; such sparse methods tend to select relevant features and ignore irrelevant features, thus automating the feature selection process. This dissertation describes three contributions in the area of sparse value function approximation for reinforcement learning.

One method for obtaining sparse linear approximations is the inclusion in the objective function of a penalty on the sum of the absolute values of the approximation weights. This L1 regularization approach was first applied to temporal difference learning in the LARS-inspired, batch learning algorithm LARS-TD. In our first contribution, we define an iterative update equation which has as its fixed point the L1 regularized linear fixed point of LARS-TD. The iterative update gives rise naturally to an online stochastic approximation algorithm. We prove convergence of the online algorithm and show that the L1 regularized linear fixed point is an equilibrium fixed point of the algorithm. We demonstrate the ability of the algorithm to converge to the fixed point...