Página 1 dos resultados de 15 itens digitais encontrados em 0.007 segundos

Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data

Tan, Yongxi; Shi, Leming; Tong, Weida; Wang, Charles
Fonte: Oxford University Press Publicador: Oxford University Press
Tipo: Artigo de Revista Científica
EN
Relevância na Pesquisa
25.88%
DNA microarray technology provides a promising approach to the diagnosis and prognosis of tumors on a genome-wide scale by monitoring the expression levels of thousands of genes simultaneously. One problem arising from the use of microarray data is the difficulty to analyze the high-dimensional gene expression data, typically with thousands of variables (genes) and much fewer observations (samples), in which severe collinearity is often observed. This makes it difficult to apply directly the classical statistical methods to investigate microarray data. In this paper, total principal component regression (TPCR) was proposed to classify human tumors by extracting the latent variable structure underlying microarray data from the augmented subspace of both independent variables and dependent variables. One of the salient features of our method is that it takes into account not only the latent variable structure but also the errors in the microarray gene expression profiles (independent variables). The prediction performance of TPCR was evaluated by both leave-one-out and leave-half-out cross-validation using four well-known microarray datasets. The stabilities and reliabilities of the classification models were further assessed by re-randomization and permutation studies. A fast kernel algorithm was applied to decrease the computation time dramatically. (MATLAB source code is available upon request.)

An Intelligent Architecture Based on Field Programmable Gate Arrays Designed to Detect Moving Objects by Using Principal Component Analysis

Bravo, Ignacio; Mazo, Manuel; Lázaro, José L.; Gardel, Alfredo; Jiménez, Pedro; Pizarro, Daniel
Fonte: Molecular Diversity Preservation International (MDPI) Publicador: Molecular Diversity Preservation International (MDPI)
Tipo: Artigo de Revista Científica
Publicado em 15/10/2010 EN
Relevância na Pesquisa
26.01%
This paper presents a complete implementation of the Principal Component Analysis (PCA) algorithm in Field Programmable Gate Array (FPGA) devices applied to high rate background segmentation of images. The classical sequential execution of different parts of the PCA algorithm has been parallelized. This parallelization has led to the specific development and implementation in hardware of the different stages of PCA, such as computation of the correlation matrix, matrix diagonalization using the Jacobi method and subspace projections of images. On the application side, the paper presents a motion detection algorithm, also entirely implemented on the FPGA, and based on the developed PCA core. This consists of dynamically thresholding the differences between the input image and the one obtained by expressing the input image using the PCA linear subspace previously obtained as a background model. The proposal achieves a high ratio of processed images (up to 120 frames per second) and high quality segmentation results, with a completely embedded and reliable hardware architecture based on commercial CMOS sensors and FPGA devices.

Computing matrix symmetrizers. Part 2: new methods using eigendata and linear means; a comparison

Martínez Dopico, Froilán C.; Uhlig, Frank
Fonte: Elsevier Publicador: Elsevier
Tipo: info:eu-repo/semantics/acceptedVersion; info:eu-repo/semantics/article
Publicado em 10/07/2015 ENG
Relevância na Pesquisa
46.02%
Over any field F every square matrix A can be factored into the product of two symmetric matrices as A = S1 . S2 with S_i = S_i^T ∈ F^(n,n) and either factor can be chosen nonsingular, as was discovered by Frobenius in 1910. Frobenius’ symmetric matrix factorization has been lying almost dormant for a century. The first successful method for computing matrix symmetrizers, i.e., symmetric matrices S such that SA is symmetric, was inspired by an iterative linear systems algorithm of Huang and Nong (2010) in 2013 [29, 30]. The resulting iterative algorithm has solved this computational problem over R and C, but at high computational cost. This paper develops and tests another linear equations solver, as well as eigen- and principal vector or Schur Normal Form based algorithms for solving the matrix symmetrizer problem numerically. Four new eigendata based algorithms use, respectively, SVD based principal vector chain constructions, Gram-Schmidt orthogonalization techniques, the Arnoldi method, or the Schur Normal Form of A in their formulations. They are helped by Datta’s 1973 method that symmetrizes unreduced Hessenberg matrices directly. The eigendata based methods work well and quickly for generic matrices A and create well conditioned matrix symmetrizers through eigenvector dyad accumulation. But all of the eigen based methods have differing deficiencies with matrices A that have ill-conditioned or complicated eigen structures with nontrivial Jordan normal forms. Our symmetrizer studies for matrices with ill-conditioned eigensystems lead to two open problems of matrix optimization.; This research was partially supported by the Ministerio de Economía y Competitividad of Spain through the research grant MTM2012-32542.

Robust Orthogonal Complement Principal Component Analysis

She, Yiyuan; Li, Shijie; Wu, Dapeng
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
45.94%
Recently, the robustification of principal component analysis has attracted lots of attention from statisticians, engineers and computer scientists. This work focuses on the type of outliers that are not necessarily apparent in the original observation space but could affect the principal subspace estimation. Based on a mathematical formulation of such transformed outliers, a novel robust orthogonal complement principal component analysis (ROC-PCA) is proposed. The framework combines the popular sparsity-enforcing and low rank regularization techniques to deal with row-wise outliers as well as element-wise outliers. A non-asymptotic oracle inequality guarantees the performance of ROC-PCA in finite samples. To tackle the computational challenges, an efficient algorithm is developed on the basis of Stiefel manifold optimization and iterative thresholding. Furthermore, a batch variant is proposed to significantly reduce the cost in ultra high dimensions. The paper also points out a pitfall of a common practice of SVD reduction in robust PCA. Experiments show the effectiveness and efficiency of ROC-PCA in simulation studies and real data analysis.

Optimization theory of Hebbian/anti-Hebbian networks for PCA and whitening

Pehlevan, Cengiz; Chklovskii, Dmitri B.
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 30/11/2015
Relevância na Pesquisa
25.88%
In analyzing information streamed by sensory organs, our brains face challenges similar to those solved in statistical signal processing. This suggests that biologically plausible implementations of online signal processing algorithms may model neural computation. Here, we focus on such workhorses of signal processing as Principal Component Analysis (PCA) and whitening which maximize information transmission in the presence of noise. We adopt the similarity matching framework, recently developed for principal subspace extraction, but modify the existing objective functions by adding a decorrelating term. From the modified objective functions, we derive online PCA and whitening algorithms which are implementable by neural networks with local learning rules, i.e. synaptic weight updates that depend on the activity of only pre- and postsynaptic neurons. Our theory offers a principled model of neural computations and makes testable predictions such as the dropout of underutilized neurons.; Comment: Annual Allerton Conference on Communication, Control, and Computing (Allerton) 2015

Relations among Some Low Rank Subspace Recovery Models

Zhang, Hongyang; Lin, Zhouchen; Zhang, Chao; Gao, Junbin
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 05/12/2014
Relevância na Pesquisa
26.1%
Recovering intrinsic low dimensional subspaces from data distributed on them is a key preprocessing step to many applications. In recent years, there has been a lot of work that models subspace recovery as low rank minimization problems. We find that some representative models, such as Robust Principal Component Analysis (R-PCA), Robust Low Rank Representation (R-LRR), and Robust Latent Low Rank Representation (R-LatLRR), are actually deeply connected. More specifically, we discover that once a solution to one of the models is obtained, we can obtain the solutions to other models in closed-form formulations. Since R-PCA is the simplest, our discovery makes it the center of low rank subspace recovery models. Our work has two important implications. First, R-PCA has a solid theoretical foundation. Under certain conditions, we could find better solutions to these low rank models at overwhelming probabilities, although these models are non-convex. Second, we can obtain significantly faster algorithms for these models by solving R-PCA first. The computation cost can be further cut by applying low complexity randomized algorithms, e.g., our novel $\ell_{2,1}$ filtering algorithm, to R-PCA. Experiments verify the advantages of our algorithms over other state-of-the-art ones that are based on the alternating direction method.; Comment: Submitted to Neural Computation

Nonparametric Partial Importance Sampling for Financial Derivative Pricing

Neddermeyer, Jan C.
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
25.9%
Importance sampling is a promising variance reduction technique for Monte Carlo simulation based derivative pricing. Existing importance sampling methods are based on a parametric choice of the proposal. This article proposes an algorithm that estimates the optimal proposal nonparametrically using a multivariate frequency polygon estimator. In contrast to parametric methods, nonparametric estimation allows for close approximation of the optimal proposal. Standard nonparametric importance sampling is inefficient for high-dimensional problems. We solve this issue by applying the procedure to a low-dimensional subspace, which is identified through principal component analysis and the concept of the effective dimension. The mean square error properties of the algorithm are investigated and its asymptotic optimality is shown. Quasi-Monte Carlo is used for further improvement of the method. It is easy to implement, particularly it does not require any analytical computation, and it is computationally very efficient. We demonstrate through path-dependent and multi-asset option pricing problems that the algorithm leads to significant efficiency gains compared to other algorithms in the literature.; Comment: 26 pages, 4 figures

Some Options for L1-Subspace Signal Processing

Markopoulos, Panos P.; Karystinos, George N.; Pados, Dimitris A.
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 04/09/2013
Relevância na Pesquisa
26%
We describe ways to define and calculate $L_1$-norm signal subspaces which are less sensitive to outlying data than $L_2$-calculated subspaces. We focus on the computation of the $L_1$ maximum-projection principal component of a data matrix containing N signal samples of dimension D and conclude that the general problem is formally NP-hard in asymptotically large N, D. We prove, however, that the case of engineering interest of fixed dimension D and asymptotically large sample support N is not and we present an optimal algorithm of complexity $O(N^D)$. We generalize to multiple $L_1$-max-projection components and present an explicit optimal $L_1$ subspace calculation algorithm in the form of matrix nuclear-norm evaluations. We conclude with illustrations of $L_1$-subspace signal processing in the fields of data dimensionality reduction and direction-of-arrival estimation.; Comment: In Proceedings Tenth Intern. Symposium on Wireless Communication Systems (ISWCS '13), Ilmenau, Germany, Aug. 27-30, 2013 (The 2013 ISWCS Best Paper Award in Physical Layer Comm. and Signal Processing); 5 pages; 3 figures

Optimal Algorithms for $L_1$-subspace Signal Processing

Markopoulos, Panos P.; Karystinos, George N.; Pados, Dimitris A.
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 27/05/2014
Relevância na Pesquisa
26.12%
We describe ways to define and calculate $L_1$-norm signal subspaces which are less sensitive to outlying data than $L_2$-calculated subspaces. We start with the computation of the $L_1$ maximum-projection principal component of a data matrix containing $N$ signal samples of dimension $D$. We show that while the general problem is formally NP-hard in asymptotically large $N$, $D$, the case of engineering interest of fixed dimension $D$ and asymptotically large sample size $N$ is not. In particular, for the case where the sample size is less than the fixed dimension ($N

Fast, Exact Bootstrap Principal Component Analysis for p>1 million

Fisher, Aaron; Caffo, Brian; Schwartz, Brian; Zipunnikov, Vadim
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
36.17%
Many have suggested a bootstrap procedure for estimating the sampling variability of principal component analysis (PCA) results. However, when the number of measurements per subject ($p$) is much larger than the number of subjects ($n$), the challenge of calculating and storing the leading principal components from each bootstrap sample can be computationally infeasible. To address this, we outline methods for fast, exact calculation of bootstrap principal components, eigenvalues, and scores. Our methods leverage the fact that all bootstrap samples occupy the same $n$-dimensional subspace as the original sample. As a result, all bootstrap principal components are limited to the same $n$-dimensional subspace and can be efficiently represented by their low dimensional coordinates in that subspace. Several uncertainty metrics can be computed solely based on the bootstrap distribution of these low dimensional coordinates, without calculating or storing the $p$-dimensional bootstrap components. Fast bootstrap PCA is applied to a dataset of sleep electroencephalogram (EEG) recordings ($p=900$, $n=392$), and to a dataset of brain magnetic resonance images (MRIs) ($p\approx$ 3 million, $n=352$). For the brain MRI dataset, our method allows for standard errors for the first 3 principal components based on 1000 bootstrap samples to be calculated on a standard laptop in 47 minutes...

Monotonicity of quantum relative entropy and recoverability

Berta, Mario; Lemm, Marius; Wilde, Mark M.
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
15.8%
The relative entropy is a principal measure of distinguishability in quantum information theory, with its most important property being that it is non-increasing with respect to noisy quantum operations. Here, we establish a remainder term for this inequality that quantifies how well one can recover from a loss of information by employing a rotated Petz recovery map. The main approach for proving this refinement is to combine the methods of [Fawzi and Renner, arXiv:1410.0664] with the notion of a relative typical subspace from [Bjelakovic and Siegmund-Schultze, arXiv:quant-ph/0307170]. Our paper constitutes partial progress towards a remainder term which features just the Petz recovery map (not a rotated Petz map), a conjecture which would have many consequences in quantum information theory. A well known result states that the monotonicity of relative entropy with respect to quantum operations is equivalent to each of the following inequalities: strong subadditivity of entropy, concavity of conditional entropy, joint convexity of relative entropy, and monotonicity of relative entropy with respect to partial trace. We show that this equivalence holds true for refinements of all these inequalities in terms of the Petz recovery map. So either all of these refinements are true or all are false.; Comment: v3: 22 pages...

Distributed Kernel Principal Component Analysis

Balcan, Maria-Florina; Liang, Yingyu; Song, Le; Woodruff, David; Xie, Bo
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
25.94%
Kernel Principal Component Analysis (KPCA) is a key technique in machine learning for extracting the nonlinear structure of data and pre-processing it for downstream learning algorithms. We study the distributed setting in which there are multiple workers, each holding a set of points, who wish to compute the principal components of the union of their pointsets. Our main result is a communication efficient algorithm that takes as input arbitrary data points and computes a set of global principal components, that give relative-error approximation for polynomial kernels, or give relative-error approximation with an arbitrarily small additive error for a wide family of kernels including Gaussian kernels. While recent work shows how to do PCA in a distributed setting, the kernel setting is significantly more challenging. Although the "kernel trick" is useful for efficient computation, it is unclear how to use it to reduce communication. The main problem with previous work is that it achieves communication proportional to the dimension of the data points, which would be proportional to the dimension of the feature space, or to the number of examples, both of which could be very large. We instead first select a small subset of points whose span contains a good approximation (the column subset selection problem...

A Hebbian/Anti-Hebbian Neural Network for Linear Subspace Learning: A Derivation from Multidimensional Scaling of Streaming Data

Pehlevan, Cengiz; Hu, Tao; Chklovskii, Dmitri B.
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 02/03/2015
Relevância na Pesquisa
36.31%
Neural network models of early sensory processing typically reduce the dimensionality of streaming input data. Such networks learn the principal subspace, in the sense of principal component analysis (PCA), by adjusting synaptic weights according to activity-dependent learning rules. When derived from a principled cost function these rules are nonlocal and hence biologically implausible. At the same time, biologically plausible local rules have been postulated rather than derived from a principled cost function. Here, to bridge this gap, we derive a biologically plausible network for subspace learning on streaming data by minimizing a principled cost function. In a departure from previous work, where cost was quantified by the representation, or reconstruction, error, we adopt a multidimensional scaling (MDS) cost function for streaming data. The resulting algorithm relies only on biologically plausible Hebbian and anti-Hebbian local learning rules. In a stochastic setting, synaptic weights converge to a stationary state which projects the input data onto the principal subspace. If the data are generated by a nonstationary distribution, the network can track the principal subspace. Thus, our result makes a step towards an algorithmic theory of neural computation.; Comment: Accepted for publication in Neural Computation

Probabilistic Approach to Neural Networks Computation Based on Quantum Probability Model Probabilistic Principal Subspace Analysis Example

Jankovic, Marko V.
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 24/01/2010
Relevância na Pesquisa
25.8%
In this paper, we introduce elements of probabilistic model that is suitable for modeling of learning algorithms in biologically plausible artificial neural networks framework. Model is based on two of the main concepts in quantum physics - a density matrix and the Born rule. As an example, we will show that proposed probabilistic interpretation is suitable for modeling of on-line learning algorithms for PSA, which are preferably realized by a parallel hardware based on very simple computational units. Proposed concept (model) can be used in the context of improving algorithm convergence speed, learning factor choice, or input signal scale robustness. We are going to see how the Born rule and the Hebbian learning rule are connected

Riemannian Geometry of Grassmann Manifolds with a View on Algorithmic Computation

Absil, P-A; Mahony, Robert; Sepulchre, R
Fonte: Kluwer Academic Publishers Publicador: Kluwer Academic Publishers
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
25.93%
We give simple formulas for the canonical metric, gradient, Lie derivative, Riemannian connection, parallel translation, geodesics and distance on the Grassmann manifold of p-planes in ℝn. In these formulas, p-planes are represented as the column space