Multipoint linkage analysis is a powerful method for mapping a rare disease gene on the human gene map despite limited genotype and pedigree data. However, there is no standard procedure for determining a confidence interval for gene location by using multipoint linkage analysis. A genetic counselor needs to know the confidence interval for gene location in order to determine the uncertainty of risk estimates provided to a consultant on the basis of DNA studies. We describe a resampling, or "bootstrap," method for deriving an approximate confidence interval for gene location on the basis of data from a single pedigree. This method was used to define an approximate confidence interval for the location of a gene causing nonsyndromal X-linked mental retardation in a single pedigree. The approach seemed robust in that similar confidence intervals were derived by using different resampling protocols. Quantitative bounds for the confidence interval were dependent on the genetic map chosen. Once an approximate confidence interval for gene location was determined for this pedigree, it was possible to use multipoint risk analysis to estimate risk intervals for women of unknown carrier status. Despite the limited genotype data, the combination of the resampling method and multipoint risk analysis had a dramatic impact on the genetic advice available to consultants.
Several research fields frequently deal with the analysis of diverse classification results of the same entities. This should imply an objective detection of overlaps and divergences between the formed clusters. The congruence between classifications can be quantified by clustering agreement measures, including pairwise agreement measures. Several measures have been proposed and the importance of obtaining confidence intervals for the point estimate in the comparison of these measures has been highlighted. A broad range of methods can be used for the estimation of confidence intervals. However, evidence is lacking about what are the appropriate methods for the calculation of confidence intervals for most clustering agreement measures. Here we evaluate the resampling techniques of bootstrap and jackknife for the calculation of the confidence intervals for clustering agreement measures. Contrary to what has been shown for some statistics, simulations showed that the jackknife performs better than the bootstrap at accurately estimating confidence intervals for pairwise agreement measures, especially when the agreement between partitions is low. The coverage of the jackknife confidence interval is robust to changes in cluster number and cluster size distribution.
Many scientific investigations depend on obtaining data-driven, accurate, robust and computationally-tractable parameter estimates. In the face of unavoidable intrinsic variability, there are different algorithmic approaches, prior assumptions and fundamental principles for computing point and interval estimates. Efficient and reliable parameter estimation is critical in making inference about observable experiments, summarizing process characteristics and prediction of experimental behaviors. In this manuscript, we demonstrate simulation, construction, validation and interpretation of confidence intervals, under various assumptions, using the interactive web-based tools provided by the Statistics Online Computational Resource (http://www.SOCR.ucla.edu). Specifically, we present confidence interval examples for population means, with known or unknown population standard deviation; population variance; population proportion (exact and approximate), as well as confidence intervals based on bootstrapping or the asymptotic properties of the maximum likelihood estimates. Like all SOCR resources, these confidence interval resources may be openly accessed via an Internet-connected Java-enabled browser. The SOCR confidence interval applet enables the user to empirically explore and investigate the effects of the confidence-level...
Ecologists often contrast diversity (species richness and abundances) using tests for comparing means or indices. However, many popular software applications do not support performing standard inferential statistics for estimates of species richness and/or density. In this study we simulated the behavior of asymmetric log-normal confidence intervals and determined an interval level that mimics statistical tests with P(α) = 0.05 when confidence intervals from two distributions do not overlap. Our results show that 84% confidence intervals robustly mimic 0.05 statistical tests for asymmetric confidence intervals, as has been demonstrated for symmetric ones in the past. Finally, we provide detailed user-guides for calculating 84% confidence intervals in two of the most robust and highly-used freeware related to diversity measurements for wildlife (i.e., EstimateS, Distance).
It is well known that standard asymptotic theory is not valid or is extremely unreliable in models with identification problems or weak instruments [Dufour (1997, Econometrica), Staiger and Stock (1997, Econometrica), Wang and Zivot (1998, Econometrica), Stock and Wright (2000, Econometrica), Dufour and Jasiak (2001, International Economic Review)]. One possible way out consists here in using a variant of the Anderson-Rubin (1949, Ann. Math. Stat.) procedure. The latter, however, allows one to build exact tests and confidence sets only for the full vector of the coefficients of the endogenous explanatory variables in a structural equation, which in general does not allow for individual coefficients. This problem may in principle be overcome by using projection techniques [Dufour (1997, Econometrica), Dufour and Jasiak (2001, International Economic Review)]. AR-types are emphasized because they are robust to both weak instruments and instrument exclusion. However, these techniques can be implemented only by using costly numerical techniques. In this paper, we provide a complete analytic solution to the problem of building projection-based confidence sets from Anderson-Rubin-type confidence sets. The latter involves the geometric properties of “quadrics” and can be viewed as an extension of usual confidence intervals and ellipsoids. Only least squares techniques are required for building the confidence intervals. We also study by simulation how “conservative” projection-based confidence sets are. Finally...
This research introduces a new nonparametric technique: robust empirical likelihood. Robust empirical likelihood employs the empirical likelihood method to compute robust parameter estimates and confidence intervals. The technique uses constrained optimization to solve a robust version of the empirical likelihood function, thus allowing data analysts to estimate parameters accurately despite any potential contamination.
Empirical likelihood combines the utility of a parametric likelihood with the flexibility of a nonparametric method. Parametric likelihoods are valuable because they have a wide variety of uses; in particular, they are used to construct confidence intervals. Nonparametric methods are flexible because they produce accurate results without requiring knowledge about the data's distribution. Robust empirical likelihood's applications include regression models, hypothesis testing, and all areas that use likelihood methods.
Walley's Imprecise Dirichlet Model (IDM) for categorical data
overcomes several fundamental problems which other approaches to
uncertainty suffer from. Yet, to be useful in practice, one needs
efficient ways for computing the imprecise=robust sets or
intervals. The main objective of this work is to derive exact,
conservative, and approximate, robust and credible interval
estimates under the IDM for a large class of statistical
estimators, including the entropy and mutual information.
The bibliographic profile of 125 undergraduate (licentiate) theses was analyzed, describing absolute quantities of several bibliometric variables, as well as within-document indexes and average lags of the references. The results show a consistent pattern across the years in the 6 cohorts included in the sample (2001-2007), with variations, which fall within the robust confi dence intervals for the global central tendency. The median number of references per document was 52 (99% CI 47-55); the median percentage of journal articles cited was 55%, with a median age for journal references of 9 years. Other highlights of the bibliographic profile were the use of foreign language references (median 61%), and low reliance on open web documents (median 2%). A cluster analysis of the bibliometric indexes resulted in a typology of 2 main profiles, almost evenly distributed, one of them with the makeup of a natural science bibliographic profile and the second within the style of the humanities. In general, the number of references, proportion of papers, and age of the references are close to PhD dissertations and Master theses, setting a rather high standard for undergraduate theses.; Se analizó el perfil bibliográfico de 125 tesis de grado (licenciatura)...
This paper concerns robust inference on average treatment effects following
model selection. In the selection on observables framework, we show how to
construct confidence intervals based on a doubly-robust estimator that are
robust to model selection errors and prove that they are valid uniformly over a
large class of treatment effect models. The class allows for multivalued
treatments with heterogeneous effects (in observables), general
heteroskedasticity, and selection amongst (possibly) more covariates than
observations. Our estimator attains the semiparametric efficiency bound under
appropriate conditions. Precise conditions are given for any model selector to
yield these results, and we show how to combine data-driven selection with
economic theory. For implementation, we give a specific proposal for selection
based on the group lasso, which is particularly well-suited to treatment
effects data, and derive new results for high-dimensional, sparse multinomial
logistic regression. A simulation study shows our estimator performs very well
in finite samples over a wide range of models. Revisiting the National
Supported Work demonstration data, our method yields accurate estimates and
tight confidence intervals.; Comment: 48 pages...
We propose a robust inferential procedure for assessing uncertainties of
parameter estimation in high-dimensional linear models, where the dimension $p$
can grow exponentially fast with the sample size $n$. Our method combines the
de-biasing technique with the composite quantile function to construct an
estimator that is asymptotically normal. Hence it can be used to construct
valid confidence intervals and conduct hypothesis tests. Our estimator is
robust and does not require the existence of first or second moment of the
noise distribution. It also preserves efficiency in the sense that the worst
case efficiency loss is less than 30\% compared to the square-loss-based
de-biased Lasso estimator. In many cases our estimator is close to or better
than the latter, especially when the noise is heavy-tailed. Our de-biasing
procedure does not require solving the $L_1$-penalized composite quantile
regression. Instead, it allows for any first-stage estimator with desired
convergence rate and empirical sparsity. The paper also provides new proof
techniques for developing theoretical guarantees of inferential procedures with
non-smooth loss functions. To establish the main results, we exploit the local
curvature of the conditional expectation of composite quantile loss and apply
empirical process theories to control the difference between empirical
quantities and their conditional expectations. Our results are established
under weaker assumptions compared to existing work on inference for
high-dimensional quantile regression. Furthermore...
Instrumental variables have been widely used to estimate the causal effect of
a treatment on an outcome. Existing confidence intervals for causal effects
based on instrumental variables assume that all of the putative instrumental
variables are valid; a valid instrumental variable is a variable that affects
the outcome only by affecting the treatment and is not related to unmeasured
confounders. However, in practice, some of the putative instrumental variables
are likely to be invalid. This paper presents a simple and general approach to
construct a confidence interval that is robust to possibly invalid instruments.
The robust confidence interval has theoretical guarantees on having the correct
coverage. The paper also shows that the robust confidence interval outperforms
traditional confidence intervals popular in instrumental variables literature
when invalid instruments are present. The new approach is applied to a study of
the causal effect of income on food expenditures.
We consider the problem of constructing robust nonparametric confidence
intervals and tests of hypothesis for the median when the data distribution is
unknown and the data may contain a small fraction of contamination. We propose
a modification of the sign test (and its associated confidence interval) which
attains the nominal significance level (probability coverage) for any
distribution in the contamination neighborhood of a continuous distribution. We
also define some measures of robustness and efficiency under contamination for
confidence intervals and tests. These measures are computed for the proposed
procedures.; Comment: Published at http://dx.doi.org/10.1214/009053604000000634 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org)
We illustrate how recently improved low-redshift cosmological measurements
can tighten constraints on neutrino properties. In particular we examine the
impact of the assumed cosmological model on the constraints. We first consider
the new HST H0 = 74.2 +/- 3.6 measurement by Riess et al. (2009) and the
sigma8*(Omegam/0.25)^0.41 = 0.832 +/- 0.033 constraint from Rozo et al. (2009)
derived from the SDSS maxBCG Cluster Catalog. In a Lambda CDM model and when
combined with WMAP5 constraints, these low-redshift measurements constrain sum
mnu<0.4 eV at the 95% confidence level. This bound does not relax when allowing
for the running of the spectral index or for primordial tensor perturbations.
When adding also Supernovae and BAO constraints, we obtain a 95% upper limit of
sum mnu<0.3 eV. We test the sensitivity of the neutrino mass constraint to the
assumed expansion history by both allowing a dark energy equation of state
parameter w to vary, and by studying a model with coupling between dark energy
and dark matter, which allows for variation in w, Omegak, and dark coupling
strength xi. When combining CMB, H0, and the SDSS LRG halo power spectrum from
Reid et al. 2009, we find that in this very general model, sum mnu < 0.51 eV
with 95% confidence. If we allow the number of relativistic species Nrel to
vary in a Lambda CDM model with sum mnu = 0...
Subsampling and block-based bootstrap methods have been used in a wide range
of inference problems for time series. To accommodate the dependence, these
resampling methods involve a bandwidth parameter, such as subsampling window
width and block size in the block-based bootstrap. In empirical work, using
different bandwidth parameters could lead to different inference results, but
the traditional first order asymptotic theory does not capture the choice of
the bandwidth. In this article, we propose to adopt the fixed-b approach, as
advocated by Kiefer and Vogelsang (2005) in the
heteroscedasticity-autocorrelation robust testing context, to account for the
influence of the bandwidth on the inference. Under the fixed-b asymptotic
framework, we derive the asymptotic null distribution of the p-values for
subsampling and the moving block bootstrap, and further propose a calibration
of the traditional small-b based confidence intervals (regions, bands) and
tests. Our treatment is fairly general as it includes both finite dimensional
parameters and infinite dimensional parameters, such as marginal distribution
function and normalized spectral distribution function. Simulation results show
that the fixed-b approach is more accurate than the traditional small-b
approach in terms of approximating the finite sample distribution...
We provide Buehler-optimal one-sided and some valid two-sided confidence
intervals for the average success probability of a possibly inhomogeneous fixed
length Bernoulli chain, based on the number of observed successes. Contrary to
some claims in the literature, the one-sided Clopper-Pearson intervals for the
homogeneous case are not completely robust here, not even if applied to
hypergeometric estimation problems.; Comment: Revised version for: Probability and Mathematical Statistics. Two
We present a test of different error estimators for 2-point clustering
statistics, appropriate for present and future large galaxy redshift surveys.
Using an ensemble of very large dark matter LambdaCDM N-body simulations, we
compare internal error estimators (jackknife and bootstrap) to external ones
(Monte-Carlo realizations). For 3-dimensional clustering statistics, we find
that none of the internal error methods investigated are able to reproduce
neither accurately nor robustly the errors of external estimators on 1 to 25
Mpc/h scales. The standard bootstrap overestimates the variance of xi(s) by
~40% on all scales probed, but recovers, in a robust fashion, the principal
eigenvectors of the underlying covariance matrix. The jackknife returns the
correct variance on large scales, but significantly overestimates it on smaller
scales. This scale dependence in the jackknife affects the recovered
eigenvectors, which tend to disagree on small scales with the external
estimates. Our results have important implications for the use of galaxy
clustering in placing constraints on cosmological parameters. For example, in a
2-parameter fit to the projected correlation function, we find that the
standard bootstrap systematically overestimates the 95% confidence interval...
We present new estimators of the mean of a real valued random variable, based
on PAC-Bayesian iterative truncation. We analyze the non-asymptotic minimax
properties of the deviations of estimators for distributions having either a
bounded variance or a bounded kurtosis. It turns out that these minimax
deviations are of the same order as the deviations of the empirical mean
estimator of a Gaussian distribution. Nevertheless, the empirical mean itself
performs poorly at high confidence levels for the worst distribution with a
given variance or kurtosis (which turns out to be heavy tailed). To obtain
(nearly) minimax deviations in these broad class of distributions, it is
necessary to use some more robust estimator, and we describe an iterated
truncation scheme whose deviations are close to minimax. In order to calibrate
the truncation and obtain explicit confidence intervals, it is necessary to
dispose of a prior bound either on the variance or the kurtosis. When a prior
bound on the kurtosis is available, we obtain as a by-product a new variance
estimator with good large deviation properties. When no prior bound is
available, it is still possible to use Lepski's approach to adapt to the
unknown variance, although it is no more possible to obtain observable
The colour-magnitude diagrams of resolved single stellar populations, such as
open and globular clusters, have provided the best natural laboratories to test
stellar evolution theory. Whilst a variety of techniques have been used to
infer the basic properties of these simple populations, systematic
uncertainties arise from the purely geometrical degeneracy produced by the
similar shape of isochrones of different ages and metallicities. Here we
present an objective and robust statistical technique which lifts this
degeneracy to a great extent through the use of a key observable: the number of
stars along the isochrone. Through extensive Monte Carlo simulations we show
that, for instance, we can infer the four main parameters (age, metallicity,
distance and reddening) in an objective way, along with robust confidence
intervals and their full covariance matrix. We show that systematic
uncertainties due to field contamination, unresolved binaries, initial or
present-day stellar mass function are either negligible or well under control.
This technique provides, for the first time, a proper way to infer with
unprecedented accuracy the fundamental properties of simple stellar
populations, in an easy-to-implement algorithm.; Comment: 17 pages...
We study distributions of persistent homology barcodes associated to taking
subsamples of a fixed size from metric measure spaces. We show that such
distributions provide robust invariants of metric measure spaces, and
illustrate their use in hypothesis testing and providing confidence intervals
for topological data analysis.
In this paper a robust estimator against outliers along with some other existing interval estimators are considered for estimating the population standard deviation. An extensive simulation study has been conducted to compare and evaluate the performance of the interval estimators. The exact and the proposed robust method are easy to calculate and are not overly computer-intensive. It appears that the proposed robust method is performing better than other confidence intervals for estimating the population standard deviation, specifically in the presence of outliers and/or data are from a skewed distribution. Some real-life examples are considered to illustrate the application of the proposed confidence intervals, which also supported the simulation study to some extent.