## Alternatives to the Cox model in multi-state models

The introduction of time-dependent covariates in the survival process can make the patients survival change from one time point to the next as the values of the covariate change. A popular choice for the analysis of this data is the timedependent Cox regression model. In the present work we present multi-state models as an alternative for the analysis of such data.

## Clinical outcome of narrow diameter implants inserted into allografts

Franco,Maurizio; Viscioni,Alessandro; Rigo,Leone; Guidi,Riccardo; Zollino,Ilaria; Avantaggiato,Anna; Carinci,Francesco
OBJECTIVE: Narrow diameter implants (NDI) (i.e. diameter <3.75 mm) are a potential solution for specific clinical situations, such as reduced interradicular bone, thin alveolar crest and replacement of teeth with small cervical diameter. NDI have been available in clinical practice since the 1990s, but only few studies have analyzed their clinical outcome and no study have investigated NDI inserted in fresh-frozen bone (FFB) grafts. Thus, a retrospective study on a series of NDI placed in homologue FFB was designed to evaluate their clinical outcome. MATERIAL AND METHODS: In the period between December 2003 and December 2006, 36 patients (22 females and 14 males, mean age 53 years) with FFB grafts were selected and 94 different NDI were inserted. The mean follow-up was 25 months. To evaluate the effect of several host-, surgery-, and implant-related factors, marginal bone loss (MBL) was considered an indicator of success rate (SCR). The Kaplan Meier algorithm and Cox regression were used. RESULTS: Only 5 out of 94 implants were lost (i.e. survival rate - SVR 95.7%) and no differences were detected among the studied variables. On the contrary, the Cox regression showed that the graft site (i.e. maxilla) reduced MBL. CONCLUSIONS: NDI inserted in FFB have a high SVR and SCR similar to those reported in previous studies on regular and NDI inserted in non-grafted jaws. Homologue FFB is a valuable material in the insertion of NDI.

## Sinusoidal Cox Regression—A Rare Cancer Example

Efird, Jimmy Thomas
Evidence of an association between survival time and date of birth would suggest an etiologic role for a seasonally variable environmental exposure occurring within a narrow perinatal time period. Risk factors that may exhibit seasonal epidemicity include diet, infectious agents, allergens, and antihistamine use. Typically data has been analyzed by simply categorizing births into months or seasons of the year and performing multiple pairwise comparisons. This paper presents a statistically robust alternative, based upon a trigonometric Cox regression model, to analyze the cyclic nature of birth dates related to patient survival. Disease birth-date results are presented using a sinusoidal plot with peak date(s) of relative risk and a single P value that indicates whether an overall statistically significant seasonal association is present. Advantages of this derivative-free method include ease of use, increased power to detect statistically significant associations, and the ability to avoid arbitrary, subjective demarcation of seasons.

## Cancer prognosis using support vector regression in imaging modality

Du, Xian; Dua, Sumeet
The proposed techniques investigate the strength of support vector regression (SVR) in cancer prognosis using imaging features. Cancer image features were extracted from patients and recorded into censored data. To employ censored data for prognosis, SVR methods are needed to be adapted to uncertain targets. The effectiveness of two principle breast features, tumor size and lymph node status, was demonstrated by the combination of sampling and feature selection methods. In sampling, breast data were stratified according to tumor size and lymph node status. Three types of feature selection methods comprised of no selection, individual feature selection, and feature subset forward selection, were employed. The prognosis results were evaluated by comparative study using the following performance metrics: concordance index (CI) and Brier score (BS). Cox regression was employed to compare the results. The support vector regression method (SVCR) performs similarly to Cox regression in three feature selection methods and better than Cox regression in non-feature selection methods measured by CI and BS. Feature selection methods can improve the performance of Cox regression measured by CI. Among all cross validation results, stratified sampling of tumor size achieves the best regression results for both feature selection and non-feature selection methods. The SVCR regression results...

## Variable Selection in the Cox Regression Model with Covariates Missing at Random

Garcia, Ramon I.; Ibrahim, Joseph G.; Zhu, Hongtu
We consider variable selection in the Cox regression model (Cox, 1975, Biometrika 362, 269–276) with covariates missing at random. We investigate the smoothly clipped absolute deviation penalty and adaptive least absolute shrinkage and selection operator (LASSO) penalty, and propose a unified model selection and estimation procedure. A computationally attractive algorithm is developed, which simultaneously optimizes the penalized likelihood function and penalty parameters. We also optimize a model selection criterion, called the ICQ statistic (Ibrahim, Zhu, and Tang, 2008, Journal of the American Statistical Association 103, 1648–1658), to estimate the penalty parameters and show that it consistently selects all important covariates. Simulations are performed to evaluate the finite sample performance of the penalty estimates. Also, two lung cancer data sets are analyzed to demonstrate the proposed methodology.

## Misspecification of Cox regression models with composite endpoints

Wu, Longyang; Cook, Richard J
Researchers routinely adopt composite endpoints in multicenter randomized trials designed to evaluate the effect of experimental interventions in cardiovascular disease, diabetes, and cancer. Despite their widespread use, relatively little attention has been paid to the statistical properties of estimators of treatment effect based on composite endpoints. We consider this here in the context of multivariate models for time to event data in which copula functions link marginal distributions with a proportional hazards structure. We then examine the asymptotic and empirical properties of the estimator of treatment effect arising from a Cox regression model for the time to the first event. We point out that even when the treatment effect is the same for the component events, the limiting value of the estimator based on the composite endpoint is usually inconsistent for this common value. We find that in this context the limiting value is determined by the degree of association between the events, the stochastic ordering of events, and the censoring distribution. Within the framework adopted, marginal methods for the analysis of multivariate failure time data yield consistent estimators of treatment effect and are therefore preferred. We illustrate the methods by application to a recent asthma study. Copyright © 2012 John Wiley & Sons...

## Pseudo-partial likelihood estimators for the Cox regression model with missing covariates

Luo, Xiaodong; Tsai, Wei Yann; Xu, Qiang
By embedding the missing covariate data into a left-truncated and right-censored survival model, we propose a new class of weighted estimating functions for the Cox regression model with missing covariates. The resulting estimators, called the pseudo-partial likelihood estimators, are shown to be consistent and asymptotically normal. A simulation study demonstrates that, compared with the popular inverse-probability weighted estimators, the new estimators perform better when the observation probability is small and improve efficiency of estimating the missing covariate effects. Application to a practical example is reported.

## Non-Asymptotic Oracle Inequalities for the High-Dimensional Cox Regression via Lasso

Kong, Shengchun; Nan, Bin
We consider finite sample properties of the regularized high-dimensional Cox regression via lasso. Existing literature focuses on linear models or generalized linear models with Lipschitz loss functions, where the empirical risk functions are the summations of independent and identically distributed (iid) losses. The summands in the negative log partial likelihood function for censored survival data, however, are neither iid nor Lipschitz.We first approximate the negative log partial likelihood function by a sum of iid non-Lipschitz terms, then derive the non-asymptotic oracle inequalities for the lasso penalized Cox regression using pointwise arguments to tackle the difficulties caused by lacking iid Lipschitz losses.

## Comparison between artificial neural network and Cox regression model in predicting the survival rate of gastric cancer patients

ZHU, LUCHENG; LUO, WENHUA; SU, MENG; WEI, HANGPING; WEI, JUAN; ZHANG, XUEBANG; ZOU, CHANGLIN
The aim of this study was to determine the prognostic factors and their significance in gastric cancer (GC) patients, using the artificial neural network (ANN) and Cox regression hazard (CPH) models. A retrospective analysis was undertaken, including 289 patients with GC who had undergone gastrectomy between 2006 and 2007. According to the CPH analysis, disease stage, peritoneal dissemination, radical surgery and body mass index (BMI) were selected as the significant variables. According to the ANN model, disease stage, radical surgery, serum CA19-9 levels, peritoneal dissemination and BMI were selected as the significant variables. The true prediction of the ANN was 85.3% and of the CPH model 81.9%. In conclusion, the present study demonstrated that the ANN model is a more powerful tool in determining the significant prognostic variables for GC patients, compared to the CPH model. Therefore, this model is recommended for determining the risk factors of such patients.

## Reweighting estimators for Cox regression with missing covariate data: Analysis of insulin resistance and risk of stroke in the Northern Manhattan Study

Xu, Qiang; Paik, Myunghee Cho; Rundek, Tatjana; Elkind, Mitchell S. V.; Sacco, Ralph L.
Incomplete covariates often obscure analysis results from a Cox regression. In an analysis of the Northern Manhattan Study (NOMAS) to determine the influence of insulin resistance on the incidence of stroke in non-diabetic individuals, insulin level is unknown for 34.1% of the subjects. The available data suggest that the missingness mechanism depends on outcome variables, which may generate biases in estimating the parameters of interest if only using the complete observations. This article aimed to introduce practical strategies to analyze the NOMAS data and present sensitivity analyses by using the reweighting method in standard statistical packages. When the data set structure is in counting process style, the reweighting estimates can be obtained by built-in procedures with variance estimated by the jackknife method. Simulation results indicate that the jackknife variance estimate provides reasonable coverage probability in moderate sample sizes. We subsequently conducted sensitivity analyses for the NOMAS data, showing that the risk estimates are robust to a variety of missingness mechanisms. At the end of this article, we present the core SAS and R programs used in the analysis.

## Factors Associated with Methadone Treatment Duration: A Cox Regression Analysis

Lin, Chao-Kuang; Hung, Chia-Chun; Peng, Ching-Yi; Chao, En; Lee, Tony Szu-Hsien
This study examined retention rates and associated predictors of methadone maintenance treatment (MMT) duration among 128 newly admitted patients in Taiwan. A semi-structured questionnaire was used to obtain demographic and drug use history. Daily records of methadone taken and test results for HIV, HCV, and morphine toxicology were taken from a computerized medical registry. Cox regression analyses were performed to examine factors associated with MMT duration. MMT retention rates were 80.5%, 68.8%, 53.9%, and 41.4% for 3, 6, 12, and 18 months, respectively. Excluding 38 patients incarcerated during the study period, retention rates were 81.1%, 73.3%, 61.1%, and 48.9% for 3 months, 6 months, 12 months, and 18 months, respectively. No participant seroconverted to HIV and 1 died during the 18-months follow-up. Results showed that being female, imprisonment, a longer distance from house to clinic, having a lower methadone dose after 30 days, being HCV positive, and in the New Taipei city program predicted early patient dropout. The findings suggest favorable MMT outcomes of HIV seroincidence and mortality. Results indicate that the need to minimize travel distance and to provide programs that meet women’s requirements justify expansion of MMT clinics in Taiwan.

## NETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA

Sun, Hokeun; Lin, Wei; Feng, Rui; Li, Hongzhe
We consider estimation and variable selection in high-dimensional Cox regression when a prior knowledge of the relationships among the covariates, described by a network or graph, is available. A limitation of the existing methodology for survival analysis with high-dimensional genomic data is that a wealth of structural information about many biological processes, such as regulatory networks and pathways, has often been ignored. In order to incorporate such prior network information into the analysis of genomic data, we propose a network-based regularization method for high-dimensional Cox regression; it uses an ℓ1-penalty to induce sparsity of the regression coefficients and a quadratic Laplacian penalty to encourage smoothness between the coefficients of neighboring variables on a given network. The proposed method is implemented by an efficient coordinate descent algorithm. In the setting where the dimensionality p can grow exponentially fast with the sample size n, we establish model selection consistency and estimation bounds for the proposed estimators. The theoretical results provide insights into the gain from taking into account the network structural information. Extensive simulation studies indicate that our method outperforms Lasso and elastic net in terms of variable selection accuracy and stability. We apply our method to a breast cancer gene expression study and identify several biologically plausible subnetworks and pathways that are associated with breast cancer distant metastasis.

## Time-Dependent Propensity Score for Assessing the Effect of Vaccine Exposure on Pregnancy Outcomes through Pregnancy Exposure Cohort Studies

Xu, Ronghui; Luo, Yunjun; Glynn, Robert; Johnson, Diana; Jones, Kenneth L.; Chambers, Christina
Women are advised to be vaccinated for influenza during pregnancy and may receive vaccine at any time during their pregnancy. In observational studies evaluating vaccine safety in pregnancy, to account for such time-varying vaccine exposure, a time-dependent predictor can be used in a proportional hazards model setting for outcomes such as spontaneous abortion or preterm delivery. Also, due to the observational nature of pregnancy exposure cohort studies and relatively low event rates, propensity score (PS) methods are often used to adjust for potential confounders. Using Monte Carlo simulation experiments, we compare two different ways to model the PS for vaccine exposure: (1) logistic regression treating the exposure status as binary yes or no; (2) Cox regression treating time to exposure as time-to-event. Coverage probability of the nominal 95% confidence interval for the exposure effect is used as the main measure of performance. The performance of the logistic regression PS depends largely on how the exposure data is generated. In contrast, the Cox regression PS consistently performs well across the different data generating mechanisms that we have considered. In addition, the Cox regression PS allows adjusting for potential time-varying confounders such as season of the year or exposure to additional vaccines. The application of the Cox regression PS is illustrated using data from a recent study of the safety of pandemic H1N1 influenza vaccine during pregnancy.

## Study of survival time in pulp export; Estudo do tempo de sobrevivência na exportação de celulose

This study analyzed the time for a country to survive exporting pulp, using a Cox regression model. Covariates being used included data about population, Gross Domestic Product, total exports of forest products as an aggregate, pulp production and balance of trade for pulp, economic markets and blocks, and geographic regions. To select and check the most significant covariates, a proposal formulated by Collet (1994) was used. It was concluded that survival analysis via the Cox regression model proved to be a powerful tool for predicting the survival of a country exporting pulp; around 80% of countries that have pulp in their list of exports continue to export the commodity; out of the fifteen covariates selected for fitting the Cox model, four explain the model and two were found significant in explaining the survival of a country exporting pulp; international trade agreements were more significant in the Cox regression model than classes of macroeconomic forest indicators and geographic location; covariates explaining the odds of a country exporting pulp to survive, according to the hazard ratio, were, in descending order, integration between ECLAC and European Union, be a member of the European Union (V07) and be a member of ECLAC (V6); Brazil has 3.5 times as much chance of survival exporting pulp through an integration between ECLAC and the European Union than a country that is not a part of such integration; the probability that Brazil will survive exporting pulp is greater than the probability that Asian countries will.

## Modelling survival in acute severe illness: Cox versus accelerated failure time models

Moran, J.; Bersten, A.; Solomon, P.; Edibam, C.; Hunt, T.
BACKGROUND: The Cox model has been the mainstay of survival analysis in the critically ill and time-dependent covariates have infrequently been incorporated into survival analysis. OBJECTIVES: To model 28-day survival of patients with acute lung injury (ALI) and acute respiratory distress syndrome (ARDS), and compare the utility of Cox and accelerated failure time (AFT) models. METHODS: Prospective cohort study of 168 adult patients enrolled at diagnosis of ALI in 21 adult ICUs in three Australian States with measurement of survival time, censored at 28 days. Model performance was assessed as goodness-of-fit [GOF, cross-products of quantiles of risk and time intervals (P > or = 0.1), Cox model] and explained variation ('R2', Cox and ATF). RESULTS: Over a 2-month study period (October-November 1999), 168 patients with ALI were identified, with a mean (SD) age of 61.5 (18) years and 30% female. Peak mortality hazard occurred at days 7-8 after onset of ALI/ARDS. In the Cox model, increasing age and female gender, plus interaction, were associated with an increased mortality hazard. Time-varying effects were established for patient severity-of-illness score (decreasing hazard over time) and multiple-organ-dysfunction score (increasing hazard over time). The Cox model was well specified (GOF...

## Accommodating Measurements Below a Limit of Detection: A Novel Application of Cox Regression

Dinse, Gregg E.; Jusko, Todd A.; Ho, Lindsey A.; Annam, Kaushik; Graubard, Barry I.; Hertz-Picciotto, Irva; Miller, Frederick W.; Gillespie, Brenda W.; Weinberg, Clarice R.
In environmental epidemiology, measurements of exposure biomarkers often fall below the assay's limit of detection. Existing methods for handling this problem, including deletion, substitution, parametric regression, and multiple imputation, can perform poorly if the proportion of “nondetects” is high or parametric models are misspecified. We propose an approach that treats the measured analyte as the modeled outcome, implying a role reversal when the analyte is a putative cause of a health outcome. Following a scale reversal as well, our approach uses Cox regression to model the analyte, with confounder adjustment. The method makes full use of quantifiable analyte measures, while appropriately treating nondetects as censored. Under the proportional hazards assumption, the hazard ratio for a binary health outcome is interpretable as an adjusted odds ratio: the odds for the outcome at any particular analyte concentration divided by the odds given a lower concentration. Our approach is broadly applicable to cohort studies, case-control studies (frequency matched or not), and cross-sectional studies conducted to identify determinants of exposure. We illustrate the method with cross-sectional survey data to assess sex as a determinant of 2...

## Factors Determining Disease Duration in Alzheimer's Disease: A Postmortem Study of 103 Cases Using the Kaplan-Meier Estimator and Cox Regression

Armstrong, R. A.
Factors associated with duration of dementia in a consecutive series of 103 Alzheimer's disease (AD) cases were studied using the Kaplan-Meier estimator and Cox regression analysis (proportional hazard model). Mean disease duration was 7.1 years (range: 6 weeks–30 years, standard deviation = 5.18); 25% of cases died within four years, 50% within 6.9 years, and 75% within 10 years. Familial AD cases (FAD) had a longer duration than sporadic cases (SAD), especially cases linked to presenilin (PSEN) genes. No significant differences in duration were associated with age, sex, or apolipoprotein E (Apo E) genotype. Duration was reduced in cases with arterial hypertension. Cox regression analysis suggested longer duration was associated with an earlier disease onset and increased senile plaque (SP) and neurofibrillary tangle (NFT) pathology in the orbital gyrus (OrG), CA1 sector of the hippocampus, and nucleus basalis of Meynert (NBM). The data suggest shorter disease duration in SAD and in cases with hypertensive comorbidity. In addition, degree of neuropathology did not influence survival, but spread of SP/NFT pathology into the frontal lobe, hippocampus, and basal forebrain was associated with longer disease duration.

## Beyond first-order asymptotics for Cox regression

Pierce, Donald A.; Bellio, Ruggero
To go beyond standard first-order asymptotics for Cox regression, we develop parametric bootstrap and second-order methods. In general, computation of $P$-values beyond first order requires more model specification than is required for the likelihood function. It is problematic to specify a censoring mechanism to be taken very seriously in detail, and it appears that conditioning on censoring is not a viable alternative to that. We circumvent this matter by employing a reference censoring model, matching the extent and timing of observed censoring. Our primary proposal is a parametric bootstrap method utilizing this reference censoring model to simulate inferential repetitions of the experiment. It is shown that the most important part of improvement on first-order methods - that pertaining to fitting nuisance parameters - is insensitive to the assumed censoring model. This is supported by numerical comparisons of our proposal to parametric bootstrap methods based on usual random censoring models, which are far more unattractive to implement. As an alternative to our primary proposal, we provide a second-order method requiring less computing effort while providing more insight into the nature of improvement on first-order methods. However...

## Non-asymptotic Oracle Inequalities for the High-Dimensional Cox Regression via Lasso

Kong, Shengchun; Nan, Bin
Weighted likelihood, in which one solves Horvitz-Thompson or inverse probability weighted (IPW) versions of the likelihood equations, offers a simple and robust method for fitting models to two phase stratified samples. We consider semiparametric models for which solution of infinite dimensional estimating equations leads to $\sqrt{N}$ consistent and asymptotically Gaussian estimators of both Euclidean and nonparametric parameters. If the phase two sample is selected via Bernoulli (i.i.d.) sampling with known sampling probabilities, standard estimating equation theory shows that the influence function for the weighted likelihood estimator of the Euclidean parameter is the IPW version of the ordinary influence function. By proving weak convergence of the IPW empirical process, and borrowing results on weighted bootstrap empirical processes, we derive a parallel asymptotic expansion for finite population stratified sampling. Whereas the asymptotic variance for Bernoulli sampling involves the within strata second moments of the influence function, for finite population stratified sampling it involves only the within strata variances. The latter asymptotic variance also arises when the observed sampling fractions are used as estimates of those known a priori. A general procedure is proposed for fitting semiparametric models with estimated weights to two phase data. Several of our key results have already been derived for the special case of Cox regression with stratified case-cohort studies...