\documentclass[draft]{article}
\usepackage{amsmath,amssymb,latexsym,graphics,psfig,natbib}
\bibpunct{(}{)}{;}{a}{}{,}
\newcommand{\bt}{\mathbf{t}}
\newcommand{\x}{\mathbf{x}}
\newcommand{\z}{\mathbf{z}}
\DeclareMathOperator{\argmax}{argmax}
\DeclareMathOperator{\corr}{Corr}
\DeclareMathOperator{\var}{Var}
\begin{document}
\title{Protocol S1. Additional Models and Methods}
\author{{\sc Eric Bair}\thanks{Dept. of Statistics Sequoia Hall, Stanford
Univ., CA 94305. ebair@stanford.edu}\\ and
{\sc Robert Tibshirani}\thanks{Depts. of Health, Research \&
Policy, and Statistics,
Stanford Univ, tibs@stat.stanford.edu}
}
\maketitle
\section{Supplementary Results}
\subsection{A Variant of Supervised Clustering} \label{S:SCvariant}
We note that simply ranking the genes based on their absolute Cox
scores may not be the optimal method for identifying genes that are
expressed differently in different underlying cell types. Recall the
underlying conceptual model illustrated in Figure
\ref{F:TwoCellTypes}. Since there is considerable overlap in the
survival times of patients with different cancer subtypes, simply
selecting genes that are correlated with survival may not produce the
best results. A better method would take into account both the
survival time and the underlying genetic profile of a patient. One
possible way to do this is to use the PLS-corrected Cox scores
described in Section \ref{S:plsSAM}.
Thus, we tested a variation of the supervised clustering idea
described in Section \ref{S:diagnose} on the DLBCL data. Rather than
simply ranking the genes based on their raw Cox scores, we projected
the Cox scores onto the first partial least squares component of the
expression matrix. See Section \ref{S:plsSAM} in the Supplementary
Materials and Methods for details. We used the procedure described in
Section \ref{S:diagnoseMM} to choose the genes on which to cluster; we
obtained a list of 693 genes. After applying 2-means clustering to
create two subgroups, we fit a nearest shrunken centroids model to
predict these subgroups on the test data. The survival times of the
two resulting predicted subgroups are shown in Figure
\ref{F:StaudtPLS}; this shrunken centroids model used 14 genes. Using
the PLS-derived Cox scores seems to have produced better results in
this case.
\begin{figure}
\begin{center}
\begin{tabular}{c}
\psfig{file=StaudtPLS.eps,height=8cm,angle=270}
\end{tabular}
\end{center}
\caption{\em Results of using PLS-derived Cox scores in the procedure
described in Section \ref{S:SemiCluster}.}
\label{F:StaudtPLS}
\end{figure}
We also tested this procedure on our simulated data sets. For most of
the simulations, the 2-means clustering based on the PLS-corrected Cox
scores produced more accurate predictions than the clustering based on
the raw Cox scores. However, the PLS-corrected Cox scores produced
extremely poor results in a few cases, so the average results were
approximately the same. We tested the procedure on other microarray
data sets and obtained similar results (results not shown). Using
PLS-corrected Cox scores can occasionally produce better predictions,
although the method sometimes fails and produces poor results.
\subsection{Another Continuous Predictor of Survival}
\label{S:contpred2}
Section \ref{S:contpred2MM} in the Supplmentary Materials and Methods
describes two other possible continuous predictors of survival. We
calculated the estimators $\tilde{\beta}$ and $\hat{\gamma}$ for the
DLBCL data. Figure \ref{F:OLSbeta} shows that there appears to be a
very strong correlation between the value of $\tilde{\beta}$ and the
patient's survival. Fitting a Cox proportional hazards model to
$\tilde{\beta}$ confirmed this observation ($R^2 = 0.495$,
$\text{likelihood ratio test statistic} = 54.6$ on 1 df, $p=1.46
\times 10^{-13}$).
\begin{figure}
\begin{center}
\begin{tabular}{c}
\psfig{file=OLSbeta.eps,height=8cm,angle=270}
\end{tabular}
\end{center}
\caption{\em Plot of survival versus the least squares estimate of
$\tilde{\beta}$ on the DLBCL data.}
\label{F:OLSbeta}
\end{figure}
We also calculated $\hat{\gamma}$ for the DLBCL data, and fit a Cox
proportional hazards model. It turned out to be not as strong of a
predictor as $\tilde{\beta}$ ($R^2 = 0.292$, $\text{likelihood ratio
test statistic} = 27.6$ on 1 df, $p=1.49 \times 10^{-7}$), but still
highly significant. Figure \ref{F:gamma} shows a plot of survival
versus $\hat{\gamma}$.
\begin{figure}
\begin{center}
\begin{tabular}{c}
\psfig{file=gamma.eps,height=8cm,angle=270}
\end{tabular}
\end{center}
\caption{\em Plot of survival versus the least squares estimate of
$\tilde{\gamma}$ on the DLBCL data.}
\label{F:gamma}
\end{figure}
We have seen that $\tilde{\beta}$ is a very strong predictor of
survival on the DLBCL data. However, this approach has certain
limitations. It is only useful if an entire test data set $\tilde{X}$
exists; its utility is limited if we have, for example, only a single
test observation. Unfortunately, this is the situation that is most
likely to arise in practice; a doctor would like to be able to assign
a risk score to a single patient. Moreover, even if there were other
``test patients'' that could be used to calculate $\tilde{\beta}$, the
risk score of a given patient would depend upon the other patients in
the test sample, which is an undesirable property.
Thus, we proposed the predictor $\hat{\gamma}$. It is easy to verify
that $\hat{\gamma}$ has the desired properties that we described in
the above paragraph. (In particular, $\tilde{X}$ could be a column
vector.) On the DLBCL data, it is also a strong predictor of
survival, although not as strong as $\tilde{\beta}$.
In the DLBCL example, this method was clearly superior to the
supervised principal component method of Section \ref{S:SPCMM}.
However, when we analyzed other microarray data sets, supervised
principal components generally performed just as well or slightly
better than the method involving $\hat{\gamma}$. Interestingly, in
both simulation experiments, supervised principal components performed
much better than $\hat{\gamma}$. Although we were able to construct a
few examples where $\hat{\gamma}$ produced better results, we were not
able to find a model where $\hat{\gamma}$ performed better over the
average of several simulations. Furthermore, supervised principal
components performed better than $\hat{\gamma}$ on three of the four
data sets we examined in Section \ref{S:litcomp}. (Indeed,
$\hat{\gamma}$ was not even a significant predictor of survival on two
of these data sets.) These results suggest that supervised principal
components generally performs better than $\hat{\gamma}$. However,
$\hat{\gamma}$ gave significantly better results on the DLBCL data,
and if may perform better on other data sets as well.
Another reason to prefer supervised principal components to
$\hat{\gamma}$ is that the number of genes in the model is a tunable
parameter. On the DLBCL data, the optimal supervised principal
components model used 17 genes. By contrast, $\hat{\gamma}$ requires
the use of all $7399$ genes. As discussed above, in many
applications, it is important to know which genes are used to make
predictions. Supervised principal components would be the method of
choice in such circumstances.
\section{Supplementary Materials and Methods}
\subsection{Use of Partial Least Squares to Identify Significant
Genes} \label{S:plsSAM}
Let $X$ denote the $n \times p$ expression matrix, and let $d$ denote
the vector of Cox scores for each of the $p$ genes. Then let $\alpha$
be the first partial least squares direction for $d$ versus $X$. We
project $d$ onto $\alpha$ to obtain a vector of modified Cox scores.
Then we cluster using the genes with the largest modified Cox scores.
A general-purpose algorithm for finding partial least squares
coefficients is given in \citet{HTF01}. However, in this case, we
only need to calculate the first partial least squares direction, so
the solution is very simply:
\begin{enumerate}
\item Standardize $d$ to have mean 0, and standardize each column of
$X$ to have mean 0 and variance 1.
\item Let $\hat{\phi}_1 = X^T d$.
\item Let $\alpha = X \hat{\phi}_1$.
\item Standardize $\alpha$ to have mean 0 and variance 1.
\item Compute new scores $d'$ equal to the least squares fit of $d$ on
$t$. Since $t$ is a unit vector with mean zero, this has the simple
form
\[
d' = \bar{d} + \langle d, \alpha \rangle \cdot \alpha
\]
\end{enumerate}
\subsection{Another Continuous Predictor of Survival}
\label{S:contpred2MM}
We describe a modification of the procedure described in Section
\ref{S:contpred} that can sometimes produce better results. The
motivation for this new procedure is as follows: Rather than simply
using the first few columns of $V$ as continuous predictors of
survival, we attempt to find a linear combination of all the columns
of $V$. One way to do this is to find the least squares solution
$\hat{\theta}$ of
\begin{equation} \label{E:Ureg}
UD \theta = d
\end{equation}
and letting $\hat{v} = V \hat{\theta}$. (Here $d$ represents the
vector of Cox scores for each gene.) We would hope that if a given
linear combination of the columns of $U$ can predict the Cox score of
each gene accurately, then the corresponding linear combination of
columns of $V$ can predict survival. Note that (\ref{E:Ureg}) is
equivalent to
\begin{equation}
XV \theta = d
\end{equation}
Thus, if we let $\hat{\beta} = V\hat{\theta}$, then $\hat{\beta}$ is
simply the vector of ordinary least squares regression coefficients
for predicting $d$ based on $X$. Hence, if we find the least squares
solution $\hat{\beta}$ of
\begin{equation}
X \beta = d
\end{equation}
we would hope that $\hat{\beta}$ is correlated with survival. We can
use this idea to predict the survival of future patients. Suppose
that we have a set of test observations $\tilde{X}$. Then if we find
the least squares solution $\tilde{\beta}$ of
\begin{equation}
\tilde{X} \beta = d
\end{equation}
we would hope that $\tilde{\beta}$ would predict the survival times of
the patients in the test set. (Note that $d$ is calculated using only
the training data, so $\tilde{\beta}$ does not depend on the survival
times of the patients in the test data.)
We also describe an estimate of $\tilde{\beta}$ (call it
$\hat{\gamma}$). We begin by noting that the least squares estimate
$\hat{\beta}$ for estimating $d$ based on the training data $X$ can be
written in the form
\begin{equation} \label{E:ols}
(X^TX)^{-1} X^T d
\end{equation}
Also, by taking a singular value decomposition of $X$, we can
substitute (\ref{E:svd}) into (\ref{E:ols}) and obtain
\begin{equation}
\hat{\beta} = VD^{-1}U^Td
\end{equation}
which may also be written as
\begin{equation}
\hat{\beta} = X^TUD^{-2}U^Td
\end{equation}
This shows that $\hat{\beta}$ can be expressed as a linear combination
of the rows of $X$. Thus, a logical estimator of $\tilde{\beta}$
would be to take the same linear combination of the rows of
$\tilde{X}$, in other words:
\begin{equation}
\hat{\gamma} = \tilde{X}^TUD^{-2}U^Td
\end{equation}
where $U$ and $D$ are obtained from the singular value decomposition
of $X$.
\subsection{Theoretical Justification for the Continuous Predictor in
Section \ref{S:SPCMM}} \label{S:PCbest}
As above, let $X$ be the $p \times n$ matrix of expression values, for
$p$ genes and $n$ patients. Suppose that each patient has one of two
tumor types: tumor type 1 or tumor type 2. For each of the $p$ genes,
patients with tumor type 1 have an expression level that is Gaussian
with mean $\mu_1$ and variance $\sigma^2$, and patients with tumor
type 2 have an expression level that is Gaussian with mean $\mu_2$ and
variance $\sigma^2$. Assume that patients with tumor type 2 live
slightly longer, on average, than patients with tumor type 1, and that
no further information about survival is contained in the expression
data. Also, suppose that samples $1, \ldots, m$ belong to tumor type
1, and that samples $m+1, \ldots, n$ belong to tumor type 2, and that
$\mu_1 < \mu_2$. Thus, for a given patient, if we wish to test the
null hypothesis that the patient has tumor type 1, that is equivalent
to testing the null hypothesis that $\mu \leq \mu_1$ versus the
alternative that $\mu > \mu_1$. Moreover, it is well known that the
most powerful (MP) test of this hypothesis rejects the null when
$\sum_{i=1}^p x_i$ is large \citep{LC98}. (Here $x_i$ represents the
expression level of the $i$th gene in the patient.) We will show that
the predictor $\hat{v}$ in (\ref{E:PCpred}) is equivalent to this MP
test.
Let $\tau^2$ represent the variance of a given row of $X$, and let
$\upsilon$ denote the covariance between any two rows of $X$. (Since
all rows of $X$ have the same joint distribution, the covariance of
any two given rows of $X$ is the same.) Then the variance-covariance
matrix of $X$ (call it $X^{*}$) is of the form
\begin{equation}
X^{*} =
\begin{bmatrix}
\tau^2 & \upsilon & \ldots & \upsilon \\
\upsilon & \tau^2 & \ldots & \upsilon \\
\vdots & \vdots & \ddots & \vdots \\
\upsilon & \upsilon & \ldots & \tau^2
\end{bmatrix}
\end{equation}
It is easy to verify that $(1, 1, \ldots, 1)^T$ is an eigenvector of
$X^{*}$, and that the corresponding eigenvalue is $\tau^2 +
(p-1)\upsilon$. Let
\begin{equation}
\rho(X^{*}) = \max\{|\lambda| \mid \text{$\lambda$ is an eigenvalue
of $X^{*}$}\}
\end{equation}
and
\begin{align}
&\|X^{*}\|_1 = \max_{1 \leq j \leq p} \sum_{i=1}^p |x_{ij}^{*}|, \\
&\|X^{*}\|_\infty = \max_{1 \leq i \leq p} \sum_{j=1}^p |x_{ij}^{*}|
\end{align}
(Here, $x_{ij}^{*}$ represents the value of the $i$th row and $j$th
column of $X^{*}$.) Then
\begin{equation}
\rho(X^{*}) \leq \|X^{*}\|_1, \quad \rho(X^{*}) \leq \|X^{*}\|_\infty
\end{equation}
(For a proof of this result, see \citet{HJ85}). This implies that
$\tau^2 + (p-1)\upsilon$ must be the largest eigenvalue of $X^{*}$,
and hence $(1,1, \ldots, 1)^T$ is the first eigenvector of $X^{*}$.
Therefore $u_1 = (1/\sqrt{p}, 1/\sqrt{p}, \ldots, 1/\sqrt{p})^T$, and,
for a given column $x_j$ of $X$,
\begin{equation}
x_j^T u_1 = \frac{1}{\sqrt{p}} \sum_{i=1}^p x_{ij}
\end{equation}
Note that this is equivalent to the MP test statistic for testing the
null hypothesis that the patient has tumor type 1. Thus, for a given
patient, the best possible rule for diagnosing the correct tumor type
(and hence the best possible predictor of survival) is given by
(\ref{E:PCpred}).
\subsection{A Brief Description of the Methods Used in Section
\ref{S:litcomp}} \label{S:beer}
The following is a brief description of the algorithm described in
\citet{dB02} for predicting the survival of cancer patients using
microarray data. A complete description is available in the original
paper. The results in Section \ref{S:litcomp} are based on our
implementation of this algorithm. Although we attempted to reproduce
the published algorithm as faithfully as possible, there may be some
minor differences in our implementation. Here is an outline of the
procedure:
\begin{enumerate}
\item Select the $N$ genes with the largest Cox scores. (In both the
published result of \citet{dB02} and our implementation, we let
$N=50$.)
\item For each of these $N$ genes, calculate the univariate Cox
regression coefficient using the expression level of the gene as a
predictor of survival.
\item Take a linear combination of these $N$ genes, where the
coefficient of each gene is equal to the univariate Cox regression
coefficient corresponding to that gene. Call the resulting sum the
``risk index.''
\item Compute the median of the risk indices. Patients whose risk
index are less than the median are assigned to the ``low-risk''
subgroup, whereas patients whose risk index exceed the median are
assigned to the ``high-risk'' subgroup. \label{Beer:stp}
\end{enumerate}
Note that the median is not the only possible cutoff in step
\ref{Beer:stp} above. In the published result of \citet{dB02}, the
60th percentile of the risk indices was used. (Although they note
that the choice of the 60th percentile was arbitrary, and that other
cutoffs, such as the median, would also give reasonable results. We
chose to use the median as the cutoff for ease of implementation and
interpretability.)
\citet{LL03} use support vector machines (SVMs) to predict patient
survival. (For a description of SVMs, see \citet{HTF01}.)
Unfortunately, we were unable to obtain to procure an implementation
of their procedure. Thus, we implemented a simplified version of this
procedure by fitting an SVM (with a linear kernel) to the survival
times of each patient. (The censoring status of the patients was
ignored.) Applying the resulting model to an independent test set
produces a continuous predictor of survival. One can discretize this
predictor by assigning each patient to a ``low-risk'' or ``high-risk''
category based on whether or not this continuous risk score was above
or below the median. We considered both the discrete and continuous
versions of this predictor in our analysis.
\citet{NR02b} use partial least squares (PLS) to predict survival.
They fit a PLS regression model to the survival times of the patients
(the censoring status is ignored) and use the resulting model to
predict the survival of future patients. For a complete
description, see either the original paper or \citet{HTF01}. The
resulting predictor of survival is continuous; it can be discretized
as described in the previous paragraph. Again, we considered both the
discrete and continuous versions of this predictor in our analysis.
Finally, we examined the following naive approach: We partitioned the
training data into two subgroups so as to minimize the p-value of the
log-rank statistic for comparing the survival of the two subgroups.
We then fit a nearest shrunken centroid model using the two resulting
subgroups as the class labels and used the resulting to model to
classify the test patients into one of these two groups. We wished to
demonstrate that this procedure would result in overfitting, implying
that more sophisticated methods must be used to partition the training
data.
\begin{thebibliography}{36}
\expandafter\ifx\csname natexlab\endcsname\relax\def\natexlab#1{#1}\fi
\expandafter\ifx\csname url\endcsname\relax
\def\url#1{{\tt #1}}\fi
\expandafter\ifx\csname urlprefix\endcsname\relax\def\urlprefix{URL }\fi
\bibitem[{Alizadeh et~al.(2000)Alizadeh, Eisen, Davis, Ma, Lossos
et~al.}]{aA00}
Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, et~al. (2000).
\newblock Distinct types of diffuse large b-cell lymphoma identified by gene
expression profiling.
\newblock Nature 403: 503--511.
\bibitem[{Beer et~al.(2002)Beer, Kardia, Huang, Giordano, Levin et~al.}]{dB02}
Beer DG, Kardia SL, Huang CC, Giordano TJ, Levin AM, et~al. (2002).
\newblock Gene-expression profiles predict survival of patients with lung
adenocarcinoma.
\newblock Nature Medicine 8: 816--824.
\bibitem[{Ben-Dor et~al.(2001)Ben-Dor, Friedman, and Yakhini}]{BFY01}
Ben-Dor A, Friedman N, Yakhini Z (2001).
\newblock Class discovery in gene expression data.
\newblock Proceedings of the fifth annual international conference on
computational biology. Montreal, Quebec, Canada: ACM Press. p. 31--38.
\newblock DOI: 10.1145/369133.369167.
\bibitem[{Bhattacharjee et~al.(2001)Bhattacharjee, Richards, Staunton, Li,
Monti et~al.}]{aB01}
Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, et~al. (2001).
\newblock Classification of human lung carcinomas by mrna expression profiling
reveals distinct adenocarcinoma subclasses.
\newblock Proceedings of the National Academy of Sciences 98: 13790--13795.
\bibitem[{Bullinger et~al.(2004)Bullinger, D\"ohner, Bair, Fr\"ohling, Schlenk
et~al.}]{lB04}
Bullinger L, D\"ohner K, Bair E, Fr\"ohling S, Schlenk R, et~al. (2004).
\newblock Gene expression profiling identifies new subclasses and improves
outcome prediction in adult myeloid leukemia.
\newblock The New England Journal of Medicine In press.
\bibitem[{Chu et~al.(2002)Chu, Narasimhan, Tibshirani, and Tusher}]{SAM02}
Chu G, Narasimhan B, Tibshirani R, Tusher V (2002).
\newblock Significance analysis of microarrays (pam) software.
\newblock Available: http://www-stat.stanford.edu/\textasciitilde tibs/SAM/ via
the Internet. Accessed 2003 July 16.
\bibitem[{Coiffier(2001)}]{bC01}
Coiffier B (2001).
\newblock Diffuse large cell lymphoma.
\newblock Current Opinion in Oncology 13: 325--334.
\bibitem[{Cox and Oakes(1984)}]{dC84}
Cox DR, Oakes D (1984).
\newblock Analysis of survival data.
\newblock London, UK: Chapman and Hall. 208 p.
\bibitem[{Eisen et~al.(1998)Eisen, Spellman, Brown, and Botstein}]{mE98}
Eisen MB, Spellman PT, Brown PO, Botstein D (1998).
\newblock Cluster analysis and display of genome-wide expression patterns.
\newblock Proceedings of the National Academy of Sciences 95: 14863--14868.
\bibitem[{Golub et~al.(1999)Golub, Slonim, Tamayo, Huard, Gaasenbeek
et~al.}]{tG99}
Golub T, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, et~al. (1999).
\newblock Molecular classification of cancer: class discovery and class
prediction by gene expression monitoring.
\newblock Science 286: 531--536.
\bibitem[{Gordon(1999)}]{aG99}
Gordon AD (1999).
\newblock Classification.
\newblock Boca Raton, LA: Chapman \& Hall. 256 p.
\bibitem[{Hastie et~al.(2001{\natexlab{a}})Hastie, Tibshirani, Botstein, and
Brown}]{tH01}
Hastie T, Tibshirani R, Botstein D, Brown P (2001{\natexlab{a}}).
\newblock Supervised harvesting of expression trees.
\newblock Genome Biology 2(1): 1--12.
\bibitem[{Hastie et~al.(2001{\natexlab{b}})Hastie, Tibshirani, and
Friedman}]{HTF01}
Hastie T, Tibshirani R, Friedman J (2001{\natexlab{b}}).
\newblock The elements of statistical learning: data mining, inference and
prediction.
\newblock New York, NY: Springer-Verlag. 552 p.
\bibitem[{Hedenfalk et~al.(2001)Hedenfalk, Duggan, Chen, Radmacher, Bittner
et~al.}]{iH01}
Hedenfalk I, Duggan D, Chen Y, Radmacher M, Bittner M, et~al. (2001).
\newblock Gene-expression profiles in hereditary breast cancer.
\newblock The New England Journal of Medicine 344: 539--548.
\bibitem[{Horn and Johnson(1985)}]{HJ85}
Horn RA, Johnson CR (1985).
\newblock Matrix analysis.
\newblock New York, NY: Cambridge University Press. 575 p.
\bibitem[{Khan et~al.(2001)Khan, Wei, Ringn\'{e}r, Saal, Ladanyi et~al.}]{jK01}
Khan J, Wei JS, Ringn\'{e}r M, Saal LH, Ladanyi M, et~al. (2001).
\newblock Classification and diagnostic prediction of cancers using gene
expression profiling and artificial neural networks.
\newblock Nature Medicine 7: 673--679.
\bibitem[{Lapointe et~al.(2004)Lapointe, Li, van~de Rijn, Huggins, Bair
et~al.}]{jL04}
Lapointe J, Li C, van~de Rijn M, Huggins JP, Bair E, et~al. (2004).
\newblock Gene expression profiling identifies clinically relevant subtypes of
prostate cancer.
\newblock Proceedings of the National Academy of Sciences 101: 811--816.
\bibitem[{Lehmann and Casella(1998)}]{LC98}
Lehmann E, Casella G (1998).
\newblock Theory of Point Estimation.
\newblock New York, NY: Springer-Verlag. 589 p.
\bibitem[{Li and Luan(2003)}]{LL03}
Li H, Luan Y (2003).
\newblock Kernel cox regression models for linking gene expression profiles to
censored survival data.
\newblock Pacific Symposium on Biocomputing Available:
http://www.smi.stanford.edu/projects/helix/psb03/li.pdf via the Internet.
Accessed 2003 Dec. 11.
\bibitem[{McLachlan(1992)}]{gM92}
McLachlan GJ (1992).
\newblock Discriminant analysis and statistical pattern recognition.
\newblock New York, NY: John Wiley \& Sons. 526 p.
\bibitem[{Nguyen and Rocke(2002{\natexlab{a}})}]{NR02b}
Nguyen DV, Rocke DM (2002{\natexlab{a}}).
\newblock Multi-class cancer classification via partial least squares with gene
expression profiles.
\newblock Bioinformatics 18: 1216--1226.
\bibitem[{Nguyen and Rocke(2002{\natexlab{b}})}]{NR02a}
Nguyen DV, Rocke DM (2002{\natexlab{b}}).
\newblock Tumor classification by partial least squares using microarray gene
expression data.
\newblock Bioinformatics 18: 39--50.
\bibitem[{{Non-Hodgkin's Lymphoma Classification Project}(1997)}]{lcP97}
{Non-Hodgkin's Lymphoma Classification Project} (1997).
\newblock A clinical evaluation of the international lymphoma study group
classification of non-hodgkin's lymphoma.
\newblock Blood 89: 3909--3918.
\bibitem[{Nutt et~al.(2003)Nutt, Mani, Betensky, Tamayo, Cairncross
et~al.}]{cN03}
Nutt CL, Mani DR, Betensky RA, Tamayo P, Cairncross JG, et~al. (2003).
\newblock Gene expression-based classification of malignant gliomas correlates
better with survival than histological classification.
\newblock Cancer Research 63: 1602--1607.
\bibitem[{Ramaswamy et~al.(2001)Ramaswamy, Tamayo, Rifkin, Mukherjee, Yeang
et~al.}]{sR01}
Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, et~al. (2001).
\newblock Multiclass cancer diagnosis using tumor gene expression signatures.
\newblock Proceedings of the National Academy of Sciences 98: 15149--15154.
\bibitem[{Rosenwald et~al.(2002)Rosenwald, Wright, Chan, Connors, Campo
et~al.}]{aR02}
Rosenwald A, Wright G, Chan WC, Connors JM, Campo E, et~al. (2002).
\newblock The use of molecular profiling to predict survival after chemotherapy
for diffuse large b-cell lymphoma.
\newblock The New England Journal of Medicine 346: 1937--1947.
\bibitem[{Shipp et~al.(2002)Shipp, Ross, Tamayo, Weng, Kutok et~al.}]{mS02}
Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, et~al. (2002).
\newblock Diffuse large b-cell lymphoma outcome prediction by gene-expression
profiling and supervised machine learning.
\newblock Nature Medicine 8: 68--74.
\bibitem[{Sorlie et~al.(2001)Sorlie, Perou, Tibshirani, Aas, Geisler
et~al.}]{tS01}
Sorlie T, Perou C, Tibshirani R, Aas T, Geisler S, et~al. (2001).
\newblock Gene expression patterns of breast carcinomas distinguish tumor
subclasses with clinical implications.
\newblock Proceedings of the National Academy of Sciences 98: 10969--74.
\bibitem[{Therneau and Grambsch(2000)}]{tT00}
Therneau TM, Grambsch PM (2000).
\newblock Modelling survival data: extending the cox model.
\newblock New York, NY: Springer-Verlag. 350 p.
\bibitem[{Tibshirani et~al.(2002)Tibshirani, Hastie, Narasimhan, and
Chu}]{rT02}
Tibshirani R, Hastie T, Narasimhan B, Chu G (2002).
\newblock Diagnosis of multiple cancer types by shrunken centroids of gene
expression.
\newblock Proceedings of the National Academy of Sciences 99: 6567--6572.
\bibitem[{Tibshirani et~al.(2003)Tibshirani, Hastie, Narasimhan, and
Chu}]{PAM03}
Tibshirani R, Hastie T, Narasimhan B, Chu G (2003).
\newblock Prediction analysis for microarrays (pam) software.
\newblock Available: http://www-stat.stanford.edu/\textasciitilde
tibs/PAM/index.html via the Internet. Accessed 2003 July 12.
\bibitem[{Tusher et~al.(2001)Tusher, Tibshirani, and Chu}]{vT01}
Tusher VG, Tibshirani R, Chu G (2001).
\newblock Significance analysis of microarrays applied to the ionizing
radiation response.
\newblock Proceedings of the National Academy of Sciences 98: 5116--5121.
\bibitem[{van~de Vijver et~al.(2002)van~de Vijver, He, van~'t Veer, Dai, Hart
et~al.}]{mV02}
van~de Vijver MJ, He YD, van~'t Veer LJ, Dai H, Hart AA, et~al. (2002).
\newblock A gene-expression signature as a predictor of survival in breast
cancer.
\newblock The New England Journal of Medicine 347: 1999--2009.
\bibitem[{van~'t Veer et~al.(2002)van~'t Veer, Dai, van~de Vijver, He, Hart
et~al.}]{lV02}
van~'t Veer LJ, Dai H, van~de Vijver MJ, He YD, Hart AAM, et~al. (2002).
\newblock Gene expression profiling predicts clinical outcome of breast cancer.
\newblock Nature 415: 530--536.
\bibitem[{von Heydebreck et~al.(2001)von Heydebreck, Huber, Poustka, and
Vingron}]{aV01}
von Heydebreck A, Huber W, Poustka A, Vingron M (2001).
\newblock Identifying splits with clear separation: a new class discovery
method for gene expression data.
\newblock Bioinformatics 17: S107--S114.
\bibitem[{Vose(1998)}]{jV98}
Vose JM (1998).
\newblock Current approaches to the management of non-hodgkin's lymphoma.
\newblock Seminars in Oncology 25: 483--491.
\end{thebibliography}
\end{document}