Classification of CHEK2 variants in breast cancer cases and controls

Please go to the prediction submission form to submit you predictions for the CHEK2 dataset (to submit, you need to be logged into your account).

Background: Variants in the ATM and CHEK2 genes are associated with breast cancer. For this experiment, predictors are asked to estimate the probability of an individual with a given mutation being in the case (cancer) or control (healthy) cohort. The data available include the targeted resequencing of ATM and CHEK2 from approximately 1250 breast cancer cases and 1250 controls.

The ATM sequencing results have already been published [1], and will thus serve as an example set, download here ATM_dataset.xls.

[1] Tavtigian SV, Oefner PJ, Babikyan D, et al. Rare, Evolutionarily Unlikely Missense Substitutions in ATM Confer Increased Risk of Breast Cancer. Journal of Human Genetics. 2009:427-446.doi: 10.1016/j.ajhg.2009.08.018

Dataset: Predictors are provided with 41 rare missense, nonsense, splicing, and indel variants in CHEK2. The prediction dataset can be downloaded here: CHEK2_dataset.xls.

To see the alignments that were used to analyze the CHEK2 and ATM data, please go to Align GVGD website at http://agvgd.iarc.fr.

Prediction challenge: Predictors are asked to classify variants as occurring in cases or controls. Predictors will provide their estimate of the probability of individuals with a given variant being in the case set. Control probability is implicitly 1 – P(case). Correctness of each prediction will be weighted according to

(a) how accurately P(case) was predicted
(b) the confidence measure provided
(c) the number of study participants with the variant.

While prediction for a single individual may not be meaningful in all cases, the sum across all predictions should give an informative measure of prediction accuracy. In addition, we ask predictors to submit the raw output data of the prediction algorithm.

Prediction submission format: The prediction submission is a simple text file. The organizers provide a file template, which should be used for submission. In the submitted file, each line should include the following columns:

1) The CHEK2 variant as listed in the prediction dataset file, use the order as provided in the template form
2) P(case), the probability of individuals with a given variant being in the case set
3) Standard deviation as a measure of confidence of prediction in column 2
4) Raw output data from your prediction algorithm

All columns are required for the prediction submission of each variant. In the template file, columns 2-4 are marked with an “*”. Submit your predictions by replacing the “*” with your prediction value. If predictions cannot be submitted for a specific line (variant), leave the sign “*” in these columns. The validity of the submitted prediction file will be checked with a script, so please make sure you follow these submission guidelines strictly.

In addition, your submission should include a detailed description of the method used to make the predictions. This information will be submitted as a separate file.

Please go to the prediction submission form to submit you predictions for the CHEK2 dataset (to submit, you need to be logged into your account).

Sean Tavtigian
Dataset provided by Sean Tavtigian, University of Utah