Breast cancer cell line pharmacogenomics dataset

Please go to the prediction submission form to submit you predictions for the Breast cancer cell line dataset (to submit, you need to be logged into your account).

Background: Cancer tissues are specifically responsive to different drugs. For this experiment, predictors are asked to predict the response of each of 54 breast cancer cell lines to a panel of drugs. Data about the tissues include transcriptional profiling, SNP data and copy number profiles measured for cells grown in the absence of any treatment.

Background information on the breast cancer cell line panel used in this study may be found at

Neve RM, Chin K, Fridlyand J, Yeh J, Baehner FL, Fevr T, Clark L, Bayani N, Coppe JP, Tong F, Speed T, Spellman PT, DeVries S, Lapuk A, Wang NJ, Kuo WL, Stilwell JL, Pinkel D, Albertson DG, Waldman FM, McCormick F, Dickson RB, Johnson MD, Lippman M, Ethier S, Gazdar A, Gray JW. (2006) A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes. Cancer Cell. Dec;10(6):515-27. http://www.ncbi.nlm.nih.gov/pubmed/17157791

Each cell line underwent transcriptional profiling to assess the expression of ~20,000 genes. These represent baseline expression profiles: cells were grown in the absence of any treatment. This data may be downloaded here. In the file, each column represents a single breast cancer cell line, and each row represents a single gene. Data are in log2 coordinates.

Copy number profiles represent measures of changes at the DNA level. Normal cells contain two complete copies of each chromosome. Cancer cells are frequently genomically unstable -- regions of the genome can become over-expressed or deleted, leading to either multiple or no copies of a particular gene. The data represent segmented copy number profiles (processed from the SNP6.0 CEL files, see below): after assessing the number of copies at the probe-level, the data are smoothed using circular binary segmentation. Each segment represents a region of the genome with similar copy number. Aberrant regions can be either focal or large. Download here a pdf figure explaining the Copy number data.

The copy number data can be downloaded here.

The SNP6.0 CEL data for 53 cell lines may be downloaded here. NOTE, file size 1.6Gb.

Dataset: The example dataset consists of drug response data on 54 breast cancer cell lines. The three drugs of the example dataset are: BIBW2992, AKT1-2 inhibitor and Erlotinib. For each drug, the GI50 value as measured in each cell line is given.

Download the example dataset here.

The prediction dataset including a list 54 drugs and their biological targets can be downloaded here.

Prediction challenge: Participants are asked to predict the response of each of the 54 breast cancer cell lines described in the example dataset on the 54 drugs. The prediction should be the GI50 value with standard deviation.

Prediction submission format: The prediction submission is a simple text file. The organizers provide a file template, which should be used for submission. In the submitted file, each line should include the following columns:

1) The cell line as listed in the example dataset file, use the order as provided in the template form
2) The GI50 value of the drug
3) Standard deviation as a measure of confidence of prediction in column 2
4) The GI50 value of the drug
5) Standard deviation as a measure of confidence of prediction in column 4
Repeat columns 2 and 3 for all 54 drugs.

In the template file, all columns but 1 are marked with an “*”. Submit your predictions by replacing the “*” with your prediction value. If predictions cannot be submitted for a specific cell line or drug, leave the sign “*” in these columns.

In addition, your submission should include a detailed description of the method you used to make the predictions. This information will be submitted as a separate file.

Please go to the prediction submission form to submit you predictions for the Breast cancer cell line dataset (to submit, you need to be logged into your account).

Joe Gray
Dataset provided by Joe W. Gray, Lawrence Berkeley National Laboratory.