Single amino-acid changes in the human p53 core domain that can restore activity of inactive p53 found in human cancers

Please go to the prediction submission form to submit you predictions for the p53 dataset (to submit, you need to be logged into your account).

Background: The transcription factor p53 is a central tumor suppressor protein that controls DNA repair, cell cycle arrest, and apoptosis (programmed cell death). About half of human cancers have p53 mutations that inactivate p53. Over 250,000 US deaths yearly are due to tumors that express full-length p53 that has been inactivated by a single point mutation. For the past several years, the group of Rick Lathrop at University of California, Irvine, has been engaged in a complete functional census of p53 second-site suppressor (“cancer rescue”) mutations. These cancer rescue mutations are additional amino acids changes (to otherwise cancerous p53 mutations), which have been found to rescue p53 tumor suppressor function, reactivating otherwise inactive p53. These intragenic rescue mutations reactivate cancer mutant p53 in yeast and human cell assays by providing structural changes that compensate for the cancer mutation.

For a reference to the primary biological strategies, please see: Baronio R, Danziger SA, Hall LV, Salmon K, Hatfield GW, Lathrop RH, Kaiser P. (2010) All-codon scanning identifies p53 cancer rescue mutations, Nucleic Acids Res. (Epub ahead of print). doi: 0.1093/nar/gkq571

Additional references on the computational and biological strategies:
Danziger, S.A., Baronio, R., Ho, L., Hall, L., Salmon, K., Hatfield, G.W., Kaiser, P., and Lathrop, R.H. (2009) Predicting Positive p53 Cancer Rescue Regions Using Most Informative Positive (MIP) Active Learning, PLOS Computational Biology, 5(9), e1000498. doi: 10.1371/journal.pcbi.1000498

Danziger, S.A., Zeng, J., Wang, Y., Brachmann, R.K. and Lathrop, R.H. (2007) Choosing where to look next in a mutation sequence space: Active Learning of informative p53 cancer rescue mutants, Bioinformatics, 23(13), 104-114. doi: 10.1093/bioinformatics/btm166

Danziger, S.A., Swamidass, S.J., Zeng, J., Dearth, L.R., Lu, Q., Chen, J.H., Cheng, J., Hoang, V.P., Saigo, H., Luo, R., Baldi, P., Brachmann, R.K. and Lathrop, R.H. (2006) Functional census of mutation sequence spaces: the example of p53 cancer rescue mutants, IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM, 3, 114-125. doi: 10.1109/TCBB.2006.22.

Dataset, general description: The dataset is composed of calculations performed on in silico models of the mutant p53 structures. Features per instance (mutant) represent various aspects of the changes induced by the mutations, such as:

1) Changes in properties like (polarity, size, etc) in the mutant.
2) Changes in electrostatic and surface based features in the mutant.
3) Distance measures the movement in alpha carbon of each residue.
4) ddG predictions (stability metrics).

Training dataset: Overall, the training dataset contains 16,772 mutants, with each labeled as either ‘active’ or ‘inactive’. These class labels are determined by wet-lab experimental assays of p53 function in yeast and/or human cell lines.

The training dataset represents an exhaustive single-point mutagenesis of the entire core domain of p53 for the following p53 cancer mutations: r175h, r273h, and g245s.

The format here is x#y, where:
x is the single letter code for the wild type amino acid
# is the residue
y is the single letter code for the mutant amino acid.

Additionally, regional saturation mutagenesis of the following p53 cancer mutations are included: h179r, p151s, r280t, p278l, r248l, r273l, r249s, p152l, and r158l. While these mutations comprise most of the dataset, several hundred examples for other p53 cancer mutants are included.

Download the training dataset in ARFF format.
More information on the ARFF format can be found at http://www.cs.waikato.ac.nz/~ml/weka/arff.html

Download the training dataset in CVS format.
Download the training dataset header file.

Download the training dataset mutant list.

Prediction dataset: The amino acid mutations are provided for each mutant in case predictors wish to derive predictions not based upon the additional structural data supplied. We welcome creative ideas and alternative predictive strategies.

The basis for our structural models is the core domain of wild-type p53 and can found in the PDB under the structure ID of 1TSR, specifically chain B: http://www.rcsb.org/pdb/explore/explore.do?structureId=1tsr

The prediction dataset represents an exhaustive single-point mutagenesis of the entire core domain of p53 for the following p53 cancer mutations: m237i, r248q, r282w and y220c. These cancer mutations were specifically chosen for CAGI to contain a diverse set of structural perturbations of p53 structure and function. The test dataset contains 14668 mutants.

Residue numbering is as in the 1TSR PDB file, where residue number 96 begins the core domain of p53 and residue number 289 terminates the core domain.

Download the prediction dataset in ARFF format.

Download the prediction dataset in CVS format.
Download the prediction dataset header file. Please note, the header files for training and prediction sets are identical.

Download the prediction dataset mutant list.

Prediction challenge: Predictors are asked to submit predictions on the effect of the cancer rescue mutants on four p53 cancer mutations as measured with yeast and/or human cell lines. The prediction should be the probability of a mutant being active. In addition, we ask predictors to submit the raw output data of the prediction algorithm.

Prediction submission format: The prediction submission is a simple text file. The organizers provide a file template, which should be used for submission. In the submitted file, each line should include the following columns:

1) The mutant pair as listed in the prediction dataset file, use the order as provided in the template file
2) The probability of a mutant being active
3) Standard deviation as a measure of confidence of prediction in column 2
4) Raw output data from your prediction algorithm

In the template file, columns 2-4 are marked with an “*”. Submit your predictions by replacing the “*” with your prediction value. If predictions cannot be submitted for a specific line (mutant), leave the sign “*” in these columns.

In addition, your submission should include a detailed description of the method you used to make the predictions. This information will be submitted as a separate file.

Please go to the prediction submission form to submit you predictions for the p53 dataset (to submit, you need to be logged into your account)

Rick Lathrop
Dataset provided by Rick Lathrop and the p53 "cancer rescue" team, UC Irvine