We've just integrated 1033 copy number profiles from the The Clinical Lung Cancer Genome Project (CLCGP) and Network Genomic Medicine (NGM) study. It is a very nice and comprehensive dataset, including clinical followup etc. Make sure to read the article!
We have started a new blog, with an article discussing some aspects of array based molecular analyses and their utility in the current technological landscape. Enjoy! Comments are welcome...
We just switched to some new server hardware - small & black ... Fingers crossed - please let us know about oddities.
With PMID 24476156, PubMed now lists our discussion of clustered genomic copy numbers imbalances in cancer, which correspond to copy number aberration patterns commonly referred to as "Chromothripsis".
Based on the analysis of ~20'000 genomic array datasets from our arrayMap database, we discuss features associated with these patterns and suggest that they represent a number of, frequently cancer type related, genomic aberration types, beyond a well defined single "Chromothripsis" phenomenon.
The PDF of the article can be downloaded directly from here .... Open Access, naturally - Enjoy!
We have started to list WES / WGS publications of cancer genomes in the Progenetix literature collection. For now, we just try to extract copy number aberration data if it is annotated in a compatible format.
The first example here is the publication by Bea et al. of whole exome sequencing data in mantle cell lymphomas. This publication is exemplary in providing supplementary information, including listing of CNA regions from the WES data, and deposition of corresponding SNP (will be added to arrayMap later) and expression arrays in GEO.
Our new database update publication
- "Progenetix: 12 years of oncogenomic data curation"
which had been deposited at http://arxiv.org/abs/1311.2757 has now been published at Nucleic Acids Research, as part of NAR's 2014 database issue:
Open Access ...
We have implemented a map display for the currently selected articles or samples.
Essentially, all samples of the current subset / search result are projected to their origin, determined from the main institution associated with the publication.
This feature should come in particularly handy when e.g. finding out which institutions are especially active in a given area of cancer genome research. However, for the sample driven listings, this depends on the availability of the sample data through Progenetix/arrayMap.
Michael has been co-author of another non-cancer genetics publication:
Krieger, M., A. Roos, C. Stendel, K. G. Claeys, F. M. Sonmez, M. Baudis, P. Bauer, A. Bornemann, C. de Goede, A. Dufke, R. S. Finkel, H. H. Goebel, M. Haussler, H. Kingston, J. Kirschner, L. Medne, P. Muschke, F. Rivier, S. Rudnik-Schoneborn, S. Spengler, F. Inzana, F. Stanzial, F. Benedicenti, M. Synofzik, A. Lia Taratuto, L. Pirra, S. K. Tay, H. Topaloglu, G. Uyanik, D. Wand, D. Williams, K. Zerres, J. Weis, and J. Senderek. SIL1 mutations and clinical spectrum in patients with Marinesco-Sjogren syndrome. Brain (2013) PubMed
When looking for focal copy number aberrations, so far min/max values could be used to limit CNAs to a given size range. A typical scenario would be to e.g. set the "max" to "5000" when querying a given gene, thereby limiting the size of the called segment involving the gene to 5Mb - this is pretty much a focal hit (though it still may involve quite a number of other potential targets).
However, this method is not very specific: e.g. a whole chromosome loss in a "medium quality" array may present as hundreds of small segments, thereby triggering "focal" calls.
The addition or sole use of the new "MAX COVERAGE" option adds another layer of "selactability". A value of 5000 there means that only gene hits are evaluated, if in the interval "gene CDR start - 5000" to "gene CDR end + 5000" all segments of the requested type (gain/1 or loss/-1) do not exceed 5000 kb. (A true sliding window approach around the target may be theoretically superior, but in practice would not make a lot of difference.
As always, comments appreciated ...
Congratulations to Haoyang, who successfully defended his thesis on July 11, 2013. A substantial part of his project was the assembly of the arrayMap resource, which we reported in a 2012 (PLoS ONE paper)[http://www.ncbi.nlm.nih.gov/pubmed/22629346].
And thanks to the other members of his thesis committee - Christian von Mering, Mark Robinson, Homayoun Bagheri and Nuria Lopez-Bigas!
Haoyang Cai from the Baudis group will present the results of his PhD thesis on Thursday, July 11 in Y55 L08, Irchel Campus. Guests are welcome; the presentation starts at 15:00.
- Martin Baumgartner: Imaging and molecular characterization of cancer cell motility
- Maurizio Provenzano: Diagnostic/prognostic procedures and treatment options for prostate cancer
Room: USZ PATH C22
Time: 16:30H - 18:00H
We have added the first series of copy number aberration data from methylation arrays (Sturm et al., PMID 23079654) to Progenetix and arrayMap. Among overall 210 glioma saples, the dataset contains 69 paediatric/young adult DIPG/high grade gliomas which are included in the (DIPG project)[http://dipg.progenetix.org].
We will use this as a pilot project, to work on a future general use of this type of molecular screening data. However, we deem it worthwhile to provide the data in its current state - and we are very excited about these developments.
Haoyang will present his results from the analysis of chromothripsis-like genome patterns at this year's CNZ retreat in Grindelwald:
- Chromothripsis-like patterns are recurring but heterogeneously distributed features in a survey of 22,347 cancer genomes
From 2013-03-21 - 2013-03-23, the Swiss-Romanian cutaneous NHL collaboration meets in Timisoara.
By adding some new data sets and annotating some more of the evaluated data from arrayMap, Progenetix now has more than 30000 cancer genome copy number profiles, from 1001 publications. The data consists of 20531 chromosomal CGH and 10024 genomic array profiles, and covers 364 diagnostic entities according to ICD-O 3.
As of today, we have launched a way to access and integrate different versions of our gain/loss frequency plots into your resources. The information can be found in the guide (API).
Today Haoyang from the Baudis group is presenting the latest part of his thesis project, using arrayMap data to explore correlations of cancer CNA and local genome features, e.g. gene density.
The "Computational Biology in Zurich" site has a new web address:
Additionally, the domains
... will forward to this domain.
On Jan 09 and 10 we were welcoming the members of the DIPG data working, group for a meeting at the University of Zurich.
Progenetix and arrayMap are back to normal - happy cancer genome data mining! For the next year, we plan some nice data & tool updates - stay posted.
While the cancer genome profile array CGH data in Progenetix and the arrayMap data are fine, chromosomal CGH copy number profiles are still incorrect. Working on it ...
There are some database search errors when accessing sample specific data from both Progenetix and arrayMap. We're working on it - should be fixed before the year is over...
Cancer genome copy number profiles of samples from both Progenetix and arrayMap can now be queried for cases with similar CNA profiles. The function is currently accessible through the sample details pages of both Progenetix and arraymap. Enjoy!
To reflect common cancer incidence and information systems, we have introduced an additional classification scheme based on the categories used by the NIH's Surveillance, Epidemiology and End Results (SEER) resource. Compared to what we observe in SEER, we provide additional adjustment of outlier data.
A news feature by Mike Martin discusses our arrayMap resource in a recent issue of the Journal of the National Cancer Institute (JNCI ).
Nitin Kumar, Haoyang Cai, Christian von Mering and Michael Baudis: Specific genomic regions are differentially affected by copy number alterations across distinct cancer types in aggregated cytogenetic data
has just been published at PLoS ONE.
Beleut et al.: Integrative genome-wide expression profiling identifies three distinct molecular subgroups of renal cell carcinoma with different patient outcome
... in which we had participated, has just become available through BMC Cancer. Congratulations to all co-authors!
Oncogenomic Pattern Detection in Cancer Copy Number Alteration Data for Pathway Description and Disease Classification: Nitin Kumar from our group at the University of Zurich has successfully passed his exam for a PhD. Congratulations from the members of the Baudis group!
Nitin has been instrumental in developing some of the analytical algorithms (too be) implemented in Progenetix and arrayMap, and also in the survey of available genomic array data, finally resulting in the arrayMap resource. More to come ...
Some interface and form elements have been streamlined, including the less commonly used selector fields as well as the sample selection options. Some common options are now displayed only if activated (e.g. "mouse over" to see all files available for download).
Also, icon quality has been enhanced for all but the details pages (where larger icons just would take up too much space).
Supported through the @CureStartsNow foundation, a joint #DIPG effort has been started to collect and share all genomics and related molecular data from diffuse intrinsic pontine gliomas and related childhood brain cancer samples. Details can be found at dipg.progenetix.org and at dipgdata.blogspot.com, as well as through the @dipgdata Twitter feet.
New: All pre-generated histogram and ideogram plots are now produced based on a 1Mb matrix, with a 500Kb minimum size filter to remove CNV/platform dependent background from some high resolution array platforms. The unfiltered data can still be visualized through the standard analysis procedures.
Bug fix: Interactive segment size filtering so far only worked for region specific queries, but not as a general filter (see above). This has been fixed; a minimum segment size in the visualization options now will remove all smaller segments.
The interval selector now has options to include the p-arms of acrocentric chromosomes (though the data itself there may be incompletely annotated!). This feature was requested by Melody Lam.
- bug fix: fixing lack of clustering for CNA frequency profiles in the analysis section
- removed "Series Search" from the arrayMap side bar; kind of confusing - just search for the samples & select the series
- introduced a method to combine sample annotations and segmentation files for user data processing (see "FAQ & GUIDE")
- fixed some array plot presentation and replotting problems
- consolidation of script names - again, don't use deep links (besides for "api.cgi?...")
- moving of remaining sample selection options (random sample number, segments number, age range) to the sample selection page, leaving the pre-analysis page (now "prepare.cgi") for plotting/grouping options
- fixed the KM-style survival plots
- re-factoring of the cytobands plotting for histograms and heatmaps; this also fixes missing histogram tiles
- analysis output page: the circular histogram/connections plot and group specific histograms are now all available as both SVG and PNG image files
Some changes to the plotting options:
- the circular plot is now added as a default; and connections are drawn in for <= 30 samples (subject to change)
- one can now mark up multiple genes (or other loci of interest), for all plot types
- added option to create custom analysis groups based on text match values
- rewritten circular plot code
- copied data for PMIDs 17327916, 17311676, 18506749 and 18246049 from arrayMap to Progenetix
- bug fix: gene selector was broken for about a week; fixed
- In many places, images are now converted sever side to PNG data streams and embedded into the web pages. This will substantially decrease web data traffic and page download times. Fully linked SVG images (including region links etc.) are still available through the analysis pipeline.
- data fix: PMID 18160781 had missing loss values (due to irregular character encoding); fixed, thanks to Emanuela Felley-Bosco for the note!
- moved the region filter from the analysis to the sample selection page
- added a "mark region" option to the analysis page: one now can highlight a genome region in histograms and matrix plots
- added "select all" option to entity lists
- implemented first version of sample-to-entity match score
- added single sample annotation input field to "User File Processing"; i.e. one can now type in CNA data for a single case, and have this visualised and similar cases listed
- added per sample CNA visualisation to the samples details listings (currently if up to 100 samples)
- added direct access to sample details listing to the subsets pages
- adding of abstract search to the publication search page
- introduction of a matching function for similar cases by CNA profile, accessible through the sample details pages of both Progenetix and arraymap
- Introduction of SEER groups
The database now contains the copy number status for different interval sizes (e.g. 1MB). With this, users can now create their own data plots (histograms etc.) using more than 10000 cancer copy number profiles with a high resolution. The options here are still being tested and improved - comments welcome!
- added a new export file format "ANNOTATED SEGMENTS FILE", which uses the first columns for standard segment annotation, followed by some diagnostic and clinical data; i.e., the information for a case is repeated for each segment:
GSM255090 22 25063244 25193559 1 NA C50 8500/3 breast Infiltrating duct carcinoma, NOS Carcinomas: breast ca. NA 1 51 0.58 GSM255090 22 25368299 48899534 -1 NA C50 8500/3 breast Infiltrating duct carcinoma, NOS Carcinomas: breast ca. NA 1 51 0.58 GSM255091 1 2224111 30146401 -1 NA C50 8500/3 breast Infiltrating duct carcinoma, NOS Carcinomas: breast ca. NA 0 72 0.54 GSM255091 1 35418712 37555461 1 NA C50 8500/3 breast Infiltrating duct carcinoma, NOS Carcinomas: breast ca. NA 0 72 0.54
- added gene selection for region specific replotting of array data
- the gene database has been changed to the last version of the complete (HUGO names only) Ensembl gene list for HG18; previously, only a subset of "cancer related genes" was offered in the gene selection search fields
- some interface and form elements have been streamlined (e.g. less commonly used selector fields, sample selection options)
- some common options are now displayed only if activated (e.g. "mouse over" to see all files available for download)
- icon quality has been enhanced for all but the details pages
- New: All pre-generated histogram and ideogram plots are now produced based on a 1Mb matrix, with a 500Kb minimum size filter to remove CNV/platform dependent background from some high resolution array platforms. The unfiltered data can still be visualized through the standard analysis procedures.
- Bug fix: Interactive segment size filtering so far only worked for region specific queries, but not as a general filter (see above). This has been fixed; a minimum segment size in the visualization options now will remove all smaller segments.
- NEW: change log; that is what is shown here
- FEATURE: The interval selector now has options to include the p-arms of acrocentric chromosomes (though the data itself there may be incompletely annotated!). Feature requested by Melody Lam.
The navigation icons of the arrayMap and Progenetix sites have been updated. This is mostly a cosmetic change, but some of the linking has been streamlined, too.
Cai, H., N. Kumar, and M. Baudis. arrayMap: A Reference Resource for Genomic Copy Number Imbalances in Human Malignancies. PLoS ONE 2012: accepted.
The original version of the manuscript is available at
We will announce the final version as soon as it becomes available.
Progenetix and arrayMap news and guide are now available through RSS:
You can either subscribe to this, or follow it on Twitter @progenetix. Enjoy!
Samples can be queried by specifying a number of parameters and/or keywords. The following example shows a text query for anything with "renal" in diagnosis text, ICD-O text or locus text, limited to platforms containing "affy" in the name, and having a minimum or 45000 probes. Also, there will be a limitation to samples having any change overlapping the CDKN2A locus - the selection is just under way:
After performing the search, the user is presented with selection lists containing parameters encountered in the current samples, for further exclusion options.
As of March 2012, registration is only necessary for
- any commercial use of the database
- maintaining private projects
- participating in collaborative studies containing unpublished material
However, we are happy about feedback and suggestions.
For any use of the site by for-profit entities, an individual license has to be obtained. Starting 2008, licensing proceeds for new licensees has been handled through the Univerity of Zurich.
Please contact Michael Baudis for further information.