@progenetix | arraymap Changes


Bug fix: Fixed the nasty MongoDB timeout

recv timed out (30000 ms)

which affected complex searches, by setting the time to 30sec


Please report back, if there still are problems!

(... skipped a number of changes ...)


Bug fix: temporarily there were no samples listed; just pre-computed subset data. arrayMap was not affected. Problem has been fixed (but still checking) - apologies!


Progenetix now collects cancer genome / exome sequencing publications


Is it a bug fix? Is it a feature? Progenetix & arrayMap NOW with much faster publication listing!


We have (re-)introduced pre-computed subsets (ICD-O 3 morphologies, ICD topographies, SEER groups, "clinical groups"), which greatly increases especially the listing page load times. Enjoy!


  • bug fix: online array segmenting didn't work (missing dependency); fixed
  • bug fix: publication search by PMID was broken; fixed
  • added icon notes for mouseover
  • small modifications to array re-plotting


We have added a more refined option for the CNA search with a combination of genes/loci:

  • One can now exclude specific types of CNAs by prepending the type with an exclamation mark. E.g. 17:7512445-7531642:!-1 will exclude all samples with a loss in this locus (TP53).
  • This can be helpful to look for specific breakpoints, too. E.g.

    [MYC] 8:128816862-128822853:!1

    ... should be helpful in identifying samples with a breakpoint around the MYC locus. This, however, mostly makes sense when using arrayMap and following up by exploring the plots themselves. The followin example (breast cancer sample GSM217412 from PMID 19336569) shows a complex re-arrangement around the MYC locus...


  • streamlined sample selection: for single publications & subsets, the parameters for further sample selection are now displayed on the same page
  • bug fix: for some weeks, when searching for samples with a stemmed search (e.g. using "814" in ICD morphology), sub-selections of matching codes in the list selector where not recognised


  • improved the Kaplan-Meier plots by adding a max. time option (i.e., fix the max. observation time plotted to compare different datasets); also, colors for the region specific KM-plots match now the selected gain/loss coloscheme


  • changed the way images are presented: in most cases, PNG data is now embedded into the HTML page, and clicking opens an SVG version (if available)
  • the clustering options are now more sensible, with the selection of Eisen-cluster vs. R hclust being applied to both sample and frequency profile clustering


  • added cluster method selector


  • fixed the statistics display for publications (aCGH/cCGH summary table) and added the "select all" toggle for th epublications list


  • "Search Publications" now presents only a search window (no automatic listing of all publications)
  • moved external links to a separate listing


Another example for plotting of arrayMap data, here using a glioblastoma sample (associated GEO-ID GSM491153) to highlight some focal copy number alterations:


  • improved the color mixing model for the "heat map-like" gain/loss intensity strips


  • adding of consistent option to plot markers and selected regions in all linear plot types; one can now add genes or custom labels to array plots, histograms, matrix plots ... see (garish) example below

last edit 2014-03-12
[display] [permalink] [edit] [top]

2012 Paper: Specific Genomic Regions Are Differentially Affected by Copy Number Alterations ...

Group publication

Specific Genomic Regions Are Differentially Affected by Copy Number Alterations across Distinct Cancer Types, in Aggregated Cytogenetic Data

Nitin Kumar, Haoyang Cai, Christian von Mering, Michael Baudis

PLoS One (2012) 7: e43689. PubMed

22937079.pdf [PDF]

last edit 2014-02-17
[display] [permalink] [edit] [top]

2012 Publication - arrayMap oncogenomic database announcement

The PDF of our PLoS ONE publication announcing the arrayMap cancer genome database.

22629346.pdf [PDF]

last edit 2013-01-31
[display] [permalink] [edit] [top]

2012 Publication: Specific Genomic Regions Are Differentially Affected by Copy Number Alterations ...

Specific Genomic Regions Are Differentially Affected by Copy Number Alterations across Distinct Cancer Types, in Aggregated Cytogenetic Data

Nitin Kumar, Haoyang Cai, Christian von Mering, Michael Baudis

PLoS One (2012) 7: e43689. [PubMed]

22937079.pdf [PDF]

last edit 2013-01-31
[display] [permalink] [edit] [top]

30449 cancer genome profiles from 1001 publications now included in Progenetix

By adding some new data sets and annotating some more of the evaluated data from arrayMap, Progenetix now has more than 30000 cancer genome copy number profiles, from 1001 publications. The data consists of 20531 chromosomal CGH and 10024 genomic array profiles, and covers 364 diagnostic entities according to ICD-O 3.

last edit 2013-03-01
[display] [permalink] [edit] [top]

Ni Ai

PhD student in the group of Michael Baudis at the Institute of Molecular Life Sciences, UZH. Method development for mining of oncogenomic data sets. Involved in the development and maintenance of the Progenetix and arrayMap cancer genome databases as well as the CNHL and DIPG projects.

Y55 K14 | ni.ai@uzh.ch

last edit 2013-08-12
[display] [permalink] [edit] [top]

Haoyang Cai

Postdoc in the group of Michael Baudis at the Institute of Molecular Life Sciences, UZH. Development and maintenance of the Progenetix and arrayMap cancer genome databases.

Y55 K14 | haoyang.cai@imls.uzh.ch

last edit 2013-08-02
[display] [permalink] [edit] [top]

Saumya Gupta

PhD student in the group of Michael Baudis at the Institute of Molecular Life Sciences, UZH. Meta-analysis of oncogenomic data sets, with a focus on gene specific aberrations. Involved in the development and maintenance of the Progenetix and arrayMap cancer genome databases as well as the CNHL and DIPG projects.

Y55 K14 | saumya.gupta@imls.uzh.ch

last edit 2013-08-02
[display] [permalink] [edit] [top]

Prisni Rath

PhD student in the group of Michael Baudis at the Institute of Molecular Life Sciences, UZH. Collection and analysis of cancer genome data, with a focus on genomic array data sets and data structures. Development and maintenance of the Progenetix and arrayMap cancer genome databases; contributions to the CNHL and DIPG projects. Recipient of a fellowship from the Swiss Institute of Bioinformatics.

Y55 K14 | prisni.rath@uzh.ch

last edit 2013-10-02
[display] [permalink] [edit] [top]

@progenetix | arrayMap Changes (- 2013-05-22)


  • bug fix: fixing lack of clustering for CNA frequency profiles in the analysis section
  • removed "Series Search" from the arrayMap side bar; kind of confusing - just search for the samples & select the series


  • introduced a method to combine sample annotations and segmentation files for user data processing (see "FAQ & GUIDE")
  • fixed some array plot presentation and replotting problems


  • consolidation of script names - again, don't use deep links (besides for "api.cgi?...")
  • moving of remaining sample selection options (random sample number, segments number, age range) to the sample selection page, leaving the pre-analysis page (now "prepare.cgi") for plotting/grouping options
  • fixed the KM-style survival plots


  • re-factoring of the cytobands plotting for histograms and heatmaps; this also fixes missing histogram tiles
  • analysis output page: the circular histogram/connections plot and group specific histograms are now all available as both SVG and PNG image files


Some changes to the plotting options:

  • the circular plot is now added as a default; and connections are drawn in for <= 30 samples (subject to change)
  • one can now mark up multiple genes (or other loci of interest), for all plot types


  • added option to create custom analysis groups based on text match values
  • rewritten circular plot code


  • copied data for PMIDs 17327916, 17311676, 18506749 and 18246049 from arrayMap to Progenetix


  • bug fix: gene selector was broken for about a week; fixed


  • In many places, images are now converted sever side to PNG data streams and embedded into the web pages. This will substantially decrease web data traffic and page download times. Fully linked SVG images (including region links etc.) are still available through the analysis pipeline.


  • data fix: PMID 18160781 had missing loss values (due to irregular character encoding); fixed, thanks to Emanuela Felley-Bosco for the note!


  • moved the region filter from the analysis to the sample selection page
  • added a "mark region" option to the analysis page: one now can highlight a genome region in histograms and matrix plots


  • added "select all" option to entity lists
  • implemented first version of sample-to-entity match score
  • added single sample annotation input field to "User File Processing"; i.e. one can now type in CNA data for a single case, and have this visualised and similar cases listed
  • added per sample CNA visualisation to the samples details listings (currently if up to 100 samples)
  • added direct access to sample details listing to the subsets pages


  • adding of abstract search to the publication search page


  • introduction of a matching function for similar cases by CNA profile, accessible through the sample details pages of both Progenetix and arraymap


  • Introduction of SEER groups


The database now contains the copy number status for different interval sizes (e.g. 1MB). With this, users can now create their own data plots (histograms etc.) using more than 10000 cancer copy number profiles with a high resolution. The options here are still being tested and improved - comments welcome!


  • added a new export file format "ANNOTATED SEGMENTS FILE", which uses the first columns for standard segment annotation, followed by some diagnostic and clinical data; i.e., the information for a case is repeated for each segment:
    GSM255090   22  25063244    25193559    1   NA  C50 8500/3  breast  Infiltrating duct carcinoma, NOS    Carcinomas: breast ca.  NA  1   51  0.58  
    GSM255090   22  25368299    48899534    -1  NA  C50 8500/3  breast  Infiltrating duct carcinoma, NOS    Carcinomas: breast ca.  NA  1   51  0.58  
    GSM255091   1   2224111 30146401    -1  NA  C50 8500/3  breast  Infiltrating duct carcinoma, NOS    Carcinomas: breast ca.  NA  0   72  0.54  
    GSM255091   1   35418712    37555461    1   NA  C50 8500/3  breast  Infiltrating duct carcinoma, NOS    Carcinomas: breast ca.  NA  0   72  0.54  


  • added gene selection for region specific replotting of array data


  • the gene database has been changed to the last version of the complete (HUGO names only) Ensembl gene list for HG18; previously, only a subset of "cancer related genes" was offered in the gene selection search fields


  • some interface and form elements have been streamlined (e.g. less commonly used selector fields, sample selection options)
  • some common options are now displayed only if activated (e.g. "mouse over" to see all files available for download)
  • icon quality has been enhanced for all but the details pages


  • New: All pre-generated histogram and ideogram plots are now produced based on a 1Mb matrix, with a 500Kb minimum size filter to remove CNV/platform dependent background from some high resolution array platforms. The unfiltered data can still be visualized through the standard analysis procedures.
  • Bug fix: Interactive segment size filtering so far only worked for region specific queries, but not as a general filter (see above). This has been fixed; a minimum segment size in the visualization options now will remove all smaller segments.


  • NEW: change log; that is what is shown here
  • FEATURE: The interval selector now has options to include the p-arms of acrocentric chromosomes (though the data itself there may be incompletely annotated!). Feature requested by Melody Lam.
last edit 2013-09-23
[display] [permalink] [edit] [top]

Vignette: Exploring focal gene hits in Progenetix and arrayMap

When exploring a candidate oncogene, one of the interesting questions is the frequency of copy number abnormalities involving the gene's locus in different cancer types. While Progenetix offers a powerful platform to detect cancers of interest, the specifics of those changes can be explored with the help of arrayMap.

Example: Focal gain/amplification involving the MYCN locus

1 Go to "Gene CNA Frequencies" in Progenetix

2 Start to type the gene's name and select the correct one

3 Options

  • select "More Options" and change the region size to 5000 (kb)
  • change the region type from "9" to "1" (only gains)

4 Receive the scores

  • for the different subsets, the relative number (percentage) of samples with the hit is shown
  • a "score" valu weighs this by the overall genome complexity in the subset (i.e. higher complexity => reduced score)

(... to be continued)

last edit 2013-01-23
[display] [permalink] [edit] [top]

Vignette: Prepare annotated files for upload and processing

This is a workflow for processing you own data, including annotation fields (e.g. diagnoses, clinical data) for group visualisation.

  • process your samples (e.g. from segmentation file)
  • click on "Download Files ..." to show the options, and select "PROGENETIX TAB FILE"
  • open this in a spreadsheet software (e.g. OpenOffice or LibreOffice; or in a text editor and copy the content into a spreadsheet)
  • fill in the missing data
  • save as a tab-delimited text file (preferably Unix line feed endings, fields not quoted)
  • reload your file and select the "tab delimited" format; use the correct aCGH or cCGH assignment
  • process ...

last edit 2013-05-08
[display] [permalink] [edit] [top]



New User Guide & API

The API documentation for Progenetix and arrayMap has now be folded into the Progenetix Wiki. The information below is to be considered deprecated (though most of it may remain stable).


New option: Publication based sample map from article data using"collection=publications"; e.g.:


While sample mapping described below is based on samples with data in Progenetix or arrayMap, the use of "collection=publications" will query the publication database, including also aCGH/cCGH publications for which no sample data is available, but for which we have extracted supposed sample numbers from the articles' texts.

Possible publication search tags:

  • author_m (multiple, "OR" treated; e.g. author_m=Lichter&author_m=Beroukhim)
  • text_m (multiple, "OR" treated; e.g. text_m=medulloblast&text_m=neuroblast)
  • pmid_m (multiple, "OR" treated; e.g. pmid_m=123&pmid_m=678)
  • techniques_m (though multiple, only 2 options; aCGH, cCGH; and empty=both)


New option: sample map; e.g.:

... provides a Google Maps interface showing the location of submitters (corresponding author's institutions) of the publications including samples for the query.

This will be based on data included in the database, not on all published a/cCGH samples!

Please note in the example the relatively large amount of data from East Asia; ICDO3=8170 is hepatocellular ca.


New option: samplematrix; e.g.:

... will plot all individual samples (sorted by their ID; as of now, for clustering etc., one has to go through the browser ...).


Change: Standard plot format is nom PNG. SVG images can be called by adding "&imgFormat=svg" to the call.



We now provide real-time copy number frequency plots, for both our Progenetix and arrayMap collections. At this time, the API calls will deliver SVG images only; they are the qualitatively best solution (scalable, clickable, embeddable ...), but may fail in ancient browsers - please use recent editions of Safari/Firefox/Chrome etc.

The link structure is shown below. We'll try to keep this stable; however, please let us know if implementing these links in production environments. And please follow our Twitter feed @progenetix.

Since the plots are generated in real-time and are rather complex (i.e. >1MB for a histoplot with 1Mb resolution), it may take some seconds until the image is returned & interpreted.

The base constructor starts with




... followed by one of the required base parameters

  • ICDO3=nnnn/n
  • PMID=nnnnnnnn
  • SERIESID=xxxxxxxxxx

Please note that the keys (ICDO3 ...) are all CAPS, and that the values have to be full matches to existing parameters in Progenetix or arrayMap.

Scope: Data is queried in the scope of either the Progenetix or arrayMap collection, and will default to Progenetix (but for the SERIESID to arrayMap).

Correct minimal query examples would be:

Plot options

The standard return will be a histogram of genomic gains/losses (chromosomes 1-22) in the selected dataset, in the format of an SVG vector plot. Other options can be chosen by adding a query parameter "plot", with one of the values"

  • adding "&plot=ideogram" will produce CNA frequencies in a standard chromosomal ideogram arrangement
  • adding "&plot=chr8" (with "8" being one of the chromosomes) will just deliver this chromosome in an upright gain/loss frequency plot - basically a cut-out from the histogram
  • adding "&plotLinks=1" will produce an SVG, in which each interval is linked to the UCSC genome browser; however, the image size will increase dramatically (for a histoplot from ~250kb to 1.5Mb)
  • adding "&chr2plot=8,11" to the histoplot (or without plot selection) will produce a histoplot of all the comma separated chromosomes; if less than 3 of those, the image will default to the "linked" version


last edit 2014-04-09
[display] [permalink] [edit] [top]

Array-based analyses in the current technological landscape

We have started a new blog, with an article discussing some aspects of array based molecular analyses and their utility in the current technological landscape. Enjoy! Comments are welcome...

last edit 2014-03-20
[display] [permalink] [edit] [top]

arrayMap and Progenetix interface update

The navigation icons of the arrayMap and Progenetix sites have been updated. This is mostly a cosmetic change, but some of the linking has been streamlined, too.

last edit 2012-05-06
[display] [permalink] [edit] [top]

arrayMap feature update(s)

Over the last weeks, we have introduced a number of new search/ordering features to arrayMap. Some of those mimic functions previously implemented in Progenetix. Overall, the highlights are:

ICD entity aggregation
all ICD-O entities with their according samples
ICD locus aggregation
all tumor loci with their according samples
Clinical group aggregation
clinical super-entities (e.g. "breast ca.": all carcinoma types with locus breast) with their samples
Publication aggregation
all publication with samples in arrayMap

In contrast to Progenetix, we do not offer precomputed SCNA histograms. However, users can generate them on the fly, but should consider the specific challenges in doing so (e.g. noise background in frequency calculations).

last edit 2012-04-12
[display] [permalink] [edit] [top]

arrayMap featured at the Journal of the National Cancer Institute

A news feature by Mike Martin discusses our arrayMap resource in a recent issue of the Journal of the National Cancer Institute (JNCI ).

last edit 2012-08-29
[display] [permalink] [edit] [top]

arrayMap File Names

We are unifying the structure of of the arrayMap file naming scheme, to allow for multiple file versions. The order of the different tags doesn't really matter (I suggest using the base file descriptor as first element, though).


  • segments
    • any type of segments file following the "ID -tab- chro -tab- segstart -tab- segend -tab- value" (and optional "-tab- probenumber") order
  • probes
    • any type of file describing probe specific values "ID -tab- chro -tab- basepos -tab- value" order


  • calibrated
    • if values have been adjusted to account for the specific signal dynamics, or if a baseline correction has been performed
  • called
    • if status values are used instead of the original segments value
  • filtered
    • for segments files; after threshold and/or noise and/or CNV filtering
  • given
    • data as provided from the source; used for segments files
  • hg18
  • hg19


Example Explanation
segments.tab minimal canonical segments file name; this will be processed depending on calling parameters
segments,filtered.tab segments file, but only having values accepted after thresholding etc.
segments,filtered,called.tab segments file, but with values just being the calling indicators (at the moment "1" and "-1")
segments,called,hg19.tab as above, but specifying the genome edition
segments,given.tab segments values as provided from a source; may have been remapped by us to another genome edition
segments,given,called.tab interpreted gain / loss regions as provided
probes,calibrated.tab future project: calibration of probe dynamics (i.e. sample specific, that presumptive single gains center around 0.25 ...)
last edit 2012-06-28
[display] [permalink] [edit] [top]

arrayMap manuscript accepted at PLoS ONE

Cai, H., N. Kumar, and M. Baudis. arrayMap: A Reference Resource for Genomic Copy Number Imbalances in Human Malignancies. PLoS ONE 2012: accepted.

The original version of the manuscript is available at

arXiv quantitative biology

We will announce the final version as soon as it becomes available.

last edit 2012-04-17
[display] [permalink] [edit] [top]

arrayPlotter feature update

The arrayPlotter module underwent some enhancements:

  • BUG FIX: array segments without probe number (e.g. from GP annotated data) are not removed anymore when re-plotting
  • NEW: Baseline correction; This is useful for re-plotting arrays which have a shift of the "normal" probe value away from 0;. This correction is automatically applied to the thresholds, too (i.e. a BLC of 0.5 with GTH 0.15 and LTH -0.15 will call original values of -0.4 as gain and -0.6 as loss).
  • NEW: More parameters are now shifted towards the "PLOT FACTORS" field, for free text editing. Be careful, though ...


last edit 2012-05-25
[display] [permalink] [edit] [top]

BIO612: Haoyang Cai: Focal Copy Number Changes in Cancer and Genome Structure

last edit 2013-01-16
[display] [permalink] [edit] [top]

Browser Compatibility

last edit 2012-04-12
[display] [permalink] [edit] [top]


last edit 2013-12-18
[display] [permalink] [edit] [top]

CNA sample profile similarity search

last edit 2012-10-25
[display] [permalink] [edit] [top]

CNZ Zurich Joint Cancer meeting: Michael Baudis - Genomic copy number aberrations in cancer - Patterns, Targets, Resources

last edit 2013-06-17
[display] [permalink] [edit] [top]

Computer hardware

last edit 2012-06-14
[display] [permalink] [edit] [top]

Data Download

last edit 2012-04-12
[display] [permalink] [edit] [top]

DIPG data collaboration meeting in Zurich

last edit 2013-04-05
[display] [permalink] [edit] [top]

Filemaker database fields

last edit 2012-08-27
[display] [permalink] [edit] [top]

First copy number profiling data from methylation arrays added

last edit 2013-04-27
[display] [permalink] [edit] [top]

Haoyang Cai - a new arrayMap PhD from the Baudis group

last edit 2013-11-08
[display] [permalink] [edit] [top]

Haoyang Cai presenting chromothripsis data at Cancer Network Zurich retreat

last edit 2013-04-11
[display] [permalink] [edit] [top]

Happy New Year

last edit 2012-12-30
[display] [permalink] [edit] [top]

Improved search options, now with autocomplete

last edit 2013-11-11
[display] [permalink] [edit] [top]

Interface enhancements

last edit 2012-07-03
[display] [permalink] [edit] [top]

Internal documentation how-to

last edit 2012-06-14
[display] [permalink] [edit] [top]

Latex tips & tricks

last edit 2012-06-14
[display] [permalink] [edit] [top]


last edit 2012-07-03
[display] [permalink] [edit] [top]

Link: Atlas of Cytogenetics in Hematology/Oncology

last edit 2013-09-25
[display] [permalink] [edit] [top]

Link: CompBio Zurich

last edit 2013-09-25
[display] [permalink] [edit] [top]

Link: IntOGen

last edit 2013-09-25
[display] [permalink] [edit] [top]

Link: NCI SKY/M-FISH and CGH Database

last edit 2013-09-25
[display] [permalink] [edit] [top]


last edit 2013-11-08
[display] [permalink] [edit] [top]

More than 2500 publications registered in Progenetix

last edit 2014-04-30
[display] [permalink] [edit] [top]

New interval options

last edit 2012-06-05
[display] [permalink] [edit] [top]

New option for filtering focal copy number aberrations

last edit 2013-11-08
[display] [permalink] [edit] [top]

new post

last edit 2013-11-01
[display] [permalink] [edit] [top]

new post

last edit 2013-11-01
[display] [permalink] [edit] [top]

new post

last edit 2013-11-01
[display] [permalink] [edit] [top]

New Progenetix article published at PLoS ONE

last edit 2012-08-27
[display] [permalink] [edit] [top]

New Progenetix server hardware

last edit 2014-02-18
[display] [permalink] [edit] [top]

New publication - "Progenetix: 12 years of oncogenomic data curation"

last edit 2013-11-13
[display] [permalink] [edit] [top]

Nitin Kumar - a new @progenetix PhD

last edit 2012-07-10
[display] [permalink] [edit] [top]

Our arrayMap based "Chromothripsis-like patterns" article is out

last edit 2014-02-04
[display] [permalink] [edit] [top]

Perl script for reading sample data from mongoDB, performing some operations and writing output files

last edit 2012-06-14
[display] [permalink] [edit] [top]

Principle of CGH

last edit 2013-01-22
[display] [permalink] [edit] [top]

Progenetix & arrayMap RSS feed

last edit 2013-01-23
[display] [permalink] [edit] [top]

Progenetix and arrayMap API for cancer genome copy number aberration frequency profiles

last edit 2013-02-07
[display] [permalink] [edit] [top]

Progenetix and arrayMap cancer genome database server errors

last edit 2012-12-27
[display] [permalink] [edit] [top]

Progenetix and arrayMap status update

last edit 2012-12-28
[display] [permalink] [edit] [top]


last edit 2012-03-23
[display] [permalink] [edit] [top]

Renal cell carcinoma paper published at BMC Cancer

last edit 2012-07-27
[display] [permalink] [edit] [top]

Search Samples

last edit 2012-03-26
[display] [permalink] [edit] [top]

SEER categories in Progenetix and arrayMap

last edit 2012-10-22
[display] [permalink] [edit] [top]

Spring collaboration meeting in Timisoara

last edit 2013-03-26
[display] [permalink] [edit] [top]

Thesis Defence Haoyang Cai: Characterization of Cancer Genomes through Systematic Analyses of Oncogenomic Data Assemblies

last edit 2013-11-08
[display] [permalink] [edit] [top]

Website and database internal change notes

last edit 2012-12-05
[display] [permalink] [edit] [top]

www.compbio.ch - The new web address for @compbiozurich

last edit 2013-01-11
[display] [permalink] [edit] [top]

Search Posts