Vignette: Exploring focal gene hits in Progenetix and arrayMap

When exploring a candidate oncogene, one of the interesting questions is the frequency of copy number abnormalities involving the gene's locus in different cancer types. While Progenetix offers a powerful platform to detect cancers of interest, the specifics of those changes can be explored with the help of arrayMap.

Example: Focal gain/amplification involving the MYCN locus

1 Go to "Gene CNA Frequencies" in Progenetix

2 Start to type the gene's name and select the correct one

3 Options

  • select "More Options" and change the region size to 5000 (kb)
  • change the region type from "9" to "1" (only gains)

4 Receive the scores

  • for the different subsets, the relative number (percentage) of samples with the hit is shown
  • a "score" valu weighs this by the overall genome complexity in the subset (i.e. higher complexity => reduced score)

(... to be continued)

last edit 2013-01-23
Vignette: Prepare annotated files for upload and processing

This is a workflow for processing you own data, including annotation fields (e.g. diagnoses, clinical data) for group visualisation.

  • process your samples (e.g. from segmentation file)
  • click on "Download Files ..." to show the options, and select "PROGENETIX TAB FILE"
  • open this in a spreadsheet software (e.g. OpenOffice or LibreOffice; or in a text editor and copy the content into a spreadsheet)
  • fill in the missing data
  • save as a tab-delimited text file (preferably Unix line feed endings, fields not quoted)
  • reload your file and select the "tab delimited" format; use the correct aCGH or cCGH assignment
  • process ...

last edit 2013-05-08
New User Guide & API

The API documentation for Progenetix and arrayMap has now be folded into the Progenetix Wiki. The information below is to be considered deprecated (though most of it may remain stable).


New option: Publication based sample map from article data using"collection=publications"; e.g.:

While sample mapping described below is based on samples with data in Progenetix or arrayMap, the use of "collection=publications" will query the publication database, including also aCGH/cCGH publications for which no sample data is available, but for which we have extracted supposed sample numbers from the articles' texts.

Possible publication search tags:

  • author_m (multiple, "OR" treated; e.g. author_m=Lichter&author_m=Beroukhim)
  • text_m (multiple, "OR" treated; e.g. text_m=medulloblast&text_m=neuroblast)
  • pmid_m (multiple, "OR" treated; e.g. pmid_m=123&pmid_m=678)
  • techniques_m (though multiple, only 2 options; aCGH, cCGH; and empty=both)


New option: sample map; e.g.:

... provides a Google Maps interface showing the location of submitters (corresponding author's institutions) of the publications including samples for the query.

This will be based on data included in the database, not on all published a/cCGH samples!

Please note in the example the relatively large amount of data from East Asia; ICDO3=8170 is hepatocellular ca.


New option: samplematrix; e.g.:

... will plot all individual samples (sorted by their ID; as of now, for clustering etc., one has to go through the browser ...).


Change: Standard plot format is nom PNG. SVG images can be called by adding "&imgFormat=svg" to the call.



We now provide real-time copy number frequency plots, for both our Progenetix and arrayMap collections. At this time, the API calls will deliver SVG images only; they are the qualitatively best solution (scalable, clickable, embeddable ...), but may fail in ancient browsers - please use recent editions of Safari/Firefox/Chrome etc.

The link structure is shown below. We'll try to keep this stable; however, please let us know if implementing these links in production environments. And please follow our Twitter feed @progenetix.

Since the plots are generated in real-time and are rather complex (i.e. >1MB for a histoplot with 1Mb resolution), it may take some seconds until the image is returned & interpreted.

The base constructor starts with


... followed by one of the required base parameters

  • ICDO3=nnnn/n
  • PMID=nnnnnnnn
  • SERIESID=xxxxxxxxxx

Please note that the keys (ICDO3 ...) are all CAPS, and that the values have to be full matches to existing parameters in Progenetix or arrayMap.

Scope: Data is queried in the scope of either the Progenetix or arrayMap collection, and will default to Progenetix (but for the SERIESID to arrayMap).

Correct minimal query examples would be:

Plot options

The standard return will be a histogram of genomic gains/losses (chromosomes 1-22) in the selected dataset, in the format of an SVG vector plot. Other options can be chosen by adding a query parameter "plot", with one of the values"

  • adding "&plot=ideogram" will produce CNA frequencies in a standard chromosomal ideogram arrangement
  • adding "&plot=chr8" (with "8" being one of the chromosomes) will just deliver this chromosome in an upright gain/loss frequency plot - basically a cut-out from the histogram
  • adding "&plotLinks=1" will produce an SVG, in which each interval is linked to the UCSC genome browser; however, the image size will increase dramatically (for a histoplot from ~250kb to 1.5Mb)
  • adding "&chr2plot=8,11" to the histoplot (or without plot selection) will produce a histoplot of all the comma separated chromosomes; if less than 3 of those, the image will default to the "linked" version


last edit 2014-04-09
Browser Compatibility

Pages are created dynamically and mostly are being served as XML. Some browsers have problems with the XHTML/XML doctype. For older browsers, all pages are served as HTML, which on the other hand breaks SVG compatibility.

Working browsers for all features are (oldest compatible versions listed):

  • Safari 3
  • Safari iOS
  • Firefox 3
  • Google Chrome
  • Internet Explorer 9

Most other recent browsers (Opera etc.) should be fine, too, but haven't been tested. The basic requirements for full display are:

  • inline SVG (but possibly can be achieved with plug-in)
  • HTML5 canvas support
last edit 2012-04-12
Progenetix: For any use of theProgenetixdata, e.g. as a reference for aberration frequencies in a certain locus, it is necessary to cite both the website and the original Bioinformatics publication:

  • Baudis, M., & Cleary, M. L. (2001). an online repository for molecular cytogenetic aberration data. Bioinformatics, 17(12), 1228-1229.
  • Progenetix oncogenomic online resource: Baudis, M. (2013)

There is now a publication describing the database's current status. This can be used in addition to or as replacement for the 2001 publication:

  • Cai, H., N. Kumar, N. Ai, S. Gupta, P. Rath, and M. Baudis. Progenetix: 12 years of oncogenomic data curation. Nucleic Acids Res (2013)

In case of citation restrictions, you may just use the Bioinformatics citation, and put the website in the text. A proper citation would look e.g. like:

... according to the Progenetix resource ([1];, copy number ...

... and in the citations:

  • Baudis, M., & Cleary, M. L. (2001). an online repository for molecular cytogenetic aberration data. Bioinformatics, 17(12), 1228-1229.

arrayMap: For arrayMap data, the same rules apply: Citation of the article and the website:

  • Cai, H., Kumar, N., & Baudis, M. 2012. arrayMap: A Reference Resource for Genomic Copy Number Imbalances in Human Malignancies. PLoS One 7(5), e36944.
  • arrayMap: Genomic arrays for copy number profiling in human cancer ( Baudis, M. (2012)

last edit 2013-12-18
CNA size filtering

New: All pre-generated histogram and ideogram plots are now produced based on a 1Mb matrix, with a 500Kb minimum size filter to remove CNV/platform dependent background from some high resolution array platforms. The unfiltered data can still be visualized through the standard analysis procedures.

Bug fix: Interactive segment size filtering so far only worked for region specific queries, but not as a general filter (see above). This has been fixed; a minimum segment size in the visualization options now will remove all smaller segments.

last edit 2012-06-13
Data Download

Data files can be downloaded after having performed a database search or data analysis procedure. An example download box is shown below:

The corresponding file formats are:


This is a standard JSON file structure with each line being a sample entry. You can read it e.g. into a list in Perl:

use JSON;
my $json = JSON->new;
open FILE, "myPathTo/progenetix.json" or warn "No file myPathTo/progenetix.json $!";
my @filecontent = ();
close FILE;
chomp @filecontent;
my @data;
foreach (@filecontent) {
    push(@data, $json->relaxed(1)->decode( $_ ));   


this is a tab-delimited text file containing most of the data fields. CNA segments are concatenated in one entry:



CNA segment information saved as a tab-delimited list, including the sample UID in the first column:

sampleID    chro    basestart   basestop    segvalue    probes
GIST-ass-15 1   0   124299999   -1  NA
GIST-ass-16 1   142400000   149599999   1   NA

Depending on the active page, the value may be the original log2 value from an array or more commonly the status marker. "Probes" will only display a value when plotting array specific data.

last edit 2012-04-12
Improved search options, now with autocomplete

We have added autocomplete options to sample and publication search, and integrated a search across multiple fields (authors, title, abstract, PMID) for the publication search.


last edit 2013-11-11
Latex tips & tricks

Method for getting line numbers into a LaTex document

* \usepackage{lineno}
* \linenumbers % this would be inserted before the start of the text
last edit 2012-06-14
Access to the site and data downloads are free for academic users.

Any commercial use (e.g. using the data for target validation, including Progenetix or arrayMap data into analysis systems) is dependent on a license granted through Michael Baudis, and managed through the University of Zurich.

last edit 2012-07-03
Linking - see API

Linking is now deprecated - please follow the API information!

Most of the information below is outdated or may become so soon.

Links to Progenetix should always use the base "", never specific IP-addresses (which are bound to change). Also, many pages/image files may become moved or renamed. Below are some notes.

1. Locus Score

 lists entities with a gain involving 1p36 
 1=gain, -1=loss, 9=any
 one can use GP annotation (e.g. chr2:15,998,134-16,004,580:1 fuer MYCN)

Values/ranking is calculated on the fly per case (we could use a score matrix, but then we would be limited to defined intervals).

2. ICD-O entities

... can be linked directly; e.g.
will link to ICD-O 8490/3 (Signet ring cell carcinoma).As well,

link to the respective plots in web (PNG) and print (PDF) format.

3. PMIDs

... etc. However, there are many information pages for publications without available data/plots, e.g.

4. Loci

... e.g. links to Larynx. However, although the codes are based on ICD locus topography, the code selection is a bit arbitrary and follows the amount of available data (e.g. most soft tissue tumors are mapped to "connective and soft tissue" instead of specific loci - upper arm, ...).

5. Clinical entities

... are defined as mix of ICD-O entities and locus (e.g. any carcinoma of the breast tissue => "Ca.: breast ca."). Since the annotation may change, one shouldn't use hard links to these.

last edit 2013-05-17
We have implemented a map display for the currently selected articles or samples.

Essentially, all samples of the current subset / search result are projected to their origin, determined from the main institution associated with the publication.

This feature should come in particularly handy when e.g. finding out which institutions are especially active in a given area of cancer genome research. However, for the sample driven listings, this depends on the availability of the sample data through Progenetix/arrayMap.


last edit 2013-11-08
New option for filtering focal copy number aberrations

When looking for focal copy number aberrations, so far min/max values could be used to limit CNAs to a given size range. A typical scenario would be to e.g. set the "max" to "5000" when querying a given gene, thereby limiting the size of the called segment involving the gene to 5Mb - this is pretty much a focal hit (though it still may involve quite a number of other potential targets).

However, this method is not very specific: e.g. a whole chromosome loss in a "medium quality" array may present as hundreds of small segments, thereby triggering "focal" calls.

The addition or sole use of the new "MAX COVERAGE" option adds another layer of "selactability". A value of 5000 there means that only gene hits are evaluated, if in the interval "gene CDR start - 5000" to "gene CDR end + 5000" all segments of the requested type (gain/1 or loss/-1) do not exceed 5000 kb. (A true sliding window approach around the target may be theoretically superior, but in practice would not make a lot of difference.

As always, comments appreciated ...

last edit 2013-11-08
Principle of CGH

CGH schematics - "traditional" 2-color chromosomal CGH

last edit 2013-01-22
Progenetix & arrayMap RSS feed

Progenetix and arrayMap news and guide are now available through RSS:


You can either subscribe to this, or follow it on Twitter @progenetix. Enjoy!

last edit 2013-01-23
Progenetix & arrayMap site updates

Progenetix site update

All data entries have to be done in the ProgenetixCases FMP database, either through importing or editing there. When adding sample data to ProgenetixCases, one has to make sure that some fields are correct (e.g. ICD codes showing up), and that some fields are edited correcttly.

One field that definitely requires editing is "In subsets": usually, one has to select "project_progenetix" in the selector field, and hidd the "Add Subset" button. If samples belong to an additional project etc., also do it for this (e.g. "projact_DIPG"). Please be very careful that the currently active samples are still correct (e.g. see that it is "124 of 30123" in the dartabase sample indicator! Similiar for the "Tags" field.

One has to be especially careful when updating records through the

import => update matching records

feature: Only the required fields etc.

After all changes have been done (from any machine on which the database had been opened), one has to go to the target machine for the website rebuilt - either through VNC or directly.

Open FMP "ProgenetixCases" from remote => (current address), hit the "Web Export" button and wait until the export is finished.

Now go in the terminal to ~/Progenetix/scripts and issue:

perl -am -1 -pg 1 -psite 1

... if you don't want to update arrayMap, too. Will take some hours.

arrayMap site update

For arrayMap, the procedure is rather similar: Edit/import the samplae, select the export script from the "scripts" menu, and wait until finished. The command then would be

perl -am 1 -pg -1 -psite -1

If doing both arrayMap and Progenetix updates, one could simply issue

last edit 2013-04-25
R API and examples

The R data access API (code at the bottom of the page) can be used for direct data calls into R. In the example(s), * pgDataLoader.R* is sourced from a general library DIR; pls. adjust.

  • change 2013-09-19
    • changed the library name
    • improved the library (updated parameter names, more options - regions etc., some feedback)
    • now public
  • change 2013-06-26
    • added the "db" option; you'll need this to e.g. access arraymap data
    • added the "valuematrix" format

Example - survival

One can use this example to search for gender related survival bias in an ICD entity (here "9500/3" - change acc. to your interest). Other modifications are possible.

rm(list = ls())  
source('pgDataLoader.R', chdir = FALSE)  
ICDcode <-c("9500/3")  
PGdata <- pgDataLoader(icdm_m=c(ICDcode), output="matrix")  
survData <- subset(PGdata,$FOLLOWUP)==FALSE)  
survData <- subset(survData, subset = survData$DEATH  %in% c(0,1))  
plot(survfit(Surv(survData$FOLLOWUP, survData$DEATH) ~ 1,, main=paste("Overall survival"), xlab="months", ylab="survival", cex=1.2)  
survData <- survData[grep("male", survData$GENDER),]  
femaleNo <- nrow(subset(survData, survData$GENDER == "female"))   
plot(survfit(Surv(survData$FOLLOWUP, survData$DEATH) ~ survData$GENDER == "female",, col=c("black","blue"), main=paste("Survival and Gender (ICD-O ", ICDcode, ")", sep=""), xlab="months", ylab="survival", cex=1.2)  
sdf <- survdiff(Surv(survData$FOLLOWUP, survData$DEATH) ~ survData$GENDER == "female")  
pcsq <- round(pchisq(sdf$chisq, df=1, lower=FALSE), digits=5)  
legend("bottomright", c("male", paste("female", ' (', femaleNo, ' of  ', nrow(survData), ')', sep="")), fill=c("black","blue"), inset=c(0.02,0.02), bg="azure1", cex=0.8)  
legend("bottomleft", c(paste("p:", pcsq)), inset=c(0.02,0.02), bty="n", cex=1)  


You can download the required R function here: pgdataloader.r

last edit 2013-10-01
As of March 2012, registration is only necessary for

  • any commercial use of the database
  • maintaining private projects
  • participating in collaborative studies containing unpublished material

However, we are happy about feedback and suggestions.

For any use of the site by for-profit entities, an individual license has to be obtained. Starting 2008, licensing proceeds for new licensees has been handled through the Univerity of Zurich.

Please contact Michael Baudis for further information.

last edit 2012-03-23
Search Samples

Samples can be queried by specifying a number of parameters and/or keywords. The following example shows a text query for anything with "renal" in diagnosis text, ICD-O text or locus text, limited to platforms containing "affy" in the name, and having a minimum or 45000 probes. Also, there will be a limitation to samples having any change overlapping the CDKN2A locus - the selection is just under way:

After performing the search, the user is presented with selection lists containing parameters encountered in the current samples, for further exclusion options.

last edit 2012-03-26
Search Posts