PageMan

Table of Contents


Introduction

This software (PageMan) helps you in analyzing data series from time courses and/or treatments. One problem of these kinds of datasets is usually that due to the sheer mass of data points it is difficult to visualize an overview of the data. PageMan can also be used to visualize individual experiments.

Workflow

The general workflow for this software is:
Load Expression data -> Configure your Datafile -> Choose Ontology Information -> Choose Statistics and Parameters-> Modify Layout -> Export to Graphics

Load Expression Data

When you start an analysis you will be presented with a wizard which will ask you for a file to load. The wizard also gives a short example of the file types expected and the last three files that were selected. (Please note, that the wizard might look different depending on your operating system).
 Add Data wizard view

After you have selected your data file, PageMan will try to autoconfigure your file in respect to decimal format and headers used. Even though, this should function ocrrectly in at least 95% of all cases, PageMan will still offer you the possibilty to override the file format. 
configure data menu
You can specify if the first row contains a header with experiment names (checkbox "first row contains header"), the decimal format  ("." or "," as separator : "number format"), and you can deselect individual experiments to exclude them from the further analysis if desired.

Afterwards, you are prompted to select a Mapping file, which maps gene entries into functional classes.

If everything has functioned correctly, you are then presented with an option dialog where you can select the statistics to use (currently Wilcox or OverRepresentationAnalysis), and which method you wish to use to tackle the multiple testing problem. Finally in the case that you selected ORA, you can set a threshold, which is used to flag objects as interesting. Whenever the absolute value of an object is exceeding this value it is flagged and counted.
For example, if you set 3.2 as a threshold both 3.3 and -4 would be counted but 2.9 would not.

Statistics

Wilcoxon Test

The Wilcoxon test tests the hypothesis that objects within one functional class behave different from the rest of the objects. It is a rank based test and therefore reasonably immune to outliers. It detects if a functional class consistently behaves differently but it does not detect extreme changes of the distribution. If one sees a normal distribution in all objects but within class X one observes a bimodal distribution, this might not be detected since both modes might cancel out.
For simplicity you can think of the Wilcox test as a rank based t-test.

Overrepresentation Analysis

This is classical test to test for each class, if when given the number of objects choosen, the total number of objects, and the class size, one could expect the number of objects from this class by chance. There are several different ways to investigate this, commonly the hypergeometric distribution, Fisher's exact test or in the case of large samples, binominal distributions can be used. PageMan uses Fisher's exact test or the chi squared test. Both are based on calculating contigency tables.
Please bear in mind that when you use such a test,  you should not use data prefiltered based on the criteria you want to use to perform the test, since you decrease the power of your test.
As an example consider the following:  There are thirty people in a German class and ten of them failed, four of them being girls with totally 20 girls in the class.
Therefore you would have: number of boys = 30 - 20 =10

Let us transfer that into a contingeny table of the from

Failed Passed Total
Boys a b NB
Girls c d NG
Total NF NP N

Contingency table
Failed Passed Total
Boys 6 4 10
Girls 4 16 20
Total 10 20 30

The Fisher test is then calculated with the values 6,4,4,16. Within PageMan Fisher's exact test is calculated as the two tailed test version, meaning it tests if more than expected or less than expected girls/boys fail. This is in fact equivalent to performing both hypergeometric tests. Since PageMan also switches the sign based on expected frequencies, the implementation here is identical to first performing a hypergeometric test for overrepresentation and then one for underrepresentation.
The Chi square p-value is caluclated according to the Yates correction
X² (Yates)=N(|ad-bc|-N/2)²  / NF NP NG NB
Here it would be
30(6*16-4*4-30/2)²/(10*20*10*20)
which is ~3.17. However in this case the Chi test might not be applicable, since two cells are below a value of 5. (There are some different criteria, which are applied to evaluate if a Chi square test could be used and these usually involve looking at the class counts. In cases where class count numbers are low the Chi square test probably does not lead excat p-values)

Multiple Test correction

Usually when using PageMan multiple hypothesises are tested at once (namely is any of the several hundred classes differnt / over under represented). Thus, multiple testing correction methods have to be employed.
PageMan implements different correction methods for the multiple test problem. These are Bonferoni correction (just multiply p-values by number of tests), Benjamini-Hochberg false discovery rate control (FDR), as well as the Benjamini Yekutieli correction. After correcting for multiple testing new "p-values" are computed. In the case of FDR controlling corrections these new values actually represent the FDR level e.g. a "p-value" of 0.05 represents an FDR level of 5%.
All test corrections currently implemented in PageMan do not take the dependency structure of the nodes into account (e.g. Glycolyis is also in Metabolism). Only recently have methods to correct for this been developed. Once these methods become more accepted and tested, they will probably be included in PageMan.

Visualization model

Usually all tests give (corrected/adjusted) p-values. However, p-values are difficult to display, since they range from 1 to 0. One common approach is to to take the logarithm of these values. In this way several orders of magnitude can be covered. PageMan compresses the p-values even more, by converting them into z-scores. These z-scores come from statistical theory and can be looked up in many statistical tables. Basically a z-score of 1.96 represents a p-value of 0.05.

Analyze Data Files

If you want to analyze data files, you will be guided by a wizard. Upon loading the software, choose File->Analyze Experiment File or press the Analysis button.
start analysis
First the Wizard will ask you for an experiment file. These are usually tab separated text files or excel files which can contain a header consisting of experiment identifiers and rows consisting of a object identifier in the first column and experimental values for the different experiments in the other cells.
The experimental values are usually log fold change (M) values. Missing data is represented by an X.


Id Cold Heat Meat
264517_at -1.3 2 1.4
264518_at 0 X 1.2
264519_at -4 4 1.3
264520_at -2 -1.3 4

Also once you have analyzed an experiment, you can add in addtional data from other experiments (datasets), for example coming  from a different species. Alternatively, you could analyze the same experiment with different statistics. However, note that adding in data is slower than analyzing data in one go. Also, please consider that when you correct the p-values of your data, the correction is applied within your newly added data only,.i.e. if you addded in an ontologial  subset of the data (e.g. only metabolic nodes) you would get better (lower) p-values when using any kind of p-value correction.
Add in additional data

Layout

There are several options once you have successfully analyzed your data or if you load an already existing datafile. You can insert spacing columns between the individual experiments by right clicking on the experiment names and selecting "insert spacer here" from the pop-up menu. You can also delete experiments by right clicking on them. And finally you can move experiment columns to the left and right to put them into the layout you desire.
change experiments

If you want to annotate a functional class, just click on a box which represents an experiment of this functional class, and an annotation  arrow of the same color as your box will appear. It will carry the name of the functional class. You can move these annotations by dragging the arrows around on your canvas. You can edit the annotation by right clicking on the arrow. If you have MapMan installed, PageMan will also tell you in which pathways you would find this class.
add an annotation arrow

If you want to expand/collapse the hierarchy tree simply click on the nodes. If you right click on the hierarchy tree, you can choose to make everything visible. Moreover you have the opportunity to collapse all parent nodes which have no children containing any coloured (i.e. significant) entry in at least one experiment, to hide everything which is not significant, or to annotate all nodes which are significant in at least one experiment.

collapse/ expand pop up menu



Finally, when right-clinking on the nodes (Create instance of...) you have the opportunity to just display the current object as well as all its children in a separate window as a new instance in order to annotate these in more detail. Please note that the new view is not linked to the original but rather treated as a separate instance.

Influencing the behavior of individual boxes

You have several options to influence the visualization of  the individual boxes. You can change the color format by choosing a different one (Options->SetModel). You are then presented with a drop down list, offering you options like:
and several others. Here the color that is used for down (under) regulated classes is indicated by a - and for up (over)regulated classes by a  +. If only two colors are indicated the colors range from one to the other and have their midpoint in white (meaning no changes). Otherwise black is the default middle color, meaning that no change is colorized in black. (The standard behavior for two color arrays is Red Black Green, meaning upregulated is red, no change is black and downregulation is green)
The color scheme is also influenced by the option scaling which tells PageMan at which value a color should become saturated. Thus, if you choose 1 everything becomes saturated at a value of 1 or above (or below -1). However, this option would not be very useful, because PageMan uses a z-scoring scheme by default (see section statistics) and masks all p-values above 0.05 (z value ~1.96). Therefore, all values are either numerically zero or at least 1.96 when using the standard transformation scheme.

Moreover, you can change the size of the individual boxes by setting the options Set Width and Set Height, where you can set the width and height in pixels respcetively. Finally, you can set if there should be a border around the boxes and its sizes in pixels (size = zero, means no border).

Setting general options

You can choose if you want to display a hierarchy tree or not by pressing Options->Set Tree or by pressing the Tree Icon treeicon.


Export /Save

You can export your generated graphics using File->Export. Here you are offered with various export options:
You can export to the following vector based formats
Using the following bitmapped image formats (e.g. for legacy web pages, modern browsers support svg!)
If you just want save your analysis as it is at its current state, you use File->Save As or File->Save if you have saved your file previously. PageMan will save your actual data  in two files, the file  will look something like this
Heat Cold
1.1 PS.light 2 3 1 0
1.1.1 Ps.light.something 1 2 1 0
2 KDE -4 4 1 0
3 bone 1 -0.4 1 0
and another file storing the annotation objects. Which will be named as your file with the  .NOD extension.

In your actual data file the first column is the identifier code the second the human readable identifier the following the Experiments which are named in the first line and the last two columns are the visibility and if a node is collapsed. (It is better no to tamper with these two)


FAQ


What do the colors and the up/ down mean?

When you use Over/Under-representation analysis, your experiment is split into up and down regulated genes (positive and negative sign). In both cases, if a class has more genes than expected exceeding the threshold, than it is colored in blue (red) and if it has less than expected in red (green).

How do I open/modify/edit saved files of PageMan?

Use PageMan. Otherwise you can open them easily using any text editor. Please note that PageMan as of version  0.4 splits the data into annotation objects (.NOD) and the actual data. The actual data is representeted in a tab separated table with one header line.
Using this you can actually display any kind of data in a false color grid.
Just put some random identifier code in column 1 and always finish with a column containing 1 and one containing 0. As of PageMan version 0.12 PageMan autodetects experiment files to be represented as heatmaps and adds the additional data.

How are the Mapping files in PageMan structured?

PageMan uses tree like ontology structures. These can be stored in xml, txt, excel or in PageMan's own m02 format (a bzipped, sealed xml to save space (and lots of it)).
Every node is described by a numerical code and a human readable name. Each node can contain identifiers. Moreover, every node identifier code also gives a path to its parent nodes, which are separated by dots.
So node 1.1.1.1 is_a 1.1.1 is_a .1.1 is_a 1. 

A typical mapping file might look like this:

<?xml version="1.0" encoding="UTF-16"?>

<Mapping>
    <Bin name="control genes" code="0"/>
    <Bin name="PS" code="1">
        <Bin name="PS.lightreaction" code="1.1">
            <Bin name="PS.lightreaction.photosystem II" code="1.1.1">
                <Bin name="PS.lightreaction.photosystem II.LHC-II" code="1.1.1.1">
                     <BinItem id="251082_at" type="">
                          <Text>At5g01530-&gt;chlorophyll A-B binding protein CP29 (LHCB4), identical to CP29 (Arabidopsis thaliana) GI:298036; contains Pfam profile: PF00504 chlorophyll A-B binding protein</Text>
                     </BinItem>
                     <BinItem id="254970_at" type="">
                          <Text>At4g10340-&gt;chlorophyll A-B binding protein CP26, chloroplast / light-harvesting complex II protein 5 / LHCIIc (LHCB5), identical to SP:Q9XF89 Chlorophyll A/B-binding protein CP26, chloroplast precursor (Light-harvesting complex II protein 5) (LHCB5) (LHCI</Text>
                     </BinItem>
                </Bin name>
                <Bin name="PS.lightreaction.photosystem II.PSII polypeptide subunits" code="1.1.1.2">
                </Bin name>
            </Bin name>
        </Bin name>
    </Bin name>
</Mapping>

or like this:
BIN NAME IDENTIFIER DESCRIPTION
1 PS
1.1 PS.lightreaction
1.1.1 PS.lightreaction.photosystem II
1.1.1.1 PS.lightreaction.photosystem II.LHC-II 251082 At5g01530-&gt;chlorophyll A-B binding protein CP29 (LHCB4), identical to CP29 (Arabidopsis thaliana) GI:298036; contains Pfam profile: PF00504 chlorophyll A-B binding protein
1.1.1.2 PS.lightreaction.photosystem II.PSII polypeptide subunits


Where do I get Mapping files from?

From the standard MapMan repository. http://gabi.rzpd.de. Or use the converter in order to tap into KEGG, MIPS or GO ontologies.

What does the scalebar indicate?

Since for each Bin p-values are calculated, a transformation has to be made to display them on a linear scale for visualization. This could be achieved by taking for example the logarithm of the p-values. However, very small p-values would still get a very high value. Therefore, a z-transformation is used. Thus, a p-value of 0.05 is assigned to a value of 1.96.

I want to convert an OBO ontology into a mapping file for PageMan is that possible?

This will be made possible in the future. Currently you can use KEGG, GO, and MIPS ontologies.

What is the difference between a vector and a bitmaped image?

Many images encountered today are bitmapped graphics. A bitmapped graphic splits a picture into pixels (picture elements). These pixels are (usually) square blocks of a given color. Therefore when you enlarge such an image or try to increase it to a higher resolution (say 300 dpi from 100 dpi) you actually don't add any value.
See the following example

Here is an original image stored as a bitmapped image
original

When we now enlarge the image three fold one starts seeing blocking artifacts (or stairs):
blow up
These blocking artifacts result from the fact that when increasing the size the color information at each position is the same as what it would have been if the picture were smaller. One also sees that Rectangles don't suffer, since they don't change shape.
 In order to overcome these effects, programs like Corel Photopaint, or Adobe Photoshop usually  filter over the edges and reduce the contrast. This technique is commonly known as smoothing or anti-aliasing they can also use techniques like bilinear filtering.
This results in something like the following picture
smoothed enlarged image

Now the jarred edges have disappeared. However, no matter how good the filtering/ smoothing techniques is, it is not adding extra information. This new image is as good as the original. A price that has to be paid using these filters is that the image now does not look as crisp anymore.

Once an image is in bitmapped format, the information is fixed, it will always look jarry or blurred when blown up. Some professional tools try to trace the outline of the shapes and generate vectors from it.

Vector formats on the other hand, try to describe a picture by basic geometric shapes and their position on the drawing canvas. In the above case this could be
put a blue block in the middle right,
put an arrow at the bottom very right
put the text "8h mean expression" on the top and rotate it by 45°
and so on.

Since now the orignal information is preserved, once can edit the text, and enlarge the image without loosing quality. Therefore enlarging the previous picture, if it were vectorized would result in something like this:

enlarged vector image

Therefore - whenever you can- use vector formats.

Common bitmapped fomrats and editors:
bmp, jpg, png, gif, tif, tga
Microsoft Paint
Corel Photopaint
Adobe Photoshop

Common vector formats and editors
svg and a lot of vendor specific formats
Corel Draw
Adobe Illustrator
Microsoft Powerpoint
These can also embedd bitmapped graphics but still these graphics are bitmapped.

I want to get my data publication ready in which file format should I save?

Choose one of the above mentioned vector based formats, many journals accept eps, which can be generated from ps files easily. Or use svg / emf (on windows) and import these into e.g. Corel Draw, Adobe Illustrator or if you insist Microsoft Powerpoint, edit the graphics in these programs and prepare them for final publication

Help, the journal says 300/ 600/1 gazillion dpi but I can't find such an option in PageMan?

Fret not, use one of the vector based formats and you can modify your graphics to any size you want without loosing quality. See "What is the difference between a vector and a bitmapped image".

I want to use feature statistics X / I think I found a bug

Cool. Please let us know so we can fix it /implement it.

How can I convert a BioConductor limma analysis into PageMan format?

Let fit be your ebayes and coef 2 be of interest:
then you can do the following

M<-fit$coef[,2]
IDENTIFIER<-names(M)
ftable<-cbind(IDENTIFIER,M)
write.table(file="file.txt",ftable,quote=F,sep="\t",col.names=T,row.names=F)

Acknowledgments:


Thanks are due to the developers of the Freehep graphics library.

References

Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. _Journal of the Royal Statistical Society Series B, 57, 289-300.

Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Annals of Statistics 29, 1165-1188.

Usadel B, Nagel A, Thimm O, Redestig H, Blaesing OE, Palacios-Rojas N, Selbig J, Hannemann J, Piques MC, Steinhauser D, Scheible WR, Gibon Y, Morcuende R, Weicht D, Meyer S, Stitt M. (2005) Extension of the visualization tool MapMan to allow statistical analysis of arrays, display of corresponding genes, and comparison with known responses. Plant Physiol. 138(3):1195-204.
.