CyanoExpress Dataprocessing

Data processing

Microarray data

Microarray data were downloaded as raw data from the different repositories. Subsequently, the data were pre-processed and analyzed using R statistical language and various Bioconductor packages. All dual-channel microarray datasets were individually normalized by optimized intensity-dependent normalization (OIN), based on iterative regression of log fold changes with respect to average log spot intensities. Single-channel microarrays were analyzed using functions from the limma package, i.e., the normexp function for background correction, and quantile normalization to adjust signal intensities for each experiment. Intensities from spots corresponding to the same genomic feature were averaged to obtain a value per gene per array. For each of the collected microarray experiments, a linear model was designed based on the sample annotation and evaluated using the lmFit function. The use of explicit models allowed us to define stringent statistical contrasts (with respect to the control condition), which were evaluated for each experiment. These derived contrasts were then imported into CyanoExpress. An additional normalization step was used for data from the experiment by Kucho et al. 2005 (PMID: 15743968) measuring gene expression under circadian oscillations. Here, the expression values were adjusted for each gene individually, so that the mean expression over the full day-night cycle is equal zero. This is to facilitate the inspection of differential expression observed in the experiment. More information regarding experimental conditions and controls used in the included microarray experiments can be found on the Sample Information page.

RNA-seq data
RNA-Seq data from the study by Kopf et al are displayed as logged fold changes compared to exponentially growing Synechocystis under normal conditions. This also corresponds to the experimental design of the study, as 9 different conditions were induced after transferring exponentially growing cultures. Thus, this RNA-Seq dataset includes fold changes with respect to exponential growth for 9 different conditions such as cold stress, heat stress, Fe limitation. Note that Kopf et al. primarily assigns expression to transcriptional units. To still enable listing by genes, we assumed the same expression for genes belonging to the same transcriptional unit. Finally, to enable log-transformation, the expression of transcriptional units were set to 1 for conditons, for which zero transcription was detected in the original study. The latter step is also known as adding of pseudo-counts and a common procedure for this type of data.