|
|
|
Downloading
GENOTEXT programs
The following
software is available from this project. These programs are not meant
to be downloaded and run; each will need modifications for ones own computer
system and database structure.
- A program
in Perl that iterates through GSE
.soft files in a directory, extracts seven free-text annotations of
each sample and series, and stores these in a MySQL database.
- A program
in Java that iterates through
the annotations in the MySQL database and uses MetaMap to map the text
into UMLS concepts. This program requires the MetaMap
Transfer libraries.
- A program
in Java that iterates through
the mappings and removes many (but not all!) that were manually determined
to be incorrect.
- A program
in R and its data
file that together can create a dendrogram figure, clustering GEO
data sets by the concepts mapped from their annotations. See the gallery
for an example of this dendrogram.
- A data
file containing the mappings between GEO annotations and UMLS
String
Unique Identifiers (SUI). To make effective use of this file, you
will need the UMLS
MRCON or MRCONSO table to convert SUI to Concept
Unique Identifiers (CUI) or readable strings. This file is tab-delimited
and is approximately 16 MB in size.
- Column
1 contains the GEO object (GDS = GEO Data Set, GSE = GEO Series,
GSM = GEO Sample)
- Column
2 contains the annotation (title, description, source, keyword)
- Column
3 contains a phrase of the annotation
- Column
4 contains the score of mapping (from the MetaMap
programming libraries)
- Column
5 contains the UMLS String
Unique Identifier (SUI)
- Column
6 indicates whether the mapping was considered an erroneous mapping
and removed by the Java program (above); 1 indicates removed, 0
indicates not removed
- A data
file containing relations between genes (indicated by NCBI
Gene ID, formally LocusLink) and UMLS
concepts. A relation between a gene and concept indicates that a statistically
significant difference in expression level is seen for that gene in
GEO data sets annotated with that concept, compared with those data
sets not annotated with the concept. To make effective use of this file,
you will need the UMLS
MRCON or MRCONSO table to convert CUI
to readable strings, as well as access to the NCBI
Gene site to translate IDs into gene name and symbols. To reproduce
the analysis in the manuscript, you will also require the Homologene
table. This file is tab-delimited and is approximately 22 MB in size.
- Column
1 contains the LocusLink
(now NCBI Gene) identifier for this relation
- Column
2 contains the UMLS
Concept Unique Identifier (CUI) for this relation, obtained
from the String Unique Identifier
- Column
3 indicates whether this gene's expression was statistically significantly
higher (1) or lower (0) in those GEO data sets annotated with the
concept, compared to those GEO data sets measuring this gene but
not annotated with the concept
- Column
4 contains the p-value from the t-test performed for this
gene and concept across the GEO data sets with and without the annotative
concept
- Column
5 contains the q-value from the t-test performed, computed
using 100 permutations
- Column
6 contains the mean rank-normalized expression level for this gene
in the GEO data sets measuring this gene and annotated with the
concept
- Column
7 contains the mean rank-normalized expression level for this gene
in the GEO data sets measuring this gene and not annotated with
the concept
- Column
8 contains number of GEO data sets measuring this gene and annotated
with the concept
- Column
9 contains number of GEO data sets measuring this gene and not annotated
with the concept
Updated October 1, 2004 |