|
|
|
GENOTEXT
GENOTEXT
(GENOmics conTEXT) is an automated sytem that determines the experimental
context of samples and data sets in the NCBI
Gene Expression Omnibus. It extracts the contextual identifiers and
annotations of samples and models these annotations using the largest
available compendium of biomedical vocabularies, the Unified
Medical Language System (UMLS).
The program
and its results are pending publication.
The following
software is available from this project. These
programs are not meant to be downloaded and run; each will need modifications
for ones own computer system and database structure.
- A program
in Perl that iterates through GSE
.soft files in a directory, extracts seven free-text annotations of
each sample and series, and stores these in a MySQL database.
- A program
in Java that iterates through
the annotations in the MySQL database and uses MetaMap to map the text
into UMLS concepts. This program requires the MetaMap
Transfer libraries.
- A program
in Java that iterates through
the mappings and removes many (but not all!) that were manually determined
to be incorrect.
- A program
in R and its data file that
together can create a dendrogram figure, clustering GEO data sets by
the concepts mapped from their annotations. See the gallery
for an example of this dendrogram.
Funding
for this work was provided in part by:
Updated January 10, 2006 |