The Molecular Biology Database Collection: 2005 update
http://www.100md.com
《核酸研究医学期刊》
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
* Tel: +1 301 435 5910; Fax: +1 301 435 7794; Email: galperin@ncbi.nlm.nih.gov
ABSTRACT
The Nucleic Acids Research Molecular Biology Database Collection is a public online resource that lists the databases described in this and previous issues of Nucleic Acids Research together with other databases of value to the biologist and available throughout the world. All databases included in this Collection are freely available to the public. The 2005 update includes 719 databases, 171 more than the 2004 one. The databases are organized in a hierarchical classification that simplifies the process of finding the right database for any given task. The growing number of databases related to immunology, plant and organelle research have been accommodated by separating them into three new categories. The database summaries provide brief descriptions of the databases, contact details, appropriate references and acknowledgements. The online summaries also serve as a venue for the maintainers of each database to introduce database updates and other improvements in the scope and tools. These updates are particularly important for those databases that have not been described in print in the recent past. The database list and summaries are available online at the Nucleic Acids Research web site, http://nar.oupjournals.org/.
COMMENTARY
In its 12th annual database issue, Nucleic Acids Research presents 135 new and recently updated molecular biology databases. The current release of the Nucleic Acids Research online Molecular Biology Database Collection (Table 1) includes 719 databases, an increase of 171 over last year (1). The database geography also continues to expand. This year we have the first databases from Brazil, Cuba, Estonia, Greece, Hungary (2,3), Malaysia, Taiwan (4,5) and Turkey. The database authors have again shown remarkable creativity in naming their databases: last year's ORFanage has been joined by H-ANGEL , PROPHECY , PANDIT , SIEGE and other aptly named databases. The database list is divided into 14 major categories, 3 more than last year. One of them, the category for immunology-related databases, was created in response to the rapid growth in databases dedicated to immuno-polymorphisms, certainly an offshoot of the Human Genome Project. The proliferation of plant-related databases, sparked by the completion of the first two plant genomes (Arabidopsis thaliana and Oryza sativa) and steady progress in sequencing other plants, prompted elevation of their status from a subcategory to a separate category. One more category, organelle databases, was created to provide a single home for the databases on chloroplasts and mitochondria from various sources. As always, we hope that these database listings, organized into a hierarchical structure, will help introduce the community of biologists to the enormous body of data accumulated by their colleagues and simplify the process of finding the appropriate database for each particular task.
Table 1. Molecular Biology Database Collectiona
Certainly, this listing is far from exhaustive. To be included, databases had to be publicly available to any user and allow direct browsing of the data without downloading any special software that might interfere with institutional firewalls. This means leaving out several potentially interesting database projects.
Of the 548 databases featured in last year's compilation, 17 have been dropped from the list because they have been discontinued, merged into larger ones or, like the well-known Kabat database, converted to commercial access. The previous year saw a loss of 13 databases from a total of 386 in the 2003 release. These numbers and the history of Swiss-Prot (http://www.expasy.org/announce/) and the GDB Human Genome Database (http://www.gdb.org/gdb/aboutGDB.html) show that the databases that offer useful content usually manage to survive, even if they have to change their funding scheme or migrate from one host institution to another. This means that the open database movement is here to stay, and more and more people in the community (as well as in the financing bodies) now appreciate the importance of open databases in spreading knowledge. It is worth noting that the majority of database authors and curators receive little or no remuneration for their efforts and that it is still difficult to obtain money for creating and maintaining a biological database. However, disk space is relatively cheap these days and database maintenance tools are fairly straightforward, so that a decent database can be created on a shoestring budget, often by a graduate student or as a result of a postdoctoral project. Many databases in this compilation originated just that way—as collections of data on a certain research topic that a particular lab was studying anyway, formatted in a user-friendly way by graduate or even undergraduate students as part of their dissertations or course work. Subsequent maintenance and further development of these databases, however, require a commitment that can only be applauded. For scientists from China, France, Japan, Russia and many other countries, making their databases available to the worldwide community also means maintaining them in English, the lingua franca of science, which does not always come easily. Such efforts deserve special appreciation.
Speaking of appreciation, those who maintain databases often do not get much credit for their work either. Other than publication in the Database Issue of Nucleic Acids Research or in Bioinformatics, or an occasional publication in some other journal, there is currently no straightforward way to announce progress. The online summaries published by the database maintainers on the NAR web site partially fill this void. Since the entry number assigned to each database (Table 1) will be stable, these updates can be cited in just the same way as any other online resource. At this time, I would suggest the following format for citing these summaries: ‘The ooTFD database (11) is listed with Accession No. 185 in the NAR Molecular Biology Database compilation (1); see the recent summary at http://www3.oup.co.uk/nar/database/summary/185’. Suggestions for a better format are certainly welcome. Suggestions for the inclusion of additional databases in this Collection, as well as for improvements to the category structure, are also encouraged and should be directed to the author at galperin@ncbi.nlm.nih.gov.
ACKNOWLEDGEMENTS
I thank Rich Roberts, Alex Bateman and my colleagues at NCBI for support and helpful advice, Alice Ellingham and Gill Smith for logistical support, and Claire Saxby, Amanda Titmas and Kate Welsby at Oxford University Press for their patience in handling this compilation.
REFERENCES
Galperin,M.Y. ( (2004) ) The Molecular Biology Database Collection: 2004 update. Nucleic Acids Res., , 32, , D3–D22. .
Barta,E., Sebestyén,E., Pálfy,T.B., Tóth,G., Ortutay,C.P. and Patthy,L. ( (2005) ) DoOP: Databases of Orthologous Promoters, collections of clusters of orthologous upstream sequences from chordates and plants. Nucleic Acids Res., , 33, , D86–D90. .
Tusnády,G.E., Dosztányi,Z. and Simon,I. ( (2005) ) PDB_TM: selection and membrane localization of transmembrane proteins in the Protein Data Bank. Nucleic Acids Res., , 33, , D275–D278. .
Huang,H.-D., Horng,J.-T., Lin,F.-M., Chang,Y.-C. and Huang,C.-C. ( (2005) ) SpliceInfo: an information repository for the modes of mRNA alternative splicing in human genome. Nucleic Acids Res., , 33, , D80–D85. .
Chang,Y.-H., Su,W.-H., Lee,T.-C., Sun,H.-F.S., Chen,C.-H., Pan,W.-H., Tsai,S.-F. and Jou,Y.-S. ( (2005) ) TPMD: a database and resources of microsatellite marker genotyped in Taiwanese populations. Nucleic Acids Res., , 33, , D174–D177. .
Siew,N., Azaria,Y. and Fischer,D. ( (2004) ) The ORFanage: an ORFan database. Nucleic Acids Res., , 32, , D281–D283. .
Tanino,M., Debily,M.-A., Tamura,T., Hishiki,T., Ogasawara,O., Murakawa,K., Kawamoto,S., Itoh,K., Watanabe,S., José de Souza,S., Imbeaud,S., Graudens,E., Eveno,E., Hilton,P., Sudo,Y., Kelso,J., Ikeo,K., Imanishi,T., Gojobori,T., Auffray,C., Hide,W. and Okubo,K. ( (2005) ) The Human Anatomic Gene Expression Library (H-ANGEL), the H-Inv integrative display of human gene expression across disparate technologies and platforms. Nucleic Acids Res., , 33, , D567–D572. .
Fernandez-Ricaud,L., Warringer,J., Ericson,E., Pylv?n?inen,I., Kemp,G., Nerman,O. and Blomberg,A. ( (2005) ) PROPHECY – a database for high-resolution phenomics. Nucleic Acids Res., , 33, , D369–D373. .
Whelan,S., de Bakker,P.I. and Goldman,N. ( (2003) ) Pandit: a database of protein and associated nucleotide domains with inferred trees. Bioinformatics, , 19, , 1556–1563. .
Shah,V., Sridhar,S., Beane,J., Brody,J.S. and Spira,A. ( (2005) ) SIEGE: Smoking Induced Epithelial Gene Expression Database. Nucleic Acids Res., , 33, , D573–D579. .
Ghosh,D. ( (2000) ) Object-oriented transcription factors database (ooTFD). Nucleic Acids Res., , 28, , 308–310. .(Michael Y. Galperin*)
娣団剝浼呮禒鍛返閸欏倽鈧喛绱濇稉宥嗙€幋鎰崲娴f洑绠e楦款唴閵嗕焦甯归懡鎰灗閹稿洤绱╅妴鍌涙瀮缁旂姷澧楅弶鍐ㄧ潣娴滃骸甯拋妞剧稊閺夊啩姹夐敍宀冨閹劏顓绘稉鐑橆劃閺傚洣绗夌€规粏顫﹂弨璺虹秿娓氭稑銇囩€硅泛鍘ょ拹褰掓鐠囦紮绱濈拠鐑藉仏娴犺埖鍨ㄩ悽浣冪樈闁氨鐓¢幋鎴滄粦閿涘本鍨滄禒顒佹暪閸掍即鈧氨鐓¢崥搴礉娴兼氨鐝涢崡鍐茬殺閹劎娈戞担婊冩惂娴犲孩婀扮純鎴犵彲閸掔娀娅庨妴锟�* Tel: +1 301 435 5910; Fax: +1 301 435 7794; Email: galperin@ncbi.nlm.nih.gov
ABSTRACT
The Nucleic Acids Research Molecular Biology Database Collection is a public online resource that lists the databases described in this and previous issues of Nucleic Acids Research together with other databases of value to the biologist and available throughout the world. All databases included in this Collection are freely available to the public. The 2005 update includes 719 databases, 171 more than the 2004 one. The databases are organized in a hierarchical classification that simplifies the process of finding the right database for any given task. The growing number of databases related to immunology, plant and organelle research have been accommodated by separating them into three new categories. The database summaries provide brief descriptions of the databases, contact details, appropriate references and acknowledgements. The online summaries also serve as a venue for the maintainers of each database to introduce database updates and other improvements in the scope and tools. These updates are particularly important for those databases that have not been described in print in the recent past. The database list and summaries are available online at the Nucleic Acids Research web site, http://nar.oupjournals.org/.
COMMENTARY
In its 12th annual database issue, Nucleic Acids Research presents 135 new and recently updated molecular biology databases. The current release of the Nucleic Acids Research online Molecular Biology Database Collection (Table 1) includes 719 databases, an increase of 171 over last year (1). The database geography also continues to expand. This year we have the first databases from Brazil, Cuba, Estonia, Greece, Hungary (2,3), Malaysia, Taiwan (4,5) and Turkey. The database authors have again shown remarkable creativity in naming their databases: last year's ORFanage has been joined by H-ANGEL , PROPHECY , PANDIT , SIEGE and other aptly named databases. The database list is divided into 14 major categories, 3 more than last year. One of them, the category for immunology-related databases, was created in response to the rapid growth in databases dedicated to immuno-polymorphisms, certainly an offshoot of the Human Genome Project. The proliferation of plant-related databases, sparked by the completion of the first two plant genomes (Arabidopsis thaliana and Oryza sativa) and steady progress in sequencing other plants, prompted elevation of their status from a subcategory to a separate category. One more category, organelle databases, was created to provide a single home for the databases on chloroplasts and mitochondria from various sources. As always, we hope that these database listings, organized into a hierarchical structure, will help introduce the community of biologists to the enormous body of data accumulated by their colleagues and simplify the process of finding the appropriate database for each particular task.
Table 1. Molecular Biology Database Collectiona
Certainly, this listing is far from exhaustive. To be included, databases had to be publicly available to any user and allow direct browsing of the data without downloading any special software that might interfere with institutional firewalls. This means leaving out several potentially interesting database projects.
Of the 548 databases featured in last year's compilation, 17 have been dropped from the list because they have been discontinued, merged into larger ones or, like the well-known Kabat database, converted to commercial access. The previous year saw a loss of 13 databases from a total of 386 in the 2003 release. These numbers and the history of Swiss-Prot (http://www.expasy.org/announce/) and the GDB Human Genome Database (http://www.gdb.org/gdb/aboutGDB.html) show that the databases that offer useful content usually manage to survive, even if they have to change their funding scheme or migrate from one host institution to another. This means that the open database movement is here to stay, and more and more people in the community (as well as in the financing bodies) now appreciate the importance of open databases in spreading knowledge. It is worth noting that the majority of database authors and curators receive little or no remuneration for their efforts and that it is still difficult to obtain money for creating and maintaining a biological database. However, disk space is relatively cheap these days and database maintenance tools are fairly straightforward, so that a decent database can be created on a shoestring budget, often by a graduate student or as a result of a postdoctoral project. Many databases in this compilation originated just that way—as collections of data on a certain research topic that a particular lab was studying anyway, formatted in a user-friendly way by graduate or even undergraduate students as part of their dissertations or course work. Subsequent maintenance and further development of these databases, however, require a commitment that can only be applauded. For scientists from China, France, Japan, Russia and many other countries, making their databases available to the worldwide community also means maintaining them in English, the lingua franca of science, which does not always come easily. Such efforts deserve special appreciation.
Speaking of appreciation, those who maintain databases often do not get much credit for their work either. Other than publication in the Database Issue of Nucleic Acids Research or in Bioinformatics, or an occasional publication in some other journal, there is currently no straightforward way to announce progress. The online summaries published by the database maintainers on the NAR web site partially fill this void. Since the entry number assigned to each database (Table 1) will be stable, these updates can be cited in just the same way as any other online resource. At this time, I would suggest the following format for citing these summaries: ‘The ooTFD database (11) is listed with Accession No. 185 in the NAR Molecular Biology Database compilation (1); see the recent summary at http://www3.oup.co.uk/nar/database/summary/185’. Suggestions for a better format are certainly welcome. Suggestions for the inclusion of additional databases in this Collection, as well as for improvements to the category structure, are also encouraged and should be directed to the author at galperin@ncbi.nlm.nih.gov.
ACKNOWLEDGEMENTS
I thank Rich Roberts, Alex Bateman and my colleagues at NCBI for support and helpful advice, Alice Ellingham and Gill Smith for logistical support, and Claire Saxby, Amanda Titmas and Kate Welsby at Oxford University Press for their patience in handling this compilation.
REFERENCES
Galperin,M.Y. ( (2004) ) The Molecular Biology Database Collection: 2004 update. Nucleic Acids Res., , 32, , D3–D22. .
Barta,E., Sebestyén,E., Pálfy,T.B., Tóth,G., Ortutay,C.P. and Patthy,L. ( (2005) ) DoOP: Databases of Orthologous Promoters, collections of clusters of orthologous upstream sequences from chordates and plants. Nucleic Acids Res., , 33, , D86–D90. .
Tusnády,G.E., Dosztányi,Z. and Simon,I. ( (2005) ) PDB_TM: selection and membrane localization of transmembrane proteins in the Protein Data Bank. Nucleic Acids Res., , 33, , D275–D278. .
Huang,H.-D., Horng,J.-T., Lin,F.-M., Chang,Y.-C. and Huang,C.-C. ( (2005) ) SpliceInfo: an information repository for the modes of mRNA alternative splicing in human genome. Nucleic Acids Res., , 33, , D80–D85. .
Chang,Y.-H., Su,W.-H., Lee,T.-C., Sun,H.-F.S., Chen,C.-H., Pan,W.-H., Tsai,S.-F. and Jou,Y.-S. ( (2005) ) TPMD: a database and resources of microsatellite marker genotyped in Taiwanese populations. Nucleic Acids Res., , 33, , D174–D177. .
Siew,N., Azaria,Y. and Fischer,D. ( (2004) ) The ORFanage: an ORFan database. Nucleic Acids Res., , 32, , D281–D283. .
Tanino,M., Debily,M.-A., Tamura,T., Hishiki,T., Ogasawara,O., Murakawa,K., Kawamoto,S., Itoh,K., Watanabe,S., José de Souza,S., Imbeaud,S., Graudens,E., Eveno,E., Hilton,P., Sudo,Y., Kelso,J., Ikeo,K., Imanishi,T., Gojobori,T., Auffray,C., Hide,W. and Okubo,K. ( (2005) ) The Human Anatomic Gene Expression Library (H-ANGEL), the H-Inv integrative display of human gene expression across disparate technologies and platforms. Nucleic Acids Res., , 33, , D567–D572. .
Fernandez-Ricaud,L., Warringer,J., Ericson,E., Pylv?n?inen,I., Kemp,G., Nerman,O. and Blomberg,A. ( (2005) ) PROPHECY – a database for high-resolution phenomics. Nucleic Acids Res., , 33, , D369–D373. .
Whelan,S., de Bakker,P.I. and Goldman,N. ( (2003) ) Pandit: a database of protein and associated nucleotide domains with inferred trees. Bioinformatics, , 19, , 1556–1563. .
Shah,V., Sridhar,S., Beane,J., Brody,J.S. and Spira,A. ( (2005) ) SIEGE: Smoking Induced Epithelial Gene Expression Database. Nucleic Acids Res., , 33, , D573–D579. .
Ghosh,D. ( (2000) ) Object-oriented transcription factors database (ooTFD). Nucleic Acids Res., , 28, , 308–310. .(Michael Y. Galperin*)
瀵邦喕淇婇弬鍥╃彿
閸忚櫕鏁為惂鐐
鐠囧嫯顔戦崙鐘插綖
閹兼粎鍌ㄩ弴鏉戭樋
閹恒劌鐡ㄧ紒娆愭箙閸欙拷
閸旂姴鍙嗛弨鎯版
|