Data submission and access


VectorBase welcome any and all submissions to our repository and archive of genomic and related data on Invertebrate Vectors of Human Pathogens including gene annotations from the community through WebApollo, genes linked to publications, and population and most high-throughput "-omics" data.

Please read below for contributing to an ongoing genome project or other community activitites

The VectorBase BRC accepts and hosts only completely open and fully shared data. All derived data generated under the BRC contract will also be completely open and fully shared. Driving Biological Projects (DBPs) will be expected to comply with all NIH policies on Sharing Research Data and Sharing of Model Organisms for Biomedical Research as specified in “Principles and Guidelines for Recipients of NIH Research Grants and Contracts on Obtaining and Disseminating Biomedical Research Resources: Final Notice”, ( Data generated by the DBPs that are hosted or stored at the BRC web site will be required to be completely open and fully shared.

Release frequency

VectorBase is committed to a new release every two months with all data freely available for public use based on NIH/NIAID policy. A list of these changes and the state of current versions on this date (e.g., current gene sets) can be found on the Releases section of VectorBase.

As for other VectorBase contract deliverables:

a. All data and information generated under the contract, including data and information generated by the DBPs, shall be released in the BRC web site and/or the BRCs Portal by the BRCs Portal Contractor within one month from publication or within one year of generation, whichever comes first.

b. All analysis tools, algorithms, software interfaces, source codes and documentation, and other software technologies developed and/or enhanced under the contract shall be made freely available to the scientific community with an open source license.

Data access

When using these data, it is important to acknowledge these resources by citing source data publications from the community and the VectorBase Bioinformatics Resource Center (BRC). To acknowledge use of this BRC please cite the latest VectorBase publication below:

Giraldo-Calderón GI. et al. 2015. VectorBase: an updated bioinformatics resource for invertebrate vectors and other organisms related with human diseases.
Nucleic Acids Research 43(Database issue):D707-13.

As many researchers have contributed to our resource, including genome sequencing efforts, please see below for specific guidance on vector data provided on various VectorBase sections:

"Pre" genome data

As a public service to the biological research community, vector genome data are often made available by the sequence producers as "pre" sites before scientific publication. We use and refer to the general NIH/NHGRI policy for Release and Database Deposition of Sequence Data (link): ''The producing laboratories intend to publish the sequence of the genome and certain large-scale analyses of the sequence in a timely manner. The sole exception to the unrestricted use of these unpublished data is that the data may not be used for the initial publication of the complete genome sequence assembly or other large-scale analyses. In this context, 'large-scale' refers to regions the size of the whole genome or individual chromosomes and examples of 'large-scale analyses' include identification of regions of evolutionary conservation across an entire genome and identification of complete sets of genomic features such as genes, repeat structures, GC content, etc. The producing laboratories will, however, be open to the possibility of collaboration on such assemblies or analyses.'' Any redistribution of these data hosted on VectorBase should carry this notice.

VectorBase would be happy to connect you to sequencing consortia if you would like to collaborate on any VectorBase genome. Community gene annotations, for example, are highly useful as "pre" sites move to official releases.

VectorBase gene sets and genomes

Current "Pre" gene annotations are collaborations between VectorBase and sequencing consortia and as such the "pre" genome data restrictions apply. In short, you are welcome to use single genes of interest to your research program, but should contact the sequencing consortia for any large-scale analysis-- as collaborations -- prior to publication of the genome manuscript.

Published genome annotations have been a collaboration between communities, genome sequencing centers and VectorBase, and as such researchers should cite the original genome papers to acknowledge the efforts of these research teams.

Microarray data

Data from gene expression profiling experiments are re-analysed using VectorBase's standard processing pipeline, such that both gene expression values and microarray probe/reporter to gene associations may differ slightly from the original publications. Some unpublished but "publication quality" studies that have been submitted to the global repositories (i.e. GEO and ArrayExpress) are also presented in VectorBase. Please cite the original publications, repositories and VectorBase as appropriate.