There is now an incremental process. In addition to quarterly releases, there will be weekly updates to create references for new species which do not have a reference genome and to correct any inconsistencies in the set of references due to taxonomic merges. As a result, there may be more frequent updates to the reference set.
There is now a history tracking file available under the ASSEMBLY_REPORTS path on FTP that lists the history of reference genome selection, including both prokaryotes and eukaryotes.
We recognize that the former species names like Human immunodeficiency virus 1 (HIV-1) are broadly used in public health, educational institutions, and research. To minimize the impact of this change on those who use NCBI resources, we will add the new binomial species names (e.g. Lentivirus humimdef1) while keeping the former names available in the lineage for each species. The former names will move below the new binomial species name in the taxonomy hierarchy, ensuring continuity. Examples are provided below. Continue reading “NCBI Taxonomy: Upcoming Changes to Viruses”→
Download the updated bacterial and archaeal reference genome collection! We built this collection of 20,403 genomes by selecting the “best” genome assembly for each species among the 350,000+ prokaryotic genomes in RefSeq (except for E. coli for which two assemblies were selected as reference). Changes have been made to the selection criteria including upgrades for type and complete assemblies resulting in a much larger set of changes as compared to previous updates.
What’s New?
2,298 species have an updated reference
1,123 species are represented in this collection for the first time
1,125 species have a better reference assembly than in the April 2024 set
50 species were removed because of changes in NCBI Taxonomy or uncertainty in their species assignment
Effective August 2024, core_nt will become the default
Interested in faster nucleotide BLAST searches with more focused search results? As previously announced, NCBI has been re-evaluating the BLAST nucleotide database (nt) to make it more compact and more efficient. Thanks to your feedback, NCBI’s BLAST is excited to introduce the core nucleotide database (core_nt), an alternative to the default nt database that contains better-defined content and is less than half the size.
Benefits of BLAST core_nt over nt
Enables faster searches
Returns similar top results for most searches
Reduces redundancy for some highly represented organisms
Allows easier download and requires less storage space for database download for standalone BLAST
Download the updated bacterial and archaeal reference genome collection! We built this collection of 19,328 genomes by selecting the “best” genome assembly for each species among the 350,000+ prokaryotic genomes in RefSeq (except for E. coli for which two assemblies were selected as reference).
What’s New?
413 species are represented in this collection for the first time
198 species are represented by a better assembly
27 species were removed because of changes in NCBI Taxonomy or uncertainty in their species assignment
Removing contaminated sequences using NCBI quality assurance tools
Do you use BLAST to identify a sequence or the evolutionary scope of a gene? That can be challenging if contaminated and misclassified sequences are in the BLAST databases and show up in your search results. To address this problem, we now use the NCBI quality assurance tools listed below to systematically remove these misleading sequences from the default nucleotide (nt) and protein (nr) BLAST databases.Continue reading “Cleaner BLAST Databases for More Accurate Results”→
In April 2024, the FASTA (sequence text) files of the sequences in the Basic Alignment Search Tool (BLAST) databases will no longer be available on the FTP site. However, you can easily generate FASTA files yourself from the formatted BLAST databases by using the BLAST utility blastdbcmd that comes with the standalone BLAST programs. This provides you the flexibility to generate organism-specific FASTA files using NCBI’s taxonomy IDs for specific organisms or groups.
Download the updated bacterial and archaeal reference genome collection! This collection (18,941 genomes as of Jan 18, 2024) was built by selecting the “best” genome assembly for each species among the 330,000+ prokaryotic genomes in RefSeq (except for E. coli for which two assemblies were selected as reference). You can speed up your sequence searches by running them against these high-quality genomes instead of the entire nucleotide or protein database.
Are you a biology student working on a research project? NCBI offers free access to a wide variety of resources and tools to help you find and download data for your project.
How and why do you use our resources? Check out the example below:
Your professor has assigned you a research project looking at the sequence and structure of the TP53 gene in the domestic cat (Felis catus). In addition, you were asked to find information on this gene and its genomic region in other members of the cat family (Felidae).Continue reading “Using NCBI Data and Tools for Your Research Project”→