Effective August 2024, core_nt will become the default
Interested in faster nucleotide BLAST searches with more focused search results? As previously announced, NCBI has been re-evaluating the BLAST nucleotide database (nt) to make it more compact and more efficient. Thanks to your feedback, NCBI’s BLAST is excited to introduce the core nucleotide database (core_nt), an alternative to the default nt database that contains better-defined content and is less than half the size.
Benefits of BLAST core_nt over nt
- Enables faster searches
- Returns similar top results for most searches
- Reduces redundancy for some highly represented organisms
- Allows easier download and requires less storage space for database download for standalone BLAST
What is core_nt?
Core_nt contains the same eukaryotic transcript and gene-related sequences as nt. The core_nt database is nt without most eukaryotic chromosome sequences. Most nucleotide BLAST searches with core_nt will be similar to the nt database. However, core_nt is better than nt for accomplishing your most common BLAST search goals, such as identifying gene-related sequences like transcript sequences and complete bacterial chromosomes. This is because, in recent years, nt has acquired more low-relevance, non-annotated, and non-gene content.
Example:
The screenshots below show nucleotide BLAST results using the Drosophila melanogaster rosy gene transcript (NM_079613) as the search query. The core_nt results (top panel) and the nt results (bottom panel) are the same for the top 12 results, which are mostly related transcript sequences from other Drosophila species. With the original nt database, the majority of the remaining 100 results are dominated by unannotated D. melanogaster chromosome assemblies (boxed in red) that provide no gene-related information.
If you’re interested in searching against eukaryote genome assemblies including full-length chromosomes, we suggest searching against the RefSeq Reference Genomes database (available in the BLAST database list) or individual assemblies from NCBI Datasets. Sets of genomes can also be searched through the WGS or microbial BLAST database options.
Try it out!
Search our new BLAST core_nt database and let us know what you think. Share your feedback using the yellow Feedback button located on the bottom of the BLAST webpage.
Please note: As of August 2024, the new BLAST core_nt database will be the default. The original nt database will still be available as a choice in the database list.
Stay up to date
BLAST is a part of the NIH Comparative Genomics Resource (CGR). CGR facilitates reliable comparative genomics analyses for all eukaryotic organisms through an NCBI Toolkit and community collaboration. Follow us on social @NCBI and join our mailing list to keep up to date with BLAST and other CGR news.
Questions?
If you have questions or would like to provide feedback, please reach out to us at info@ncbi.nlm.nih.gov.
That’s great to know on this fantastic update.
Will the same database be made available in FTP as well ?
Thanks
Yes, it should be up on the BLAST ftp area sometime in August.
Thanks. Is there a plan to provide core_nt for eukaryotes, prokaryotes, virus and others, similar to the experimental databases nt_euk, nt_prok, nt_viruses & nt_others.
No plans to do that at the moment. In any case, only eukaryote sequence content would be affected in the taxonomy split dbs. By the way, core_nt is up on the FTP site now https://ftp.ncbi.nlm.nih.gov/blast/db/
But core_nt is not listed now in https://ftp.ncbi.nlm.nih.gov/blast/db/blastdb-manifest.json. Will it?
Please contact our help desk so we can assist you: https://support.nlm.nih.gov/support/create-case/
There is instead: https://ftp.ncbi.nlm.nih.gov/blast/db/core_nt-nucl-metadata.json
Trying to get this database but it is still not working! I am using ./update_blastdb.pl –decompress core_nt but all we get is:
Warning: No BLASTDB metadata for core_nt
core_nt not found, skipping.
Hello! Thanks for your question. Please contact our help desk so we can assist you: https://support.nlm.nih.gov/support/create-case/
Hi, is there a way to download the sequences from type material in core_nt?
Hello! Thanks for your question. Please contact our help desk so we can assist you: https://support.nlm.nih.gov/support/create-case/
So what is in the current nt database? Is it still the same sequences just duplicates have been merged? Does it still include all the normal transcripts and chromsomes whereas core_nt has the transcripts and gene sequences only?
Hello! Thanks for your questions. Please contact our help desk so we can assist you: https://support.nlm.nih.gov/support/create-case/