Get Faster, More Focused Search Results with NCBI’s New BLAST Core Nucleotide Database (core_nt)

Get Faster, More Focused Search Results with NCBI’s New BLAST Core Nucleotide Database (core_nt)

Effective August 2024, core_nt will become the default 

Interested in faster nucleotide BLAST searches with more focused search results? As previously announced, NCBI has been re-evaluating the BLAST nucleotide database (nt) to make it more compact and more efficient. Thanks to your feedback, NCBI’s BLAST is excited to introduce the core nucleotide database (core_nt), an alternative to the default nt database that contains better-defined content and is less than half the size. 

Benefits of BLAST core_nt over nt
  • Enables faster searches  
  • Returns similar top results for most searches 
  • Reduces redundancy for some highly represented organisms 
  • Allows easier download and requires less storage space for database download for standalone BLAST 

What is core_nt?

Core_nt contains the same eukaryotic transcript and gene-related sequences as nt. The core_nt database is nt without most eukaryotic chromosome sequences. Most nucleotide BLAST searches with core_nt will be similar to the nt database. However, core_nt is better than nt for accomplishing your most common BLAST search goals, such as identifying gene-related sequences like transcript sequences and complete bacterial chromosomes. This is because, in recent years, nt has acquired more low-relevance, non-annotated, and non-gene content. 

Example: 

The screenshots below show nucleotide BLAST results using the Drosophila melanogaster rosy gene transcript (NM_079613) as the search query. The core_nt results (top panel) and the nt results (bottom panel) are the same for the top 12 results, which are mostly related transcript sequences from other Drosophila species. With the original nt database, the majority of the remaining 100 results are dominated by unannotated D. melanogaster chromosome assemblies (boxed in red) that provide no gene-related information.  Screenshot of BLAST search results comparing core_nt (future default) and nt (current default).

If you’re interested in searching against eukaryote genome assemblies including full-length chromosomes, we suggest searching against the RefSeq Reference Genomes database (available in the BLAST database list) or individual assemblies from NCBI Datasets. Sets of genomes can also be searched through the WGS or microbial BLAST database options.

Try it out! 

Search our new BLAST core_nt database and let us know what you think. Share your feedback using the yellow Feedback button located on the bottom of the BLAST webpage. 

Please note: As of August 2024, the new BLAST core_nt database will be the default. The original nt database will still be available as a choice in the database list. 

Stay up to date

BLAST is a part of the NIH Comparative Genomics Resource (CGR). CGR facilitates reliable comparative genomics analyses for all eukaryotic organisms through an NCBI Toolkit and community collaboration. Follow us on social @NCBI and join our mailing list to keep up to date with BLAST and other CGR news.   

Questions?

If you have questions or would like to provide feedback, please reach out to us at info@ncbi.nlm.nih.gov. 

13 thoughts on “Get Faster, More Focused Search Results with NCBI’s New BLAST Core Nucleotide Database (core_nt)

  1. That’s great to know on this fantastic update.
    Will the same database be made available in FTP as well ?
    Thanks

      1. Thanks. Is there a plan to provide core_nt for eukaryotes, prokaryotes, virus and others, similar to the experimental databases nt_euk, nt_prok, nt_viruses & nt_others.

    1. Trying to get this database but it is still not working! I am using ./update_blastdb.pl –decompress core_nt but all we get is:

      Warning: No BLASTDB metadata for core_nt
      core_nt not found, skipping.

  2. So what is in the current nt database? Is it still the same sequences just duplicates have been merged? Does it still include all the normal transcripts and chromsomes whereas core_nt has the transcripts and gene sequences only?

Leave a Reply