Tag: Basic Local Alignment Search Tool (BLAST)

An updated bacterial and archaeal reference genome collection is available!

Download the updated bacterial and archaeal reference genome collection! We built this collection of 21,258 genomes by selecting the “best” genome assembly for each species among the 400,000+ prokaryotic genomes in RefSeq.

What’s new?

As previously announced, we updated our release process:

There is now an incremental process. In addition to quarterly releases, there will be weekly updates to create references for new species which do not have a reference genome and to correct any inconsistencies in the set of references due to taxonomic merges. As a result, there may be more frequent updates to the reference set.
There is now a history tracking file available under the ASSEMBLY_REPORTS path on FTP that lists the history of reference genome selection, including both prokaryotes and eukaryotes.

Continue reading “An updated bacterial and archaeal reference genome collection is available!” →

Top of 2024: A Look at the NCBI Insights Blog

As we begin a new year, let’s look back at the top NCBI Insights Blog posts of 2024 based on number of views.

In case you missed any of these, check them out:

Continue reading “Top of 2024: A Look at the NCBI Insights Blog “ →

NCBI Taxonomy: Upcoming Changes to Viruses

To reflect changes to the International Code of Virus Classification and Nomenclature (ICVCN) made by the International Committee on Taxonomy of Viruses (ICTV), NCBI will add binomial species names to about 3000 viruses. These updates to NCBI Taxonomy are planned for spring 2025, but you can view the changes now in the ICTV’s Virus Metadata Resource.

We recognize that the former species names like Human immunodeficiency virus 1 (HIV-1) are broadly used in public health, educational institutions, and research. To minimize the impact of this change on those who use NCBI resources, we will add the new binomial species names (e.g. Lentivirus humimdef1) while keeping the former names available in the lineage for each species. The former names will move below the new binomial species name in the taxonomy hierarchy, ensuring continuity. Examples are provided below.  Continue reading “NCBI Taxonomy: Upcoming Changes to Viruses” →

Updated Bacterial and Archaeal Reference Genome Collection now Available!

Download the updated bacterial and archaeal reference genome collection! We built this collection of 20,403 genomes by selecting the “best” genome assembly for each species among the 350,000+ prokaryotic genomes in RefSeq (except for E. coli for which two assemblies were selected as reference). Changes have been made to the selection criteria including upgrades for type and complete assemblies resulting in a much larger set of changes as compared to previous updates.

What’s New?

2,298 species have an updated reference
1,123 species are represented in this collection for the first time
1,125 species have a better reference assembly than in the April 2024 set
50 species were removed because of changes in NCBI Taxonomy or uncertainty in their species assignment

Continue reading “Updated Bacterial and Archaeal Reference Genome Collection now Available!” →

Get Faster, More Focused Search Results with NCBI’s New BLAST Core Nucleotide Database (core_nt)

Effective August 2024, core_nt will become the default

Interested in faster nucleotide BLAST searches with more focused search results? As previously announced, NCBI has been re-evaluating the BLAST nucleotide database (nt) to make it more compact and more efficient. Thanks to your feedback, NCBI’s BLAST is excited to introduce the core nucleotide database (core_nt), an alternative to the default nt database that contains better-defined content and is less than half the size.

Benefits of BLAST core_nt over nt

Enables faster searches
Returns similar top results for most searches
Reduces redundancy for some highly represented organisms
Allows easier download and requires less storage space for database download for standalone BLAST

Continue reading “Get Faster, More Focused Search Results with NCBI’s New BLAST Core Nucleotide Database (core_nt)” →

Now Available! Updated Bacterial and Archaeal Reference Genomes Collection

Download the updated bacterial and archaeal reference genome collection! We built this collection of 19,328 genomes by selecting the “best” genome assembly for each species among the 350,000+ prokaryotic genomes in RefSeq (except for E. coli for which two assemblies were selected as reference).

What’s New?

413 species are represented in this collection for the first time
198 species are represented by a better assembly
27 species were removed because of changes in NCBI Taxonomy or uncertainty in their species assignment

Continue reading “Now Available! Updated Bacterial and Archaeal Reference Genomes Collection” →

Cleaner BLAST Databases for More Accurate Results

Removing contaminated sequences using NCBI quality assurance tools

Do you use BLAST to identify a sequence or the evolutionary scope of a gene? That can be challenging if contaminated and misclassified sequences are in the BLAST databases and show up in your search results. To address this problem, we now use the NCBI quality assurance tools listed below to systematically remove these misleading sequences from the default nucleotide (nt) and protein (nr) BLAST databases. Continue reading “Cleaner BLAST Databases for More Accurate Results” →

BLAST FASTA Files Will No Longer Be Available on the FTP Site Effective April 2024

Easily generate BLAST FASTA files yourself!

In April 2024, the FASTA (sequence text) files of the sequences in the Basic Alignment Search Tool (BLAST) databases will no longer be available on the FTP site. However, you can easily generate FASTA files yourself from the formatted BLAST databases by using the BLAST utility blastdbcmd that comes with the standalone BLAST programs. This provides you the flexibility to generate organism-specific FASTA files using NCBI’s taxonomy IDs for specific organisms or groups.

See the examples below and the BLAST Command Line Applications User Manual for more details on the standalone BLAST programs and working with the BLAST databases. Continue reading “BLAST FASTA Files Will No Longer Be Available on the FTP Site Effective April 2024” →

Updated Bacterial and Archaeal Reference Genome Collection is Available!

Download the updated bacterial and archaeal reference genome collection! This collection (18,941 genomes as of Jan 18, 2024) was built by selecting the “best” genome assembly for each species among the 330,000+ prokaryotic genomes in RefSeq (except for E. coli for which two assemblies were selected as reference). You can speed up your sequence searches by running them against these high-quality genomes instead of the entire nucleotide or protein database.

The criteria for selecting the reference assembly for a given species include assembly contiguity and completeness and quality of the RefSeq annotation. Continue reading “Updated Bacterial and Archaeal Reference Genome Collection is Available!” →

Using NCBI Data and Tools for Your Research Project

Are you a biology student working on a research project? NCBI offers free access to a wide variety of resources and tools to help you find and download data for your project. 

How and why do you use our resources? Check out the example below:

Your professor has assigned you a research project looking at the sequence and structure of the TP53 gene in the domestic cat (Felis catus). In addition, you were asked to find information on this gene and its genomic region in other members of the cat family (Felidae). Continue reading “Using NCBI Data and Tools for Your Research Project” →

Directory

What’s new?

What’s New?

Effective August 2024, core_nt will become the default

Benefits of BLAST core_nt over nt

What’s New?

Removing contaminated sequences using NCBI quality assurance tools

Easily generate BLAST FASTA files yourself!

How and why do you use our resources? Check out the example below: