
################################################################################ README for the reference_assemblies_log.txt file found on the NCBI genomes FTP site: Last updated: Jan 17, 2025 ################################################################################ INTRODUCTION ------------ reference_assemblies_log.txt reports the date range of an assembly accession labeled as the reference assembly for a species UPDATE FREQUENCY ---------------- reference_assemblies_log.txt is recreated nightly to update the current reference list CONTENTS -------- The file is a tab-delimited text file with 8 columns described below. Header rows begin with '#". "na" is used to indicate values that are not-available or not-applicable. COLUMNS Column 1: Classification Classification: taxonomy classification of the domain level of the genbank-accession (column 6). Column 2: species-taxid Species taxonomy ID: NCBI taxnonomy identifier for the species of the genbank-accession (column 6). The species taxid will differ from the organism taxid (column 4) when the assembly taxid is at a subspecies level or is from a strain that has its own taxonomic identifier. Column 3: species-name Species name: the scientific name of the organism matching the species-taxid (column 2). Species-name will differ from the organism-name (column 5) if the assembly is at subspecies level or is from an older strain that had its own taxonomic identifier. Column 4: taxid Taxonomy ID: the NCBI taxonomy identifier for the organism from which the genbank-accession (column 6) was derived. The NCBI Taxonomy Database is a curated classification and nomenclature for all of the organisms in the public sequence databases. The taxonomy record can be retrieved from the NCBI Taxonomy resource: Column 5: organism-name Organism name: the scientific name of the organism from which the sequences in the genbank-accession were derived. This name is taken from the NCBI Taxonomy record for the taxid specified in column 4. Some older taxids were assigned at the strain level and for these the organism name will include the strain. Current practice is only to assign taxids at the species level; for these the organism name will be just the species. Column 6: genbank-accession GenBank assembly accession: the genbank assembly accession.version is a unique identifier for the set of sequences in this particular version of the genome assembly. Column 7: ref-from Reference From Date: The start date of the range when the genbank-accession specified in column 6 was labeled as the reference for the species. Column 8: ref-to Reference to Date: The end date of the range when the genbank-accession specified in column 6 was labeled as the reference for the species. "current" is listed when the genbank-accession is labeled the current reference. When "current" is in column 8, the genbank-accession will have "reference genome" in column 5 of assembly_summary.txt files.