F1ALA: ultrafast and memory-efficient ancestral lineage annotation applied to the huge SARS-CoV-2 phylogeny (2024)

Article Navigation

Article Contents

  • Abstract

Journal Article Accepted manuscript

,

Yongtao Ye

State Key Laboratory of Emerging Infectious Diseases, School of Public Health, The University of Hong Kong

, Hong Kong SAR,

P. R. China

Laboratory of Data Discovery for Health, 19W Hong Kong Science & Technology Parks

, Hong Kong SAR,

P. R. China

Search for other works by this author on:

Oxford Academic

,

Marcus H Shum

Laboratory of Data Discovery for Health, 19W Hong Kong Science & Technology Parks

, Hong Kong SAR,

P. R. China

Search for other works by this author on:

Oxford Academic

,

Isaac Wu

Laboratory of Data Discovery for Health, 19W Hong Kong Science & Technology Parks

, Hong Kong SAR,

P. R. China

Search for other works by this author on:

Oxford Academic

,

Carlos Chau

Laboratory of Data Discovery for Health, 19W Hong Kong Science & Technology Parks

, Hong Kong SAR,

P. R. China

Search for other works by this author on:

Oxford Academic

,

Ningqi Zhao

Laboratory of Data Discovery for Health, 19W Hong Kong Science & Technology Parks

, Hong Kong SAR,

P. R. China

Search for other works by this author on:

,

David K Smith

State Key Laboratory of Emerging Infectious Diseases, School of Public Health, The University of Hong Kong

, Hong Kong SAR,

P. R. China

Laboratory of Data Discovery for Health, 19W Hong Kong Science & Technology Parks

, Hong Kong SAR,

P. R. China

Search for other works by this author on:

Oxford Academic

,

Joseph T Wu

State Key Laboratory of Emerging Infectious Diseases, School of Public Health, The University of Hong Kong

, Hong Kong SAR,

P. R. China

Laboratory of Data Discovery for Health, 19W Hong Kong Science & Technology Parks

, Hong Kong SAR,

P. R. China

Search for other works by this author on:

Oxford Academic

Tommy T Lam

State Key Laboratory of Emerging Infectious Diseases, School of Public Health, The University of Hong Kong

, Hong Kong SAR,

P. R. China

Laboratory of Data Discovery for Health, 19W Hong Kong Science & Technology Parks

, Hong Kong SAR,

P. R. China

Guangdong-Hongkong Joint Laboratory of Emerging Infectious Diseases, Joint Institute of Virology (Shantou University/The University of Hong Kong)

, Shantou, Guangdong, 515063,

P. R. China

EKIH (Gewuzhikang) Pathogen Research Institute

, Futian District, Shenzhen City, Guangdong, 518045,

P. R. China

Centre for Immunology & Infection, 17W Hong Kong Science & Technology Parks

, Hong Kong SAR,

P. R. China

Corresponding author: tylam.tommy@gmail.com

Search for other works by this author on:

Oxford Academic

Virus Evolution, veae056, https://doi.org/10.1093/ve/veae056

Published:

25 July 2024

Article history

Received:

19 January 2024

Revision received:

19 April 2024

Editorial decision:

21 May 2024

Accepted:

24 July 2024

Published:

25 July 2024

  • PDF
  • Split View
  • Views
    • Article contents
    • Figures & tables
    • Video
    • Audio
    • Supplementary Data
  • Cite

    Cite

    Yongtao Ye, Marcus H Shum, Isaac Wu, Carlos Chau, Ningqi Zhao, David K Smith, Joseph T Wu, Tommy T Lam, F1ALA: ultrafast and memory-efficient ancestral lineage annotation applied to the huge SARS-CoV-2 phylogeny, Virus Evolution, 2024;, veae056, https://doi.org/10.1093/ve/veae056

    Close

Search

Close

Search

Advanced Search

Search Menu

Abstract

The unprecedentedly large size of the global SARS-CoV-2 phylogeny makes any computation on the tree difficult. Lineage identification (e.g. the PANGO nomenclature for SARS-CoV-2) and assignment are key to track the virus evolution. It requires annotating clade roots of lineages to unlabeled ancestral nodes in a phylogenetic tree. Then the lineage labels of descendant samples under these clade roots can be inferred to be the corresponding lineages. This is the ancestral lineage annotation problem, and matUtils (a package in pUShER) and PastML are commonly used methods. However, their computational tractability is a challenge and their accuracy needs further exploration in huge SARS-CoV-2 phylogenies. We have developed an efficient and accurate method, called ‘F1ALA’, that utilizes the F1-score to evaluate the confidence with which a specific ancestral node can be annotated as the clade root of a lineage, given the lineage labels of a set of taxa in a rooted tree. Compared to these methods, F1ALA achieved roughly an order of magnitude faster yet with ~12% of their memory usage when annotating 2,277 PANGO lineages in a phylogeny of 5.26 million taxa. F1ALA allows real-time lineage tracking be performed on a laptop computer. F1ALA outperformed matUtils (pUShER) with statistical significance, and had comparable accuracy to PastML in tests on empirical and simulated data. F1ALA enables a tree refinement by pruning taxa with inconsistent labels to their closest annotation nodes and re-inserting them back to the pruned tree to improve a SARS-CoV-2 phylogeny with both higher log-likelihood and lower parsimony score. Given the ultrafast speed and high accuracy, we anticipated that F1ALA will also be useful for large phylogenies of other viruses. Codes and benchmark datasets are publicly available at https://github.com/id-bioinfo/F1ALA.

PANGO lineages, SARS-CoV-2, ancestral reconstruction, tree refinement, F1-score

F1ALA: ultrafast and memory-efficient ancestral lineage annotation applied to the huge SARS-CoV-2 phylogeny (3) Accepted manuscripts

Accepted manuscripts are PDF versions of the author’s final manuscript, as accepted for publication by the journal but prior to copyediting or typesetting. They can be cited using the author(s), article title, journal title, year of online publication, and DOI. They will be replaced by the final typeset articles, which may therefore contain changes. The DOI will remain the same throughout.

PDF

This content is only available as a PDF.

© The Author(s) 2024. Published by Oxford University Press.

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact reprints@oup.com for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site–for further information please contact journals.permissions@oup.com.

Download all slides

  • Supplementary data

  • Supplementary data

    veae056_Supp - zip file

    Advertisem*nt

    Citations

    Views

    16

    Altmetric

    More metrics information

    Metrics

    Total Views 16

    0 Pageviews

    16 PDF Downloads

    Since 7/1/2024

    Month: Total Views:
    July 2024 16

    Citations

    Powered by Dimensions

    Altmetrics

    ×

    Email alerts

    Article activity alert

    Advance article alerts

    New issue alert

    In progress issue alert

    Receive exclusive offers and updates from Oxford Academic

    Citing articles via

    Google Scholar

    • Latest

    • Most Read

    • Most Cited

    GALV-KoRV-related retroviruses in diverse Australian and African rodent species.
    Opening a 60-year time capsule: sequences of historical poliovirus cold variants shed a new light on a contemporary strain.
    Investigating the emergence of a zoonotic virus: phylogenetic analysis of European bat lyssavirus 1 in the UK
    Analysis of the genetic diversity in RNA-directed RNA polymerase sequences: implications for an automated RNA virus classification system
    F1ALA: ultrafast and memory-efficient ancestral lineage annotation applied to the huge SARS-CoV-2 phylogeny

    More from Oxford Academic

    Biological Sciences

    Evolutionary Biology

    Medicine and Health

    Microbiology

    Public Health and Epidemiology

    Science and Mathematics

    Virology

    Books

    Journals

    Advertisem*nt

    F1ALA: ultrafast and memory-efficient ancestral lineage annotation applied to the huge SARS-CoV-2 phylogeny (2024)
    Top Articles
    How to Play Crazy Eights: Rules, Tips, & More
    NLSC Forum • How will NBA Live 07 pwn NBA 2k7
    Otc School Calendar
    FPL tips and team of the week: Eze, Fernandes and Mateta should shine this week
    Spectrum Store Appointment
    Saratoga Hills Single-Family Homes for Sale
    Chesapeake Wv Topix
    Leccion 4 Lesson Test
    What Is Flipping Straights Ted Lasso
    Tacos Diego Hugoton Ks
    Megan Thee Stallion, Torrey Craig Seemingly Confirm Relationship With First Public Outing
    Fintechzoommortgagecalculator.live Hours
    'Kendall Jenner of Bodybuilding' Vladislava Galagan Shares Her Best Fitness Advice For Women – Fitness Volt
    JPMorgan and 6 More Companies That Are Hiring in 2024, Defying the Layoffs Trend
    Craigslist Battle Ground Washington
    MLB The Show 23 Marketplace: Your Ultimate Guide to Trading and Collecting - Belvidere Youth Baseball
    Secret Stars Sessions Julia
    Zom100 Mangadex
    Miller's Yig
    Hose Woe Crossword Clue
    Marisa Jacques Bio
    2022 Jeep Grand Cherokee Lug Nut Torque
    Urgent Care Near Flamingo Crossings Village
    2621 Lord Baltimore Drive
    Knock At The Cabin Showtimes Near Alamo Drafthouse Raleigh
    Qcp Lpsg
    Https //Paperlesspay.talx.com/Gpi
    Hawkview Retreat Pa Cost
    A Closer Look at Ot Megan Age: From TikTok Star to Media Sensation
    Does Walmart have Affirm program? - Cooking Brush
    San Diego Box Score
    Panama City News Herald Obituary
    Amarillos (FRIED SWEET PLANTAINS) Recipe – Taste Of Cochin
    Chevalier Showtimes Near Island 16 Cinema De Lux
    Sunset Time Yesterday
    Horoscope Today: Astrological prediction September 9, 2024 for all zodiac signs
    KOBALT K15CS-06AC MANUAL Pdf Download
    Texas Longhorns Soccer Schedule
    Flixtor The Meg
    Miracle Child Brandon Lake Chords
    Wocs Failure Rate
    Kristy Althaus Kansas
    Dr Ommert Norwalk Ohio
    Cb2 South Coast Plaza
    Kronos.nyp
    358 Edgewood Drive Denver Colorado Zillow
    Hourly Pay At Dick's Sporting Goods
    Unintelligible Message On A Warning Sign Crossword
    SF bay area cars & trucks "chevrolet 50" - craigslist
    Family Court Forms | Maricopa County Superior Court
    Evalue Mizzou
    Potion To Reset Attributes Conan
    Latest Posts
    Article information

    Author: Jamar Nader

    Last Updated:

    Views: 5574

    Rating: 4.4 / 5 (55 voted)

    Reviews: 86% of readers found this page helpful

    Author information

    Name: Jamar Nader

    Birthday: 1995-02-28

    Address: Apt. 536 6162 Reichel Greens, Port Zackaryside, CT 22682-9804

    Phone: +9958384818317

    Job: IT Representative

    Hobby: Scrapbooking, Hiking, Hunting, Kite flying, Blacksmithing, Video gaming, Foraging

    Introduction: My name is Jamar Nader, I am a fine, shiny, colorful, bright, nice, perfect, curious person who loves writing and wants to share my knowledge and understanding with you.