Article Navigation
Article Contents
-
Abstract
Journal Article Accepted manuscript
, Yongtao Ye State Key Laboratory of Emerging Infectious Diseases, School of Public Health, The University of Hong Kong , Hong Kong SAR, P. R. China Laboratory of Data Discovery for Health, 19W Hong Kong Science & Technology Parks , Hong Kong SAR, P. R. China Search for other works by this author on: Oxford Academic Marcus H Shum Laboratory of Data Discovery for Health, 19W Hong Kong Science & Technology Parks , Hong Kong SAR, P. R. China Search for other works by this author on: Oxford Academic Isaac Wu Laboratory of Data Discovery for Health, 19W Hong Kong Science & Technology Parks , Hong Kong SAR, P. R. China Search for other works by this author on: Oxford Academic Carlos Chau Laboratory of Data Discovery for Health, 19W Hong Kong Science & Technology Parks , Hong Kong SAR, P. R. China Search for other works by this author on: Oxford Academic Ningqi Zhao Laboratory of Data Discovery for Health, 19W Hong Kong Science & Technology Parks , Hong Kong SAR, P. R. China Search for other works by this author on: Oxford Academic David K Smith State Key Laboratory of Emerging Infectious Diseases, School of Public Health, The University of Hong Kong , Hong Kong SAR, P. R. China Laboratory of Data Discovery for Health, 19W Hong Kong Science & Technology Parks , Hong Kong SAR, P. R. China Search for other works by this author on: Oxford Academic Joseph T Wu State Key Laboratory of Emerging Infectious Diseases, School of Public Health, The University of Hong Kong , Hong Kong SAR, P. R. China Laboratory of Data Discovery for Health, 19W Hong Kong Science & Technology Parks , Hong Kong SAR, P. R. China Search for other works by this author on: Oxford Academic Tommy T Lam State Key Laboratory of Emerging Infectious Diseases, School of Public Health, The University of Hong Kong , Hong Kong SAR, P. R. China Laboratory of Data Discovery for Health, 19W Hong Kong Science & Technology Parks , Hong Kong SAR, P. R. China Guangdong-Hongkong Joint Laboratory of Emerging Infectious Diseases, Joint Institute of Virology (Shantou University/The University of Hong Kong) , Shantou, Guangdong, 515063, P. R. China EKIH (Gewuzhikang) Pathogen Research Institute , Futian District, Shenzhen City, Guangdong, 518045, P. R. China Centre for Immunology & Infection, 17W Hong Kong Science & Technology Parks , Hong Kong SAR, P. R. China Corresponding author: tylam.tommy@gmail.com Search for other works by this author on: Oxford Academic
Virus Evolution, veae056, https://doi.org/10.1093/ve/veae056
Published:
25 July 2024
Article history
Received:
19 January 2024
Revision received:
19 April 2024
Editorial decision:
21 May 2024
Accepted:
24 July 2024
Published:
25 July 2024
- Split View
- Views
- Article contents
- Figures & tables
- Video
- Audio
- Supplementary Data
-
Cite
Cite
Yongtao Ye, Marcus H Shum, Isaac Wu, Carlos Chau, Ningqi Zhao, David K Smith, Joseph T Wu, Tommy T Lam, F1ALA: ultrafast and memory-efficient ancestral lineage annotation applied to the huge SARS-CoV-2 phylogeny, Virus Evolution, 2024;, veae056, https://doi.org/10.1093/ve/veae056
Close
Search
Close
Search
Advanced Search
Search Menu
Abstract
The unprecedentedly large size of the global SARS-CoV-2 phylogeny makes any computation on the tree difficult. Lineage identification (e.g. the PANGO nomenclature for SARS-CoV-2) and assignment are key to track the virus evolution. It requires annotating clade roots of lineages to unlabeled ancestral nodes in a phylogenetic tree. Then the lineage labels of descendant samples under these clade roots can be inferred to be the corresponding lineages. This is the ancestral lineage annotation problem, and matUtils (a package in pUShER) and PastML are commonly used methods. However, their computational tractability is a challenge and their accuracy needs further exploration in huge SARS-CoV-2 phylogenies. We have developed an efficient and accurate method, called ‘F1ALA’, that utilizes the F1-score to evaluate the confidence with which a specific ancestral node can be annotated as the clade root of a lineage, given the lineage labels of a set of taxa in a rooted tree. Compared to these methods, F1ALA achieved roughly an order of magnitude faster yet with ~12% of their memory usage when annotating 2,277 PANGO lineages in a phylogeny of 5.26 million taxa. F1ALA allows real-time lineage tracking be performed on a laptop computer. F1ALA outperformed matUtils (pUShER) with statistical significance, and had comparable accuracy to PastML in tests on empirical and simulated data. F1ALA enables a tree refinement by pruning taxa with inconsistent labels to their closest annotation nodes and re-inserting them back to the pruned tree to improve a SARS-CoV-2 phylogeny with both higher log-likelihood and lower parsimony score. Given the ultrafast speed and high accuracy, we anticipated that F1ALA will also be useful for large phylogenies of other viruses. Codes and benchmark datasets are publicly available at https://github.com/id-bioinfo/F1ALA.
PANGO lineages, SARS-CoV-2, ancestral reconstruction, tree refinement, F1-score
Accepted manuscripts
Accepted manuscripts are PDF versions of the author’s final manuscript, as accepted for publication by the journal but prior to copyediting or typesetting. They can be cited using the author(s), article title, journal title, year of online publication, and DOI. They will be replaced by the final typeset articles, which may therefore contain changes. The DOI will remain the same throughout.
This content is only available as a PDF.
© The Author(s) 2024. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact reprints@oup.com for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site–for further information please contact journals.permissions@oup.com.
Download all slides
Advertisem*nt
Citations
Views
16
Altmetric
More metrics information
Metrics
Total Views 16
0 Pageviews
16 PDF Downloads
Since 7/1/2024
Month: | Total Views: |
---|---|
July 2024 | 16 |
Citations
Powered by Dimensions
Altmetrics
Email alerts
Article activity alert
Advance article alerts
New issue alert
In progress issue alert
Receive exclusive offers and updates from Oxford Academic
Citing articles via
Google Scholar
-
Latest
-
Most Read
-
Most Cited
More from Oxford Academic
Biological Sciences
Evolutionary Biology
Medicine and Health
Microbiology
Public Health and Epidemiology
Science and Mathematics
Virology
Books
Journals
Advertisem*nt