Taxonomy and phylogeny of Papillomaviridae: precision, accuracy, stability and utility.
Marcus E. Siddall
Abstract
The systematic biology of Papillomaviridae, small double-stranded DNA viruses infecting vertebrate keratinocytes, reflects distinct evolutionary patterns of niche adaptation and clinical presentation. Human papillomaviruses (HPV), particularly Alphapapillomavirus, are significant for their role in various cancers, with HPV types 16 and 18 being notably oncogenic. The current taxonomy of Papillomaviridae, which classifies viruses based on a single-locus genetic distance ‘yardstick’ of the L1 major capsid protein, has demonstrated remarkable stability and utility by conveying information on common ancestry, host range, tissue affinity, and clinical characteristics. Despite the reliability of this system, recent increases in newly discovered papillomaviruses, especially from non-primate hosts, have led to concerns regarding the L1 yardstick’s precision and the accuracy of the phylogenetic relationships inferred from E1, E2, L1, and L2 loci. This study examines the precision of the various loci, the nucleotide PID metric, in terms of phylogenetic accuracy of taxonomy in Papillomaviridae. Only minor taxonomic adjustments are needed to enhance the classification’s robustness. Results indicate that while methodological variations affect PID calculation, a biologically appropriate and detailed approach to PID measurement maintains the L1 yardstick’s discriminatory power. There is a need for clear and consistent PID determination methods in new papillomavirus descriptions and strategic recommendations for mitigating the proliferation of monotypic taxa are provided in a manner that will ensure a coherent and informative taxonomy for Papillomaviridae.
Systematic biology of Papillomaviridae
Papillomaviruses are small double stranded DNA icosahedral viruses infecting keratinocytes of vertebrate animals. First discovered as the infectious agents of benign cutaneous warts [1], human papillomaviruses (HPV) are clinically important for their role in cancers especially of the cervix, but also of the penis, anus, oropharynx and genitourinary system more generally [2, 3]. Reflecting evolutionary patterns of niche-adaptation [4], clinical presentation and tissue preference of the 220 described HPV types (in 5 genera) correlate well with distinct phylogenetic clades. For example, while a few Gammapapillomavirus types (e.g., HPV4 and HPV65), are associated with cutaneous warts [5], and some Betapapillomavirus types (e.g., HPV5 and HPV8) are responsible for epidermodysplasia verruciformis [6], the vast majority of the 159 types in these two genera are cutaneous and unapparent. Meanwhile, the genus Alphapapillomavirus comprises 83 types with a mucosal tissue tropism and which are globally responsible for more than 600,000 new cancer cases each year, representing nearly 5% of all cancer burden, and overwhelmingly in women [3, 7]. Virus types that are responsible for most squamous cell carcinomas and adenocarcinomas of the cervix also are well-circumscribed by the species Alphapapillomavirus 9 (esp. HPV16) and Alphapapillomavirus 7 (esp. HPV18) in the high-risk (HR-HPV) subclade of Alphapapillomavirus [8, 9]. Apart from Alphapapillomavirus types infecting other non-human primates (NHP-HPV), papillomaviruses isolated from other vertebrate groups form distinct clades that are approximately host-group specific. The most numerous are in the genera Lambdapapillomavirus and Chipapillomavirus of carnivores as well as in the genera Deltapapillomavirus and Xipapillomavirus of artiodactyls [10].
Insofar as the task of biological systematics is to reflect common ancestry while encompassing meaningful biological information content [11, 12], the classification system for Papillomaviridae is an enviable one. It is eminently useful that a taxonomic name simultaneously conveys common-ancestry, host range, tissue affinity and even clinical characteristics. A few exceptions exist, such as scant carcinogenic HPVs outside of Alphapapillomavirus, or that MmPV5 from macaques groups within an otherwise human Gammapapillomavirus 7. Yet, these are no more problematic than accepting Serpentes as a suborder of Tetrapoda notwithstanding that snakes lack feet. In addition to coherence, stability is a desirable property of any nomenclatural system [13]. Without identifiable differences in phenotypic morphology, and lacking variation or any reshuffling of genomic elements, papillomavirus systematics has relied on a single-locus genetic distance ‘yardstick’ for the naming of types and species [14]. It is worrying that such “DNA barcoding” strategies have been found to not be universally reliable for animals [15, 16], plants [17], fungi [18], and protists [19], particularly with the addition of more individuals.
Figure 1. Frequency distribution of all pairwise PID among 118 papillomavirus types in 2004. Reproduced with permission from de Villiers et al. (2004) [14].
The circular genome of HPV16 (Alphapapillomavirus 9) encompasses 6 coding regions, denoted in order from the origin of replication and transcription: E6, E7, E1, E2 (which includes the E4 overlapping reading frame), E5, L2, and L1. Only 107 papillomavirus types have an identifiable E5 locus (many of questionable homology), whereas 19 types lack E6 and 22 types lack E7 [10]. One of the slowest evolving loci, the major capsid protein L1, serves as the DNA barcoding ‘yardstick’ for taxonomy of the Papillomaviridae [14, 20, 21]. The original choice of L1 was premised not only on the availability of general primers for its amplification, but also on a multimodal frequency distribution (Fig 1. reproduced here with permission) of pairwise percent identity (PID) that corresponded well with each taxonomic level of classification [14]. Consequently, newly discovered viruses are assigned to a new type if they shared less than 90% PID with any existing type [22, 23]. Assignments to genus and species are made relative to a 60% –70% PID threshold, but also in reference to phylogenetic relationships [24].
Figure 2. Temporal accumulation of total number (blue) non-primate papillomavirus types and number of unclassified papillomavirus types (red).
In the period since the last comprehensive assessment of the L1 yardstick [23], the number of newly discovered papillomaviruses has expanded considerably, especially from non-primate vertebrate orders including squamates and fish (Fig. 2). Fully 38% of described papillomavirus types are not assigned to species, of which more than half remain unassigned to genus [10]. That some 22% of named genera, and more than half of the named species, are monotypic (represented by a single type) has brought about a reëvaluation of papillomavirus taxonomy, particularly given loss of the multimodal PID distribution [25, 26]. Given that there are more than 15,000 vertebrate genera, and over 30,000 tetrapod species, it may be argued that the proliferation of taxon names aptly represents true papillomavirus diversity that has emerged from tens of millions of years of evolution. Nonetheless, the utility of a classification system is embodied by its information content – content that is lacking for monotypic taxa.
Here we assess the precision of the L1 nucleotide PID metric (and its standardization) as a continued yardstick suitable for the distinction of species. Furthermore, we examine the phylogenetic accuracy of E1, E2, L1 and L2 in terms of support for the monophyly of named species and genera. Leveraging widely accepted taxonomic conventions for the resolution of contemporary classification conundrums, we make formal proposals minorly adjusting the taxonomy of Papillomaviridae and suggest best practices for its progress.
Percent Identity (PID)
How PID is measured in papillomavirus systematics typically goes unreported. In addition to the straightforward (complement of the) edit-distance between pairwise-aligned positions [27, 28], taxonomic assignment has included blastn [29, 30], the complement of the p-distance [31, 32], use of amino acids instead of nucleotides [33], as well as PIDs derived from multiple alignments instead of from pair-aligned sequences [23, 34, 35]. Naïve implementation of blastn risks reporting an inflated PID based only on the first high scoring segment pair (HSP), instead of the ensuring more than 90% query sequence. Limiting comparisons to amino acids runs counter to determinations that much of the change driving Papillomaviridae evolution is synonymous [36-38]. P-distances reported from phylogenetic software only count nucleotide mismatches because gaps are treated as missing data. The widely adopted Sequence Demarcation Tool (SDT) was specifically developed to counter this lack of clarity regarding how PID is measured in viral systematics [39]. Nonetheless, and though SDT assesses PID pairwise for all sites including gaps, it does so in a manner that is quite different from analyses respecting the codon triplets of the L1 coding region [23, 35, 40].
We compared the L1 nucleotide sequences for all 443 known reference types [10] using Mafft [41] to contrast pair alignment with multiple alignment, using TranslatorX [42] (with Mafft [41]) to compare alignments that respect codons with alignments that do not, and comparing those edit distances with p-distances also from multiple alignments (as reported by PAUP [43]). As expected (Fig. 3), p-distances overestimated similarity by not counting insertion deletion events (treated as missing). In contrast, multiple-alignment based PIDs underestimated similarity because in multiple-alignments, for example, the alignment of a particular Alphapapillomavirus type to a particular Betapapillomavirus type is not independent of how either is aligned to all other types in all other genera.
Figure 3. Frequency distributions of all pairwise PID for all 443 known papillomavirus types from all hosts as determined from a variety of different methods that are used for calculating PID.
The fact that methodological differences account for 10% variation in PID is sufficient to urge detailed description in the publication of new papillomavirus types as to how PID was obtained. It is axiomatic that non-biological barcode indices and degenerate papillomavirus primer sequences must be removed prior to determination of any PID [44-49]. Meanwhile, it is also clear that PIDs or p-distances that are obtained from less time-consuming multiple-alignment shortcuts also are inaccurate estimators of the true pairwise edit distances between viral types.
The most biologically appropriate and least distorted PIDs are based on a pairwise edit-distance that respects codons (avoiding impossible ancestors with broken reading frames) obtained strictly from pair-alignments that reflect a whole query (not HSPs returned from blastn). That said, PIDs that do not respect codons (e.g., SDT [39]) are only marginally inflated relative to those from biologically realistic codon-based alignments (Fig. 3) and may well be on the scale of variation in alignment algorithm/software choice (not examined).
Figure 4. Frequency distributions of PID values between species (interspecific, green) and within species (intraspecific, blue), illustrated both as all values within and between species (solid) and as the differences of means within and between species (dashed). Values are predicated on each codon-aligned pairwise edit distance including gaps using TranslatorX and Mafft.
The L1 DNA barcode
It will not go unnoticed that the multimodal distribution once apparent when PIDs were dominated by Alphapapillomavirus and Betapapillomavirus HPV types (Fig. 1), is no longer apparent with the inclusion of virus diversity that is phylogenetically intermediate to those genera (Fig. 3). However, concerns that this casts doubt [25, 26] on the utility of L1 PIDs for the discrimination of types and species are misplaced. Already it is well-understood that the overall distribution of PIDs poorly reflects the discriminatory power of L1 similarities that are better understood when comparing within-group to among-group similarities [23]. This “DNA barcoding gap,” comparing mean PID intraspecific to mean interspecific PID [50, 51] has been thoroughly investigated for other groups of organisms [52-57]. Leveraging the same 443 reference types, but excluding monotypic species (which have no intraspecific variation), and using PIDs from pairwise-alignments with Mafft [41], while respecting codons with TranslatorX [42], it is clear (Figure 4) that a DNA barcoding gap remains quite obvious for the distribution of all pairwise intraspecific versus interspecific PIDs. Remarkably, the means for types in species are even more precise at the 60% threshold [14]. Yet, this still underestimates the discriminatory power of the L1 framework, for as long as the minimum (Fig. 5A) interspecific distance between types of two species (Figure 5: d) is greater than the minimum intraspecific distances for those closest types and their conspecifics (Figure 5A: i), there is a barcoding gap regardless of the average or maximum (Fig. 5B,C) values those take [58]. Quite contrary to published concerns [13, 25], a barcoding gap is found for every papillomavirus species in every genus. Indeed, there are only 11 instances in which a maximum intraspecific distance (Figure 5C: x) exceeds an interspecific distance (Figure 5: d), and in none of those is the overlap reciprocal. That is, even though HPV18 (Alphapapillomavirus 7) is more similar (69.4%) to HPV54 (Alphapapillomavirus 13) than it is to at least one other type in Alphapapillomavirus 7, HPV54 is more similar to all other types in Alphapapillomavirus 13 than to any type in any other species. This is an extraordinary performance for a metric that was tentatively proposed two decades ago merely as a heuristic ‘yardstick’ [14].
Figure 5. Illustration of the determination of DNA barcoding gaps on the basis of (A) minimum (i), (B) average, and (C) maximum (x) intra-specific PID relative to the minimum interspecific distance (d). It is the minimum intraspecific distances (i) of types with the smallest interspecific distances (d) that determine the barcoding gap (A), not the mean nor the maximum (x) intraspecific distances of those types (B or C). These determinations may also be non-reciprocal (C) should overlap exist for one species (maximum blue overlaps 4 red types) but not for the other (maximum red does not overlap any blue type).
Monophyly Matters
Modern systematics demands that taxonomy adhere first to the phylogenetic principle of monophyly, not to phenetic criteria [59-63]. Unlike the use of the L1 yardstick, there is no compelling reason to restrict phylogeny reconstruction to a single papillomavirus ORF like L1. That the oncogenic E5, E6, and E7 ORFs are not universally present in papillomaviruses limits their utility in phylogeny reconstruction for all of Papillomaviridae. This confines comparative consideration to the E1, E2, L1 and L2 elements conserved in all papillomavirus genomes. It has been suggested that “not all of the conserved elements in the PV genomes are suitable for phylogenetic inference” [64]. Specifically, the L2 ORF has been targeted for exclusion on the basis of character incongruence and the partition homogeneity test [26, 65, 66] in spite of long-standing determinations that such tests are ill-suited to this purpose [67-71]. Nor is the isolated L2 recombination event in some ancestral papillomavirus of cetaceans sufficient reason for the exclusion of this ORF (reviewed in [72]). None of E1, E2, L1 or L2, has been immune to at least one putative recombination event [73-76]; events that, in any case, do not respect ORF boundaries [74, 77]. Without invoking a kind of “spooky action at a distance” [78], it is difficult to imagine how a phylogenetically shallow (i.e., relatively recent), isolated recombination event in the ancestor of Upsilonpapillomavirus and Omicronpapillomavirus of dolphins should cause one to reject whatever support is inherent in L2 for the monophyly of species in Lambdapapillomavirus of dogs when those two virus genera have been separated by time and hosts for many millions of years.
Many papillomavirus phylogenetic studies have relied on translated amino acids for phylogeny reconstruction [26, 79-81]. As noted above, it has become apparent that synonymous nucleotide changes (e.g., at CpG, APOBEC and TLR9 sites [36]) have a greater role in structuring the evolution of papillomaviruses than nonsynonymous substitution [36-38]. In order to understand the difference, we examined the full-length open reading frames of E1, E2, L2 and L1, separately and in all possible combinations for all 443 described types in PAVE, both as amino acid sequences and as nucleotides aligned respecting codons [41, 42]. We also analyzed an alignment of whole genomes, though without respecting codons as that has no meaning in the non-coding upstream regulatory region (URR). Alignments were subject to phylogenetic analysis using FastTree [82] owing to its speed and performance in the face of taxa with uncertain alignment [83]. Assessment of relative monophyly was accomplished via MRP matrices [84] representing each papillomavirus genus and species and optimized with describe / chglist in PAUP [43] on each of the foregoing phylogenetic trees recovered from FastTree [82] for the 31 analyses.
Figure 6. Monophyly (white) and non-monophyly (black) of all papillomavirus genera found in phylogenetic anlayses of the whole genome (“Genome”) and of the E1, E2, L1, and L2 ORFs individually and in all of their possible combinations. Monotypic genera (grey) cannot fail to be monophyletic.
Three analyses recover all genera as monophyletic (Figure 6, monotypic genera in grey): E1 with L1, E1 with L1 and L2, and the use of all four ORFs, each instance with nucleotides aligned to respect codons. Exclusion of L2 compromises the discovery of a monophyletic Gammapapillomavirus for which the alternative tree places Treistapapillomavirus as sister to Gammapapillomavirus 7, but with low support. It is not, however, incongruent information content of L2 alone that disfavors the monophyly Gammapapillomavirus as there is evidence of this phylogenetic result from all other ORFs. Similarly, while there is disagreement as to the monophyly of Omikronpapillomavirus, perhaps reflecting the shallow (relatively recent) recombination event, it is E1 that disagrees alone or in combination with other ORFs, not L2. Analyses that fail to find a monophyletic Deltapapillomavirus do so because the nucleotide signal of L1 places Epsilonpapillomavirus with OjohPV1 in Deltapapillomavirus. Obviously, each locus entails its own idiosyncracies regarding clade support. This is a well-documented compelling reason to use all of them in combination so that whatever signal they have common will prevail in an ampliative way [85]. In that vein, our results also show that average clade support was: highest using codon-aligned nucleotides from all 4 ORFs in combination (94.1%), which was significantly (Wilcoxon signed rank test ref, p < 0.0000616 Bonferroni transformed critical value) better than any 2 or fewer ORFS used in phylogenetic analysis, but not 3 in combination; indistinguishable when using either the combination of E1, E2, L1 (92.8%) or the combination of E1, L1, L2 (92.8%), both of which were significantly (Wilcoxon signed rank test ref, p < 0.0000616 Bonferroni transformed critical value) better only than use of a single ORF, not two in combination.
Figure 67 Monophyly (white) and non-monophyly (black) of all papillomavirus species found in phylogenetic anlayses of the whole genome (“Genome”) and of the E1, E2, L1, and L2 ORFs individually and in all of their possible combinations. Monotypic species (grey) cannot fail to be monophyletic.
With the exceptions of Chipapillomavirus 2 and Pipapillomavirus 2, all papillomavirus species were recovered as monophyletic when more than 1 or 2 ORFs were used (Figure 7, previous page, monotypic species in grey). Here again, it is E1 (and E2) not L2 that conflict on support for Omikronpapillomavirus 1, ostensibly due to the shallow (relatively recent) recombination event in dolphins. Chipapillomavirus 2 is paraphyletic with respect to Chipapillomavirus 1 save if using amino acid sequences of E2 on their own. Every other analysis places Chipapillomavirus 1 as sister to CPV16 rendering Chipapaillomavirus 2 paraphyletic (Figure 8a). Similarly, PsuPV1, the sole type in a monotypic Pipapillomavirus 1, invariably groups with MaPV1 rendering Pipapillomavirus 2 paraphyletic in all analyses (Figure 8b).
| A | B | ||
Figure 8. Phylogenetic relationships of Chipapillomavirus (A) and Pipapillomavirus (B) revealing that, regardless of ORF or combinations of ORFs, Chipapillomavirus 2 and Pipapillomavirus 2 are paraphyletic, requiring their respective subjective junior synonymies with Chipapillomavirus 1 and Pipapillomavirus 1.
Taxonomy of the Papillomaviridae
It is undeniable that adherence to the L1 PID yardsticks for the naming of papillomavirus taxa already has proliferated the number of monotypic groups. That is, 22% of named genera are monotypic, and a majority (57%) of species are monotypic (see greyed out portions of Figs. 5 and 6). While we have demonstrated from the point of view of common ancestry that the result is accurate, so too would be a classification in which every type had a unique genus and species. Monotypic species and genus names do not carry with them the same information content that is associated with “Alphapapillomavirus” (i.e., in primate mucosa), much less “Alphapapillomavirus 9″ (i.e., highly carcinogenic in human mucosa). In the following we do not seek to fully overhaul the current state of Papillomaviridae taxonomy because stability too is among the desiderata for any classification scheme. Rather, and borrowing especially from well-established zoological [86], and botanical [87] systematic rules that are also adopted by protozoologists, it is our intent to clarify the state of current Papillomaviridae systematics on the basis of monophyly while providing guidance as to the future naming of higher-level taxa.
- New taxonomic names at all levels should be established/proposed (for review) prior to acceptance for publication of a new taxon. Publication of a putative new type as, for example, “SE87” or “FA69” without a proper type number in the original publication [88] divorces the information content of the discovery from the later-assigned taxon name (HPV175 and HPV180 in the case of “SE87” and “FA69”). Relying on post hoc adjusted record descriptions in NCBI after the type is named remains problematic because NCBI policy allows only the original submitter(s) to edit their records regardless of incompleteness or inaccuracy. Presently, links to the original NCBI record, or links to the original publication for types, are not provided in PAVE search results [10].
- Currently published taxon names should not be changed unless the taxon is found to not be monophyletic (e.g., Pipapillomavirus 2 in Figure 8). Should a named taxon be found to be synonymous with another by virtue of monophyly, the first-named taxon takes priority regardless of the number of types in species, and the later-named taxon becomes a “subjective junior synonym”. That is, the current state of evidence indicates rather conclusively that Chipapilomavirus 2 is not monophyletic, but rather is paraphyletic relative to Chipapillomavirus 1. As such, Cpv4 and Cpv16 (currently Chipapillomavirus 2) should be included with CpV1, CPV3, CPV5, Cpv11 and Cpv20 (currenty Chipapillomavirus 1) together in Chipapillomavirus 1. Chilpapillomavirus 2 is then a subjective junior synonym of Chipapillomavirus 1. No nomenclatural changes are made to Cpv8, Cpv10, Cpv14 and Cpv15 which remain in Chipapillomavirus 3. Because Pipapillomavirus 1 has priority over Pipapillomavirus 2, all types in the genus should be assigned to Pipapillomavirus 1 with Pipapillomavirus 2 becoming a subjective junior synonym of Pipapillomavirus 1, regardless of PID.
- For the sake of information content, the naming of new monotypic taxa should be avoided going forward unless required by the monophyly of taxa already named and regardless of PID. For example, LwPV1 (currently without genus or species designation) is the sister type to all of Lambdapapillomavirus. As such, LwPV1 could either be given its own monotypic genus name (carrying no added information content), or LwPV1 could be included in Lambdapapillomavirus carrying with it the information content of common ancestry. This is the same sort of information content conveyed by including the platypus in Mammalia even though montremes lack mammaries and lay eggs. Similarly, ChPV2 (currently without genus or species designation) invariably falls within the Xipapillomavirus clade, and this papillomavirus of goats (ChPV2) should be assigned to the species Xipapillomavirus 1 with which it is monophyletic. In contrast, the monotypic genus Treiszetapap-illomavirus (FgPV1) may deserve its monotypic status given that it is sister to a clade comprising depauperate genera: Dyozetapapillomavirus (ditypic), Etapapillomavirus (ditypic), Thetapapillomavirus (monotypic), Treisepsilonpapillomavirus (ditypic), and Dyoepsilonpapillomavirus (monotypic). Either Treiszetapapillomavirus must also be accepted as a monotypic genus, or all 6 genera should be synonymized under Etapapillomavirus (which has priority). That is, should a type lacking a species or genus designation fall sister to an already named clade with strong support, it may either take the name of the clade already named (e.g., LwPV1 above), or the author may present an argument for its receiving a new monotypic name subject to peer review. A third alternative is the conservative approach designating the type as “insertae sedis” – a formal indication, preferable to “unknown”, that clade membership has not yet been established even if phylogenetic placement is not “unknown “. Regarding the latter, it would seem prudent to leave alone the deep-branched piscine papillomaviruses, which have no genus or species designations yet, but do so formally as HfrePV1 insertae sedis, HfrePV2 insertae sedis, SaPV1 insertae sedis, OmykPV1 insertae sedis, MaegPV1 insertae sedis and MaegPV2 insertae sedis.
- Rather than choose sides where there is credible evidence of a mosaic ancestry like the shallow (relatively recent) recombination event in Omikronpapillomavirus/ Upsilonpapillomavirus, standard practice under other codes of nomenclature offers a way to reflect that some taxa are affected while others are not. For example, though TtPV1 falls out with other Upsilonpapillomavirus 1 types, a substantial portion of its genome seems to have come from an ancestral Omikronpapillomavirus. Uniting the sister clades Upsilonpapillomavirus and Omikronpapillomavirus into a single named clade is unsatisfactory because it merely obscures an important evolutionary phenomenon and fails to acknowledge the distinctness of the majority of types in those two genera. The informative alternative is the “sedis mutabilis” designation denoting a variable position. Formally the designation for TtPV1 would be Upsilonpapillomavirus / Omikronpapillomavirus sedis mutabilis: Upsilonpapillomavirus 1.
Conclusions
Concerns regarding the continued use of the L1 barcoding yardstick are misplaced. The use of L1 PIDs comparing interspecific and intraspecific similarities and differences among papillomavirus species has no counterindication. The two species-level findings of non-monophyly, neither of which is due to violation of a PID barcoding gap, are easily rectified leveraging established rules of nomenclature that respect the monophyly criterion. There is an obvious need for authors to adhere to a uniform assessment of PID when proffering new types, or at least a clear and consistent description of the methods used for PID determination. Nonetheless, the use of PID to establish new monotypic higher taxa leads to a lamentable lack of information content. Still, there are strategic ways in which to mitigate these outcomes without violating the primacy of monophyly and which can foster an authoritative, informative, and robust classification scheme in the systematics of Papillomaviridae.
References
1. Strauss, M.J., E.W. Shaw, and et al., Crystalline virus-like particles from skin papillomas characterized by intranuclear inclusion bodies. Proc Soc Exp Biol Med, 1949. 72(1): p. 46-50.
2. Humans, I.W.G.o.t.E.o.C.R.t., Human papillomaviruses. IARC Monogr Eval Carcinog Risks Hum, 1995. 64: p. 1-378.
3. de Martel, C., et al., Worldwide burden of cancer attributable to HPV by site, country and HPV type. Int J Cancer, 2017. 141(4): p. 664-670.
4. Chen, Z., et al., Non-human Primate Papillomaviruses Share Similar Evolutionary Histories and Niche Adaptation as the Human Counterparts. Front Microbiol, 2019. 10: p. 2093.
5. Bruggink, S.C., et al., Cutaneous wart-associated HPV types: prevalence and relation with patient characteristics. J Clin Virol, 2012. 55(3): p. 250-5.
6. Patel, T., et al., Epidermodysplasia verruciformis and susceptibility to HPV. Dis Markers, 2010. 29(3-4): p. 199-206.
7. Bruni L, A.G., Serrano B, Mena M, Collado JJ, Gómez D, Muñoz J, Bosch FX, de Sanjosé S., ICO/IARC Information Centre on HPV and Cancer (HPV Information Centre). Human Papillomavirus and Related Diseases in the World. 2021.
8. Burk, R.D., et al., Distribution of human papillomavirus types 16 and 18 variants in squamous cell carcinomas and adenocarcinomas of the cervix. Cancer Res, 2003. 63(21): p. 7215-20.
9. Schiffman, M., et al., A population-based prospective study of carcinogenic human papillomavirus variant lineages, viral persistence, and cervical neoplasia. Cancer Res, 2010. 70(8): p. 3159-69.
10. Van Doorslaer, K., et al., The Papillomavirus Episteme: a major update to the papillomavirus sequence database. Nucleic Acids Res, 2017. 45(D1): p. D499-D506.
11. Mayr, E., Principles of systematic zoology. 1969, New York,: McGraw-Hill. xi, 428 p.
12. Farris, J.S., The Information Content of the Phylogenetic System. Systematic Biology, 1979. 28(4): p. 483-519.
13. Van Doorslaer, K., et al., Papillomaviruses: evolution, Linnaean taxonomy and current nomenclature. Trends Microbiol, 2011. 19(2): p. 49-50; author reply 50-1.
14. de Villiers, E.M., et al., Classification of papillomaviruses. Virology, 2004. 324(1): p. 17-27.
15. Hickerson, M.J., C.P. Meyer, and C. Moritz, DNA barcoding will often fail to discover new animal species over broad parameter space. Syst Biol, 2006. 55(5): p. 729-39.
16. Galtier, N., et al., Mitochondrial DNA as a marker of molecular diversity: a reappraisal. Mol Ecol, 2009. 18(22): p. 4541-50.
17. Spooner, D.M., DNA barcoding will frequently fail in complicated groups: An example in wild potatoes. Am J Bot, 2009. 96(6): p. 1177-89.
18. Begerow, D., et al., Current state and perspectives of fungal DNA barcoding and rapid identification procedures. Appl Microbiol Biotechnol, 2010. 87(1): p. 99-108.
19. Leliaert, F., et al., DNA-based species delimitation in algae. European Journal of Phycology, 2014. 49(2): p. 179-196.
20. Bernard, H.U., et al., Identification and assessment of known and novel human papillomaviruses by polymerase chain reaction amplification, restriction fragment length polymorphisms, nucleotide sequence, and phylogenetic algorithms. J Infect Dis, 1994. 170(5): p. 1077-85.
21. Van Doorslaer, K., et al., ICTV Virus Taxonomy Profile: Papillomaviridae (Summary). Journal of General Virology, 2018. 99(8): p. 989-990.
22. Burk, R.D., A. Harari, and Z. Chen, Human papillomavirus genome variants. Virology, 2013. 445(1-2): p. 232-243.
23. Bernard, H.U., et al., Classification of papillomaviruses (PVs) based on 189 PV types and proposal of taxonomic amendments. Virology, 2010. 401(1): p. 70-9.
24. Van Doorslaer, K., et al. ICTV Virus Taxonomy Profile: Papillomaviridae. 2018; Available from: https://ictv.global/report/chapter/papillomaviridae/papillomaviridae.
25. Daigle, B., Makarenkov, V., & Diallo, A. B. , Effect of hundreds sequenced genomes on the classification of human papillomaviruses., in Data Science, Learning by Latent Structures, and Knowledge Discovery, S.K.-S. B. Lausen, M. Böhmer Editor. 2015, Springer: Berlin. p. 309-318.
26. Van Doorslaer, K., Revisiting Papillomavirus Taxonomy: A Proposal for Updating the Current Classification in Line with Evolutionary Evidence. Viruses, 2022. 14(10).
27. Varsani, A., et al., A novel papillomavirus in Adelie penguin (Pygoscelis adeliae) faeces sampled at the Cape Crozier colony, Antarctica. J Gen Virol, 2014. 95(Pt 6): p. 1352-1365.
28. Lu, X., R. Zhu, and Z. Dai, Characterization of a novel papillomavirus identified from a whale (Delphinapterus leucas) pharyngeal metagenomic library. Virol J, 2023. 20(1): p. 48.
29. Rogovskyy, A.S., et al., A novel papillomavirus isolated from proliferative skin lesions of a wild American beaver (Castor canadensis). J Vet Diagn Invest, 2012. 24(4): p. 750-4.
30. Mokili, J.L., et al., Identification of a novel human papillomavirus by metagenomic analysis of samples from patients with febrile respiratory illness. PLoS One, 2013. 8(3): p. e58404.
31. Canuti, M., et al., New Insight Into Avian Papillomavirus Ecology and Evolution From Characterization of Novel Wild Bird Papillomaviruses. Front Microbiol, 2019. 10: p. 701.
32. Tse, H., et al., Identification of a novel bat papillomavirus by metagenomics. PLoS One, 2012. 7(8): p. e43986.
33. Stevens, H., et al., Novel papillomavirus isolated from the oral mucosa of a polar bear does not cluster with other papillomaviruses of carnivores. Vet Microbiol, 2008. 129(1-2): p. 108-16.
34. Dolz, G., et al., Leopardus wiedii Papillomavirus type 1, a novel papillomavirus species in the tree ocelot, suggests Felidae Lambdapapillomavirus polyphyletic origin and host-independent evolution. Infect Genet Evol, 2020. 81: p. 104239.
35. Munday, J.S., et al., Genomic Characterisation of Canis Familiaris Papillomavirus Type 24, a Novel Papillomavirus Associated with Extensive Pigmented Plaque Formation in a Pug Dog. Viruses, 2022. 14(11).
36. Burk, R.D., Mirabello, L., and DeSalle, R., Distinguishing genetic drift from selection in papillomavirus evolution. Viruses, 2023.
37. King, K.M., et al., Synonymous nucleotide changes drive papillomavirus evolution. Tumour Virus Research, 2022. 14.
38. Shackelton, L.A., C.R. Parrish, and E.C. Holmes, Evolutionary basis of codon usage and nucleotide composition bias in vertebrate DNA viruses. J Mol Evol, 2006. 62(5): p. 551-63.
39. Muhire, B.M., A. Varsani, and D.P. Martin, SDT: a virus classification tool based on pairwise sequence alignment and identity calculation. PLoS One, 2014. 9(9): p. e108277.
40. Chen, Z., et al., Classification and evolution of human papillomavirus genome variants: Alpha-5 (HPV26, 51, 69, 82), Alpha-6 (HPV30, 53, 56, 66), Alpha-11 (HPV34, 73), Alpha-13 (HPV54) and Alpha-3 (HPV61). Virology, 2018. 516: p. 86-101.
41. Katoh, K. and D.M. Standley, MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol, 2013. 30(4): p. 772-80.
42. Abascal, F., R. Zardoya, and M.J. Telford, TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations. Nucleic Acids Res, 2010. 38(Web Server issue): p. W7-13.
43. Swofford, D., PAUP* (* Phylogenetic Analysis Using PAUP). 2021.
44. Abellan-Schneyder, I., et al., Primer, Pipelines, Parameters: Issues in 16S rRNA Gene Sequencing. mSphere, 2021. 6(1).
45. Lagstrom, S., et al., TaME-seq: An efficient sequencing approach for characterisation of HPV genomic variability and chromosomal integration. Sci Rep, 2019. 9(1): p. 524.
46. Huang, X., et al., Cervicovaginal microbiota composition correlates with the acquisition of high-risk human papillomavirus types. Int J Cancer, 2018. 143(3): p. 621-634.
47. Edgar, R. USEARCH v. 11; Read preparation: strip primer-binding sequences. 2018 7/27/2023]; Available from: https://drive5.com/usearch/manual/pipe_readprep_primers.html.
48. Sahlin, K., M.C.W. Lim, and S. Prost, NGSpeciesID: DNA barcode and amplicon consensus generation from long-read sequencing data. Ecol Evol, 2021. 11(3): p. 1392-1398.
49. Grubaugh, N.D., et al., An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar. Genome Biol, 2019. 20(1): p. 8.
50. Rodrigues, M.S., K.A. Morelli, and A.M. Jansen, Cytochrome c oxidase subunit 1 gene as a DNA barcode for discriminating Trypanosoma cruzi DTUs and closely related species. Parasit Vectors, 2017. 10(1): p. 488.
51. Tian, Q., et al., DNA Barcoding for Efficient Species- and Pathovar-Level Identification of the Quarantine Plant Pathogen Xanthomonas. PLoS One, 2016. 11(11): p. e0165995.
52. Meyer, C.P. and G. Paulay, DNA barcoding: error rates based on comprehensive sampling. PLoS Biol, 2005. 3(12): p. e422.
53. Hebert, P.D., S. Ratnasingham, and J.R. deWaard, Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species. Proc Biol Sci, 2003. 270 Suppl 1(Suppl 1): p. S96-9.
54. Cognato, A.I., Standard percent DNA sequence difference for insects does not predict species boundaries. J Econ Entomol, 2006. 99(4): p. 1037-45.
55. Lefebure, T., et al., Relationship between morphological taxonomy and molecular divergence within Crustacea: proposal of a molecular threshold to help species delimitation. Mol Phylogenet Evol, 2006. 40(2): p. 435-47.
56. Chase, M.W., et al., Land plants and DNA barcodes: short-term and long-term goals. Philos Trans R Soc Lond B Biol Sci, 2005. 360(1462): p. 1889-95.
57. Xu, J., Fungal DNA barcoding. Genome, 2016. 59(11): p. 913-932.
58. Meier, R., et al., The Use of Mean Instead of Smallest Interspecific Distances Exaggerates the Size of the “Barcoding Gap” and Leads to Misidentification. Systematic Biology, 2008. 57(5): p. 809-813.
59. Hillis, D.M., C. Moritz, and B.K. Mable, Molecular systematics. 2nd ed. 1996, Sunderland, Mass.: Sinauer Associates. xvi, 655 p.
60. Hennig, W., Phylogenetic systematics. 1966, Urbana,: University of Illinois Press. 263 p.
61. Eldredge, N., How Systematics Became “Phylogenetic”. Evolution: Education and Outreach, 2010. 3(4): p. 491-494.
62. Brower, A.V.Z. and R.T. Schuh, Biological Systematics. 2021.
63. Wiley, E.O. and B.S. Lieberman, Phylogenetics. 2011.
64. Bravo, I.G. and A. Alonso, Phylogeny and evolution of papillomaviruses based on the E1 and E2 proteins. Virus Genes, 2007. 34(3): p. 249-62.
65. Gottschling, M., et al., Multiple evolutionary mechanisms drive papillomavirus diversification. Mol Biol Evol, 2007. 24(5): p. 1242-58.
66. Van Doorslaer, K., Evolution of the papillomaviridae. Virology, 2013. 445(1-2): p. 11-20.
67. DeSalle, R. and A.V. Brower, Process partitions, congruence, and the independence of characters: inferring relationships among closely related Hawaiian Drosophila from multiple gene regions. Syst Biol, 1997. 46(4): p. 751-64.
68. Siddall, M.E., Prior agreement: arbitration or arbitrary? Syst Biol, 1997. 46(4): p. 765-9.
69. Yoder, A.D., J.A. Irwin, and B.A. Payseur, Failure of the ILD to determine data combinability for slow loris phylogeny. Syst Biol, 2001. 50(3): p. 408-24.
70. Barker, F.K. and F.M. Lutzoni, The utility of the incongruence length difference test. Syst Biol, 2002. 51(4): p. 625-37.
71. Quicke, D.L., O.R. Jones, and D.R. Epstein, Correcting the problem of false incongruence due to noise imbalance in the incongruence length difference (ILD) test. Syst Biol, 2007. 56(3): p. 496-503.
72. Gong, Y., L. Sui, and Y. Li, Recombination in Papillomavirus: Controversy and Possibility. Virus Res, 2022. 314: p. 198756.
73. Varsani, A., et al., Evidence of ancient papillomavirus recombination. J Gen Virol, 2006. 87(Pt 9): p. 2527-2531.
74. Rector, A., et al., Genomic characterization of novel dolphin papillomaviruses provides indications for recombination within the Papillomaviridae. Virology, 2008. 378(1): p. 151-61.
75. Bolatti, E.M., et al., Characterization of novel human papillomavirus types 157, 158 and 205 from healthy skin and recombination analysis in genus gamma-Papillomavirus. Infect Genet Evol, 2016. 42: p. 20-9.
76. Murahwa, A.T., M. Tshabalala, and A.L. Williamson, Recombination Between High-Risk Human Papillomaviruses and Non-Human Primate Papillomaviruses: Evidence of Ancient Host Switching Among Alphapapillomaviruses. J Mol Evol, 2020. 88(5): p. 453-462.
77. Garcia-Perez, R., et al., Novel papillomaviruses in free-ranging Iberian bats: no virus-host co-evolution, no strict host specificity, and hints for recombination. Genome Biol Evol, 2014. 6(1): p. 94-104.
78. Einstein, A., Letter to Max Born, in The Born-Einstein letters: correspondence between Albert Einstein and Max and Hedwig Born from 1916–1955, with commentaries by Max Born. . 1971, Macmillan: New York. p. p. 158.
79. Chan, S.Y., et al., Phylogenetic analysis of 48 papillomavirus types and 28 subtypes and variants: a showcase for the molecular evolution of DNA viruses. Journal of Virology, 1992. 66(10): p. 5714-5725.
80. Tachezy, R., et al., Avian papillomaviruses: the parrot Psittacus erithacus papillomavirus (PePV) genome has a unique organization of the early protein region and is phylogenetically related to the chaffinch papillomavirus. BMC Microbiol, 2002. 2: p. 19.
81. Schulz, E., et al., Isolation and genomic characterization of the first Norway rat (Rattus norvegicus) papillomavirus and its phylogenetic position within Pipapillomavirus, primarily infecting rodents. Journal of General Virology, 2009. 90(11): p. 2609-2614.
82. Price, M.N., P.S. Dehal, and A.P. Arkin, FastTree 2–approximately maximum-likelihood trees for large alignments. PLoS One, 2010. 5(3): p. e9490.
83. Liu, K., C.R. Linder, and T. Warnow, RAxML and FastTree: comparing two methods for large-scale maximum likelihood phylogeny estimation. PLoS One, 2011. 6(11): p. e27731.
84. Purvis, A., A Modification to Baum and Ragan’s Method for Combining Phylogenetic Trees. Systematic Biology, 1995. 44(2): p. 251-255.
85. Wenzel, J.W. and M.E. Siddall, Noise. Cladistics, 1999. 15(1): p. 51-64.
86. International Commission on Zoological Nomenclature., et al., International code of zoological nomenclature = Code international de nomenclature zoologique. 4th ed. 1999, London: International Trust for Zoological Nomenclature. xxix, 306 p.
87. Greuter, W., International code of botanical nomenclature : Tokyo code : adopted by the Fifteenth International Botanical Congress, Yokohama, August-September 1993. Regnum vegetabile,. 1994, Königstein, Germany: Koeltz Scientific Books. xviii, 389 p.
88. Johansson, H., et al., Metagenomic sequencing of “HPV-negative” condylomas detects novel putative HPV types. Virology, 2013. 440(1): p. 1-7.