Comparative Analyses of Acacia Plastomes to Detect Mutational Hotspots and Barcode Sites for the Identification of Important Timber Species

Comparative Analyses of Acacia Plastomes to Detect Mutational Hotspots and Barcode Sites for the Identification of Important Timber Species

3.1. Genome Characteristics of Acacia Plastomes

By utilizing MGI short reads, the plastomes of eight Acacia species were assembled with the published A. crassicarpa plastome (NCBI accession number: NC_067032.1) used to extract sequences from whole-genome shotgun data. The eight plastomes were all typically circular in structure and ranged from 174,311 bp to 177,517 bp in length (Figure 1 and Figure S1, Table 1). The GC contents of the eight plastomes were 35.32%–35.66%. The IR regions were similar in length, ranging from 39,482 to 40,813 bp, separated with an LSC region (90,323–91,989 bp) and an SSC region (4897–5049 bp; Table 1). Contraction or expansion of the IR region has been widely proposed to be the main reason for the variation in plastome size [20]. Compared with the plastomes of other forest trees not in the Fabaceae family, the Acacia plastomes had a longer IR region (25–26 Kbp vs. 39–40 Kbp) due to the extension of the repeat region into the single-copy region [47]. Acacia plastomes exhibited conserved gene order and gene contents and showed a typical quadripartite structure (LSC, SSC, IRA, and IRB) that has been widely reported in green plants [48].
All eight of these newly assembled plastomes of Acacia contain 111 unique genes, including 77 protein-coding genes (PCGs), 30 tRNAs, and four rRNAs with the IR region containing 16 protein-coding genes (Table S2). The 12 genes with introns are atpF, clpP, ndhA, ndhB, petB, petD, rpl16, rpl2, rpoC1, rps12, rps16, and ycf3, with three of these (ycf3, clpP and rps12) containing two introns, while the other nine genes contain one intron (Figure S1, Table S2). We further examined codon preferences in these eight sequences. A common method for analyzing the frequency of codon use is relative to synonymous codon usage (RSCU), which assumes that the codon that is used more frequently has a higher value (Figure S2). Given the conservatism of codon use, if a mutation is detected, it is usually fixed, so it can be used as a useful marker site. As a result, the codon usages of Acacia, A. lucyi, and I. leiocalycina plastomes were analyzed together. There was no significant difference in RSCU between the plastids of Acacia, indicating that Acacia are conservative in codon use, and the number of loci used for branching or species identification in functional genes is limited (Figure S2).
In addition to the eight Acacia plastomes newly sequenced in this study, 99 published genomes from Acacia, and four outgroup species, including A. lucywere, A. bracteata, I. edulis, and I. leiocalycina, were used for greater comparative analyses (Table S1). When comparing the 111 plastomes, they contained 76 or 77 coding genes, 30 tRNA genes, and 8 rRNA genes. The difference in gene number among some taxa was the result of the loss of rps16 in A. karina (LN885281) and clpP in A. exocarpoides (LN885267, Table S1). We further analyzed the dN (nonsynonymous substitution rates), dS (synonymous substitution rates), and the ratio of dN/dS (to quantify the strength of selection) of all protein-coding genes to look for genes under different modes of selection (the clpP, rps16, and psbL were excluded due to outlier values, Figure 2). With the exception of ycf1 (1.50), ycf2 (1.94), rps3 (1.26), and rps7 (1.29), the dN/dS of all other genes is less than 1, indicating that most genes evolved through purification selection, especially those related to photosynthesis, such as psbL, ndhB, psaC, ndhI, ndhD, ndhG, petG, psbF, psbE, ndhA, psbN; their values are all lower than the average of the other four functional categories (Figure 2). Ribosomal protein genes usually have higher substitution rates and genetic variation in different species than those encoded in photosynthesis (Figure S3) [49]. There are significant differences in the evolutionary rates of genes encoding chloroplast proteins (Figure S3). Significance analyses of different gene types showed that the dN values of photosynthetic genes were significantly lower than those of other types of genes, except for conserved ORFs. In this result, the most significant difference with the dN value and dN/dS value of photosynthetic genes was with Ribosomal proteins. In the ribosomal protein genes, rps3 and rps7 have dN/dS greater than 1.2; the dS of genes rps11, rps14, rps16, rps18, rps19, rps20 is greater than 0.4, and the rpl32 gene dS is greater than 1.1. On the contrary, the dS values of genes related to photosynthesis are close to zero, indicating that their functions are highly conserved.
The gene content and distance between genes at the junctions of the LSC, SSC, and IRs varied across the ten plastomes we compared (Figure 3). The ndhD and ndhF genes are adjacent to the IRB SSC junction in all eight newly sequenced Acacia species, A. lucyi, and I. leiocalycina. The SSC and IRA junction is nearest to the ccsA gene in these 10 plastomes. The length of the intergenic spacer across the LSC and IRB connections varies greatly in the ten plastomes. We compared the distances between rps3 and rps19 or between rpl2 and rpl23, ranging from 20 to 756 bp (Figure 3). Similarly, the intergenic region between rps19/rpl23 and trnH-GUG ranges from 177 bp to 1466 bp across these Acacia plastomes. Although the IR regions are considered to be the most conserved units in the plastome, at the boundary of these, those with single-copy regions that expand or contract can result in changes in the copy number of the associated genes or the generation of pseudogenes that span the boundary region. Among these eight species, we found that A. confusa and A. mangium had different boundary genes and IRB lengths than the other Acacia species we assembled, and in the case of A. mangium, it had a boundary gene arrangement like that of the outgroups I. leiocalycina and A. lucyi (Figure 3). In this study, even within the Acacia genus, the boundaries between different regions were diverse among different lineages, such as in the boundary of LSC and IRB where three different arrangements of genes at the junction were found including rps3 and rps19, rpl2 and rpl23, rps19 and rpl2. This diversity of junctions can be utilized in the identification of different lineages.

3.2. Phylogenetic Analyses

In order to resolve the plastome phylogeny of Acacia, we combined data from our eight newly sequenced plastomes with 99 publicly available Acacia plastomes and four outgroup plastomes to better understand the maternal diversity of common timber Acacia grown in Asia (Figure 4). Trees were resolved from concatenated CDSs, protein sequences, and full-length plastome sequences and were compared. The protein dataset contained a total of 22,869 informative sites, the CDS dataset contained 62,952 informative sites, and the full-length plastome alignment contained 145,595 informative sites (Figure 4 and Figure S4). The results for the three datasets are generally consistent with each other and clade membership and relationships in our trees are in line with previous studies [50,51]. Our results resolved a clade L + M that is sister to clade K. However, early diverging to clade M is a well-supported clade of two A. crassicarpa from GenBank (MW649002, NC_067032) as well as plastomes sequenced in this study and identified as A. mangium (seq2), A. auriculiformis (seq4), A. aulacocarpa (seq6), and A. mangium × A. auriculiformis (seq3). Early diverging to the clade of L + M + the clade with seqs 2, 3, 4 and 6 (as well as two A. crassicarpa from GenBank) is a well-supported clade containing A. confusa (seq5) and A. crassicarpa (seq8). This pattern of polyphyly with A. crassicarpa is indicative of past introgression with an A. crassicarpa mother to an A. confusa father or possibly the misidentification of samples. However, the positions of A. confusa and A. crassicarpa are still problematic because the three datasets are not entirely consistent (Figure 4 and Figure S4). Additionally, the near-complete sequence identity between A. auriculiformis (seq4) and A. aulacocarpa (seq6) and resolution in a well-support clade also suggest possible past introgression or misidentification. For clades N, O, and P, the result from the protein dataset and full-length dataset support clades N + O sister to clade P, whereas there is a conflict in the sequences of the CDSs, which may be due to long-branch attraction affects. The newly assembled A. cincinnata (seq 7) was resolved in clade A, while A. melanoxylon (seq 1) was resolved in clade I sister to a previously published A. melanoxylon. In our phylogenetic results, most of the Acacia species resolved in a position that was predicted based on previous plastome phylogenies. The polyphyly of A. crassicarpa in our tree suggests that the introduction of Acacia into China may have been less diverse than previously thought, or that introgression has been more widespread than previously noted. To better understand Acacia germplasm in China, more comprehensive sampling of individuals and genomes is required. Such efforts will refine the association of wood attributes with genomic patterns, ultimately improving marker-assisted breeding in the future.

3.3. Repeat Analysis

To further understand the differences between plastomes of Acacia repeat sequences were annotated and compared among 107 Acacia accessions and four outgroup species. The 111 plastomes contained repeat types Complement (C), Forward (F), Palindromic (P), and Reverse (R). According to the length of the sequence to classify, count the number of each type; the most common is the repeat sequence whose length is less than 50 bp (Figure 5). Almost all the repetitive sequences are in the range of 20–29 bases, then 50 + bases, then 30–39 bases, and the least is in the range of 40–49 bases (Figure 5). No C type repeat sequence was detected above 40 bp, even in a small size range (Figure 5). In all accessions, the number of F and P repeat sequences is the most common repeat type and is more evenly distributed in shorter repeat sequences (Figure 5). There are also some exceptions to this general trend such as in A. assimilis, A. jennerae, A. oldfieldii and A. podalyriifolia where the F type is far more common than P type. In the range of 40–49 and 50+ bp, the R type was absent in most plastomes of these accessions (Figure 5). In view of the differences in the type and abundance of repetitive sequences, markers to identify different lineages can be designed according to different repetitive sequences. In previous studies, the differences in repetitive position, abundance, and type in the mass group provided ideal characteristics for the identification of species or lineages [52,53]. Based on the findings in the results, especially for repetitive sequences greater than 50 bp in length, their abundance can be used for species identification of the genus Acacia due to differences between species.
To better understand the dynamics of simple sequence repeats (SSRs) in the plastomes of Acacia, 107 Acacia accessions and four outgroup species were analyzed (Figure S5). Among all 111 plastomes, 94.1% (8197/8713) of the SSRs were single nucleotide A/T motifs. Most plastomes in Acacia have fewer A/T SSRs than the four outgroup taxa (Figure S5). There were four accessions (A. acuminata, A. burkittii (2) and A. lasiocalyx) that contained A/T motifs only. The species A. cerastes possessed a unique AAAAAG/CTTTTT SSR; A. scleroclada possessed a unique AACAAT/ATTGTT SSR, and only three accessions (A. restiacea (2) and A. scleroclada) contained a AAAGAG/CTCTTT SSR. The number of AT/AT type SSRs in Acacia plastomes varied widely, ranging from 0 to 12. In our SSR analyses, as expected, hybrid A. mangium × A. auriculiformis and its parents A. mangium, and A. auriculiformis exhibited the same SSR type and abundance of distributions. Based on the results of SSR analyses, by integrating data from different SSR motif types and length differences and performing nested analyses, we found that these genomic regions may be important for identifying different populations, species, and branches of Acacia.

3.4. Genome Sequence Divergence and Barcode Selection

In order to identify the differences between plastid sequences, we used mVISTA to find significant differences between conserved regions in eight newly assembled Acacia plastomes and the two outgroups I. leiocalycina and A. lucyi (Figure S6). Sequences within most genes remained highly conserved, except for ycf1, ycf2, rps3, and accD which contained differences from the outgroups, but remain highly conserved among the eight Acacia plastomes. The sequences between genes (IGSs) have low similarity in some regions, especially the LSC (for instance psaA to psbB and matK to atpI) (Figure S6). From these results it is clear that although most of the regions in the plastome are conserved, there are still numerous regions between and within genes that can be developed as markers to distinguish different Acacia species.
In addition, we further compared divergent regions among all 111 plastomes (Figure 6). The results showed that the CDSs of the tested materials had high sequence similarity, while those of accDpsaI (score 622; length 1811 bp), matKrps16 (score 671; length 3134 bp), rps16trnQ-UUG (score 687; length 1567 bp), psbZtrnG-GCC (score 687; length 1352 bp), psbItrnS-GCU (score 717; length 933 bp), rps8rpl14 (score 745; length 517 bp), trnT-UGU-trnL-UAA (score 769; length 2716 bp), rpl14rpl16 (score 794; length 215 bp), trnI-CAU-rpl23 (score 823; length 1792 bp), and ndhA-intron 1(score 792; length 1843 bp) had lower sequence similarity (Figure 6). The identification of these highly variable intergenic regions provides a list of candidate regions from which genetic markers such as barcodes can be generated. These markers can provide important genetic resources for studying the evolution and diversity of Acacia in the future. Although most of the intergenic regions were more variable, psaBpsaA (score 1000; length 105 bp), rps15ndhH (score 1000; length 104 bp), ndhAndhI (score 1000; length 79 bp), psbLpsbF (score 1000; length 22 bp) are very similar (Figure 6). Because these photosynthetic genes are clustered in a single operon, even the spacer regions are strongly selected for function which has resulted in nucleotide conservation [47].
We further identified the key SNPs and InDels used to differentiate between the common Acacia species grown in China for timber production (Table S3). Among these eight species, A. melanoxylon contained a total of 24 SNPs and 51 InDel loci unique to the species; A. mangium contained 3 SNPs and 10 InDel loci; A. auriculiformis contains 2 InDel loci; A. confusa contained 4 SNPs and 6 InDel loci; A. cincinnata contained 189 SNPs and 61 InDel loci; A. crassicarpa contained 3 SNPs and 6 InDel loci; and A. aulacocarpa and A. mangium × A. auriculiformis did not contain any such SNPs or InDels because they were the reference set for comparison (Table S3). It was also possible to separate A. mangium, A. mangium × A. auriculiformis, A. auriculiformis, and A. aulacocarpa from the other species using the six InDel loci. The loci used to separate A. confusa, A. crassicarpa from the other species were even greater, with a total of 56 SNPs and 35 InDels (Table S3). The plastome is characterized by a small genome and a low sequence mutation rate, so the identification of species-specific InDels is an effective method for developing molecular markers [54]. In this study, unique InDels were identified among the newly assembled eight species. Overall, we have identified sufficient nucleotide differences for the development of genetic markers in the genus Acacia, as well as in closely related species that produce similar-appearing wood. The production of the initial precise map of genomic variation (InDels and SNPs) has extensively revealed the concentration of variation across plastomes in the Rosaceae family. The majority of the genomic alterations were situated in non-coding and intronic regions, a pattern that aligns with previous reports in rice [48].

Disasters Expo USA, is proud to be supported by Inergency for their next upcoming edition on March 6th & 7th 2024!

The leading event mitigating the world’s most costly disasters is returning to the Miami Beach

Convention Center and we want you to join us at the industry’s central platform for emergency management professionals.
Disasters Expo USA is proud to provide a central platform for the industry to connect and
engage with the industry’s leading professionals to better prepare, protect, prevent, respond
and recover from the disasters of today.
Hosting a dedicated platform for the convergence of disaster risk reduction, the keynote line up for Disasters Expo USA 2024 will provide an insight into successful case studies and
programs to accurately prepare for disasters. Featuring sessions from the likes of The Federal Emergency Management Agency,
NASA, The National Aeronautics and Space Administration, NOAA, The National Oceanic and Atmospheric Administration, TSA and several more this event is certainly providing you with the knowledge
required to prepare, respond and recover to disasters.
With over 50 hours worth of unmissable content, exciting new features such as their Disaster
Resilience Roundtable, Emergency Response Live, an Immersive Hurricane Simulation and
much more over just two days, you are guaranteed to gain an all-encompassing insight into
the industry to tackle the challenges of disasters.
By uniting global disaster risk management experts, well experienced emergency
responders and the leading innovators from the world, the event is the hub of the solutions
that provide attendees with tools that they can use to protect the communities and mitigate
the damage from disasters.
Tickets for the event are $119, but we have been given the promo code: HUGI100 that will
enable you to attend the event for FREE!

So don’t miss out and register today:

And in case you missed it, here is our ultimate road trip playlist is the perfect mix of podcasts, and hidden gems that will keep you energized for the entire journey


This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More