Chromosomal-level genome of macadamia (<i>Macadamia integrifolia</i>)

Chengcai Xia; Sirong Jiang; Qiujin Tan; Wenquan Wang; Long Zhao; Chenji Zhang; Yuting Bao; Qi Liu; Jianjia Xiao; Ke Deng; Miaohua He; Pengliang An; Wenlin Wang; Meiling Zou; Zhiqiang Xia; Chengcai Xia; Sirong Jiang; Qiujin Tan; Wenquan Wang; Long Zhao; Chenji Zhang; Yuting Bao; Qi Liu; Jianjia Xiao; Ke Deng; Miaohua He; Pengliang An; Wenlin Wang; Meiling Zou; Zhiqiang Xia

doi:10.48130/TP-2022-0003

2022 Volume 1

Article Contents

Next Previous

ARTICLE Open Access

Chromosomal-level genome of macadamia (Macadamia integrifolia)

1.
Sanya Nanfan Research Institute of Hainan University, Hainan Yazhou Bay Seed Laboratory, Sanya 572025, China
2.
College of Tropical Crops, Hainan University, Haikou 570288, China
3.
Guangxi South Subtropical Agricultural Science Research Institute, Longzhou 532415, China
4.
Academy of Agriculture and Forestry Sciences, Qinghai University, Xining 810016, China
5.
College of Agriculture, China Agricultural University, Beijing 100083, China
^# These authors contributed equally: Chengcai Xia, Sirong Jiang

More Information

Corresponding authors: 22312595@qq.com; mlzou@hainanu.edu.cn; zqiangx@gmail.com

Received: 08 April 2022
Accepted: 02 May 2022
Published online: 22 May 2022
Tropical Plants 1, Article number: 3 (2022) | Cite this article

Highlights

A chromosome-scale, high-quality macadamia GR1 reference genome was constructed through a combination of Nanopore sequencing, with 14 chromosomes

Protein sequences of macadamia and 11 other species of the same family were compared to evaluate expansion and contraction in the macadamia gene family

Proteaceae diverged from Nelumbonaceae nearly 115.37 million years ago and from Rubiaceae about 140 million years ago

Identified 120 GELP family members using our assembled macadamia genome.
Abstract

Macadamia from the family Proteaceae is a plant native to Australia and has long been favoured by people for its crispy and high nutritional and medicinal value. Here, the genome of GUIRE 1 (GR1), a highly heterozygous superior cultivar of macadamia nut, was sequenced and assembled using nanopore sequencing, and a 807-Mb genome (contig N50, 1.9 Mb; scaffold N50, 54.70 Mb) and 14 chromosomes were obtained. A total of 453 Mb (about 55.95%) repetitive sequences and 37,657 protein-coding genes were obtained by gene annotation and homologous protein comparison. Proteaceae diverged from Nelumbonaceae nearly 115.37 million years ago and from Rubiaceae about 140 million years ago. A genome-wide duplication (WGD) event occurred in macadamia 41 million years ago based on the WGD analysis. The functional enrichment analysis of M. integrifolia-specific gene families revealed their roles in signal transduction, protein phosphorylation, protein binding, and defense response. Here, a highly heterozygous genome of M. integrifolia was unlocked to provide a database for breeding and molecular mechanism research.

Graphical Abstract
- Macadamia genome,
- Proteaceae,
- Genome assembly,
- Genome annotation

Supplementary information

Supplemental Table S1 The comparison of GR1 and Kau genome assembly.
Supplemental Table S2 Repeat sequences in the GR1 genome.
Supplemental Table S3 Summary of the GR1 genome assembled chromosomes.
Supplemental Fig. 1 Gene family analysis of five species.
Supplemental Fig. 2 Chromosomal locations of DELP gene family.
Supplemental Fig. 3 Analysis of cis-acting elements of the GELP family.

Rights and permissions
Copyright: © 2022 by the author(s). Published by Maximum Academic Press on behalf of Hainan University. This article is an open access article distributed under Creative Commons Attribution License (CC BY 4.0), visit https://creativecommons.org/licenses/by/4.0/.

References

[1]	Tan Q, Wang W, Wei YR, Zheng SF, Huang XY, et al. 2019. Diversity analysis of fruit traits related to yield in Macadamia germplasms. Journal of Fruit Science 36:1630−37 doi: 10.13925/j.cnki.gsxb.20190087 CrossRef Google Scholar
[2]	Wall MM. 2010. Functional lipid characteristics, oxidative stability, and antioxidant activity of macadamia nut (Macadamia integrifolia) cultivars. Food Chemistry 121:1103−8 doi: 10.1016/j.foodchem.2010.01.057 CrossRef Google Scholar
[3]	Birch J, Yap K, Silcock P. 2010. Compositional analysis and roasting behaviour of gevuina and macadamia nuts. International Journal of Food Science and Technology 45:81−86 doi: 10.1111/j.1365-2621.2009.02106.x CrossRef Google Scholar
[4]	Maguire LS, O'Sullivan SM, Galvin K, O'Connor TP, O'Brien NM. 2004. Fatty acid profile, tocopherol, squalene and phytosterol content of walnuts, almonds, peanuts, hazelnuts and the macadamia nut. International Journal of Food Sciences and Nutrition 55:171−78 doi: 10.1080/09637480410001725175 CrossRef Google Scholar
[5]	Garg ML, Blake RJ, Wills RBH. 2003. Macadamia nut consumption lowers plasma total and LDL cholesterol levels in hypercholesterolemic men. The Journal of Nutrition 133:1060−63 doi: 10.1093/jn/133.4.1060 CrossRef Google Scholar
[6]	Garg ML, Blake RJ, Wills RBH, Clayton EH. 2007. Macadamia nut consumption modulates favourably risk factors for coronary artery disease in hypercholesterolemic subjects. Lipids 42:583−87 doi: 10.1007/s11745-007-3042-8 CrossRef Google Scholar
[7]	Liu J, Huang L. 2005. The nutritional value of macadamia and its development and utilization. Food and Nutrition in China 2:25−26 doi: 10.3969/j.issn.1006-9577.2005.02.008 CrossRef Google Scholar
[8]	Tu X, Zhang X, Liu Y, Du L, Huang M, et al. 2015. Study on the technology of activated carbon pre paration of microwave irradiation of macadamia shell. Science and Technology of Food Industry 36:253−59 doi: 10.13386/j.issn1002-0306.2015.20.045 CrossRef Google Scholar
[9]	Geng J, Tao L, Yue H, Li Z, He X. 2021. Review on Comprehensive Utilization of Macadamia Nutshell. Tropical Agricultural Science & Technology 38:41−47 Google Scholar
[10]	Nock CJ, Hardner CM, Montenegro JD, Termizi AAA, Batley J. 2019. Wild origins of macadamia domestication identified through intraspecific chloroplast genome sequencing. Frontiers in Plant Science 10:334 doi: 10.3389/fpls.2019.00334 CrossRef Google Scholar
[11]	Topp BL, Nock CJ, Hardner CM, Alam M, O'Connor, et KM. 2019. Macadamia (Macadamia spp.) breeding. In Advances in Plant Breeding Strategies: Nut and Beverage Crops, eds. Al-Khayri J, Jain S, Johnson D. Switzerland: Springer International Publishing. pp. 221–51 https://doi.org/10.1007/978-3-030-23112-5_7
[12]	Nock CJ, Baten A, Mauleon R, Langdon KS, Topp B, et al. 2020. Chromosome-scale assembly and annotation of the Macadamia genome (Macadamia integrifolia HAES 741). G3 Genes\|Genomes\|Genetics 10:3497−3504 doi: 10.1534/g3.120.401326 CrossRef Google Scholar
[13]	Stace HM, Douglas AW, Sampson JF. 1998. Did 'Paleo-polyploidy' Really occur in Proteaceae? Australian Systematic Botany 11:613−29 doi: 10.1071/SB98013 CrossRef Google Scholar
[14]	Nock CJ, Elphinstone MS, Ablett G, Kawamata A, Hancock W, et al. 2014. Whole genome shotgun sequences for microsatellite discovery and application in cultivated and wild Macadamia (Proteaceae). Applications in Plant Sciences 2:1300089 doi: 10.3732/apps.1300089 CrossRef Google Scholar
[15]	Nock CJ, Baten A, Barklaet BJ, Furtadoal A, Henry RJ, et al. 2016. Genome and transcriptome sequencing characterises the gene space of Macadamia integrifolia (Proteaceae). BMC Genomics 17:937 doi: 10.1186/s12864-016-3272-3 CrossRef Google Scholar
[16]	Lin J, Zhang W, Zhang X, Ma X, Zhang S, et al. 2022. Signatures of selection in recently domesticated macadamia. Nature Communications 13:242 doi: 10.1038/s41467-021-27937-7 CrossRef Google Scholar
[17]	Akoh CC, Lee GC, Liaw YC, Huang TH, Shaw JF. 2004. GDSL family of serine esterases/lipases. Progress in Lipid Research 43:534−52 doi: 10.1016/j.plipres.2004.09.002 CrossRef Google Scholar
[18]	Chepyshko H, Lai CP, Huang LM, Huang LM, Liu JH, Shaw JF. 2012. Multifunctionality and diversity of GDSL esterase/lipase gene family in rice (Oryza sativa L. japonica) genome: New insights from bioinformatics analysis. BMC Genomics 13:309−27 doi: 10.1186/1471-2164-13-309 CrossRef Google Scholar
[19]	Otto SP. 2007. The evolutionary consequences of polyploidy. Cell 131:452−62 doi: 10.1016/j.cell.2007.10.022 CrossRef Google Scholar
[20]	Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, et al. 1999. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Research 27:29−34 doi: 10.1093/nar/27.1.29 CrossRef Google Scholar
[21]	Belser C, Istace B, Denis E, Dubarry M, Baurens FC, et al. 2018. Chromosome-scale assemblies of plant genomes using nanopore long reads and optical maps. Nature Plants 4:879−87 doi: 10.1038/s41477-018-0289-4 CrossRef Google Scholar
[22]	Jain M, Koren S, Miga KH, Quick J, Rand AC, et al. 2018. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nature Biotechnology 36:338−45 doi: 10.1038/nbt.4060 CrossRef Google Scholar
[23]	Takahashi K, Shimada T, Kondo M, Tamai A, Mori M, et al. 2009. Ectopic expression of an esterase, which is a candidate for the unidentified plant cutinase, causes cuticular defects in Arabidopsis thaliana. Plant and Cell Physiology 51:123−31 doi: 10.1093/pcp/pcp173 CrossRef Google Scholar
[24]	Dong X, Yi H, Han CT, Nou IS, Hur Y. 2016. GDSL esterase/lipase genes in Brassica rapa L.: genome-wide identification and expression analysis. Molecular Genetics and Genomics 291:531−42 doi: 10.1007/s00438-015-1123-6 CrossRef Google Scholar
[25]	Kim GK, Kwon SJ, Jang YJ, Chung JH, Nam MH, et al. 2014. GDSL lipase 1 regulates ethylene signaling and ethylene-associated systemic immunity in Arabidopsis. FEBS Letters 588:1652−58 doi: 10.1016/j.febslet.2014.02.062 CrossRef Google Scholar
[26]	Ding L, Guo X, Li M, Fu Z, Yan S, et al. 2018. Improving seed germination and oil contents by regulating the GDSL transcriptional level in Brassica napus. Plant Cell Reports 38:243−53 doi: 10.1007/s00299-018-2365-7 CrossRef Google Scholar
[27]	Ling H. 2008. Sequence analysis of GDSL lipase gene family in Arabidopsis thaliana. Pakistan Journal of Biological Sciences 11:763−67 doi: 10.3923/pjbs.2008.763.767 CrossRef Google Scholar
[28]	Porebski S, Bailey LG, Baum BR. 1997. Modification of a CTAB DNA extraction protocol for plants containing high polysaccharide and polyphenol components. Plant Molecular Biology Reporter 15:8−15 doi: 10.1007/BF02772108 CrossRef Google Scholar
[29]	Pryszcz LP, Gabaldón T. 2016. Redundans: an assembly pipeline for highly heterozygous genomes. Nucleic Acids Research 44:e113 doi: 10.1093/nar/gkw294 CrossRef Google Scholar
[30]	Hu J, Fan J, Sun Z, Liu S. 2019. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics 36:2253−55 doi: 10.1093/bioinformatics/btz891 CrossRef Google Scholar
[31]	Alonge M, Soyk S, Ramakrishnan S, Wang X, Goodwin S, et al. 2019. RaGOO: fast and accurate reference-guided scaffolding of draft genomes. Genome Biology 20:224 doi: 10.1186/s13059-019-1829-6 CrossRef Google Scholar
[32]	Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. 2015. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31:3210−12 doi: 10.1093/bioinformatics/btv351 CrossRef Google Scholar
[33]	Stanke M, Keller O, Gunduz I, Hayes A, Waack S, ea tl. 2006. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Research 34:W435−W439 doi: 10.1093/nar/gkl200 CrossRef Google Scholar
[34]	Cantarel BL, Korf I, Robb SMC, Parra G, Ross E, ea tl. 2008. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Research 18:188−96 doi: 10.1101/gr.6743907 CrossRef Google Scholar
[35]	Li L, Stoeckert SC Jr., Roos DS. 2003. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Research 13:2178−89 doi: 10.1101/gr.1224503 CrossRef Google Scholar
[36]	Price MN, Dehal PS, Arkin A. 2010. PFastTree 2 - approximately maximum-likelihood trees for large alignments. PLoS One 5:e9490 doi: 10.1371/journal.pone.0009490 CrossRef Google Scholar
[37]	Kumar S, Stecher G, Suleski M, Hedges SB. 2017. TimeTree: a resource for timelines, timetrees, and divergence times. Molecular Biology and Evolution 7:1812−19 doi: 10.1093/molbev/msx116 CrossRef Google Scholar
[38]	Wang Y, Tang H, DeBarry JD, Tan X, Li J, et al. 2012. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Research 40:e49 doi: 10.1093/nar/gkr1293 CrossRef Google Scholar
[39]	Yang Z. 2007. PAML 4: phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution 24:1586−91 doi: 10.1093/molbev/msm088 CrossRef Google Scholar
[40]	Ni P, Ji X, Guo D. 2020. Genome-wide identification, characterization, and expression analysis of GDSL-type esterases/lipases gene family in relation to grape berry ripening. Scientia Horticulturae 264:109162 doi: 10.1016/j.scienta.2019.109162 CrossRef Google Scholar
[41]	Chen C, Chen H, Zhang Y, Thomas HR, Frank MH, et al. 2020. TBtools: An integrative toolkit developed for interactive analyses of big biological data. Molecular Plant1194−202 doi: 10.1016/j.molp.2020.06.009 CrossRef Google Scholar

About this article

Cite this article

Xia C, Jiang S, Tan Q, Wang W, Zhao L, et al. 2022. Chromosomal-level genome of macadamia (Macadamia integrifolia). Tropical Plants 1:3 doi: 10.48130/TP-2022-0003

Xia C, Jiang S, Tan Q, Wang W, Zhao L, et al. 2022. Chromosomal-level genome of macadamia (Macadamia integrifolia). Tropical Plants 1:3 doi: 10.48130/TP-2022-0003

Figures(4) / Tables(1)

Download PDF

Article Metrics

Article views(9683) PDF downloads(1820)

Other Articles By Authors

on this site
- Chengcai Xia
- Sirong Jiang
- Qiujin Tan
- Wenquan Wang
- Long Zhao
- Chenji Zhang
- Yuting Bao
- Qi Liu
- Jianjia Xiao
- Ke Deng
- Miaohua He
- Pengliang An
- Wenlin Wang
- Meiling Zou
- Zhiqiang Xia
on Google Scholar
- Chengcai Xia
- Sirong Jiang
- Qiujin Tan
- Wenquan Wang
- Long Zhao
- Chenji Zhang
- Yuting Bao
- Qi Liu
- Jianjia Xiao
- Ke Deng
- Miaohua He
- Pengliang An
- Wenlin Wang
- Meiling Zou
- Zhiqiang Xia

HTML

INTRODUCTION

Macadamia, also known as Hawaii nut, belongs to the family Proteaceae, and M. integrifolia F. Mull., an evergreen tree without taproots and a shallow root system, is suitable for subtropical climate, and is classified as a subtropical fruit tree^[1]. Macadamia nut contains more than 70% fat, 9% protein and eight essential amino acids; thus, its crispy kernels have abundant unsaturated fatty acids with high nutrition and health value^[2−4]. Long-term consumption of macadamia nuts, which is known as the 'queen of dried fruits', contributes to the prevention and treatment of cardiovascular diseases^[5,6]. In addition, the kernel, shell, and oil meal of macadamia are processed into various drinks, oil^[7], activated carbon^[8], adsorbent, feed, and other products. Furthermore, macadamia trees are beautifully shaped, with dense branches and foliage, pretty and fragrant flowers, and solid and compact wood, and are ideal insect-resistant landscaping plants.

Macadamia is endemic to the subtropical rainforests of eastern Australia and is one of the few crops to have been domesticated from dicotyledons^[9]. The plant is concentrated in Australia, USA, Kenya, South Africa, Costa Rica, Guatemala, Brazil, and other tropical countries^[10]. Macadamia is one of the few rapidly expanding crops in USA, Australia, South Africa, New Zealand, and China. Over 500 selected landraces and varieties are present. The macadamia germplasm in the world ranges from 34° N to 34° S and involves more than 20 countries, but most commercial production areas are located in 16°–24° N, and the main production countries are China, Australia, South Africa, USA, and Kenya. In 2020, Macadamia had a global cultivation area of 6.05 million acres, with the largest cultivation area of 4.7 million acres in China. Thus, China has the highest macadamia production in the world.

Macadamia is highly heterozygous and predominantly outcrossing. Its genome size is 896 Mb, and all cultivars are diploid^[11−14]. Study of its origin and evolutionary history requires a high-quality macadamia genome, and two macadamia genome research studies have been reported. The release of the HAES 741 macadamia genome opened a new era of macadamia genomics and was an important milestone in macadamia research. In 2016, Nock et al. used Illumina short-read sequence data to construct a highly fragmented genome sketch of HAES 741 with a total length of 518 Mb, 3,522-bp contig N50, and 4,745-bp scaffold N50, which annotated 35,337 protein-coding genes^[15]. In 2020, they reassembled the HAES 741 genome using a combined strategy of Illumina short read sequences and PacBio long read sequences, with a 745-Mb genome and 413-kb scaffold N50. After fixing scaffolds to 14 chromosomes using seven genetic linkage maps, 34,274 protein-coding genes were predicted^[12]. The studies of Ming et al.^[16] and Nock et al.^[12] used the world-famous macadamia nut variety Kau (Macadamia integrifolia Maiden & Betche) as the material and used three-generation sequencing to complete the genome assembly . The genome size is 794 Mb, contig N50 281 kb and they annotated 37,728 genes. They found the gene families related to oil synthesis and fruit shell development in the macadamia genome have expanded. Then they selected 112 representative cultivated and wild macadamia materials from Hawaii and Australia to resequence, and the results revealed a phylogenetic relationship of macadamia populations. Through 2−3 generations of artificial selection, selective clearance areas are generated, which provides genomic basis and direct evidence for the 'one-step' theory of asexual crop domestication. These findings provide a theoretical basis for the rapid domestication of new species^[16]. At present, two studies on the macadamia genome have been reported, and both are for the same species. It is also the only genome in macadamia, falling short of the genome resources required to study the plant.

GDSL esterase/lipase protein (GELP) is a hydrolase that hydrolyzes thioesters, aryl esters, phospholipids, amino acids, and other substrates. This type of lipase is widely present in prokaryotes and eukaryotes. At present, GELP is widely involved in physiological activities such as normal plant growth and development, organ morphogenesis, secondary metabolism and stress, and plays an important role in oil metabolism of oil crop seeds^[17,18]. Although GELP family genes have been identified and studied in other plants, they have not been identified in macadamia.

GUIRE 1 (GR1), an excellent cultivar bred in China, has evident advantages such as early fruit setting, high and stable yield, high quality, and strong stress resistance. The high-quality GR1 assembled genome described in this study will help in seed selection and breeding of macadamia and accelerate research on the molecular mechanisms in macadamia, thereby providing a database for macadamia researchers.

DISCUSSION

Macadamia has important economic value in the global food industry. A high-quality genome sequence is needed as a basis to promote research on the molecular mechanisms of macadamia and accelerate the breeding of superior varieties. GR1 is the first macadamia variety in China that is protected by intellectual property rights. This variety features early fruit setting, high yield, and strong stress resistance. The macadamia genome is relatively complex with high heterozygosity and outcrossing. To elucidate the genetic system and evolution of Proteaceae, we sequenced the genome of macadamia, providing new insights and genomic resources for breeding. We report a high-quality chromosome-scale genome assembly of passion fruit, with a contig N50 of 1.9 Mb and assembly to 14 pseudo-chromosomes. This reference genome is higher in continuity than the previously published macadamia genome, such as HAES 741, with a scaffold N50 ~413Kb. The high quality of our assembly can be attributed to the use of the unique combination of Nanopore sequencing^[21,22] with chromosome-scale scaffolding via RagTag. The macadamia genome sequence provides an important resource for future molecular breeding and evolutionary studies.

This annotated chromosome-level reference genome of macadamia can provide important information on the gene content, duplication elements, gene location on 14 chromosomes and RNA types in macadamia. In addition, Macadamia is the most important genus of the family Proteaceae, and this high-quality assembled macadamia genome revealed the important role of macadamia in evolutionary history. The divergence between macadamia and its relative T. speciosissima occurred about 71.74 million years ago. The WGD event, a paleopolyploidy event, is common in plants, and it indicates the development of new gene functions or the formation of a new species. About 115.37 million years ago, Proteaceae and Nelumbonaceae differentiated to form a separate family, which provided a data base for future research on the evolutionary relationship of Proteales. This genomic information for macadamia will help clarify the evolutionary processes in Proteaceae species and contribute to improving the understanding of the physiological and morphological diversity of Proteaceae species.

Macadamia is rich in oil and unsaturated fatty acids, and the GELP gene family is involved in the regulation of plant growth and development, secondary metabolism and oil synthesis in fruits^[23−26]. However, previous studies have found that GELP gene family members have been identified in Arabidopsis^[27], rice, rape^[24] and other plants, but studies of this gene family have not been reported in macadamia. To aid in future studies on oil synthesis in macadamia fruit as well as other molecular mechanisms, we sought to identify 120 members using our assembled macadamia gene genome. The number of GELP family members varies among species, which may be because of the different degree of evolution of the GELP family. Chromosome mapping analysis showed that GELP family genes were unevenly distributed on 14 chromosomes of macadamia. Gene structure analysis showed that most GELP genes contained 5 exons and 4 introns. GELP is structurally conservative. Promoter cis-acting element analysis showed that the promoters of GELP family members contained cis-acting elements related to hormone response and plant growth and metabolism. Therefore, the functions of GELP genes may be related to hormone response and growth and metabolism of plants.

Statistic	Contig length (bp)	Contig no.	Scaffold length (bp)	Scaffold no.
N50	1,971,883	124	55,868,876	7
N80	494,409	388	50,524,966	11
N90	166,444	729	47,725,559	13
Longest	30,625,757	−	77,511,179	−
Total	924,331,529	1,757	792,388,653	14

{{lists.name}}

Chromosomal-level genome of macadamia (Macadamia integrifolia)

Highlights

Abstract