Domesticated in New Guinea approximately 10,000 years ago, ‘reeds that produce honey without bees’ were considered a luxury and an expensive spice from the sixth to fourth centuries BC.
After the introduction of sugarcane to the Old World around the eighth century, its spread to Caribbean, South American, Indian Ocean and Pacific island nations drove large human migrations, including slave labor.
Now the world’s number one crop by harvested tonnage and its fifth most valuable crop, sugarcane is cultivated on 26 million ha of land in over 90 countries, and 1.83 billion metric tons are harvested annually with a gross production value approaching $57 billion, providing 80% of the world’s sugar and 40% of its ethanol as the primary sugar and biofuel feedstock crop.
The sugarcane grown by most farmers is a hybrid of two species: Saccharum officinarum, which grows large plants with high sugar content, and Saccharum spontaneum, whose lesser size and sweetness is offset by increased disease resistance and tolerance of environmental stress.
Lacking a complete genome sequence, plant breeders have made high-yielding, robust strains through generations of crossing and selection, but this is an arduous process relying on time and luck.
“Sugarcane is the fifth most valuable crop, and the lack of a reference genome hindered genomic research and molecular breeding for sugarcane improvement,” said lead author Professor Ray Ming, from the University of Illinois.
“Sequencing technology was not ready to handle large autopolyploid genomes until 2015 when the throughput, read length, and cost of third generation sequencing technology became competitive enough.”
Why was sequencing the sugarcane genome so difficult?
Sometime during the evolutionary history of sugarcane, its genome had been duplicated twice, resulting in four slightly different versions of each pair of chromosomes all crammed into the same nucleus together.
These events not only quadrupled the size of the genome, they also made highly similar sequences from the genome wide duplication much more difficult to assemble into distinct chromosomes.
Genomic DNA is typically sequenced, or read, in small, overlapping fragments, and the sequence data from those fragments become overlapping pieces of an enormous linear puzzle.
As the sugarcane genome size doubled, then doubled again, this puzzle didn’t just get larger; it took on repeated but not-quite-identical elements into which those many tiny pieces were difficult to correctly fit.
To conquer this challenge, Professor Ming and co-authors used a technique called high-throughput chromatin conformation capture or Hi-C.
This method allows scientists to discover what parts of the long, tangled strands of chromosomal DNA lie in contact with one another inside the cell.
When analyzed using a customized algorithm called ALLHIC developed by the team, the resulting data served the purpose of the picture on the lid of a jigsaw puzzle box, providing a rough map of which sections of sequence most likely belonged to which chromosome.
“The biggest surprise was that by combining long sequence reads and the Hi-C physical map, we assembled an autotetraploid [quadrupled] genome into 32 chromosomes and realized our goal of allele-specific annotation among homologous chromosomes,” Professor Ming said.
In other words, the researchers now knew which gene sequences belonged to each of the four variations on the original, pre-duplications genome — a much higher level of detail than they expected to attain.
Through comparison with the genomes of related species, scientists knew that at some point the number of unique chromosomes had dropped from ten to eight.
To the team’s surprise, the new data revealed that two different chromosomes had split apart, and all four halves had then fused to different existing chromosomes, a more complex set of events than the one they hypothesized.
How does understanding these physical changes help? Along with these large physical rearrangements within the genome come changes to the genes in the affected regions.
For example, the team found that the large chunks of chromosome that had been moved to new locations contained many more genes that help plants resist disease than were found in other locations.
“It resolved a mystery why Saccharum spontaneum is such a superior source of disease resistance and stress tolerance genes,” Professor Ming said.
“The chromosomal rearrangements are likely the cause, not the consequence of this enrichment, although the underlining mechanism of this enrichment remains to be investigated.”
“This discovery will accelerate mining effective alleles of disease resistance genes that have incorporated into elite modern sugarcane hybrid cultivars, and subsequently the implement of molecular breeding of sugarcane.”
The wild sugarcane genome also allowed the team to identify possible origins of modern sugarcane’s incredible sweetness: even in the less sweet Saccharum spontaneum, mutations that produced multiple copies of genes for sugar-transporting proteins have accumulated.
They were also able to observe that in the hybridization between Saccharum officinarum and Saccharum spontaneum, the S. spontaneum-derived DNA sequence is scattered randomly throughout the hybrid genome.
“The ALLHIC method has already proven to be effective for the construction of the autopolyploid sugarcane genome,” Professor Ming said.
Jisen Zhang et al. Allele-defined genome of the autopolyploid sugarcane Saccharum spontaneum L. Nature Genetics, published online October 8, 2018; doi: 10.1038/s41588-018-0237-2