Evolutionary Genomics Lab

Shin-Han Shiu

 

 

We are interested in questions about the evolution ofduplicate genes. In particular, our research is focused on determining whengene duplication occurred, how it happened, and why certain genes tend to havea higher probability of being kept after duplication. Another focus in the labis in identifying novel genes and novel cis-regulatory elements. Are theregenes remaining in the intergenic regions of genomes? How is gene expressioncontrolled by cis-elements in the promoter regions? We address these questionsusing both computational and experimental approaches in the following fourareas.

 

1. Gene family evolution and genome dynamics in polyploids

            Oneof the major mechanisms for generating duplicate genes is by whole genomeduplication (polyploidization). In plants, this process occurs frequently andit is common to find polyploids in the vicinity of their diploid ancestors. Weare looking into the gene losses in a recently created allotetraploidArabidopsis suecica and its putativeprogenitors using both a tiling microarray with nearly the whole Arabidopsisgenome and survey sequencing. The outcomes will provide clues on the rate ofgene losses and information on the genes thatpreferentially retained in the early history of polyploid evolution. Inaddition to monitoring the genome dynamics of polyploids, we analyze theavailable eukaryote genomes via computational approaches to evaluate thelong-term trend of gene gains and losses and the evolutionary history of allgene families in various eukaryotes.

 

2. Functional divergence of duplicate genes

In a typical eukaryote, >80% ofgenes have at least one within-genome relative (paralog).Intuitively, if the organism gains nothing from the extra copy, loss-of-functionmutations can accumulate rather quickly and lead to the loss of one of theduplicates. Why then are we still detecting so many duplicate genes in genomes?Is it because these duplicates offer some selective advantages? Or can neutralprocesses contribute to duplicate retention? To address these questions, westudy the evolutionary histories and functional divergence, particularly at theexpression level of duplicate genes using publicly available expression data.We are interested in knowing if certain types of genes tend to have fasterexpression divergence or acquire novel expression profiles.

 

4. Evolution of cis-regulatory elements

Cis-regulatory elements are shortstretches of DNA sequences that control the expression of nearby genes. Changein these elements is thought to be fast and significantly contribute to theinitial functional divergence between duplicate genes. Together withtrans-acting factors such as transcription factors and the genes they control,cis-regulatory elements form an integral part of the transcriptional regulatorynetwork. How many cis-elements are there? Each gene usually has multiplecis-elements. How do they work together to specify a particular expressionpattern? We are interested in first finding potential cis-elements from theArabidopsis genome and then using machine-learning methods to dissect theirinteractions.

 

4. Novel genes in genomes

Transcriptome sequencing and wholegenome tiling array studies reveal that significant levels of expression hasbeen detected in intergenic regions in human, fly Arabidopsis, and rice. Thesestudies demonstrate the presence of genic sequences in un-annotated¡°intergenic¡± regions. However, studies so far do not vigorously distinguish ifthe expressed sequences are protein coding genes or not. We devised asimplified method (Coding Index, CI) of current gene finder based solely on thecomposition bias of most coding sequences and found that it performs well indetecting coding genes that tend to be missed. Combing with tiling array data, wefound > 1500 intergenic regions in the Arabidopsis genome likely representexpressed novel genes. We plan to refine the computational pipeline and conductexperiments to verify our findings.

 

 

Lab Homepage: shiulab.plantbiology.msu.edu