Адрес e-mail:

Advanced Bioinformatics II

Vsevolod Yu.Makeev

Dr.Sci. (Phys.&Math.)

Vavilov Institute of General Genetics

Laboratory of System Biology and Computational Genetics

Information about course:

Recommended texts:

  1. R.Durbin, S.Eddy, A.Krogh, G.Mitcheson. Biological sequence analysis. Cambridge University Press, 1998 (2006 reprinting).
  2. M.Borodovsky, S.Ekisheva. Problems and solutions in biological sequence analysis, Cambridge University Press, 2006.
  3. D.Gusfield. Algorithms on Strings, Trees and Sequences — Computer Science and Computational Biology. Cambridge University Press, 1997.
  4. A.Apostolico, C.Guerra, S.Istrail, P.Pevzner, M.Waterman, RECOMB06, Proceedings of the Tenth Conference on Research in Computational Molecular Biology, Springer LNCS Vol.3909, Berlin, 2006.
  5. P.Baldi and S.Brunak, Bioinformatics: the Machine Learning Approach. MIT Press, 2001.

Recommended web site (to start with): bioinformatics .  

Bioinformatics is the field of science growing from the application of mathematics, statistics and information technology to the study and analysis of very large biological and particularly genetic data sets.

This course is devoted to study of mathematical models and computer algorithms used in DNA and protein sequence analysis. Students will implement simplified versions of the algorithms studied in the course as computer programs. Additionally, you will get experience in using sequence analysis tools available either locally or via Internet.

The lab time may also be used for tests, student presentations and additional lectures.

Closed notes & books “surprise” quizzes (10min.) may take place on Mondays or Fridays. 

Homework: Small group efforts are encouraged, but you are responsible for writing /typing and understanding your solution.

Course outline:

Analysis of multiple sequences. Evolutionary conserved regions in protein sequences. Multiple sequence alignment. Multidimensional dynamic programming. Progressive alignment methods.

Statistical models of protein domains. Analogy between models for functional sites in DNA and models of functional & structural motifs in protein sequences. The concept of profile. Profile HMM: Hidden Markov model for evolutionary conserved sequences.

Estimation of parameters for profile HMM. Predicting protein function by profile HMM. PSI-BLAST, PFAM and SMART local similarity search methods. Finding remote homologs. Assessment of the power of a homology search method by using SCOP database.

CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.

Building phylogenetic trees. Construction of a tree by using pairwise distances. UPGMA clustering and neighbors joining clustering algorithm. Notion of parsimony.

Probabilistic approaches to phylogeny. Random genetic drift. Molecular clock. Synonymous and non-synonymous substitutions.  Using maximum likelihood approach for phylogenetic inference. Orthologs and paralogs. Evolutionary and comparative genomics.

Other topics. Protein secondary structure prediction. Profile-based neural network approach (PHD method). Information theory & Bayesian approach (GOR method). Prediction of the 3D structure of proteins.

Study of gene expression with DNA microarrays and RNA-seq. Detecting patterns in expression of multiple genes. Clustering methods of gene expression data: k-means clustering.

Lab schedule:

Lab 8. Implementation of anchored alignment algorithm.

Lab 9. Use of scoring matrices for protein sequence alignment. Implementation of algorithms studied in Lab 7 for the case of protein sequences.

Labs 10–11.  Multiple sequence alignment by Gibbs sampling.

Lab 12. Estimation of parameters of a profile HMM from a given multiple sequence alignment.

Labs 13. Phylogenetic tree building by the UPGMA and neighbor- joining methods.

Lab 14. Finding the maximum parsimony phylogenetic tree topology by the weighted parsimony method.

List of selected publications:

  1. I.Kulakovskiy, V.Levitsky, D.Oshchepkov, L.Bryzgalov, I.Vorontsov, V.Makeev. From binding motifs in ChIP-Seq.data to improved models of transcription factor binding sites. J.Bioinform.Comput.Biol. 2013, Feb.; 11(1): 1340004.
  2. V.Makeev. Predictive biology using systems and integrative analysis and methods. J Biomol Struct Dyn. 2013; 31(1):1–3.
  3. E.Permina, Y.Medvedeva, P.Baeck, S.Hegde, S.Mande, V.Makeev. Identification of self-consistent modulons from bacterial microarray expression data with the help of structured regulon gene sets. J.Biomol.Struct.Dyn. 2013; 31(1): 115–24.
  4. I.Kulakovskiy, Y.Medvedeva, U.Schaefer, A.Kasianov, I.Vorontsov, V.Bajic, V.Makeev. HOCOMOCO: a comprehensive collection of human transcription factor binding sites models. Nucleic Acids Res. 2013, Jan.1; 41(D1): D195–D202. Epub. 2012, Nov.21. PubMed PMID: 23175603.
  5. A.Nikulova, A.Favorov, R.Sutormin, V.Makeev, A.Mironov. CORECLUST: identification of the conserved CRM grammar together with prediction of gene regulation. Nucleic Acids Res. 2012, Mar.15.
  6. А.А.Никулова, М.С.Полищук, В.Г.Туманян, В.Ю.Макеев, А.А.Миронов, А.В.Фаворов. Корреляция кластеров сайтов связывания и экспериментальных данных по связыванию белков с ДНК позволяют предполагать структуры регуляторных модулей. Биофизика. 2012. Т.57, С.212–214.
  7. Y.Hara, N.Kadotani, H.Izui, J.Katashkina, T.Kuvaeva, I.Andreeva, L.Golubeva, D.Malko, V.Makeev, S.Mashko, Y.Kozlov. The complete genome sequence of Pantoea ananatis AJ13355, an organism with great biotechnological potential. Appl.Microbiol.Biotechnol. 2012, Jan.; 93(1): 331–41.
  8. Ш.Хедж, Е.Ю.Климова, Ш.Манде, Ю.А.Медведева, В.Ю.Макеев, Е.А.Пермина. Использование пар генов, входящих в один оперон, для определения порога значимости коэффициента корреляции уровней экспрессии генов. Биофизика, 2011, т.56, вып.6, стр.1062–1064.
  9. I.Kulakovskiy, A.Belostotsky, A.Kasianov, N.Esipova, Y.Medvedeva, I.Eliseeva, V.Makeev. A Deeper Look Into Transcription Regulatory Code By Preferred Pair Distance Templates For Transcription Factor Binding Sites. Bioinformatics. 2011, 27: 2621–2624.
  10. M.Logacheva, A.Kasianov, D.Vinogradov, T.Samigullin, M.Gelfand, V.Makeev, A.Penin. De novo sequencing and characterization of floral transcriptome in two species of buckwheat (Fagopyrum). BMC Genomics. 2011, Jan.13;12(1):30.
  11. I.Kulakovskiy, V.Boeva, A.Favorov, V.Makeev. Deep and wide digging for binding motifs in ChIP-Seq data. Bioinformatics. 2010, Oct.15; 26(20): 2622–3.
  12. Y.Medvedeva, M.Fridman, N.Oparina, D.Malko, E.Ermakova, I.Kulakovskiy, A.Heinzel, V.Makeev. Intergenic, gene terminal, and intragenic CpG islands in the human genome. BMC Genomics. 2010, Jan.19; 11: 48.  
Если вы заметили в тексте ошибку, выделите её и нажмите Ctrl+Enter.

МФТИ в социальных сетях

soc-vk soc-fb soc-tw soc-li soc-li soc-yt