MSc Thesis

Completed

This thesis is a report of the research I’ve conducted as an intern at IBM Research Zurich, part of a collaboration between its Systems Biology group and the Seminar for Statistics of ETH Zurich.

The following is the abstract of the thesis:
T-cells are a core component of the adaptive immune system: they play a major role in mounting an effective and tailored response to foreign pathogens, and they are also relevant in the context of cancer and cer- tain autoimmune diseases.
T-cells receptors are protein complexes present on T-cells’ surface that are responsible for identifying foreign and own antigens. Given the complexity of protein-protein interactions, this identification process exhibits a quasi-stochastic behaviour that can be modeled with probabilistic and statistical models.
Graphical models can represent a multivariate distribution in a convenient and transparent way as a graph. In this paper we introduce COMIC-Tree, an undirected graphical model for protein-protein interactions, and DrawCOMIC-Tree, a greedy algorithm based on conditional mutual information for learning COMIC-Tree structures. We provide a solid mathematical foundation for them, highlight some theoretical aspects, and test them empirically on a dataset of T-cell receptors.