The adaptive immune system, composed of white blood cells called lymphocytes (B and T cells) that circulate in the lymph and blood, is a precision tool that tags and removes foreign peptides. Such peptides, also called antigens or epitopes, are identified by a specific binding to elements of a library or repertoire of unique proteins called receptors (e.g. antibodies or T cell receptors). A repertoire must be large and diverse enough so that at least one receptor will be able to recognize it any pathogen epitope the organism is likely to encounter. This diversity is achieved by stochastic rearrangement of the germline DNA to create novel complementarity determining region sequences (CDR3) in a process called called V(D)J recombination.
In this thesis we utilize previously developed generative models of V(D)J recombination events, and infer the model parameters from large datasets of DNA sequences. The generation probability (Pgen) of a nucleotide or amino acid CDR3 is the sum of all model probabilities of V(D)J recombination events that generate the sequence. While previously it was only feasible to compute Pgen of nucleotide sequences, we introduce a novel dynamic programming algorithm that efficiently computes Pgen of amino acid sequences. We use this Pgen for several applications. First we examine how the diversity of a repertoire, characterized by the model entropy, scales with the number of insertions in the V(D)J process. This is used to describe the maturation of the T cell repertoire of mice from embryos to young adults. Next, we introduce a statistical model of hypermutation in B cells and infer the parameters from a human repertoire, providing some of the only quantification of the biases in hypermutation rates. Lastly, we examine the statistics of the receptors shared amongst a cohort of more than 600 individual humans and show that the statistics and identities of so-called 'public' sequences are determined directly from Pgen.
We highlight possible clinical applications and attempt to place this work in the context of a full theory of the adaptive immune system.