Research ResourcesIMMUNODEFICIENCY

Characterization of T and B cell repertoire diversity in patients with RAG deficiency

See allHide authors and affiliations

Science Immunology  16 Dec 2016:
Vol. 1, Issue 6, eaah6109
DOI: 10.1126/sciimmunol.aah6109

Taking SCID genetics to the clinic

Mutations that lead to deficiencies in the recombination-activating genes RAG1 and RAG2 result in a spectrum of immunodeficiencies ranging from loss of T and/or B cell repertoire diversity to a complete lack of T and B cells—severe combined immunodeficiency (SCID). Here, Lee et al. perform next-generation B and T cell repertoire sequencing on 12 patients with RAG mutations who have immunodeficiencies of varying severity. They found that the level of repertoire skewing was associated with the severity of disease and that specific repertoire deficiencies were associated with particular phenotypes. These data support a genotype-phenotype connection for primary immunodeficiencies.

Abstract

Recombination-activating genes 1 and 2 (RAG1 and RAG2) play a critical role in T and B cell development by initiating the recombination process that controls the expression of T cell receptor (TCR) and immunoglobulin genes. Mutations in the RAG1 and RAG2 genes in humans cause a broad spectrum of phenotypes, including severe combined immunodeficiency (SCID) with lack of T and B cells, Omenn syndrome, leaky SCID, and combined immunodeficiency with granulomas or autoimmunity (CID-G/AI). Using next-generation sequencing, we analyzed the TCR and B cell receptor (BCR) repertoire in 12 patients with RAG mutations presenting with Omenn syndrome (n = 5), leaky SCID (n = 3), or CID-G/AI (n = 4). Restriction of repertoire diversity skewed usage of variable (V), diversity (D), and joining (J) segment genes, and abnormalities of CDR3 length distribution were progressively more prominent in patients with a more severe phenotype. Skewed usage of V, D, and J segment genes was present also within unique sequences, indicating a primary restriction of repertoire. Patients with Omenn syndrome had a high proportion of class-switched immunoglobulin heavy chain transcripts and increased somatic hypermutation rate, suggesting in vivo activation of these B cells. These data provide a framework to better understand the phenotypic heterogeneity of RAG deficiency.

INTRODUCTION

The recombination-activating genes 1 and 2 (RAG1 and RAG2) proteins are expressed in developing lymphocytes and play a critical role in the assembly of interspersed variable (V), diversity (D), and joining (J) gene elements at the immunoglobulin and T cell receptor (TCR) loci, thereby initiating the V(D)J recombination process that allows development of B and T cells and the establishment of adaptive immunity.

Patients with null mutations in the RAG1 or RAG2 genes manifest a block in the development of B and T cells, resulting in T B severe combined immunodeficiency (T B SCID). However, hypomorphic mutations in the RAG genes may allow development of a variable number of B and T cells, associated with various distinct clinical and immunological phenotypes. In particular, Omenn syndrome (OS) is characterized by generalized skin rash, lymphadenopathy, hepatosplenomegaly, eosinophilia, hypogammaglobulinemia but elevated serum immunoglobulin E (IgE), lack of circulating B cells, and the presence of oligoclonal, activated, autologous T cells (1). Atypical or leaky SCID (LS) is characterized by the presence of T (and in some cases, B) cells, with variably affected T cell function and without clinical features of OS (2). Another form of LS with expansion of T cells expressing the γδ form of the TCR occurs, especially in patients with cytomegalovirus (CMV) infection (3, 4). More recently, hypomorphic RAG mutations were identified in patients with delayed-onset combined immunodeficiency associated with granulomas and/or autoimmunity (CID-G/AI) (5, 6), or in other, more rare, milder, and atypical presentations, including CD4 lymphopenia (7), common variable immunodeficiency (8), selective deficiency of anti-polysaccharide antibody responses (9), and pyoderma gangrenosum (10). These heterogeneous clinical phenotypes are associated with a broad spectrum of nonsense, frameshift, in-frame deletion or insertion, and missense mutations of the RAG1 and RAG2 genes that affect various domains of the respective proteins (11).

By individually introducing a large number of human RAG1 and RAG2 genetic variants into Abelson virus–transformed Rag1−/− (or Rag2−/−) pro-B cells carrying an inverted green fluorescent protein cassette flanked by recombination signal sequences (RSSs), we previously demonstrated that the severity of the clinical presentation correlates with the level of residual recombination activity supported by the mutant RAG1 protein (12, 13). In this assay, mutations with low levels of recombination activity generated fewer rearrangements at the endogenous immunoglobulin heavy chain (Ighc) locus, as compared with mutations with higher residual activity (12), suggesting that individual RAG mutations may exert different effects on immune repertoire diversity and composition. Here, we report the results of the next-generation sequencing (NGS) of T and B cell repertoire composition and the diversity in 12 patients with RAG mutations, representative of the extended phenotypic spectrum of the disease. Our results demonstrate that abnormalities of T and B cell repertoires correlate with the severity of the clinical and immunological phenotype, thus further supporting genotype-phenotype correlation in this disease. Distinctive signatures of individual V, D, and J gene usage and of CDR3 composition and length distribution have been identified in patients with different phenotypes and may contribute to the generation of an immune repertoire enriched in self-reactive specificities.

RESULTS

Patient characteristics

The 12 patients included in this study were assigned to three distinct groups, based on the clinical and immunological phenotype (table S1) (2). Five patients (OS1, OS2, OS3, OS4, and OS5) presented with clinical and laboratory features of OS. Three patients (LS1, LS2, and LS3) presented with LS and severe CMV infection, and two of them had an increased proportion of TCRγδ+ T cells (γδT). Four patients (CID1, CID2, CID3, and CID4) were included in the CID-G/AI group based on a clinical history of autoimmunity and/or the presence of granulomas. Eleven patients carried RAG1 and 1 patient carried RAG2 biallelic mutations, for a total of 15 RAG1 and 1 RAG2 distinct mutant alleles. Recombination activity of the mutant alleles was tested using Abelson virus–immortalized Rag1−/− or Rag2−/− pro-B cells, as previously described (12, 13). Patients in the OS and LS subgroups carried RAG mutant alleles that supported only modest levels of recombination activity (<7% of wild type, with a mean of 2.29%). By contrast, patients in the CID-G/AI group carried at least one RAG mutant allele that conferred higher levels of recombination activity (table S2).

Progressive restriction of the immune repertoire correlates with the severity of the clinical phenotype

To analyze and compare IGH and TRB repertoire diversity in patients with various clinical phenotypes associated with RAG mutations (Fig. 1A and table S1), we performed NGS of the IGH and TRB transcripts expressed by circulating B and T cells, respectively. The number of total and unique sequences of rearranged IGH and TRB products for each of the RAG-deficient patients and healthy infant controls is reported in table S2. Notably, productive IGH rearrangements were detected in three of five OS patients, despite virtual lack of circulating B cells. A lower ratio of unique/total IGH and TRB sequences was detected in RAG-mutated patients versus healthy controls (table S2), and this difference reached statistical significance for the IGH repertoire (P < 0.05).

Fig. 1 Progressive IGH and TRB repertoire restriction with increased clonality in patients with RAG deficiency.

(A) Schematic representation of RAG1 and RAG2 protein with the mutations of the 12 patients according to the severity of clinical presentation from top to bottom. (B) Tree maps representing the diversity and clonality of IGH and TRB repertoires from healthy donor controls (representative data from two patients are presented) and patients with RAG mutations. Each dot represents a unique V-J joining, and the size of the dot represents the relative frequency of that rearrangement in the entire population. No amplification products were obtained for the IGH repertoire from patients CID4, LS2, OS2, and OS4 and for the TRB repertoire for patients CID1, LS1, OS1, OS3, and OS5. Quantification of the diversity (C and D) and unevenness (E and F) of the IGH (C and E) and TRB (D and F) repertoires using the Shannon’s H index of diversity and the Simpson index of unevenness in healthy controls (blue circles) and in patients with CID-G/AI (purple boxes), LS (green boxes), and OS (red boxes). The cumulative frequencies of unique versus total CDR3 clonotypes are shown for the IGH (G) and TRB (H) repertoires (CDR-H3 and CDR-B3, respectively). Means ± SE values are shown; t test was used for statistical analysis. Representation of the frequency of the top 100 most abundant clones for IGH (I) and TRB (J) sequences in RAG-mutated patients and healthy controls (mean ± SE; ANOVA with post hoc test of Dunnett’s multiple comparisons with ***P < 0.001, **P < 0.01, and *P < 0.05). Sample plots illustrating the segregation of the various patient groups from healthy controls based on principal component 1 (PC1) and PC2 determined by five variables (RAG recombination activity, Shannon’s H, Simpson, and number of total and unique sequences) for the IGH (K) and TRB (L) repertoires. aa, amino acid.

A graphical representation of repertoire diversity is conveyed by tree maps of the IGH and TRB (Fig. 1B) repertoires, where each dot represents a unique V-J pair and the size of each dot corresponds to the frequency of that rearrangement in the total population of sequences obtained. Marked reduction of both TRB and IGH repertoire diversity, associated with clonotypic expansions, was detected in samples from OS patients and, to a lesser extent, from patients with LS. By contrast, a more diversified IGH repertoire was present in samples from the CID-G/AI group. However, the TRB repertoire of CID-G/AI patients was characterized by restrictions and clonotypic expansions.

To provide more quantitative measures of repertoire diversity and complexity, we took advantage of commonly used ecological parameters. In particular, the Shannon’s H index measures repertoire diversity, taking into account both the number of total sequences and clonal size distribution in the overall repertoire. As compared with healthy donors, RAG-deficient patients had a lower Shannon’s H index for both IGH (Fig. 1C) and TRB (Fig. 1D) repertoires. When the same analysis was applied to each of the three subgroups of patients with RAG deficiency (CID-G/AI, LS, and OS), a significant reduction of the Shannon’s H index was observed only for the IGH repertoire in OS patients (fig. S1, A and B). Next, to assess more precisely clonal size distribution, we calculated the Simpson index of unevenness, which measures the inequality in the relative representation of species observed in a given sample so that the higher the Simpson index, the more unequal the distribution of individual clonotypes. An uneven distribution of both IGH and TRB clonotypes was observed in RAG-deficient patients versus controls (Fig. 1, E and F). This difference was statistically significant for OS patients, but a clear trend was observed for the TRB repertoire of CID-G/AI and LS patients (fig. S1, C and D).

To analyze further the presence of clonotypic expansions, we estimated the diversity 50 (D50) index (14), which corresponds to the percentage of unique CDR3 sequences that account for 50% of the total number of sequences observed. Less than 10% of the unique clonotypes accounted for 50% of the total number of IGH sequences in patients with OS (Fig. 1G). Clonotypic expansions, resulting in markedly reduced D50, were observed in the TRB of all RAG-mutated patients, irrespective of their clinical phenotype (Fig. 1H). The top 100 most abundant IGH and TRB CDR3 (CDR-H3 and CDR-B3) clonotypes accounted for less than 0.3% of all transcripts in healthy donors. With the exception of a single clonotype in patient CID3, all patients with CID-G/AI showed a similar representation of CDR-H3 clonality, whereas a significant expansion of CDR-H3 clonotypes was detected in patients with OS and in patient LS3 (Fig. 1I). A different pattern was observed for the top 100 CDR-B3 clonotypes. In particular, two CDR-B3 clonotypes accounted for more than 50% of all total sequences in patient OS4, and a significant expansion of CDR-B3 clonotypes was also observed in patients with CID and LS (Fig. 1J). Overall, these data demonstrate that restriction and clonotypic expansions characterize both the TRB and the IGH repertoires of patients with OS, whereas in patients with CID-G/AI and LS, abnormalities of repertoire diversity are comparatively subtle and largely confined to T cells.

Last, to assess whether analysis of the T and B cell repertoire may distinguish RAG-deficient patients from healthy controls, we used principal components analysis (PCA) based on five variables: the number of total and unique sequences, Shannon’s H index, Simpson index, and recombination activity of the mutant RAG protein. PCA successfully segregated healthy donors from the patients (Fig. 1, K and L) and permitted discrimination among different groups (CID-G/AI, LS, and OS) of RAG-deficient patients, especially with respect to the IGH repertoire (Fig. 1K).

Nonstochastic restriction of IGH and TRB repertoires and skewed usage of V, D, and J genes in RAG-mutated patients

The analysis of repertoire diversity and composition among unique and total sequences permits distinguishing between constraints that occur during generation of the primary repertoire and secondary effects that occur in the periphery, such as clonotypic expansions in response to non–self-antigens or self-antigens. To determine whether RAG mutations alter targeting of individual V, D, and J genes, we generated heat maps comparing usage of these genes in unique IGH (Fig. 2A) and TRB (Fig. 2B) sequences from healthy controls and patients. In these panels, coding genes are ordered according to their location along the chromosome, making it possible to ascertain whether skewed gene usage could reflect topological constraints. The summary for the χ2 test for goodness of fit shows that in most RAG-mutated patients, the distribution of V, D, and J gene usage among unique IGH and TRB sequences was distinct from that observed in healthy controls (Fig. 2, A and B). Results were similar when the same analysis was applied to total sequences (fig. S2, A and B).

Fig. 2 Differential usage of V, D, and J genes in the IGH and TRB repertoires of patients with RAG deficiency.

Heat map representing the frequency of V, D, and J gene usage among unique IGH (A) and TRB (B) sequences from healthy controls and RAG-mutated patients. (C) Relative frequency of usage of IGHV and IGHD gene families, and of individual IGHJ genes, in healthy controls and in patients (top) and in Abelson virus–transformed pro-B cell lines expressing various RAG1 mutations (bottom). (D) Relative frequency of usage of TRBV gene family and of TRBD and TRBJ genes in healthy controls and in patients (top) and in iPS-derived thymocytes (bottom). (E) Differential usage of IGHV, IGHD, and IGHJ genes, segregating control and patient samples and the various genes according to PC1 and PC2, is shown as sample plots (left) and variable plots (right). (F) Differential usage of TRBV, TRBD, and TRB genes, segregating control and patient samples and the various genes according to PC1 and PC2, is shown as sample plots (left) and variable plots (right).

The frequency of individual V, D, and J gene usage among unique sequences of the IGH repertoire was different in healthy controls and RAG-mutated patients (Fig. 2C, top). For example, the IGHJ3 gene was the second most frequently used IGHJ gene in patients with RAG deficiency, whereas it was only fourth in order in healthy controls. Conversely, IGHJ6 was the second most commonly used gene in controls but was only fourth in order in RAG-mutated patients. The IGHD6 gene was the third in frequency among healthy controls, but it was the most commonly used in patients LS1 and OS5. Similar abnormalities were also observed in the frequency of usage of TRBV, TRBD, and TRBJ genes in patients versus controls (Fig. 2D, top). Moreover, for both IGH and TRB repertoires, the distribution of gene usage varied among RAG-mutated patients. The observation that such differences were present when analyzing unique sequences suggested that hypomorphic RAG mutations may alter selection of genes involved in V(D)J recombination during generation of the primary immune repertoire. To test this hypothesis, we analyzed the pattern of usage of individual V, D, and J genes at early stages of B and T lymphocyte development. In particular, we compared the frequency of IGHV, IGHD, and IGHJ gene usage in IGH transcripts from immortalized Rag1−/− pro-B cells engineered to express wild-type or mutant human RAG1 (12). As shown in Fig. 2C (bottom), the frequency of usage of individual V, D, and J genes among rearranged IGH products was different when comparing cells reconstituted with wild-type or mutant RAG1. To assess whether RAG mutations affect the composition of T cell repertoire at early stages of T cell development, we analyzed the frequency of usage of TRBV, TRBD, and TRBJ genes among productive TRB rearrangements during in vitro T cell differentiation of induced pluripotent stem (iPS) cells (15). A different pattern of gene usage was observed in control versus RAG1 mutant cells (Fig. 2D, bottom). Together, these data confirm what was observed in vivo in the patients (Fig. 2, C and D, top) and indicate that hypomorphic RAG mutations affect not only the efficiency but also the quality of the V(D)J recombination process.

To further illustrate this, PCA of individual IGHV, IGHD, and IGHJ gene usage clearly segregated patients from controls and even distinguished among patients with different phenotypes (Fig. 2E, left). In the variable plot analysis (Fig. 2E, right), the distribution of the 10 most abundantly used IGHV, IGHD, and IGHJ genes along PC1 and PC2 is shown for the entire population of patients analyzed. By overlaying sample and variable plots, it was possible to define which genes are preferentially used in each subgroup of patients. Thus, the T and B cell repertoire of CID-G/AI patients included overrepresentation of IGHV3-9, IGHD2-2, IGHD3-9, IGHD4-11, IGHD4-17, IGHD7-27, and IGHJ3 genes. The IGHJ1, IGHJ2, and IGHJ4 genes were more abundantly used in patients with OS and LS, and the IGHD2-8 and IGHD6-25 genes were preferentially used in OS patients. Similarly, PCA segregated the patients from the controls based on usage of TRBV, TRBD, and TRBJ genes (Fig. 2F), with clear distinction among the various groups of RAG-mutated patients when the analysis was conducted on TRBV and TRBJ gene usage. Upon overlaying sample plots with variable plots, patients with CID-G/AI were found to have increased usage of TRBJ1-1, TRBJ1-3, and TRBJ1-5, whereas overutilization of TRBV5-8, TRBV11-2, TRBV4-1, and TRBV18 was detected in patients with LS, and preferential usage of TRBJ2-7 was observed in patients with OS. Similar results were obtained when PCA of IGH and TRB gene usage was conducted on total sequences (fig. S2, C to F). Together, these data demonstrate that usage of individual V, D, and J genes distinguishes RAG-mutated patients from controls, with a specific signature of gene expression among patients with distinct phenotypes.

Abnormalities of CDR3 length and amino acid composition

Abnormalities of the complementarity-determining region 3 (CDR3) of immunoglobulin and TCR molecules as well as of CDR3 length and composition often have a significant impact on the ability to mount effective adaptive immune responses to a wide range of non–self-antigens and may also contribute to increased recognition of self-antigens in patients with autoimmune diseases (16, 17). Analysis of CDR3 length distribution of both unique (Fig. 3, A and B) and total (fig. S3, A and B) IGH and TRB transcripts demonstrated progressive skewing of the CDR3 length profile from patients with less severe to patients with more severe clinical phenotype. To better define abnormalities of CDR3 length distribution, we calculated the CDR3 complexity score and measured CDR3 skewness and kurtosis (1820). In particular, the complexity score takes into account the number of major peaks of CDR3 length (defined as those with amplitudes of at least 10% of the sum of all peak heights) and their height contribution to the sum of all peak heights. Skewness measures the asymmetry of CDR3 length distribution above and below the mean. Last, kurtosis measures the amount of events in the central part of the CDR3 distribution as opposed to the tails and therefore defines the degree of peakedness. Significant differences in the complexity score, skewness, and kurtosis of the CDR-H3 length profiles of IGH repertoire were observed for both unique and total sequences from OS patients, whereas the CDR-H3 profile of CID-G/AI and LS patients was similar to the profile observed in healthy controls (Fig. 3, C to E, and fig. S3, C to E). The CDR-H3 length was not significantly different in patients versus controls (Fig. 3F and fig. S3F). No significant differences were observed for the TRB repertoire (Fig. 3, G to J, and fig. S3, G to J), although there was a trend toward reduced kurtosis in all patient groups. Overall, the observation that a similar pattern was detected in unique and total sequences from RAG-mutated patients further indicates that abnormalities of IGH and TRB repertoires in these patients can be predominantly attributed to the effect of RAG mutations in shaping the primary repertoire.

Fig. 3 Characteristics of the CDR3 region of IGH and TRB unique sequences in peripheral blood lymphocytes.

Distribution of the length of the CDR3 region of IGH (CDR-H3) (A) and TRB (CDR-B3) (B) unique sequences from peripheral blood of patients with RAG deficiency and healthy controls (C1 to C4 and C6 to C8). In (A) and (B), the distribution of the CDR3 length in healthy controls is depicted as a blue line (representing means ± SE). Complexity scores (C and G), skewness (D and H), kurtosis (E and I), and average length in nucleotides (F and J) of the IGH (C to F) and TRB (G to J) CDR3 unique sequences in patients with RAG deficiency and controls. In (C) to (J), for each group, mean values are shown, and statistical significance was assessed by ANOVA.

The CDR3 length is determined not only by the length of V, D, and J gene sequences that are part of it but also by the addition of palindromic (P) and “N” nucleotides that contribute to its junctional diversity. The germline index (GI) can be used to estimate the abundance of P and N nucleotides and is calculated by dividing the number of nucleotides in the CDR3 that are encoded by V, D, and J genes by the total number of nucleotides contained in the CDR3, generating a value between 0 and 1 (21). A GI value of 1 indicates lack of P and N nucleotide addition; thus, the higher the GI, the lower the junctional diversity. With the exception of a lower proportion of unique sequences containing P nucleotides in patients with OS, no significant differences were observed in P and N nucleotide addition (fig. S4) and in GI value (fig. S5, A and B) within IGH and TRB sequences from patients with RAG mutations and controls. Furthermore, CDR-B3 sequences from RAG-mutated patients had a GI value between 0.8 and 0.85, which is similar to the GI value observed in healthy controls in this study (fig. S5A) and in a previous report (21). Last, when comparing the proportion of unique (fig. S5C) and total (fig. S5D) CDR-B3 sequences with GI = 1, there was a trend toward a reduced proportion of these among total sequences from patients with RAG mutations, irrespective of the disease phenotype.

The average hydrophobicity of the amino acids within the CDR-H3 loop of circulating B cells forms a Gaussian distribution centering on neutrality to mild hydrophilicity (22, 23). In patients with RAG deficiency, skewed usage of V, D, and J genes may affect amino acid composition and the hydrophobicity profile of the CDR-H3 region, with potentially important consequences for antigen binding. We have observed increased usage of IGHJ3 in patients with CID-G/AI (Fig. 2A) and decreased usage of IGHJ6 in all the patients (Fig. 4A). The IGHJ6 gene encodes for five tyrosine residues (Y) in the CDR-H3, whereas IGHJ3 and IGHJ5 do not encode for any tyrosine residue (Fig. 4B). A low content of tyrosine residues was detected in the CDR-H3 region of immunoglobulin transcripts from patients with RAG mutations, reaching statistical significance in patients with CID-G/AI (Fig. 4C). This decreased presence of tyrosine residues was associated with abnormalities of the hydrophobicity profile of the CDR-H3 region in seven of eight RAG-mutated patients, as measured by the normalized Kyte-Doolittle index of hydrophobicity of unique (Fig. 4D) and total (Fig. 4E) CDR-H3 sequences, suggesting that enrichment for immunoglobulin transcripts with an altered hydrophobicity profile in patients with RAG mutations occurs at the level of primary repertoire generation and is not simply due to expansion of selected clonotypes in the periphery.

Fig. 4 Abnormal amino acid composition of CDR3 in the IGH and TRB sequences.

(A) Frequency of usage of the IGHJ6 gene among unique CDR-H3 sequences. (B) Summary of the Y content in the IGHJ genes. (C) Percentage of tyrosine residues in the CDR-H3 of unique and total sequences. Summary of CDR-H3 hydrophobicity profile depicted as the average Kyte-Doolittle index of hydrophobicity (mean ± SE) in patients and healthy control blood samples for unique (D) and total (E) sequences. **P < 0.01, ***P < 0.001, ****P < 0.0001, one-tailed unpaired t test. Amino acid composition of CDR-B3 in patients and healthy controls for amino acid positions 6 (F) and 7 (G) of the 13–amino acid–long CDR-B3.

OS is characterized by infiltration of peripheral tissues by activated, possibly self-reactive T cells. Recent data have shown that the hydrophobicity of amino acids at positions 6 and 7 of the 13–amino acid–long CDR-B3 promotes the development of self-reactive T cells (24). Amino acid composition at positions 6 and 7 of the 13–amino acid–long CDR-B3 was very conserved in healthy controls but not in RAG-mutated patients (Fig. 4, F and G, top). Furthermore, the CDR-B3 of patient OS4 was enriched for hydrophobic amino acid at position 6, whereas both CID3 and LS3 showed increased usage of hydrophobic amino acids at position 7 (Fig. 4, F and G, bottom).

Abnormalities of immunoglobulin class switching and somatic hypermutation in patients with OS

Abnormalities of T and B cell development in patients with RAG mutations compromise immune responses in peripheral lymphoid organs, including production of antibodies of various isotypes. In particular, low IgG and low IgA, but elevated IgE serum levels, are typically seen in patients with OS and may also be observed in LS (25). However, virtual lack of B cells in patients with OS has so far precluded analysis of the distribution of B cells expressing various isotypes in the periphery of these patients. NGS analysis of the B cell repertoire, with the use of RNA as a template and reverse primers in the IGHC region, permitted analysis of the relative abundance of immunoglobulin transcripts containing various heavy chain isotypes in the peripheral blood of patients with RAG mutations and healthy controls. Consistent with the notion that unswitched cells comprise the majority of circulating B cells in normal individuals, switched transcripts represented less than 5% of all productive IGH transcripts detected in healthy controls within both unique (Fig. 5, A and B) and total (fig. S6, A and B) sequences, as also previously reported (26). By contrast, an increased frequency of IGHG transcripts was observed within unique and total sequences of CID3, LS3, OS1, OS3, and OS5 patients, and increased frequency of IGHE transcripts was observed in OS1 and OS3 patients (Fig. 5A and fig. S6). Overall, patients with OS showed an increased frequency of IGHG and IGHE transcripts (Fig. 5B and fig. S6B). The most abundant CDR-H3 clonotypes accounted for a large proportion of total sequences in patients with OS (Fig. 5C) and were mainly represented by IGHE and/or IGHG transcripts (Fig. 5D). These data suggest that the few circulating B cells in patients with OS are represented by oligoclonal populations that have switched to IgE and IgG.

Fig. 5 Distribution of immunoglobulin heavy chain isotypes, SHM, and antigen-driven selection in peripheral blood B cells of patients with RAG deficiency.

(A and B) Frequency of immunoglobulin heavy chain constant gene usage among unique IGH sequences from peripheral blood lymphocytes of RAG-deficient patients and healthy controls. In (B), mean values ± SE are shown (one-tailed unpaired t test). (C) Contribution of the most abundant clonotype to the total number of IGH sequences in patients and controls. (D) Distribution of various isotypes among the most abundant IGH transcript. (E) Rate of SHM in IGH transcripts (mean ± SE; unpaired t test). (F) Frequency of unique IGH transcripts displaying evidence of antigen-mediated selection based on the distribution of replacement and silent mutations. (G) Rate of SHM and antigen-mediated selection in IGHV3-9 for patients with CID-G/AI and healthy controls with line at the mean.

Somatic hypermutation (SHM) introduces additional diversity in the IGH repertoire of mature B cells and allows selection of high-affinity antibodies. The SHM rate per 1000 nucleotides for all combined isotypes was higher in RAG-deficient patients than in controls, and this increase was more pronounced in patients with OS (Fig. 5E). When SHM was analyzed separately for the various isotypes, healthy controls showed a lower rate of SHM in IGHM transcripts than in switched transcripts, as expected. As compared with healthy controls, patients CID1, CID2, LS1, and OS5 had a higher SHM rate among IGHM transcripts (Fig. 5E). As for IGHG transcripts, a lower rate of SHM was observed in most of the patients compared with controls, with the exception of patients CID2 and OS5 (Fig. 5E). In the latter, a higher rate of SHM rate was observed also among IGHE transcripts. To determine whether the SHM detected in immunoglobulin transcripts from patients with RAG deficiency reflects in vivo antigen-mediated selection, we have assessed the distribution of replacement and silent mutations based on the Lossos multinomial model (fig. S7) (27). A clear evidence for antigen-mediated selection was observed in switched transcripts from patients OS3 and OS5 (Fig. 5F). Furthermore, both in patients with CID-G/AI and in controls, the rate of SHM was slightly increased in the preferentially rearranged IGHV3-9 gene as compared with the mean in all other genes, and this phenomenon was associated with clear evidence of antigen-mediated selection (Fig. 5G). Overall, these data indicate that despite restriction of primary repertoire generation, B cell function in patients with RAG deficiency remains intact with respect to class switch recombination (CSR) and SHM.

Mapping the disease-related mutations onto the synaptic RAG complex models

Recently, crystallography and cryo-electron microscopy have allowed high-resolution determination of the RAG1/RAG2 heterotetrameric complex (28, 29). We have mapped the 12 missense mutations from our patient cohort onto the cryo-electron microscopy structure in complex with synapsed RSS DNAs (28) and compared the predicted structural and functional effects (Fig. 6A) with analysis of recombination activity (12). The RAG1 mutations at positions R396, R404, R737, R841, and R973 resulted in less than 10% recombination activity (12). Whereas residues R396, R737, and R973 directly engage in RSS binding using their side chains, residue R841 plays a role in stabilizing the closed conformation of the synaptic RAG complex by forming a salt bridge with the symmetric RAG1 (Fig. 6B) (28). The RAG1 mutation at position R404 likely affects RSS binding and conformation of the nonamer binding domain dimer because it stacks with R443 of the other RAG1 monomer (Fig. 6B) (28, 29). In contrast, the RAG1 mutation H612R did not compromise the recombination activity (12), likely because of the compensation of RSS binding by the mutated arginine (Fig. 6B).

Fig. 6 Mapping the disease-related mutations onto the synaptic RAG complex models.

(A) Overview of the disease-related mutations shown as space-filling models mapped onto the ribbon diagram of the synaptic RAG complex structure [Protein Data Bank (PDB) ID: 3JBY; top and bottom view]. Residues in zebrafish rag1 and rag2 and the equivalent residues that have been mutated in patients are labeled. Only one RAG1-RAG2 subunit is labeled for explicitness on the side view. Labeled in purple and red are residues that are mutated in patients with CID-G/AI and OS, respectively. Residues affected by mutations that correspond to the allele with lower recombination activity in compound heterozygous patients are labeled in black. (B) Examples of the detailed interactions between the equivalent residues from patients and the RSS intermediates or partner residues (PDB IDs: 3GNA and 3JBY). Equivalent residues that have interaction with RSS intermediates are shown as sticks and highlighted in magenta. The nucleotides in the RSS intermediates that have interaction with protein residues are shown as sticks and highlighted in cyan. The partner residues are shown as sticks and highlighted in marine. Potential interactions are displayed as red dashed lines. z, zebrafish; m, mouse. All molecular representations were generated in PyMOL (www.pymol.org) (47).

DISCUSSION

Here, we have demonstrated that the T and B cell repertoire of patients with OS is characterized by markedly reduced diversity and a nonstochastic restriction of V, D, and J gene usage. Restriction of TRB repertoire diversity, with skewed V-J gene usage, in OS patients has also been reported by Yu et al. (21) in four patients with OS. These data suggest that severe RAG mutations may impose constraints during generation of primary T and B cell repertoires and that peripheral expansion in response to self-antigens or non–self-antigens is not the only factor involved in the dominance of few T cell clonotypes.

In contrast to what is known about T and B cell abnormalities in patients with OS, limited information is available on the richness and complexity of T and B cell repertoire in patients with milder forms of RAG deficiency. Flow cytometric analysis of the expression of various TCR Vβ families and CDR3 spectratyping have revealed that patients with CID-G/AI and with CD4 lymphopenia often maintain a largely polyclonal T cell repertoire, although in some cases under- and overrepresentation of individual TCR Vβ families have been documented (5, 7, 30, 31). However, these methods have limitations and do not permit analysis of CDR3 composition and use of individual V, D, and J genes. Here, we have shown that patients with CID-G/AI have reduced T cell repertoire diversity, with clonotypic expansions, and maintain a largely diversified B cell repertoire, with even distribution of individual clonotypes, but skewed usage of IGHV, IGHD, and IGHJ genes. We have also demonstrated that analysis of repertoire diversity and composition not only may distinguish RAG-mutated patients from controls but also correctly identifies patients with distinct clinical phenotypes. However, genotype-phenotype correlation in RAG deficiency is not absolute. Ijspeert et al. (32) have demonstrated that patients carrying RAG mutations that affect the same region in the noncore domain of the RAG1 molecule, and allow similar levels of recombination activity, may present with distinct clinical and immunological features, thus emphasizing the role played by other genetic and epigenetic factors in determining the phenotype.

The catalytic core of RAG1 contains two coding flank–sensitive regions, at amino acid positions 609 to 614 (33) and 892 to 977 (34). Several RAG1 mutations associated with CID-G/AI fall within these coding flank–sensitive regions (11, 28), including the p.H612R mutation in patient CID1 and the p.F974L mutation in patient CID3. Studies in vitro had suggested that missense mutations in the coding flank–sensitive regions of RAG1 may perturb repertoire composition not only by affecting DNA cleavage but also by preferentially targeting some coding elements (35). Patients CID1 and CID3 showed a skewed usage of individual V, D, and J genes. In particular, we have demonstrated increased usage of IGHV3-9, IGHV4-31, and IGHV3-23 that are also expressed by autoantibody-secreting B cells in patients with tumors (36). These data suggest that perturbation of repertoire composition may contribute to the immune dysregulation observed in patients with CID-G/AI.

The length and amino acid composition of the immunoglobulin CDR3 region affect recognition of antigens. Progressive reduction of CDR3 length and increase of highly hydrophobic and hydrophilic sequences during differentiation from immature to naïve and memory B cells are paralleled by a progressive decrease in the proportion of self-reactive B cell specificities during B cell ontogeny (37). We identified abnormalities of hydrophobicity profile of the CDR-H3 region in seven of eight patients with hypomorphic RAG mutations, including all three patients with CID-G/AI tested. Close examination of amino acid composition revealed decreased frequency of tyrosine residues, which are abundant in CDR-H3 sequences of peripheral blood B cells from healthy controls (38). These abnormalities reflected markedly decreased usage of the IGHJ6 gene, which encodes for five tyrosine residues. Furthermore, we have reported that the CDR-B3 of some patients with hypomorphic RAG mutations is characterized by an increased frequency of hydrophobic amino acids at positions 6 and 7 of the CDR-B3, which has been previously associated with promotion of self-reactive T cells (24).

The analysis of the distribution of immunoglobulin heavy chain isotypes and SHM has revealed unexpected features in patients with hypomorphic RAG mutations. In particular, we observed that the majority of immunoglobulin heavy chain transcripts in patients with OS were represented by switched transcripts and IgE in particular. Direct μ to ε CSR has been previously reported in immature B cells from a mouse model with hypomorphic Rag1 mutations (39). We have also demonstrated the presence of SHM in immunoglobulin transcripts from patients carrying hypomorphic RAG mutations, including patients with OS, who have very low to undetectable circulating B cells and disorganized secondary lymphoid organs, with lack of follicles and germinal centers, where SHM is actively induced (40). Our observation of CSR and SHM in peripheral blood B cells from patients with OS is consistent with previous evidence of Blimp1+ CD138+ plasma cells in lymph nodes from patients with OS, and with homeostatic expansion of immunoglobulin-secreting cells and increased expression of activation-induced cytidine deaminase (AICDA) in mouse models of this disease (41, 42).

Although this study offers insights into the mechanisms underlying the immunopathology and phenotypic heterogeneity of human RAG deficiency, it has important limitations, including the small sample size of each phenotypic subgroup of patients analyzed and the inability to study both T and B cell repertoires in each patient. Despite these limitations, we have provided a detailed analysis of T and B cell repertoire in patients with hypomorphic RAG mutations that are illustrative of the entire phenotypic spectrum of the disease. After introduction of newborn screening for SCID and related conditions, RAG mutations have emerged as the most common genetic defect associated with OS and atypical SCID (43). Detailed analysis of the immune repertoire in RAG-mutated patients may have important predictive implications and may influence therapeutic interventions.

MATERIALS AND METHODS

Study design

EDTA blood samples (1 to 5 ml) were obtained upon written informed consent from patients diagnosed with RAG deficiency and a known clinical and immunological phenotype. The study was performed under the approval of the Institutional Review Board of Boston Children’s Hospital, Harvard Medical School. For healthy controls, deidentified leftover blood samples were used, which had been obtained from children at the age of 9 months to 4 years at the time of regular well-child visits.

Generation and analysis of TCR and BCR repertoire by NGS

Equal amounts of total RNA extracted from peripheral blood of patients with RAG deficiency (n = 5 for patients with OS, n = 3 for patients with LS, and n = 4 for patients with CID-G/AI) and from peripheral blood of healthy infants (n = 3 for TRB and n = 4 for IGH; age range, 9 months to 4 years) were used as template to semiquantitatively amplify the rearrangements at the endogenous T cell receptor β (TRB) and immunoglobulin heavy chain (IGH) loci according to the manufacturer’s protocol (iRepertoire Inc.) (44). Polymerase chain reaction (PCR) products were purified and sequenced using the GS Junior 454 platform (Roche Inc.).

Raw sequences were filtered for PCR errors, and the resulting FASTA sequences were submitted to IMGT/HighV-QUEST and analyzed for V, D, and J gene usage; composition; length of the CDR-3; Kyte-Doolittle index of hydrophobicity; SHM; and antigen-mediated selection using the IgAT software (45), as previously described (23). The sequences of TRB and IGH transcripts, upon processing through IMGT, are listed in table S3. The diversity indices of Shannon’s H entropy and Simpson uneveness indices were calculated using the VDJ statistics file from IgAT analysis and the PAST program, as described (23). D50 was calculated by determining the cumulative frequency of total sequences that constitute 50% of the cumulative unique sequence frequency (14). D50 is graphically represented by plotting the cumulative frequency of total sequences on the x axis against the cumulative frequency of unique sequences on the y axis and then finding the intercept on the y axis of values that correspond to 50% on the x axis. Graphical representation of V-J pairing and the relative distribution of distinct rearrangements, hierarchical tree maps, and isotype usages were generated using the iRepertoire software. All raw data used for the analyses represented in the various figures are listed in table S3.

The complexity score of the CDR3 length distribution was determined on the basis of the following calculation (46):Embedded Imagewhere N is the number of all the sequences studied (either unique or total), MP is the major peak (defined by constituting at least 10% of N), mp is the number of all the MP, nMP is the number of sequences in each MP, and the NMPs are the numbers of MP.

RAG activity

The activity of various RAG mutant proteins was determined by a flow cytometry–based assay, as previously described (12, 13), and expressed as percentage of the recombination activity of the wild-type protein.

Statistical analysis

Unpaired t test was used to compare the patient blood samples to infant controls for variables with normal distribution. For nonparametric variables, the Mann-Whitney test was used. The χ2 test was used for categorical values. For all multiple t tests, post hoc Bonferroni correction was applied. Analysis of variance (ANOVA) with Dunnett’s correction for multiple comparisons was used when comparing more than two groups. The analyses were performed using Prism version 6 (GraphPad). Non–hypothesis-driven statistical analysis of PCA was performed using the Excel add-in Multibase package (Numerical Dynamics).

SUPPLEMENTARY MATERIALS

immunology.sciencemag.org/cgi/content/full/1/6/eaah6109/DC1

Fig. S1. Diversity indices of IGH and TRB repertoire.

Fig. S2. Differential usage of V, D, and J genes in the IGH and TRB repertoires of patients with RAG deficiency.

Fig. S3. Characteristics of the CDR3 region of IGH and TRB total sequences in peripheral blood lymphocytes.

Fig. S4. P and N nucleotide addition in the CDR3 of unique IGH and TRB sequences from controls and patients with RAG mutations and various clinical phenotypes.

Fig. S5. GI and frequency of sequences without P and N nucleotides in the TRB repertoire of patients with RAG deficiency and controls.

Fig. S6. Distribution of immunoglobulin heavy chain isotypes among total IGH sequences from controls and patients with RAG deficiency.

Fig. S7. Inference of antigen-mediated selection in combined and individual IgM, IgG, and IgE isotype transcripts from healthy controls and patients with RAG deficiency.

Table S1. Clinical and laboratory features of patients with RAG deficiency.

Table S2. Summary of RAG mutations, recombination activity, and results of IGH and TRB sequencing in healthy donors and patients with RAG deficiency.

Table S3. Raw data for all the figures (named “data for Fig. 1-6” and “data for Suppl Fig. 2-6”) and sequence of IGH and TRB transcripts from healthy controls and patients with RAG deficiency.

References (4850)

REFERENCES AND NOTES

Acknowledgments: We thank the patients and their families for participating in this study. Funding: This study was supported by grants from the NIH (2R01AI100887, 2R01AI100887, and 2U54AI082973 to L.D.N.) and the March of Dimes (1-FY13-500 to L.D.N.). Author contributions: Y.N.L. and L.D.N. designed the study, supervised analysis of the data, and wrote the manuscript; Y.N.L., F.F., K.D., I.T., L.D., F.A.V., H.R., and H.W. performed the experiments and analyzed the data; Y.N.L. performed statistical analysis; and M.A., J.H.B., D.B., M.J.B., C.C., K.C., S.C., R.A.E., A.F., R.L.F., A.R.G., D.H.E.-G., L.A.H., W.A.-H., E.H., R.P.N., S.-Y.P., N.C.P., S.M.R., P.S.-P., R.S., P.P., J.E.W., S.G., and L.D.N. provided valuable samples and gathered the data. H.R. and H.W. analyzed the potential effects of the mutations on the RAG1/RAG2 heterotetrameric structure. L.O.d.B. helped with the manuscript writing. Competing interests: The authors declare that they have no competing interests. Data and materials availability: Please see supplementary Excel file for the IGH and TRB sequences.
View Abstract

Stay Connected to Science Immunology

Navigate This Article