IMMUNOGLOBULIN ANNOTATION OPTIMIZING AND BENCHMARKING; NOVEL GERMLINE ALLELE DISCOVERY IN SELECTED AFRICAN BOVINE BREEDS
Abstract
Antibodies are critical molecules of the adaptive immune response of vertebrates. For an animal to neutralize the pathogens it will encounter in its lifetime, the diversity of antibodies it will need to produce is enormous. Vertebrates achieve this through genetic recombination of immunoglobulin genes and post-somatic transcription mutations applied to so-called “germline” alleles. Bovine antibodies have distinct immunogenetics. Available annotation tools are human-centric, and therefore not optimized to annotate bovine immunoglobulin sequences. The international ImMunoGeneTics information system (IMGT) database is a global database reference in immunogenetics and immunoinformatics. Some information of germline alleles from the IMGT database is not up to date for most species and germline gene databases, which are not complete. Studies of germline alleles identification of cattle through novel allele discovery are necessary to complete germline gene databases of species. These discoveries are one step towards understanding the immunological complexity of these species. Using simulated bovine datasets, benchmarking the performance of three annotation tools IgBlast, IMGT/HighV-QUEST, and MiXCR was done based on frequencies and distribution of the correctly and wrongly identified antibody sequences. Two methods of germline allele discovery IgDiscover and TIgGER are evaluated to determine their suitability for bovine germline allele discovery. Immunoglobulin M sequences of three African bovine breeds, Ndama, Ankole, and Boran, were used in this analysis while the Friesian cattle breed used as a control. For annotation of VH gene, IMGT/HighV-QUEST and IgBlast yielded a more accurate annotation with a 4% error rate, compared to MiXCR, which had a 13% error rate. MiXCR annotated JH genes with an error rate of 15% compared to IgBlast and IMGT/HighV-QUEST, which had an error rate of 40% and 43%, respectively. IgDiscover identified 18 novel alleles in Boran, 6 novel alleles in Ndama, and 3 novel alleles in Ankole. TIgGER, on the other hand, identified 7 novel alleles in Boran, 18 novel alleles in Ndama, and 1 novel allele in Ankole. Using pairwise Hamming distances of these novel alleles, it was observed that African
xiv
novel germline alleles are more diverse compared to the Friesian novel germline alleles. This discovery of novel alleles shows that there is a need for further studies in characterizing the immune system of African breeds.