Contents

1 Introduction

The Immunoglobulin Basic Local Alignment Search Tool (IgBLAST) is a specialized bioinformatics tool developed by the National Center for Biotechnology Information (NCBI) for the analysis of B-cell receptor (BCR) and T-cell receptor (TCR) sequences (Ye et al., 2013). IgBLAST performs sequence alignment and annotation, with key outputs including germline V, D, and J gene assignments; detection of somatic hypermutations introduced during affinity maturation; identification and annotation of complementarity-determining regions (CDR1–CDR3) and framework regions (FR1–FR4); and both nucleotide and protein-level alignments.

igblastr is an R/Bioconductor package that provides functions to conveniently install and use a local IgBLAST installation from within R. The package is designed to make it as easy as possible to use IgBLAST in R by streamlining the installation of both IgBLAST and its associated germline databases. In particular, these installations can be performed with a single function call, do not require root access, and persist across R sessions.

The main function in the package is igblastn(), a wrapper to the igblastn standalone executable included in IgBLAST. In addition to igblastn(), the package provides:

IgBLAST is described at https://pubmed.ncbi.nlm.nih.gov/23671333/

Online IgBLAST: https://www.ncbi.nlm.nih.gov/igblast/

Please use https://github.com/HyrienLab/igblastr/issues to report bugs, provide feedback, request features (etc) about igblastr.

2 Install and load the package

Like any Bioconductor package, igblastr should be installed with BiocManager::install():

if (!require("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("igblastr")

BiocManager::install() will take care of installing the package dependencies that are missing.

Load the package:

library(igblastr)

3 Install IgBLAST

If IgBLAST is already installed on your system, you can tell igblastr to use it by setting the environment variable IGBLAST_ROOT to the path of your IgBLAST installation. See ?IGBLAST_ROOT for more information.

Otherwise, simply call install_igblast() to install the latest version of IgBLAST. As of March 2025, NCBI provides pre-compiled versions of IgBLAST for Linux, Windows, Intel Mac and Mac Silicon. install_igblast() will automatically download the appropriate pre-compiled version of IgBLAST for your platform from the NCBI FTP site, and install it in a location that will be remembered across R sessions.

if (!has_igblast())
    install_igblast()

See ?install_igblast for more information.

Note that we use has_igblast() to avoid reinstalling IgBLAST if igblastr already has access to a working IgBLAST installation.

Display basic information about the IgBLAST installation used by igblastr:

igblast_info()
## igblast_root: /home/biocbuild/.cache/R/igblastr/igblast_roots/1.22.0
## igblast_build: igblast 1.22.0, build Oct 11 2023 16:06:20
## igblastn_version: 1.22.0
## makeblastdb_version: 2.15.0+
## OS/arch: Linux/x86_64
## organisms: human, mouse, rabbit, rat, rhesus_monkey

4 Install and select a germline database

The igblastr package includes a FASTA file containing 8,437 paired heavy and light chain human antibody sequences (16,874 individual sequences) retrieved from OAS. These sequences will serve as our query sequences, that is, the immunoglobulin (Ig) sequences that we will analyze in this vignette.

Before we can do so, the igblastn standalone executable in IgBLAST (and, by extension, our igblastn() function) needs access to the germline V, D, and J gene sequences for humans. Before igblastn can use them, these germline sequences must first be stored in three separate BLAST databases. We will refer to these as the V-region, D-region, and J-region databases, respectively. When considered together, we will refer to them as the germline database.

4.1 Built-in germline databases

The igblastr package includes a set of built-in germline databases for human and mouse that were obtained from the OGRDB database (AIRR community). These can be listed with:

list_germline_dbs(builtin.only=TRUE)
## Warning in .warn_if_old_builtin_AIRR_human_db_exists(): In igblastr 0.99.23, the following built-in germline dbs were added:
##   _AIRR.human.IGH+IGK+IGL.202309, _AIRR.human.IGH+IGK+IGL.202309.src,
##   _AIRR.human.IGH+IGK+IGL.202410, _AIRR.human.IGH+IGK+IGL.202410.src.
## 
##   Note that _AIRR.human.IGH+IGK+IGL.202410.src is exactly the same as
##   _AIRR.human.IGH+IGK+IGL.202501, only the name of the db is different.
##   The new name is the result of a revisited naming scheme for the built-in AIRR
##   germline dbs for human. See the Value section in '?list_germline_dbs' for
##   more information.
## 
##   From now on, please make sure to always use
##   "_AIRR.human.IGH+IGK+IGL.202410.src" instead of
##   "_AIRR.human.IGH+IGK+IGL.202501" in your code.
## 
##   To get rid of this warning, remove germline db _AIRR.human.IGH+IGK+IGL.202501
##   with 'rm_germline_db("_AIRR.human.IGH+IGK+IGL.202501")'
##  db_name                                     V  D  J
##  _AIRR.human.IGH+IGK+IGL.202309            342 31 23
##  _AIRR.human.IGH+IGK+IGL.202309.src        354 33 24
##  _AIRR.human.IGH+IGK+IGL.202410            342 31 23
##  _AIRR.human.IGH+IGK+IGL.202410.src        354 33 24
##  _AIRR.human.IGH+IGK+IGL.202501            354 33 24
##  _AIRR.mouse.CAST_EiJ.IGH+IGK+IGL.202501   184  9 22
##  _AIRR.mouse.LEWES_EiJ.IGH+IGK+IGL.202501  169 11 22
##  _AIRR.mouse.MSM_MsJ.IGH+IGK+IGL.202501    172  9 22
##  _AIRR.mouse.NOD_ShiLtJ.IGH+IGK+IGL.202501 149  9 22
##  _AIRR.mouse.PWD_PhJ.IGH+IGK+IGL.202501    184 10 22

The last part of the database name indicates the date of the download in YYYYMM format.

If our query sequences are from humans or from one of the mouse strains listed above, we can already select the appropriate database with use_germline_db() and skip the subsection below.

4.2 Install additional germline databases

AIRR/OGRDB and IMGT/V-QUEST are two providers of germline databases that can be used with IgBLAST. If, for any reason, none of the built-in AIRR/OGRDB germline databases is suitable (e.g., if your query sequences are not from human or mouse), you can use install_IMGT_germline_db() to install additional germline databases. Below, we show how to install the latest human germline database from IMGT/V-QUEST.

First, we list the most recent IMGT/V-QUEST releases:

head(list_IMGT_releases())
## [1] "202531-1" "202530-1" "202518-3" "202506-1" "202449-1" "202430-2"

The organisms included in release 202518-3 are:

list_IMGT_organisms("202518-3")
##  [1] "Aotus_nancymaae"          "Bos_taurus"              
##  [3] "Camelus_dromedarius"      "Canis_lupus_familiaris"  
##  [5] "Capra_hircus"             "Chondrichthyes"          
##  [7] "Danio_rerio"              "Equus_caballus"          
##  [9] "Felis_catus"              "Gadus_morhua"            
## [11] "Gallus_gallus"            "Gorilla_gorilla_gorilla" 
## [13] "Heterocephalus_glaber"    "Homo_sapiens"            
## [15] "Ictalurus_punctatus"      "Lemur_catta"             
## [17] "Macaca_fascicularis"      "Macaca_mulatta"          
## [19] "Mus_musculus"             "Mus_musculus_C57BL6J"    
## [21] "Mustela_putorius_furo"    "Neogale_vison"           
## [23] "Nonhuman_primates"        "Oncorhynchus_mykiss"     
## [25] "Ornithorhynchus_anatinus" "Oryctolagus_cuniculus"   
## [27] "Ovis_aries"               "Pongo_abelii"            
## [29] "Pongo_pygmaeus"           "Rattus_norvegicus"       
## [31] "Salmo_salar"              "Sus_scrofa"              
## [33] "Teleostei"                "Tursiops_truncatus"      
## [35] "Vicugna_pacos"

Next, we install the human germline database from the latest IMGT/V-QUEST release:

install_IMGT_germline_db("202518-3", organism="Homo sapiens", force=TRUE)

See ?install_IMGT_germline_db for more information.

Finally, we select the newly installed germline database for use with igblastn():

use_germline_db("IMGT-202518-3.Homo_sapiens.IGH+IGK+IGL")

See ?use_germline_db for more information.

4.3 Display the full list of cached germline dbs

To see the full list of cached germline dbs:

list_germline_dbs()
## Warning in .warn_if_old_builtin_AIRR_human_db_exists(): In igblastr 0.99.23, the following built-in germline dbs were added:
##   _AIRR.human.IGH+IGK+IGL.202309, _AIRR.human.IGH+IGK+IGL.202309.src,
##   _AIRR.human.IGH+IGK+IGL.202410, _AIRR.human.IGH+IGK+IGL.202410.src.
## 
##   Note that _AIRR.human.IGH+IGK+IGL.202410.src is exactly the same as
##   _AIRR.human.IGH+IGK+IGL.202501, only the name of the db is different.
##   The new name is the result of a revisited naming scheme for the built-in AIRR
##   germline dbs for human. See the Value section in '?list_germline_dbs' for
##   more information.
## 
##   From now on, please make sure to always use
##   "_AIRR.human.IGH+IGK+IGL.202410.src" instead of
##   "_AIRR.human.IGH+IGK+IGL.202501" in your code.
## 
##   To get rid of this warning, remove germline db _AIRR.human.IGH+IGK+IGL.202501
##   with 'rm_germline_db("_AIRR.human.IGH+IGK+IGL.202501")'
##  db_name                                      V  D  J  
##  _AIRR.human.IGH+IGK+IGL.202309             342 31 23  
##  _AIRR.human.IGH+IGK+IGL.202309.src         354 33 24  
##  _AIRR.human.IGH+IGK+IGL.202410             342 31 23  
##  _AIRR.human.IGH+IGK+IGL.202410.src         354 33 24  
##  _AIRR.human.IGH+IGK+IGL.202501             354 33 24  
##  _AIRR.mouse.CAST_EiJ.IGH+IGK+IGL.202501    184  9 22  
##  _AIRR.mouse.LEWES_EiJ.IGH+IGK+IGL.202501   169 11 22  
##  _AIRR.mouse.MSM_MsJ.IGH+IGK+IGL.202501     172  9 22  
##  _AIRR.mouse.NOD_ShiLtJ.IGH+IGK+IGL.202501  149  9 22  
##  _AIRR.mouse.PWD_PhJ.IGH+IGK+IGL.202501     184 10 22  
##  IMGT-202518-3.Homo_sapiens.IGH+IGK+IGL     698 47 34 *
##  IMGT-202518-3.Homo_sapiens.TRA+TRB+TRG+TRD 306  6 97  
##  IMGT-202518-3.Mus_musculus.IGH+IGK+IGL     862 61 27  
##  IMGT-202518-3.Mus_musculus.TRA+TRB+TRG+TRD 429  5 96

Note that the asterisk (*) displayed at the far right of the output from list_germline_dbs() indicates the currently selected germline database (you may need to scroll horizontally to see the asterisk).

See ?list_germline_dbs for more information.

5 Select a constant region database (optional)

The igblastn standalone executable in IgBLAST can also use constant region (C region) sequences to improve the analysis. As with the germline V, D, and J gene sequences, the C-region sequences should typically come from the same organism as the query sequences, and they must also be formatted as a BLAST database. We will refer to this database as the C-region database.

The igblastr package includes a set of built-in C-region databases for human, mouse, and rabbit, obtained from IMGT/V-QUEST. These can be listed using:

list_c_region_dbs()
##  db_name                                 C
##  _IMGT.human.IGH+IGK+IGL.202412         76
##  _IMGT.human.TRA+TRB+TRG+TRD.202509     12
##  _IMGT.mouse.IGH.202509                 56
##  _IMGT.mouse.TRA+TRB+TRG+TRD.202509      9
##  _IMGT.rabbit.IGH.202412                28
##  _IMGT.rat.IGH.202508                   18
##  _IMGT.rhesus_monkey.IGH+IGK+IGL.202509 40

The last part of the database name indicates the date of the download in YYYYMM format.

If your query sequences are from human, mouse, or rabbit, you can select the appropriate database using use_c_region_db():

use_c_region_db("_IMGT.human.IGH+IGK+IGL.202412")

Calling list_c_region_dbs() again should display an asterisk (*) at the far right of the output, indicating the currently selected C-region database.

See ?list_c_region_dbs for more information.

6 Use igblastn()

Now that we have selected a germline and C-region database for use with igblastn(), we are almost ready to call igblastn() to perform the alignment.

As mentioned earlier, the igblastr package includes a FASTA file containing 8,437 paired heavy and light chain human antibody sequences retrieved from OAS. These serve as our query sequences, that is, the set of BCR sequences that we will analyse using igblastn().

To get the path to the query sequences, use:

query <- system.file(package="igblastr", "extdata",
                     "BCR", "1279067_1_Paired_sequences.fasta.gz")

The igblastr package also includes a JSON file containing metadata associated with the query BCR sequences:

json <- system.file(package="igblastr", "extdata",
                    "BCR", "1279067_1_Paired_All.json")
query_metadata <- jsonlite::fromJSON(json)
query_metadata
## $Run
## [1] 1279067
## 
## $Link
## [1] "https://doi.org/10.1038/s41586-022-05371-z"
## 
## $Author
## [1] "Jaffe et al., 2022"
## 
## $Species
## [1] "human"
## 
## $Age
## [1] 38
## 
## $BSource
## [1] "PBMC"
## 
## $BType
## [1] "Memory-B-Cells"
## 
## $Vaccine
## [1] "None"
## 
## $Disease
## [1] "None"
## 
## $Subject
## [1] "Donor-3"
## 
## $Longitudinal
## [1] "no"
## 
## $`Unique sequences`
## [1] 8437
## 
## $Isotype
## [1] "All"
## 
## $Chain
## [1] "Paired"

The 8,347 paired sequences come from memory B cells isolated from peripheral blood mononuclear cell (PBMC) samples of a single human donor (age 38) with no known disease or vaccination history. The source for these sequences is the Jaffe et al. (2022) study; the DOI link to the publication is provided above.

Before calling igblastn(), we first check the selected databases by calling use_germline_db() and use_c_region_db() with no arguments:

use_germline_db()
## [1] "IMGT-202518-3.Homo_sapiens.IGH+IGK+IGL"
use_c_region_db()
## [1] "_IMGT.human.IGH+IGK+IGL.202412"

Now, let’s call igblastn(). Since we are only interested in the best alignment for each query sequence, we set num_alignments_V to 1. This may take up to around 3 min on a standard laptop:

AIRR_df <- igblastn(query, num_alignments_V=1)

By default, the result is a tibble in AIRR format:

AIRR_df
## # A tibble: 16,874 × 111
##    sequence_id    sequence sequence_aa locus stop_codon vj_in_frame v_frameshift
##    <chr>          <chr>    <chr>       <chr> <lgl>      <lgl>       <lgl>       
##  1 heavy_chain_A… ACAACCA… QVQLVQSGSE… IGH   FALSE      TRUE        FALSE       
##  2 light_chain_A… TGAGCGC… QSVLTQPPSV… IGL   FALSE      TRUE        FALSE       
##  3 heavy_chain_A… ACTTTCT… QVQLQESGPG… IGH   FALSE      TRUE        FALSE       
##  4 light_chain_A… GAGGAGT… DIQMTQSPSS… IGK   FALSE      TRUE        FALSE       
##  5 heavy_chain_A… GGGATCA… QVQLVQSGAE… IGH   FALSE      TRUE        FALSE       
##  6 light_chain_A… GCTGGGG… QSALTQPPSA… IGL   FALSE      TRUE        FALSE       
##  7 heavy_chain_A… GGAGCCC… EVQLVESGGG… IGH   FALSE      TRUE        FALSE       
##  8 light_chain_A… GGGGGCT… QSALTQPASV… IGL   FALSE      TRUE        FALSE       
##  9 heavy_chain_A… ATACTTT… QVQLQESGPG… IGH   FALSE      TRUE        FALSE       
## 10 light_chain_A… GAGCTAC… DIVMTQSPDS… IGK   FALSE      TRUE        FALSE       
## # ℹ 16,864 more rows
## # ℹ 104 more variables: productive <lgl>, rev_comp <lgl>, complete_vdj <lgl>,
## #   d_frame <lgl>, v_call <chr>, d_call <chr>, j_call <chr>, c_call <chr>,
## #   sequence_alignment <chr>, germline_alignment <chr>,
## #   sequence_alignment_aa <chr>, germline_alignment_aa <chr>,
## #   v_alignment_start <int>, v_alignment_end <int>, d_alignment_start <int>,
## #   d_alignment_end <int>, j_alignment_start <int>, j_alignment_end <int>, …

You can call igbrowser() on AIRR_df to visualize the annotated sequences in a browser. For each sequence, the V, D, J, and C segments will be shown as well as the FWR1-4 and CDR1-3 regions.

See ?igblastn and ?igbrowser for more information.

7 Downstream analysis examples

library(ggplot2)
library(dplyr)
library(scales)
library(ggseqlogo)

One common analysis of AIRR format data is to examine the distribution of percent mutation across BCR sequences. Here we analyze the percent mutation in the V segments of each chain type (heavy, kappa, and lambda). Note that V percent mutation is 100 - v_identity.

AIRR_df |>
    ggplot(aes(locus, 100 - v_identity)) +
    theme_bw(base_size=14) +
    geom_point(position = position_jitter(width = 0.3), alpha = 0.1) +
    geom_boxplot(color = "blue", fill = NA, outliers = FALSE, alpha = 0.3) +
    ggtitle("Distribution of V percent mutation by locus") +
    xlab(NULL)

Another common analysis is to investigate the distribution of germline genes (e.g., V genes). In this case, we typically stratify the analysis by locus or chain type.

plot_gene_dist <- function(AIRR_df, loc) {
    df_v_gene <- AIRR_df |>
        filter(locus == loc) |>
        mutate(v_gene = sub("\\*[0-9]*", "", v_call)) |> # drop allele info
        group_by(v_gene) |>
        summarize(n = n(), .groups = "drop") |>
        mutate(frac = n / sum(n))
    df_v_gene |>
        ggplot(aes(frac, v_gene)) +
        theme_bw(base_size=13) +
        geom_col() +
        scale_x_continuous('Percent of sequences', labels = scales::percent) +
        ylab("Germline gene") +
        ggtitle(paste0(loc, "V gene prevalence"))
}
plot_gene_dist(AIRR_df, "IGH")

plot_gene_dist(AIRR_df, "IGK")

plot_gene_dist(AIRR_df, "IGL")

A third category of analysis focuses on CDR3 sequences, including their lengths and motifs, which are often visualized using sequence logo plots.

AIRR_df$cdr3_aa_length <- nchar(AIRR_df$cdr3_aa)

AIRR_df |>
    group_by(locus, cdr3_aa_length) |>
    summarize(n = n(), .groups = "drop") |>
    ggplot(aes(cdr3_aa_length, n)) +
    theme_bw(base_size=14) +
    facet_wrap(~locus) +
    geom_col() +
    ggtitle("Histograms of CDR3 length by locus")
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_col()`).

AIRR_df |>
    filter(locus == "IGK", cdr3_aa_length == 9) |>
    pull(cdr3_aa) |>
    ggseqlogo(method = "probability") +
    theme_bw(base_size=14) +
    ggtitle("Logo plot of kappa chain CDR3 sequences that are 9 AA long")
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## ℹ The deprecated feature was likely used in the ggseqlogo package.
##   Please report the issue at <https://github.com/omarwagih/ggseqlogo/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

8 Advanced usage

8.1 Passing additional arguments to igblastn()

The igblastn standalone executable in IgBLAST supports many command line arguments. You can quickly list them with igblastn_help() (or with igblastn_help(TRUE) to get an expanded list with more details – see ?igblastn_help):

igblastn_help()
## USAGE
##   igblastn [-h] [-help] [-germline_db_V germline_database_name]
##     [-num_alignments_V int_value] [-germline_db_V_seqidlist filename]
##     [-germline_db_D germline_database_name] [-num_alignments_D int_value]
##     [-germline_db_D_seqidlist filename]
##     [-germline_db_J germline_database_name] [-num_alignments_J int_value]
##     [-germline_db_J_seqidlist filename] [-num_alignments_C int_value]
##     [-c_region_db constant_region_database_name]
##     [-custom_internal_data filename] [-d_frame_data filename]
##     [-auxiliary_data filename] [-min_D_match min_D_match]
##     [-V_penalty V_penalty] [-D_penalty D_penalty] [-J_penalty J_penalty]
##     [-num_clonotype num_clonotype] [-clonotype_out clonotype_out]
##     [-allow_vdj_overlap] [-organism germline_origin]
##     [-domain_system domain_system] [-ig_seqtype sequence_type]
##     [-focus_on_V_segment] [-extend_align5end] [-extend_align3end]
##     [-min_V_length Min_V_Length] [-min_J_length Min_J_Length]
##     [-show_translation] [-db database_name] [-dbsize num_letters]
##     [-entrez_query entrez_query] [-query input_file] [-sra accession]
##     [-out output_file] [-evalue evalue] [-word_size int_value]
##     [-gapopen open_penalty] [-gapextend extend_penalty] [-searchsp int_value]
##     [-sum_stats bool_value] [-ungapped] [-lcase_masking] [-query_loc range]
##     [-strand strand] [-parse_deflines] [-outfmt format] [-show_gis]
##     [-num_descriptions int_value] [-num_alignments int_value]
##     [-line_length line_length] [-num_threads int_value] [-remote] [-version]
## 
## DESCRIPTION
##    BLAST for Ig and TCR sequences
## 
## Use '-help' to print detailed descriptions of command line arguments

All these command line arguments can be passed to the igblastn() function using the usual argument_name=argument_value syntax. For example, to set the number of threads to 8 (-num_threads command line argument), do igblastn(query, num_threads=8).

8.2 Restrict the search to a subset of user-supplied gene alleles

Some arguments of particular interest are the germline_db_[VDJ]_seqidlist arguments. They allow restricting the search of the germline database to a list of gene alleles supplied by the user. This list can be provided as a character vector of gene allele identifiers (e.g. IGHV3-23*01, IGHV3-23*04, etc..), or as the path to a file containing the identifiers (one identifier per line). For example:

V_alleles <- c("IGHV3-23*01", "IGHV3-23*04")
igblastn(query, germline_db_V_seqidlist=V_alleles)

If the gene alleles are stored in a file, say in path/to/my_V_gene_alleles.txt, then use:

igblastn(query, germline_db_V_seqidlist=file("path/to/my_V_gene_alleles.txt"))

Note that in this case, the path to the file containing the gene alleles identifiers must be wrapped in a call to file().

See ?igblastn for more information.

9 A TCR analysis example

Even though NCBI IgBLAST primary use case is BCR analysis, it can also be used for TCR sequence analysis, and so does the igblastr package.

File SRR11341217.fasta.gz included in the package contains 10,875 human beta chain TCR transcripts running from 5’ of reverse transcription reaction to beginning of constant region. See https://www.ncbi.nlm.nih.gov/sra/SRX7944361 for more information about this dataset.

filename <- "SRR11341217.fasta.gz"
query <- system.file(package="igblastr", "extdata", "TCR", filename)

To analyze this dataset with igblastn(), we need to install and select a human TCR germline database:

db_name <- install_IMGT_germline_db("202518-3", organism="Homo sapiens",
                                    tcr.db=TRUE, force=TRUE)
use_germline_db(db_name)
list_germline_dbs()
## Warning in .warn_if_old_builtin_AIRR_human_db_exists(): In igblastr 0.99.23, the following built-in germline dbs were added:
##   _AIRR.human.IGH+IGK+IGL.202309, _AIRR.human.IGH+IGK+IGL.202309.src,
##   _AIRR.human.IGH+IGK+IGL.202410, _AIRR.human.IGH+IGK+IGL.202410.src.
## 
##   Note that _AIRR.human.IGH+IGK+IGL.202410.src is exactly the same as
##   _AIRR.human.IGH+IGK+IGL.202501, only the name of the db is different.
##   The new name is the result of a revisited naming scheme for the built-in AIRR
##   germline dbs for human. See the Value section in '?list_germline_dbs' for
##   more information.
## 
##   From now on, please make sure to always use
##   "_AIRR.human.IGH+IGK+IGL.202410.src" instead of
##   "_AIRR.human.IGH+IGK+IGL.202501" in your code.
## 
##   To get rid of this warning, remove germline db _AIRR.human.IGH+IGK+IGL.202501
##   with 'rm_germline_db("_AIRR.human.IGH+IGK+IGL.202501")'
##  db_name                                      V  D  J  
##  _AIRR.human.IGH+IGK+IGL.202309             342 31 23  
##  _AIRR.human.IGH+IGK+IGL.202309.src         354 33 24  
##  _AIRR.human.IGH+IGK+IGL.202410             342 31 23  
##  _AIRR.human.IGH+IGK+IGL.202410.src         354 33 24  
##  _AIRR.human.IGH+IGK+IGL.202501             354 33 24  
##  _AIRR.mouse.CAST_EiJ.IGH+IGK+IGL.202501    184  9 22  
##  _AIRR.mouse.LEWES_EiJ.IGH+IGK+IGL.202501   169 11 22  
##  _AIRR.mouse.MSM_MsJ.IGH+IGK+IGL.202501     172  9 22  
##  _AIRR.mouse.NOD_ShiLtJ.IGH+IGK+IGL.202501  149  9 22  
##  _AIRR.mouse.PWD_PhJ.IGH+IGK+IGL.202501     184 10 22  
##  IMGT-202518-3.Homo_sapiens.IGH+IGK+IGL     698 47 34  
##  IMGT-202518-3.Homo_sapiens.TRA+TRB+TRG+TRD 306  6 97 *
##  IMGT-202518-3.Mus_musculus.IGH+IGK+IGL     862 61 27  
##  IMGT-202518-3.Mus_musculus.TRA+TRB+TRG+TRD 429  5 96

See ?install_IMGT_germline_db and ?use_germline_db for more information.

When calling igblastn(), we don’t need to specify the ig_seqtype argument as it will be automatically set to "TCR" (see documentation of the ig_seqtype argument in ?igblastn for more information):

AIRR_df <- igblastn(query)
AIRR_df
## # A tibble: 10,875 × 111
##    sequence_id    sequence sequence_aa locus stop_codon vj_in_frame v_frameshift
##    <chr>          <chr>    <chr>       <chr> <lgl>      <lgl>       <lgl>       
##  1 SRR11341217.1  GTCTCAG… VSDGYSVSRS… TRB   FALSE      TRUE        FALSE       
##  2 SRR11341217.2  AATGGGG… DLKNVFPP*A… TRB   TRUE       TRUE        FALSE       
##  3 SRR11341217.3  GAGTGGA… SGFVIDKFPI… TRB   FALSE      TRUE        FALSE       
##  4 SRR11341217.4  TCTTGAA… LEGFSAQLFP… TRB   FALSE      TRUE        FALSE       
##  5 SRR11341217.5  TTACCGA… DLKNVFPP*A… TRB   TRUE       FALSE       FALSE       
##  6 SRR11341217.6  GGGGTAC… GYSVSREKKE… TRB   FALSE      TRUE        FALSE       
##  7 SRR11341217.7  GCGTGAT… SAEFPKEGPS… TRB   FALSE      TRUE        FALSE       
##  8 SRR11341217.8  TGGAGAT… GDIAEGYSVS… TRB   FALSE      TRUE        FALSE       
##  9 SRR11341217.9  CACCCAC… DLKNVFPP*A… TRB   TRUE       FALSE       FALSE       
## 10 SRR11341217.10 TGATCGG… DRFSAERPKG… TRB   FALSE      TRUE        FALSE       
## # ℹ 10,865 more rows
## # ℹ 104 more variables: productive <lgl>, rev_comp <lgl>, complete_vdj <lgl>,
## #   d_frame <lgl>, v_call <chr>, d_call <chr>, j_call <chr>, c_call <chr>,
## #   sequence_alignment <chr>, germline_alignment <chr>,
## #   sequence_alignment_aa <chr>, germline_alignment_aa <chr>,
## #   v_alignment_start <int>, v_alignment_end <int>, d_alignment_start <int>,
## #   d_alignment_end <int>, j_alignment_start <int>, j_alignment_end <int>, …

10 Future developments

At the moment, the igblastr package does not provide access to the full functionality of the IgBLAST software. Most notably, the igblastp standalone executable included in IgBLAST has no counterpart in igblastr.

Some future developments include:

11 Session information

Here is the output of sessionInfo() on the system on which this document was compiled:

sessionInfo()
## R version 4.5.1 Patched (2025-08-23 r88802)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] ggseqlogo_0.2       scales_1.4.0        dplyr_1.1.4        
##  [4] ggplot2_4.0.0       igblastr_0.99.23    Biostrings_2.77.2  
##  [7] Seqinfo_0.99.3      XVector_0.49.1      IRanges_2.43.5     
## [10] S4Vectors_0.47.4    BiocGenerics_0.55.4 generics_0.1.4     
## [13] tibble_3.3.0        BiocStyle_2.37.1   
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6         xfun_0.53            bslib_0.9.0         
##  [4] websocket_1.4.4      processx_3.8.6       vctrs_0.6.5         
##  [7] tools_4.5.1          ps_1.9.1             curl_7.0.0          
## [10] pkgconfig_2.0.3      R.oo_1.27.1          RColorBrewer_1.1-3  
## [13] S7_0.2.0             lifecycle_1.0.4      compiler_4.5.1      
## [16] farver_2.1.2         stringr_1.5.2        tinytex_0.57        
## [19] chromote_0.5.1       GenomeInfoDb_1.45.12 htmltools_0.5.8.1   
## [22] sass_0.4.10          yaml_2.3.10          later_1.4.4         
## [25] pillar_1.11.1        crayon_1.5.3         jquerylib_0.1.4     
## [28] R.utils_2.13.0       cachem_1.1.0         magick_2.9.0        
## [31] tidyselect_1.2.1     rvest_1.0.5          digest_0.6.37       
## [34] stringi_1.8.7        bookdown_0.45        labeling_0.4.3      
## [37] fastmap_1.2.0        grid_4.5.1           cli_3.6.5           
## [40] magrittr_2.0.4       dichromat_2.0-0.1    utf8_1.2.6          
## [43] withr_3.0.2          UCSC.utils_1.5.0     promises_1.4.0      
## [46] rmarkdown_2.30       httr_1.4.7           otel_0.2.0          
## [49] R.methodsS3_1.8.2    evaluate_1.0.5       knitr_1.50          
## [52] rlang_1.1.6          Rcpp_1.1.0           xtable_1.8-4        
## [55] glue_1.8.0           selectr_0.4-2        BiocManager_1.30.26 
## [58] xml2_1.4.0           jsonlite_2.0.0       R6_2.6.1