Introduction

This is an example Rscript that shows how to use the KMINE Applied Programming Interface for performing a gene set analysis. For the complete documentation of the methods that are used in this script, have a look at the official site with documentation at https://apimlqv2.tenwiseservice.nl/html/.

Set-up

Libraries

For running the script we need a few R packages. First we need the httr package, for data manipulation and visualisation we use the packagesgglot2 and dplyr from tidyverse. For making the gene-disease network at the end of the script, we use the igraph package.

library(httr) 
library(dplyr)
library(tidyr)
library(ggplot2)
library(openxlsx)
library(igraph)
library(visNetwork)

API setup

To start using the API, we first need to obtain a session cookie. This is done by running the start method on API via a GET request. This returns a token that needs to be supplied in all subsequent POST requests.

#Define the location of the webserver

base_url <- "https://apimlqv2.tenwiseservice.nl/api/mlquery/"
request  <- GET(url = paste0(base_url, 'start/'))

#Construct a payload that is sent to the server, each time you do an API call
payload <- list('apikey'            = APIKEY,
               'csrfmiddlewaretoken'= cookies(request)$value
               )
header <- add_headers(referer = 'https://apimlq.tenwiseservice.nl/')

Data loading

IDconversion

The API service works with concept_ids. In the case that you have HGNC_IDS, you can just load them. In case you have symbols, you need to convert them to HGNC IDs. This is done using the concept/search/ function. In the following lines of code we assume you have the following file available.

symbol
TDO2
IDO1
IDO2
AFMID
KMO
KYNU
# symbol file should be a file pointing to a list of gene symbols
symbols <- read.table(file = symbol_file, h=T, sep="\t")
payload[['terms']] <- paste0(symbols$symbol, collapse=",")

# Perform the POST request
request <- POST(header,
                url  = paste0(base_url, "concept/search/"),  
                body = payload
                )

converted <- content(request)
concept_ids <- tibble::enframe(unlist(converted[['result']][['hits']]))
concept_ids
## # A tibble: 46 x 2
##    name        value  
##    <chr>       <chr>  
##  1 BACG_1364   dlsT   
##  2 BACG_2992   kynU   
##  3 HGNC:11178  ACAT2  
##  4 HGNC:117081 TDO2   
##  5 HGNC:117082 TPH2   
##  6 HGNC:12008  TPH1   
##  7 HGNC:1516   CAT    
##  8 HGNC:15471  ALDH8A1
##  9 HGNC:1564   KYAT1  
## 10 HGNC:17929  AADAT  
## # … with 36 more rows

This yields a list of identifiers that can now be further used for an enrichment calculation. Note that the we have number identifiers that start with HGNC (denoting human genes) and a number of identifiers that start with BACG, denoting bacterial genes. Before continuing, we will filter away the BACG identifiers.

Enrichment analysis

API call

This section runs the API enrichment call. We only have to supply the gene ids for this. If we do not provide a background set, a background set of the same size is automatically sampled from the database.

concept_ids <- concept_ids %>% filter(grepl("HGNC", name))

payload[['concept_ids']] <- paste0(concept_ids$name, collapse=",")

request <- POST(header,
                url  = paste0(base_url, "conceptset/enrichment/"),  
                body = payload)

enrichment_results <- content(request)

# Also here we need to do some disentangling, but it is simpler because we have no nested lists.
enrichment <- tibble(results = enrichment_results[['result']][['enrichment']]) %>% 
  # Get the statistics, keeping the original name
  hoist(results, 
        id         = 'id',          # The ID of the enriched concept
        hitsfg     = 'hitsfg',      # The number of hits to the foreground set
        hitsbg     = 'hitsbg',      # The number of hits to the backround set
        enrichment = 'enrichment',  # The lodds score
        pval       = 'pval'         # The P-value based on the Fisher test
     ) %>% 
  # Remove the remainder
  select(-results) 
  
DT::datatable(enrichment %>%  
                filter(pval < 0.05, enrichment > 1) %>% 
                arrange(pval) ,
                rownames = FALSE
              )

Plot enriched terms

The enriched terms can now be plotted in order of importance. In order to create sensible biological names to the identifiers, we first run the annotation call, we attach names and other properties to the concept ids.

filtered_enrichment <- enrichment[grep("PATH",enrichment$id),] %>% 
                       filter (pval < 0.05, enrichment > 1) %>% 
                       arrange(pval) %>% 
                       head(n=150)

payload[['concept_ids']] <- paste0(filtered_enrichment$id, collapse=",")
        
request <- POST(url = paste0(base_url, "conceptset/annotation/"), header, body = payload)
result <- content(request)
annotation <- result[['result']][['annotation']]
pathway_names <- tibble(concepts = annotation) %>% 
    # Get the name and descriptions
    hoist(concepts,  symbol='name')  %>% 
    # Add the gene ids
    mutate(id = names(annotation)) %>% 
    # Remove the remainder
    select( -concepts ) %>% 
    # And unwrap and unnest
    pivot_longer(c(symbol), names_to = "type", values_to = "name") %>% 
    unnest_longer(name) %>% 
    filter(is.na(name)==FALSE)

filtered_enrichment <-  filtered_enrichment %>% inner_join(pathway_names %>% select(id, name))
tp <- filtered_enrichment %>% distinct(name,.keep_all=TRUE) %>% head(10)
#tp <- tp %>% filter(id %in% c('HP:0100614','HP:0025283','HP:0002206') == FALSE)
p <- ggplot(tp, aes(x = reorder(name,hitsfg), y = hitsfg, fill = as.numeric(pval)))
p <- p + geom_bar(stat="identity")
p <- p + coord_flip()
p <- p + theme_bw()
p <- p + theme(axis.text.y = element_text(size=14))
p <- p + guides(fill=guide_legend(title="Pvalue"))
p <- p + labs(x = "", y = 'Nr of hits with gene set')
print(p)

Network

Get the relations between the input genes and the enriched pathways. For this we use the relations method. This creates a data frame in which the genes are matched to their corresponding pathways.Using a a netwoerk visualisation method, we candisplay the results as an interactive, dragable network. By setting the title column in the data frame that describes the edges, we can create clickable links to PubMed.

payload[['concept_ids_subject']]<- paste0(concept_ids$name,    collapse=",")
payload[['concept_ids_object']] <- paste0(filtered_enrichment$id, collapse=",")
url <- paste0(base_url,'conceptset/relations/')
request <- POST(header,
                url  = url,  
                body = payload
                )
relations <- content(request)

abtable <-tibble(relations = relations[['result']][['relations']]) %>%
    hoist(relations,
          from ='subject',
          to   ='object',
          overlap = 'overlap',
          score   ='local_mi') %>%
    select(-relations) %>% 
  arrange(desc(score)) %>% 
  head(50)

#Create node and edge tables for displaying in a network

nodes <- pathway_names %>% select(-type) %>% 
  rename (label=name)%>% 
  bind_rows(concept_ids %>% rename(id=name,label=value )) %>% 
  mutate(color = ifelse(grepl("HGNC", id),"blue","red"))

edges <- abtable %>% 
  inner_join(nodes, by =c("from"="id")) %>% 
  inner_join(nodes, by =c("to"="id"), suffix=c("to","from")) %>% 
  # Add a simple link to PubMed
  mutate(title=paste0('<a href="https://pubmed.ncbi.nlm.nih.gov/?term=',labelto," AND ",labelfrom,'">PubMed</a>'),
         value=log10(score))

visNetwork(nodes %>% filter(id %in% c(edges$to,edges$from)), edges,width="800px",height = "1000px") %>% 
  visIgraphLayout() %>% 
  visNodes(font=list(size=30), size=30,shadow = list(enabled = TRUE, size = 10))  %>% 
  visEdges( 
    scaling = list(min=4,max=10),
    shadow = FALSE,
    color = list(color = "#0085AF", highlight = "#C62F4B")
  ) %>% 
  visInteraction(navigationButtons = TRUE)