The approach in this article works for various types of questions where you need to pull an answer from outside the Qwiery box.

NodeJS has a very simple mechanism to invoke inline processes and R or Python are no exception. Let’s assume you have somewhere some data representing a time series (say, sales figures of the past year) and you want to ask Qwiery to forecast it:

  • fetching the data is a separate task and can involve accessing SharePoint, AWS, MySQL or some other data source.
  • forecasting can be done as part using some built-in methods of the data store but we’ll use the R package which has pretty much any statistical algorithms on the planet.
  • the question needs to be added to the oracle stack so Qwiery understands it and dispatches it to the correct handler
  • the computation will output an image
  • the image is fetched by the client and presented as-is

Instead of presenting the forecasted rendering within R one could also let R return the forecast as an array and render it on the client with a custom library (Kendo UI, Infragistics and so on). That’s merely a stylistic decision.

We will assume that the data can be fetched somehow and focus on integrating R in the loop. So, we’ll create a synthetic timeseries in the shape of a noisy cosine:

series = 10*cos((1:100)/(2*pi)) + rnorm(100)

Note: it’s rather obvious that if you wish to use R or Python you need to have it installed beside Qwiery. See the R Project for Statistical Computing for details.

There are many ways one can forecast a series and this is a research field on its own. Below we include a method which allows you to figure out the most appropriate algorithm, the suggestTimeSeriesModel. It will evaluate the BATS, ETS, ARIMA and TBATS models for you. As it happens, the noisy cosine series we’ll use is best forecast using the ARIMA model. Most of the magic sits in the forecast package which you need to install in R:

library(forecast)

and the custom code you need looks like so:

suggestTimeSeriesModel = function(timeseries){
  if(class(timeseries) != "ts"){
    timeseries = ts(timeseries, frequency = findfrequency(series))
  }
  results = list()

  # arima
  a = auto.arima(timeseries)
  results$ARIMA = list(AIC = a$aic, BIC = a$bic, Order = arimaorder(a))

  #bats
  b = bats(timeseries)
  results$BATS = list(AIC = b$AIC, BIC = NA)

  #ets
  e = ets(timeseries)
  results$ETS = list(AIC = e$aic, BIC = e$bic)

  #tbats
  t = tbats(timeseries)
  results$TBATS = list(AIC = t$AIC, BIC = NA)

  f = data.frame(AIC = c(a$aic, b$AIC, e$aic, t$AIC), BIC = c(a$bic, NA, e$bic, NA))
  rownames(f) = c("ARIMA", "BATS", "ETS", "TBATS")

  results$All = f
  results$Best = rownames (f[which(f$AIC==min(f$AIC)),])

  return(results)
}

findFrequency =  function(x){
  n <- length(x)
  spec <- spec.ar(c(x),plot=FALSE)
  if(max(spec$spec)>10) # Arbitrary threshold chosen by trial and error.
  {
    period <- round(1/spec$freq[which.max(spec$spec)])
    if(period==Inf) # Find next local maximum
    {
      j <- which(diff(spec$spec)>0)
      if(length(j)>0)
      {
        nextmax <- j[1] + which.max(spec$spec[j[1]:500])
        period <- round(1/spec$freq[nextmax])
      }
      else
        period <- 1
    }
  }
  else
    period <- 1
  return(period)
}

getArimaParameters <- function(timeseries){
  a = auto.arima(timeseries)
  results = list()
  results$Order = forecast::arimaorder(a)
  results$Frequency = forecast::findfrequency(timeseries)
  return(results)
}

getARIMAForecast <- function(series = NULL, howLong = 14){
  if(is.null(series)){
    series = DECData$Temporal$GlobalTimeSeries
  }
  ar = auto.arima(series)
  f = forecast.Arima(ar, h=howLong)
  results = list()

  results$Confidence80 = list("Upper" = f$upper[,1], "Lower" = f$lower[,1])
  results$Confidence95 = list("Upper" = f$upper[,2], "Lower" = f$lower[,2])
  results$Mean = f$mean
  results$Forecast = f;
  return(results)
}

getForecast <- function(series, frequency = NULL, type = "HoltWinters", howLong = 14){
  if(is.null(series)){
    stop("No data supplied to forecast.");
  }
  if(class(series) != "ts"){
    if(!is.null(frequency)){
      series = ts(series, frequency = frequency);
    }else{
      series = ts(series)
    }
  }
  if(howLong <1){
    stop("The 'howLong' parameter cannot be less than 1.")
  }
  if(type == "HoltWinters"){
    return(getHoltWintersForecast(series, howLong))
  }else if(type == "BATS"){
    return(getBATSForecast(series, howLong))
  }else if(type == "ARIMA"){
    return(getARIMAForecast(series, howLong))
  }else{
    stop("The specified forecasting type is not supported.");
  }

}

plotARIMAForecast <- function(series = NULL, howLong = 14){

  f = getARIMAForecast(series, howLong)
  png(filename)
  plot(f$Forecast)
  dev.off()
  invisible();
}

Don’t worry if all of this seems imposing, this part of the typical machine learning mechanics and indirectly related to Qwiery. In fact, you could use C# with Accord.Net or Python with Pandas to do the same.

The final plotARIMAForecast will save the forecast as a PNG file to the filename location which comes from the NodeJS method:

args=(commandArgs(TRUE))
filename = as.character(args[[1]])

series = 10*cos((1:100)/(2*pi)) + rnorm(100)
plotARIMAForecast(series)

Now, how does Qwiery connect to this? You need to define some QTL which will pick up the question. In a real-world scenario you would use parameters like ‘What is the sales forecast between $1 and $2’ but we will keep it simple here:

{
"Id": "FxRRRfBUX0",
"Grab": ["Forecast the sales of next month”, "Forecast"],
"UserId": "Everyone",
"Template": {
  "Answer": {
    "DataType": "Service",
    "Header": "This a forecats of the noisy cosine function:",
    "URL": "/data/forecast",
    "Path": "Content"
  }
},
"Category": "Core"
}

The Answer tell the client that it needs to call the data/forecast service. So, we have a pull scenario but you could equally well have a push scenario where NodeJS calls the R process and sends the answer. The difference depends on the kind of security you wish and how much middleware you want to include.

In any case, the client will use the ReactJS ServiceComponent to call the service and wrap the answer. If you have another technology stack (say WPF) you will need to create a custom control here. With HTML things are really easy however.

The data service calling the R-process is surprisingly simple and straightforward:

app.use("/data/forecast", function(req, res) {

  var spawn = require("child\_process").spawn;
  var outputPath = path.join(__dirname, '../images/Forecast.png');
  var rPath = path.join(__dirname, 'Data/Diverse/Forecast.R');
  var RCall = [rPath, outputPath];
  var R = spawn('/Library/Frameworks/R.framework/Versions/3.2/Resources/bin/RScript', RCall);
  R.on('error', function(err) {
      console.log(err);
  });
  R.stdout.on('data', function(data) {
      console.log(String.fromCharCode.apply(null, data));
  });
  R.on('exit', function(code) {
      if(code == 1) {
          return res.jsonp({Content: "An error occured, sorry."});
      } else {
          return res.jsonp({Content: "\<img src='/images/Forecast.png'/\>"});
      }

    });
});

Be sure to check the path to the R executable. The path used above corresponds to version 3.2 of the standard R deployment. On Windows you likely will have it under the Program Files. If you have Jupyter installed you potentially have R installed as part of the R-kernel for IPython notebooks.

That’s it. With this in place you can simply ask Qwiery to ‘forecast’ or whatever the parametrized question you defined:

QwieryForecast