NodeJS has plenty of packages you can use and integrate but sometimes you need something out there. If you wish to integrate neural networks then most likely you will want to have a look at Keras or Theano or a combination of Google’s TensorFlow and some other. NodeJS makes it actually incredibly easy to call child processes, not just Python but C#, C or Scala as well. So, the recipe below is more generic than it is but let’s focus on integrating a word2vec implementation in Python as a Qwiery service.

Let’s assume that you have a QTL which allows you to ask which item does not belong in the list:

{
  "Grab": "What does not belong in this list: $1
   "Template": {
  "Answer": {
    "DataType": "Service",
    "URL": "/api/whatdoesnotfit/" + $1,
    "Path": "answer"
  }
},
"UserId": "Everyone",
"Category": "My stuff"
}

Obviously the free parameter $1 should be checked so it can be casted as a list of things but we’ll leave that aside. The essence of this QTL is that the answer will be fectched from a local service called “whatdoesnotfit”. This service will make use of the word2vec model which perfectly suits this task.

The service itself is as simple as this:

router.get('/whatdoesnotfit/:series', function(req, res, next) {
    var spawn = require('child_process').spawn;
    var py = spawn('python', ['WhatDoesNotFit.py']);
    var dataString = '';
    py.stdout.on('data', function(data) {
        dataString += data.toString();
    });

    py.stdout.on('data', function(data) {
        res.send(dataString)
    });
    // pass the QTL parameter $1 to the Python script
    py.stdin.write(JSON.stringify({"series": req.params.series}));

    py.stdin.end();
});

And the only thing you need next is to write the script. Note that you can spawn anything you like, it does not have to be Python. You can use Ping, FTP, telnet or PHP if it suits your aim.

In our case, the Python script uses a word2vec implementation with a pre-trained dataset from Google (a 2Gb matrix). Training your own model is easy but lengthy, see these articles for example:

  • deep learning with word2vec contains details on how you can train your model on the basis of texts and sentences
  • word2vec-api contains links to downloadable model based on giga- and terrabytes of real-world data.

So, assuming you have a trained model in place the actual script is just

import sys, json, numpy as np
from gensim.models import Word2Vec
def read_in():
    lines = sys.stdin.readlines()
    return json.loads(lines[0])

def main():
    j = read_in()
    model = Word2Vec.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)
    print("This item seems a little odd in the series according to me : %s"%model.doesnt_match(j["series"]))
    sys.stdout.flush()

if __name__ == '__main__':
    main()

You can see that you can integrate intelligence into Qwiery with very little effort. Of course, you should go beyond this and elevate the power of all these framework in such a way that it appropriately turns your bot into a business-intelligence or smart pal within a particular domain.