Workshop: Learning representations of text for NLP

A conference on deep learning and artificial intelligence

30 July 2017, Bangalore

Think of your favorite NLP application that you wish to build - sentiment analysis, named entity recognition, machine translation, information extraction, summarization, recommender system, to name a few. A key step to building it is - using the right technique to represent the text in a form that machine can understand. In this workshop, we will understand key concepts, maths, and code behind state-of-the-art techniques for text representation.

This workshop is meant for NLP enthusiast, ML practitioners, Data science teams who often work with text data and wish to gain a deeper understanding of text representations for NLP. This will be a very hands-on workshop with jupyter notebooks to create various representations, coupled with the key concepts & maths that forms the basis of their respective theory.

Deep Learning in Images has had a phenomenal success story. One of the key reasons for it is: Rich representation of data - raw image in matrix form with RGB values.

While in images, directly using the pixel values is a very natural representation; However, when it comes to text, there is no such natural representation. No matter how good is your ML algorithm, it can do only so much unless there is a richer way to represent underlying text data. Thus, whatever NLP task/application you are building, it’s imperative to find a good representation for your text. Motivated from this, the subfield of representation learning of text for NLP has attracted a lot of interest in the past few years. Various representation learning techniques have been proposed in literature, but still there is a dearth of comprehensive tutorials that provides full coverage with the mathematical explanation and implementation details of these algorithms to a satisfactory depth. This workshop aims to bridge this gap. This workshop aims ot demystify, both - Theory (key concepts, maths) and Practice (code) that goes into these various representation schemes. At the end of workshop participants would have gained a fundamental understanding of these schemes and will be able to implement embeddings on their datasets.

We will cover the following topics:

  • Old ways of representing text
  • Introduction to Embedding spaces
  • Word-Vectors
  • Sentence2vec/Paragraph2vec/Doc2Vec
  • Character2Vec


Anuj Gupta

ML researcher at Freshdesk

Satyam Saxena

ML researcher at Freshdesk