Think of your favorite NLP application that you wish to build - sentiment analysis, named entity recognition, machine translation, information extraction, summarization, recommender system, to name a few. A key step towards achieving any of the above task is - using the right set of techniques to represent text in a form that machine can easily understand.

Unlike images, where directly using the intensity of pixels is a natural way to represent the image; in case of text there is no such natural representation. No matter how good is your ML algorithm, it can do only so much unless there is a richer way to represent underlying text data. Thus, whatever NLP application you are building, it’s imperative to find a good representation for your text data. Motivated from this, the subfield of representation learning of text for NLP has attracted a lot of research interest in the past few years.

In this bootcamp, we will understand key concepts, maths, and code behind the state-of-the-art techniques for text representation. Various representation learning techniques have been proposed in literature, but still there is a dearth of comprehensive tutorials that provides full coverage with mathematical explanations as well as implementation details of these algorithms to a satisfactory depth.

This bootcamp aims to bridge this gap. It aims to demystify, both - Theory (key concepts, maths) and Practice (code) that goes into building these representation schemes. At the end of this bootcamp participants would have gained a fundamental understanding of these schemes with an ability to implement them on datasets of their interest.

Target Audience

Machine learning practitioners
Anyone (researcher, student, professional) learning NLP
Corporates and Start-ups looking to add NLP to their product or service offerings

Pre-requisites

This is a hands-on course and hence, participants should be comfortable with programming. Familiarity with python data stack is ideal.
Prior knowledge of machine learning will be helpful. Participants should have some practice with basic NLP problems e.g. sentiment analysis.
While the DL concepts will be taught in an intuitive way, some prior knowledge of linear algebra and probability theory would be helpful.

Resources

The material for the bootcamp is hosted on github. You can find slides for this workshop here.

This is from the popular bootcamp series by the speakers on NLP. Additional materials relevant would be shared prior to the bootcamp.

Approach

This would be a two-day instructor-led hands-on bootcamp to learn and implement an end-to-end deep learning models for natural language processing.

Day1 will cover introduction to text representation, old ways of representing text, followed by a deep dive into embedding spaces and word vectors.
Day2 will cover more advanced techniques of representing text such as Paragraph2vec/doc2vector techniques and various architectures for char2vec.

There will be four sessions of three hours each over two days .

Session 1: Introduction to representation learning

What is representation learning?
Use cases in natural language processing.
Old ways of representing text
- One-hot encoding
- Tf-idf
- N-grams
How to use pre-trained word embedding?

Session 2: Word-vectors

Introduction to word-vectors?
Different techniques of generating word-vectors
- CBOW, Skip-gram model
- Glove model
Detailed implementation of each of these models in tensorflow
Negative sampling, hierarchical softmax, tSNE
Fine-tuning pretrained embeddings

Session 3: Sentence2vec/Paragraph2vec/Doc2vec

Extending word vectors to represent sentences/paragraphs/documents
Various techniques for training doc2vec
- Doc2vec i. DM ii. DBOW
- Skip - thoughts
Detailed implementation of each of these models in tensorflow

Session 4: Char2vec

Building character embeddings
Tweet2vec - character embeddings from social data
CNN for character vectors.
fastText - character n-gram embeddings

Software Requirements

We will be using Python data stack for the bootcamp with keras and tensorflow for the deep learning component. Please install Anaconda for Python 3 for the bootcamp. Additional requirement will be communicated to participants.

Instructors

Anuj Gupta

Director - Machine Learning, Huawei Technologies

Satyam Saxena

Applied Scientist - Machine Learning, Amazon

Tickets

Venue

Loading...

ThoughtFactory, Tower D, 2nd Floor, Diamond District, Bengaluru, Karnataka 560102

Directions

Bootcamp: Learning representations of text for NLP

Learn and implement an end-to-end deep learning models for natural language processing.

19-20 May 2018, 09:00 AM - 05:00 PM, ThoughtFactory, Bangalore

Target Audience

Pre-requisites

Resources

Approach

Session 1: Introduction to representation learning

Session 2: Word-vectors

Session 3: Sentence2vec/Paragraph2vec/Doc2vec

Session 4: Char2vec

Software Requirements

Instructors

Tickets

Loading...

Venue

Loading...