How is Transfer Learning an essential tool in Deep Learning?

Axel Chenu
5 min readApr 1, 2021
Transfer Learning: What Is It & Why It Is Used?

Introduction

Nowadays, in Data Sciences, Machine Learning, more in particular, Deep Learning is achieving spectacular performance results, unexpected few years ago, in certain areas such as computer vision and recently in Natural language processing.

Deep Learning uses neural networks, achieves these feats only because the volume of data used for learning processes is very high and also because the required computing power is now available thanks to GPU (Graphics Processing Units) and TPU (TensorFlow Processing Units).

Neural networks are usually built with a composing architecture of different, many parameters, and use up to several hundred thousand, or even millions of learning data depending on the use case.

The biggest constraints of Deep Learning are the volumes of data required as well as its long calculation time to achieve a quality AI. These neural networks require often massive and expensive computing power. Transfer Learning provides a solution to the problem.

What is Transfer Learning in Deep Learning?

Classic Machine Learning VS Transfer Learning

How to visualize the difference between the Learning process in classic Machine Learning or with Transfer Learning?

Transfer Learning allows us to realize a Deep Learning project without spending the month of calculations. The principle is to use the knowledge acquired by a pre-trained neural network when solving a problem in order to solve another one more or less similar. For example, it is easier to learn how to ride a motorcycle if you already know how to ride a bike.

For classical Machine Learning, a model is trained for every special task. Transfer Learning allows us to deal with the learning of a task by using the existing labelled data in a pre-trained model of some related tasks.

In general, Transfer Learning has several advantages over traditional Machine Learning:

  • Saving time for model training
  • Improving performance for the most part
  • Avoid overfitting
  • Do not need a lot of training data in the target area
  • Not every model needs to be trained from scratch.

Transfer Learning is an important topic and becomes an essential topic for NLP projects. A classic NLP model captures and learns a variety of linguistic phenomena, such as long-term dependencies, grammatical context, but also negation, from the large-scale corpus.

This data can be transferred to initialize another model to work well on a specific NLP task:

  • Classification model: sentiment analysis
  • Translation template: translation from French to Italian

What are the different types of transfer learning?

The types of transfer learning from Pan and Yang (2010)

In 2010, the learning transfers were separated and then organized into different categories, shown above.

First, there is inductive transfer learning, where the source and the target task is always different. Inductive Transfer Learning can be divided into 2 categories. In sequential transfer learning, the source data’s general knowledge is transferred to only one task. In multi-task transfer learning, several tasks are learned simultaneously, and common knowledge is shared between the tasks.

Then, in transductive transfer learning, the source and target task are the same. There is another distinction between domain adoption (data from different domains) and cross-lingual-learning (data from different languages)

A quick overview of Natural Language Processing models

Today there are two categories of NLP model architecture. The first category is primarily based on Transfer Learning and LSTMs.

  • ELMo (Embeddings from Language Models) :
    Analyses the words within the context
  • ULMFiT (Universal Language Model Fine-tuning for Text Classification) consists of three steps:
    1- Training of a General Language Model in a given language from a large body of textual data such as Wikipedia.
    2- Specialization by Transfer Learning of the pre-trained General Language Model from the corpus of texts to be classified.
    3- From the corpus of texts to be classified, training of a classifier whose first layers (those which encode a text in an activation vector) come from Transfer Learning from the Specialized Language Model.

The second category is made up of Transfer Learning combined with models of self-attention.

  • BERT (Bidirectional Encoder Representations from Transformers Devlin et al. (2018)) is published by researchers at Google AI Language group. BERT uses the Transformer Encoder as the structure of the pre-train model and addresses the unidirectional constraints by proposing new pre-training objectives.
  • GPT2 (Generative Pre-Training-2, Radford et al. (2019)) is proposed by researchers at OpenAI. GPT-2 is a multilayer Transformer Decoder and the largest version includes 1.543 billion parameters.

Summary

In recent years, we are in a new generation conducive to NLP models. Transfer Learning has mainly favoured the development of new techniques until the arrival of models of attention and self-attention (2017) which are today the most influential developments in the world. NLP. The typical use case for Transfer Learning is when you have little data.

The main advances of these models are :

  • Requires less time and less target-specific data,
  • ELMo adds contextualization to word embedding,
  • ULMFiT introduced many ideas such as tuning, which reduced the error rate
  • BERT using Transformer Encoder as the structure of the pre-train model
  • GPT first uses the transformer model architecture, which advanced PNL models use.

Sources

“The beautiful thing about learning is nobody can take it away from you.”
B. B. King

--

--