[Google_Bootcamp_Day26]

Updated: December 08, 2020

Algorithms for learning word embeddings

Neural language model

Other contexts( ways to predict target words)
- Last 4 words
- 4 words on left & right
- Last 1 word
- Nearby 1 word
  If you really want to build a language model, it’s natural to use the last few words as a context. But if your main goal is really to learn a word embedding, then you can use all of these other contexts and they will result in very meaningful work embeddings as well

Word2vec algorithm

Skip-grams model
- the model uses the current(context) word to predict the surrounding window of context words
- represents words as a vectors and learns to bring similar context words near to one another
ex. “The boy is going to school.”
- Assume “is” is current(context) word, and context size is 2
- Input : “is”, Output: “The”, “boy”, “going”, “to”

skip_gram

Input layer
: input one-hot vector of current(context) word
Input -> Hidden
: get the value of multiplication(input * Embedding word matrix = embedding vector of current word)
Hidden -> Output
: get the value of multiplication(embedding vector * output word matrix)
(output word matrix : learnable parameter Theta_t)
Ouput
: apply softmax to get probability vector
Calculate loss between y_hat and y

Loss function

loss

Problem and Solutions of skip-gram model

Computational speed
- Softmax unit : denominator should calculate the sum over the entire vocabulary size
Solution 1: Hierarchical softmax classifier (appropriate for rarely-used words)
Solution 2: Negative Sampling (appropriate for frequently-used words)

How to sample the context c

In practice the distribution of words isn’t taken just entirely uniformly at random for the training set purpose, but instead there are different heuristics that you could use in order to balance out something from the common words together with the less common words.

Negative Sampling

neg

Positive example
- pick a context word, then pick target word within the window size
Negative example
- fix context word, then pick a target word at random from the dictionary

computation

How to sample negative examples

frequency in corpus

frequency

GloVe (global vectors for word representation)

glove

Sentiment Classification

senti

Problem : hard to have huge label training set
Solution : word embeddings that you can take may help you to understand much better especially when you have a small training set

Simple Sentiment Classification model

sentiment

Problem : ignores word order
- ex. “Completely lacking in good taste, good service, and good ambience” is actually negative review, but the word ‘good’ appears 3 times which means when we average it the ratings go up.
Solution : Use many-to-one RNN model for sentiment classification

The probelm of bias in word embeddings

bias

Word embeddings can reflect gender, ethnicity, age, sexual orientation, and other biases of the text used to train the model.

Addressing bias in word embeddings

solution

[Source] https://www.coursera.org/learn/nlp-sequence-models

Share on

Twitter Facebook LinkedIn

Jeongho Shin (Leo)

[Google_Bootcamp_Day26]

Neural language model

Word2vec algorithm

Problem and Solutions of skip-gram model

How to sample the context c

Negative Sampling

How to sample negative examples

GloVe (global vectors for word representation)

Sentiment Classification

Simple Sentiment Classification model

The probelm of bias in word embeddings

Addressing bias in word embeddings

Share on

Leave a comment

You may also enjoy

[Network] Transport Layer 2

[Network] Transport Layer 1

[Web Backend] SQL Basic / JDBC

[Data Structure] Linked_List