[Google_Bootcamp_Day26]

Updated:

Algorithms for learning word embeddings

Neural language model

  • Other contexts( ways to predict target words)
    • Last 4 words
    • 4 words on left & right
    • Last 1 word
    • Nearby 1 word
      If you really want to build a language model, it’s natural to use the last few words as a context. But if your main goal is really to learn a word embedding, then you can use all of these other contexts and they will result in very meaningful work embeddings as well

Word2vec algorithm

  • Skip-grams model
    • the model uses the current(context) word to predict the surrounding window of context words
    • represents words as a vectors and learns to bring similar context words near to one another

    ex. “The boy is going to school.”

    • Assume “is” is current(context) word, and context size is 2
    • Input : “is”, Output: “The”, “boy”, “going”, “to”

skip_gram

  • Input layer
    : input one-hot vector of current(context) word
  • Input -> Hidden
    : get the value of multiplication(input * Embedding word matrix = embedding vector of current word)
  • Hidden -> Output
    : get the value of multiplication(embedding vector * output word matrix)
    (output word matrix : learnable parameter Theta_t)
  • Ouput
    : apply softmax to get probability vector
  • Calculate loss between y_hat and y

Loss function

loss

Problem and Solutions of skip-gram model

  • Computational speed
    • Softmax unit : denominator should calculate the sum over the entire vocabulary size softmax
  • Solution 1: Hierarchical softmax classifier (appropriate for rarely-used words)
  • Solution 2: Negative Sampling (appropriate for frequently-used words)

How to sample the context c

In practice the distribution of words isn’t taken just entirely uniformly at random for the training set purpose, but instead there are different heuristics that you could use in order to balance out something from the common words together with the less common words.

Negative Sampling

neg

  • Positive example
    • pick a context word, then pick target word within the window size
  • Negative example
    • fix context word, then pick a target word at random from the dictionary

computation

How to sample negative examples

  • frequency in corpus

frequency

GloVe (global vectors for word representation)

glove

Sentiment Classification

senti

  • Problem : hard to have huge label training set
  • Solution : word embeddings that you can take may help you to understand much better especially when you have a small training set

Simple Sentiment Classification model

sentiment

  • Problem : ignores word order
    • ex. “Completely lacking in good taste, good service, and good ambience” is actually negative review, but the word ‘good’ appears 3 times which means when we average it the ratings go up.
  • Solution : Use many-to-one RNN model for sentiment classification rnn_sentiment

The probelm of bias in word embeddings

bias

  • Word embeddings can reflect gender, ethnicity, age, sexual orientation, and other biases of the text used to train the model.

Addressing bias in word embeddings

solution

[Source] https://www.coursera.org/learn/nlp-sequence-models

Categories:

Updated:

Leave a comment