[Google_Bootcamp_Day25]

Updated: December 07, 2020

onehot

Problem of one-hot encoding

it doesn’t allow an algorithm to easily generalize words
inner product between any two different one-hot vector is zero
Example:
- sentence 1: I want a glass of orange ______ (blank : juice)
- sentecne 2: I want a glass of apple ______
- With using one-hot encoding, there is no relation between “orange” and “apple” so that it is hard to learn the blank of sentence 2 is “juice”.

feature

300D -> 2D tsne

Learn word embeddings from large text corpus (1-100B words) or download pre-trained embedding online
Transfer embedding to new task with smaller training set (ex. 100K words)
Optional: Continue to fine-tune the word embeddings with new data (when data size of new dataset is big enough)

face

“Encoding” and “Embedding” has almost the same meaning
In the case of “face encoding”: train a neural network that can take as input any face picture even if it is new images, then compute an encoding for that new picture
In the case of “word embedding”: have a fixed vocabulary and just learns a fixed embedding for each of the words in our vocabulary

feature_vector plot

one of the remarkable results about word embeddings is the generality of analogy relationships they can learn

cos func

if you learn a set of word embeddings and find a word w that maximizes this type of similarity, you can actually get the exact right answer

embedding

Assume the size of vocabulary is 10,000 (10K), and the word “orange” is 6257th
Initialize E randomly and learn all the parameters of this 300 by 10,000 dimensional matrix, then E times this one-hot vector gives you the embedding vector
In practice, it is not efficient to actually implement this as a mass matrix vector multiplication because the one-hot vectors, now this is a relatively high dimensional vector and most of these elements are zero
Use a specialized function to just look up a column of the Matrix E rather than do this with the matrix multiplication

[Source] https://www.coursera.org/learn/nlp-sequence-models

[Network] Transport Layer 2