dimension reduction
word Embedding
- Machine learn the meaning of words from reading a lot of documents without supervision.
 - Generating Word Vector is Unsupervised
 - A word can be understood by its context.
 
How to exploit the context?
- count based: If two words $w_i$ and $w_j$ frequently co-occur, $V(w_i)$ and $V(w_j)$ would be close to each other.(Glove Vector)
 
$V(w_i) \cdot V(w_j) \to N_{i,j}$, where number of times $w_i$ and $w_j$ in the same document.
- prediction based: predict next word based on previous words.
 
- take out he input of the neurons in the first layer.
 - use it to represent a word w
 - word vector. word embedding feature: V(w)
具有相同上下文的单词具有相近的分布
如何让两个weight一样?一样有什么好处?
 - Given the same initialization
 - cross entropy: 
 
two class:
- Cbow
 - skip-gram
结构信息:结构,包含关系等