# 点对互信息(PMI,Pointwise Mutual Information)

## 一、PMI（Pointwise Mutual Information）

$$PMI(x;y) = log\frac{p(x,y)}{p(x)p(y)} = log\frac{p(x|y)}{p(x)} = log\frac{p(y|x)}{p(y)}$$

log取自信息论中对概率的量化转换（对数结果为负，一般要再乘以-1，当然取绝对值也是一样的）

## 二、自然语言处理中使用PMI的例子

$$PMI(like;good) = log\frac{p(like,good)}{p(like)p(good)}$$

p(like,good)表示like跟good在一句话中同时出现的概率（like跟good同时出现的次数除以N2）

PMI(like,good)越大表示like的正向情感倾向就越明显。

### 三、利用PMI预测对话的回复语句关键词

$$PMI(w_{q};w_{r}) = log\frac{ p(w_{q},w_{r}) }{ p(w_{q})p(w_{r}) } = log\frac{p(w_{q}|w_{r})}{p(w_{q}))}$$

### 举个栗子

The following table shows counts of pairs of words getting the most and the least PMI scores in the first 50 millions of words in Wikipedia (dump of October 2015) filtering by 1,000 or more co-occurrences. The frequency of each count can be obtained by dividing its value by 50,000,952. (Note: natural log is used to calculate the PMI values in this example, instead of log base 2)