无码av一区二区三区无码,在线观看老湿视频福利,日韩经典三级片,成 人色 网 站 欧美大片在线观看

歡迎光臨散文網(wǎng) 會(huì)員登陸 & 注冊(cè)

《注意力矩陣乘法》Attention as matrix multiplication

2023-02-27 20:38 作者:學(xué)的很雜的一個(gè)人  | 我要投稿


來(lái)源:https://e2eml.school/transformers.html#softmax

中英雙語(yǔ)版,由各類翻譯程序和少量自己理解的意思做中文注釋


相關(guān)文章匯總在文集:Transformers from Scratch(中文注釋)

--------------------------------------------------------------------------------------------------------------------


Feature weights could be straightforward to build by counting how often each word pair/next word transition occurs in training, but attention masks are not.?

通過(guò)計(jì)算每個(gè)單詞對(duì)/下一個(gè)單詞轉(zhuǎn)換在訓(xùn)練中發(fā)生的頻率,可以很容易地建立特征權(quán)重,但注意力掩碼不是。

Up to this point, we've pulled the mask vector out of thin air.?

到目前為止,我們已經(jīng)憑空拉出了掩模矢量。

How transformers find the relevant mask matters.?

transformers是如何找到相關(guān)的掩碼。

It would be natural to use some sort of lookup table, but now we are focusing hard on expressing everything as matrix multiplications.?

使用某種查找表是很自然的,但現(xiàn)在我們專注于將所有內(nèi)容表示為矩陣乘法。

We can use the same?lookup?method we introduced above by stacking the mask vectors for every word into a matrix and using the one-hot representation of the most recent word to pull out the relevant mask.

我們可以使用與上面介紹的相同的查找方法,將每個(gè)單詞的掩碼向量堆疊到一個(gè)矩陣中,并使用最新單詞的獨(dú)熱表示來(lái)提取相關(guān)的掩碼。

In the matrix showing the collection of mask vectors, we've only shown the one we're trying to pull out, for clarity.

在顯示掩碼向量集合的矩陣中,為了清楚起見(jiàn),我們只顯示我們?cè)噲D提取的那個(gè)。

We're finally getting to the point where we can start tying into the paper.

我們終于到了可以開(kāi)始進(jìn)入到論文的地步。

This mask lookup is represented by the?QK^T?term in the attention equation.

這種掩碼查找由注意方程中的QK^T項(xiàng)表示。

The query?Q?represents the feature of interest and the matrix?K?represents the collection of masks.

查詢 Q 表示感興趣的特征,矩陣 K 表示掩碼的集合。

Because it's stored with masks in columns, rather than rows, it needs to be transposed (with the?T?operator) before multiplying.

因?yàn)樗怯醚诖a存儲(chǔ)在列中,而不是在行中,所以在乘法之前需要轉(zhuǎn)置(使用 T 運(yùn)算符)。

By the time we're all done, we'll make some important modifications to this, but at this level it captures the concept of a differentiable lookup table that transformers make use of.

當(dāng)我們完成所有操作時(shí),我們將對(duì)此進(jìn)行一些重要的修改,但在此級(jí)別,它捕獲了transformers使用的可微查找表的概念。

《注意力矩陣乘法》Attention as matrix multiplication的評(píng)論 (共 條)

分享到微博請(qǐng)遵守國(guó)家法律
云浮市| 永新县| 绥芬河市| 泗阳县| 射洪县| 灌云县| 崇州市| 塔城市| 平陆县| 靖宇县| 托克托县| 宾阳县| 呈贡县| 天长市| 平湖市| 桃源县| 台江县| 廉江市| 定远县| 北安市| 新昌县| 大安市| 凌云县| 阿巴嘎旗| 田东县| 天全县| 左云县| 当阳市| 广东省| 贵德县| 徐州市| 柳江县| 岳普湖县| 新安县| 九寨沟县| 宣威市| 鄂托克前旗| 中西区| 那曲县| 余姚市| 东辽县|