Physics of Language Models: Part 3.1 + 3.2, Knowledge Storage, Extraction and Manipulation

Описание к видео Physics of Language Models: Part 3.1 + 3.2, Knowledge Storage, Extraction and Manipulation

Timecodes
0:00 - Prelude
6:59 - Toy Example and Motivation
12:07 - Definitions
16:07 - Result 1: Mixed Training
21:38 - Result 2: Pretrain and Finetune
23:37 - Result 3: Knowledge Augmentation
28:21 - Result 4: P-Probing
33:29 - Result 5: Q-Probing
36:25 - Result 6: Celebrity can help Minority
41:00 - Result 7: Bidirectional Model + MLM
46:02 - Start of Knowledge Manipulation
46:57 - Result 8: Knowledge Partial/Dual Retrieval
51:47 - Result 9: Knowledge Classification and Comparison
1:04:44 - Result 10: Knowledge Inverse Search (Reversal Curse)
1:15:37 - Conclusion

This is an expanded version of the talk I gave about the following two papers.

(Results 1-7)
Even if LLMs losslessly memorize the pretraining data, it may not be finetuned to extract knowledge from it. Probing techniques suggest that data augmentation is necessary on the pretrain level, regardless of model size, train time and finetune choices. https://arxiv.org/abs/2309.14316

(Results 8-10)
Why do LLMs need Chain of Thoughts even for basic questions (e.g. was Biden born on an even day)? We show that LLMs cannot efficiently manipulate knowledge even if such knowledge is 100% extractable; plus, inverse knowledge search is just impossible (a.k.a. reversal curse). https://arxiv.org/abs/2309.14402

Комментарии

Информация по комментариям в разработке