[NLP] Diffusion-LM Improves Controllable Text Generation, Xiang Lisa Li et al., 2022

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

OK ROCK

[NLP] Diffusion-LM Improves Controllable Text Generation, Xiang Lisa Li et al., 2022 본문

Study/Paper Review

[NLP] Diffusion-LM Improves Controllable Text Generation, Xiang Lisa Li et al., 2022

서졍 2023. 10. 8. 18:20

[2205.14217] Diffusion-LM Improves Controllable Text Generation (arxiv.org)

Diffusion-LM Improves Controllable Text Generation

Controlling the behavior of language models (LMs) without re-training is a major open problem in natural language generation. While recent works have demonstrated successes on controlling simple sentence attributes (e.g., sentiment), there has been little

arxiv.org

Abstract

Re-training없이 언어모델의 행동을 컨트롤하는 것은 Generation task에서 중요한 open problem이다.
최근 연구들에서 simple sentence 속성(e.g, sentiment)을 성공적으로 컨트롤하는 것을 증명했으며, 더 복잡하고 fine-grained control(e.g, syntactic 구조)로 발전하고 있다.
논문에서 제안하는 Diffusion-LM(Language Model)은 continuous diffusions기반의 새로운 non-autoregrssive 언어 모델이다.
Diffusion-LM은 점차적으로 가우시안 vectors를 word vectors로 denoise(noise 줄임)해준다.
- 그 과정에서 즉각적인 변수들의 연속적이고 계층적인 성질덕분에 복잡하고 controllable generation task들의 수행을 성공적으로 해낼 수 있었다.

1. Introduction

Large Autoregrssive Language Model

LM을 real word에 적용하기 위해서는 text generation process가 컨트롤 될 수 있어야 하는데, 이를 위해 특정 요건(e.g, topic, syntactic structure)에 맞는 텍스트를 생성하는 능력을 만족해야 한다.

LM을 컨트롤하는 자연적 방법으로는 supervised data로 fine-tuning하는 방법이다.

☞ 하지만, 이것은 control task에 따라 파라미터를 업데이트 하는 비용이 많이 필요하며, multiple controls(e.g, generate text that is both positive sentiment and non-toxic)에서 수행불가하다는 문제점이 있다.

위와 같은 문제를 해소하기 위해, Light-weight & Modular plug-and-play approaches가 제안되었다.

이는 LM을 frozen하고, 얼마나 컨트롤에 맞게 잘 생성하는지 측정하는 external Classifier을 사용하는 방법론이다.

☞ 여전히 autoregrssive LM을 frozen하는 것은 어려우며, simple, attribute-level컨트롤에서만 국한된다는 문제점이 있다.

Diffusion-LM

Diffusion LM은 Gaussian noise vectors에서 시작하여, 점근적으로 word vectors에 상응되도록 denoise해준다. (아래 사진처럼)

이 과정은 continuous latent variable representation의 hierarchy를 생성하고, 이것으로 simple, gradient-based 방법으로 복잡한 control task에서 가능하게 해준다.

(... 2. Related Work 생략)

3. Problem Statement and Background

3.1. Generative Models and Controllable Generation for Text

Text Generation: 학습된 언어모델 확률 분포 p_lm(w)로부터 w를 샘플링하는 task이다. (w = [w1, w2. ..., wn])
▶ Controllable Text Generation
정의 - 조건부 확률분포 p(w | c)로부터 w를 샘플링하는 task이다. (c = control variable)
(e.g) Control task가 Syntactic(구문론적인) control일 때, c는 위 그림에서 노란박스에 있는 target syntax tree가 될 수 있다.
목표 - control target c를 만족하는 w 생성하기

Plug-and-play controllable generation 셋팅을 생각해보면, large amount of unlabeled text data로 학습을 시킨 p_lm(w)의 언어모델이 주어진다. 그리고, 각 control task에 따라 classifier p(c|w)가 주어지고, 이것은 더 작은 양의 labeled text data로 학습된다.

최종 목표는 이 두 모델을 적절히 활용하여, 베이즈 정리(p(w|c) ∝ p_lm(w) ·p(c|w))로부터 posterior p(w|c)를 샘플하는 것이다.

3.2. Autoregressive Language Models

순서를 가지는 변수들의 조건부 확률의 곱으로 데이터의 likelihood를 계산하는 모델이다.

문제점은 지금까지 생성된 partial sequence에 조건을 둔 다음 초큰을 예측하는 것을 반복하면서 속도가 느려지고 task의 범용성이 하락한다.

3.3. Diffusion Models for Continuous Domains

데이터 x_0에서 (x_0, ..., x_T)의 각 D차원마다의 변수들을 갖는 Markov Chain이 존재하고, 이떄 x_T는 Gaussian분포를따르는 데이터/변수(?)이다.

▶ Diffusion model은 latent variable인 (x_1, x_2, ..., x_T)를 denoise하여 적절한 target data distribution을 샘플하는 latent variable model이다.

Forward Diffusion process: 데이터 x_0부터 x_T-1까지 점점 noise를 time step마다 추가(지정한 nosie ratio :베타)하고, transition x_t-1 → x_t가 발생한다. Markov 성질(Markov Chain)을 가정하며 따로 학습이 진행되지 않는다.(파라미터 필요 없음)

noising 과정에서, 결과를 Gaussian 분포로 표현할 수 있다.

Reverse Diffusion process: noise 데이터에서 원래 데이터 x로 돌아오는 복원 과정(reconstructed generative model).

▶ Diffusion model은 목표는 marginal likelihood(Expectation of log_p(x_0)))를 최대화하는 것이다.

canonical(표준적인) 목표함수는 variational lower bound와 같으며, 식은 아래와 같다.

☞ 하지만, 이 목적함수는 불안정적이고 안정화하기 위해서 많은 optimization trick이 필요하기 때문에, 이러한 문제를 해결하기 위해 더 간단한 objective를 사용가능하다.

L_vlb에서 KL-divergence term의 가중치를 재할당하여 mean-squared error loss 를 얻는 방식이라고 한다.

4. Diffusion-LM : Continuous Diffusion Language Modeling

Diffusion Language Model을 설계하기 위해서는 위에서 설명한 (standard)Diffusion model에서 몇가지의 수정이 필요하다.

Define an Embedding Function
- 구분된 distinct text를 continuous space에 매핑해야 한다. 이를 위해, end-to-end training objective로 임베딩을 학습한다.(4.1절)
Rounding Method
- 임베딩 공간에 있는 벡터들을 다시 단어(word)로 바꿔주어야 한다. 이를 위해, training and decoding time methods를 사용한다.(4.2절)

4.1. End-to-End Training

연속적 Diffusion model을 Discrete한 성질의 text에 적용하려면, 각 단어들을 R_d공간에 매핑하는 Embedding function EMB(w_i)를 정의해야 한다.

EMB(w) = [EMB(w1), ..., EMB(wn)] ∈ R^nd

- Forward diffusion process : discrete word w에서 x0까지 Markov Transition 추가 ☜ word to vector Embedding step

- Reverse diffusion process: trainable rounding step을 추가

또한, diffusion model의 파라미터와 word embedding을 결합시켜 사용한 Training objective의 수정을 다음과 같이 제안한다.

Training objectives for End-to-End Learning(varational low bound ver & simple version)

4.2. Reducing Rounding Errors

Inverse process에서 rounding step에 대해서 자세히 알아보는 절이다.

Inverse process: predicted x_0를 rounding과정을 거쳐 원래의 discrete text로 복원(reconstrct)시킨다.

Rounding은 다음의 수식을 통해 각각의 위치 i에서 most probable word(armax 이용)를 고르는 것으로 얻어진다.

Denoising 단계에서 x_0이 어떤 단어의 임베딩에 정확히 위치한다면 위와 같은 argmax-rounding이 충분하게 discrete text로 치환되어야 하지만, empirically하게 모델은 single word로 x_0 생성하는 것에 실패하였다. 이 현상에 대한 이유로는 simple objective가 x_0의 구조를 모델링할 때 강조를 충분히 하지 못하기 때문이라고 설명할 수 있다.

☞ simple loss objective를 re-parameterize하여 사용한다.

5. Decoding and Controllable Generaion with Diffusion-LM

진행중

5.1. Controllable Text Generation

5.2. Minimum Bayes Risk Decoding

저작자표시 (새창열림)

'Study > Paper Review' 카테고리의 다른 글

[Audio]Conformer : Convolution-augmented Transformer for Speech Recognition (0)	2024.02.21
[Audio]SpecAugment : A Simple Data Augmentation Method for Automatic Speech Recognition (0)	2024.02.16
[Audio] WaveNet: A Generative Model For Raw Audio, Oord et al.(DeepMind), 2016 (2)	2024.02.08
[논문리뷰] Syntactic Question Abstraction and Retrieval for Data-Scarce Semantic Parsing, Wonseok Hwang et al., 2020 (2)	2023.10.03
[NLP] A Free Format Legal Question Answering System, Khazaeli et al., 2021 (0)	2023.10.03

'Study/Paper Review' Related Articles

OK ROCK

[NLP] Diffusion-LM Improves Controllable Text Generation, Xiang Lisa Li et al., 2022 본문

[NLP] Diffusion-LM Improves Controllable Text Generation, Xiang Lisa Li et al., 2022

Abstract

1. Introduction

3. Problem Statement and Background

3.1. Generative Models and Controllable Generation for Text

3.2. Autoregressive Language Models

3.3. Diffusion Models for Continuous Domains

4. Diffusion-LM : Continuous Diffusion Language Modeling

4.1. End-to-End Training

4.2. Reducing Rounding Errors

5. Decoding and Controllable Generaion with Diffusion-LM

5.1. Controllable Text Generation

5.2. Minimum Bayes Risk Decoding

'Study > Paper Review' 카테고리의 다른 글

티스토리툴바