Llama结构解析

模型打印

已Llama-7B hugging face版本为例:

1
2
3
4
5
6
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
device = torch.device("cuda:{}".format(gpu))
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code = True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code = True).half().to(device)
print(model)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
LlamaForCausalLM(
(model): LlamaModel(
(embed_tokens): Embedding(32000, 4096)
(layers): ModuleList(
(0-31): 32 x LlamaDecoderLayer(
(self_attn): LlamaFlashAttention2(
(q_proj): Linear(in_features=4096, out_features=4096, bias=False)
(k_proj): Linear(in_features=4096, out_features=4096, bias=False)
(v_proj): Linear(in_features=4096, out_features=4096, bias=False)
(o_proj): Linear(in_features=4096, out_features=4096, bias=False)
(rotary_emb): LlamaRotaryEmbedding()
)
(mlp): LlamaMLP(
(gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
(up_proj): Linear(in_features=4096, out_features=11008, bias=False)
(down_proj): Linear(in_features=11008, out_features=4096, bias=False)
(act_fn): SiLUActivation()
)
(input_layernorm): LlamaRMSNorm()
(post_attention_layernorm): LlamaRMSNorm()
)
)
(norm): LlamaRMSNorm()
)
(lm_head): Linear(in_features=4096, out_features=32000, bias=False)
)

从结构可以看出来,模型参数量为 32,0004,096+32(4,0964,0964+4,09611,0083)+4,09632,000=6,738,149,37632,000 * 4,096 + 32 * (4,096 * 4,096 * 4 + 4,096 * 11,008 * 3) + 4,096 * 32,000 = 6,738,149,376。所以约为7B7B

模型图解

以输入为 10 个 token 为例:
1