# 共享預訓練模型

{#if fw === 'pt'}

{:else}

{/if}

在下面的步驟中，我們將看看將預訓練模型分享到 🤗 Hub 的最簡單方法。有可用的工具和實用程序可以讓直接在 Hub 上共享和更新模型變得簡單，我們將在下面進行探討。

我們鼓勵所有訓練模型的用戶通過與社區共享來做出貢獻——共享模型，即使是在非常特定的數據集上進行訓練，也將幫助他人，節省他們的時間和計算資源，並提供對有用的訓練工件的訪問。反過來，您可以從其他人所做的工作中受益！

創建新模型存儲庫的方法有以下三種：

- 使用 push_to_hub API 接口
- 使用 huggingface_hub Python 庫
- 使用 web 界面

創建存儲庫後，您可以通過 git 和 git-lfs 將文件上傳到其中。我們將在以下部分引導您創建模型存儲庫並將文件上傳到它們

## 使用 push_to_hub API

{#if fw === 'pt'}

{:else}

{/if}

將文件上傳到模型中心的最簡單方法是利用 **push_to_hub** API 接口。

在繼續之前，您需要生成一個身份驗證令牌，以便 **huggingface_hub** API 知道您是誰以及您對哪些名稱空間具有寫入權限。確保你在一個環境中 **transformers** 已安裝（見[Setup](/course/chapter0)）。如果您在筆記本中，可以使用以下功能登錄：

```python
from huggingface_hub import notebook_login

notebook_login()
```

在終端中，您可以運行：

```bash
huggingface-cli login
```

在這兩種情況下，系統都會提示您輸入用戶名和密碼，這與您用於登錄 Hub 的用戶名和密碼相同。如果您還沒有 Hub 配置文件，則應該創建一個[here](https://huggingface.co/join)。

好的！您現在已將身份驗證令牌存儲在緩存文件夾中。讓我們創建一些存儲庫！

{#if fw === 'pt'}

如果你玩過 **Trainer** 用於訓練模型的 API，將其上傳到 Hub 的最簡單方法是設置 **push_to_hub=True** 當你定義你的 **TrainingArguments** ：

```py
from transformers import TrainingArguments

training_args = TrainingArguments(
    "bert-finetuned-mrpc", save_strategy="epoch", push_to_hub=True
)
```

你聲明 **trainer.train()** 的時候， 這 **Trainer** 然後每次將您的模型保存到您的命名空間中的存儲庫中時（這裡是每個時代），它將上傳到模型中心。該存儲庫將命名為您選擇的輸出目錄（此處 **bert-finetuned-mrpc** ) 但您可以選擇不同的名稱 **hub_model_id = a_different_name** 。

要將您的模型上傳到您所屬的組織，只需將其傳遞給 **hub_model_id = my_organization/my_repo_name** 。

訓練結束後，你應該做最後的 **trainer.push_to_hub()** 上傳模型的最新版本。它還將生成包含所有相關元數據的模型卡，報告使用的超參數和評估結果！以下是您可能會在此類模型卡中找到的內容示例：

  

{:else}

如果您使用Keras來訓練您的模型，則將其上傳到Hub的最簡單方法是在調用 **model.fit()**  時傳遞**PushToHubCallback**：

```py
from transformers import PushToHubCallback

callback = PushToHubCallback(
    "bert-finetuned-mrpc", save_strategy="epoch", tokenizer=tokenizer
)
```

然後，您應該在對**model.fit()**的調用中添加**callbacks=[callback]**。然後，每次將模型保存在命名空間的存儲庫中（此處為每個 epoch）時，回調都會將模型上傳到 Hub。該存儲庫的名稱將類似於您選擇的輸出目錄（此處為**bert-finetuned-mrpc**），但您可以選擇另一個名稱，名稱為**hub_model_id = a_different_name**。

要將您的模型上傳到您所屬的組織，只需將其傳遞給 **hub_model_id = my_organization/my_repo_name** 。

{/if}

在較低級別，可以通過模型、標記器和配置對象直接訪問模型中心 **push_to_hub()** 方法。此方法負責創建存儲庫並將模型和標記器文件直接推送到存儲庫。與我們將在下面看到的 API 不同，不需要手動處理。

為了瞭解它是如何工作的，讓我們首先初始化一個模型和一個標記器：

{#if fw === 'pt'}

```py
from transformers import AutoModelForMaskedLM, AutoTokenizer

checkpoint = "camembert-base"

model = AutoModelForMaskedLM.from_pretrained(checkpoint)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
```

{:else}

```py
from transformers import TFAutoModelForMaskedLM, AutoTokenizer

checkpoint = "camembert-base"

model = TFAutoModelForMaskedLM.from_pretrained(checkpoint)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
```

{/if}

你可以自由地用這些做任何你想做的事情——向標記器添加標記，訓練模型，微調它。一旦您對生成的模型、權重和標記器感到滿意，您就可以利用 **push_to_hub()** 方法直接在 **model** 中：

```py
model.push_to_hub("dummy-model")
```

這將創建新的存儲庫 **dummy-model** 在您的個人資料中，並用您的模型文件填充它。
對標記器執行相同的操作，以便所有文件現在都可以在此存儲庫中使用：

```py
tokenizer.push_to_hub("dummy-model")
```

如果您屬於一個組織，只需指定 **organization** 上傳到該組織的命名空間的參數：

```py
tokenizer.push_to_hub("dummy-model", organization="huggingface")
```

如果您希望使用特定的 Hugging Face 令牌，您可以自由地將其指定給 **push_to_hub()** 方法也是：

```py
tokenizer.push_to_hub("dummy-model", organization="huggingface", use_auth_token="")
```

現在前往模型中心找到您新上傳的模型：*https://huggingface.co/user-or-organization/dummy-model*。

單擊“文件和版本”選項卡，您應該會在以下屏幕截圖中看到可見的文件：

{#if fw === 'pt'}

{:else}

{/if}

> [!TIP]
> ✏️ **試試看**！獲取與檢查點關聯的模型和標記器，並使用該方法將它們上傳到您的命名空間中的存儲庫。在刪除之前，請仔細檢查該存儲庫是否正確顯示在您的頁面上。

如您所見， **push_to_hub()** 方法接受多個參數，從而可以上傳到特定的存儲庫或組織命名空間，或使用不同的 API 令牌。我們建議您查看直接在[🤗 Transformers documentation](https://huggingface.co/transformers/model_sharing.html)瞭解什麼是可能的

這 **push_to_hub()** 方法由[huggingface_hub](https://github.com/huggingface/huggingface_hub)Python 包，為 Hugging Face Hub 提供直接 API。它集成在 🤗 Transformers 和其他幾個機器學習庫中，例如[allenlp](https://github.com/allenai/allennlp).雖然我們在本章中專注於 🤗 Transformers 集成，但將其集成到您自己的代碼或庫中很簡單。

跳到最後一部分，瞭解如何將文件上傳到新創建的存儲庫！

## 使用 huggingface_hub python庫

這 **huggingface_hub** Python 庫是一個包，它為模型和數據集中心提供了一組工具。它為常見任務提供了簡單的方法和類，例如
獲取有關模型中心上存儲庫的信息並對其進行管理。它提供了在 git 之上工作的簡單 API 來管理這些存儲庫的內容並集成 Hub
在您的項目和庫中。

類似於使用 **push_to_hub** API，這將要求您將 API 令牌保存在緩存中。為此，您需要使用 **login** 來自 CLI 的命令，如上一節所述（同樣，確保在這些命令前面加上 **!** 字符（如果在 Google Colab 中運行）：

```bash
huggingface-cli login
```

這 **huggingface_hub** 包提供了幾種對我們有用的方法和類。首先，有幾種方法可以管理存儲庫的創建、刪除等：

```python no-format
from huggingface_hub import (
    # User management
    login,
    logout,
    whoami,

    # Repository creation and management
    create_repo,
    delete_repo,
    update_repo_visibility,

    # And some methods to retrieve/change information about the content
    list_models,
    list_datasets,
    list_metrics,
    list_repo_files,
    upload_file,
    delete_file,
)
```

此外，它還提供了非常強大的 **Repository** 用於管理本地存儲庫的類。我們將在接下來的幾節中探討這些方法和該類，以瞭解如何利用它們。

這 **create_repo** 方法可用於在模型中心上創建新存儲庫：

```py
from huggingface_hub import create_repo

create_repo("dummy-model")
```

這將創建存儲庫 **dummy-model** 在您的命名空間中。如果願意，您可以使用 **organization** 爭論：

```py
from huggingface_hub import create_repo

create_repo("dummy-model", organization="huggingface")
```

這將創建 **dummy-model** 存儲庫中的 **huggingface** 命名空間，假設您屬於該組織。
其他可能有用的參數是：

- private 以指定存儲庫是否應對其他人可見。
- token 如果您想用給定的令牌覆蓋存儲在緩存中的令牌。
-  repo_type 如果你想創建一個或一個替代一個的而不是模型。接受的值和 datasetspace "dataset""space"。

創建存儲庫後，我們應該向其中添加文件！跳到下一部分以查看可以處理此問題的三種方法。

## 使用網絡界面

Web 界面提供了直接在 Hub 中管理存儲庫的工具。使用該界面，您可以輕鬆創建存儲庫、添加文件（甚至是大文件！）、探索模型、可視化差異等等。

要創建新的存儲庫，請訪問[huggingface.co/new](https://huggingface.co/new)：

首先，指定存儲庫的所有者：這可以是您或您所屬的任何組織。如果您選擇一個組織，該模型將出現在該組織的頁面上，並且該組織的每個成員都可以為存儲庫做出貢獻。

接下來，輸入您的模型名稱。這也將是存儲庫的名稱。最後，您可以指定您的模型是公開的還是私有的。私人模特要求您擁有付費 Hugging Face 帳戶，並允許您將模特隱藏在公眾視野之外。

創建模型存儲庫後，您應該看到如下頁面：

這是您的模型將被託管的地方。要開始填充它，您可以直接從 Web 界面添加 README 文件。

README 文件在 Markdown 中 - 隨意使用它！本章的第三部分致力於構建模型卡。這些對於為您的模型帶來價值至關重要，因為它們是您告訴其他人它可以做什麼的地方。

如果您查看“文件和版本”選項卡，您會發現那裡還沒有很多文件——只有自述文件你剛剛創建和.git 屬性跟蹤大文件的文件。

接下來我們將看看如何添加一些新文件。

## 上傳模型文件

Hugging Face Hub 上的文件管理系統基於用於常規文件的 git 和 git-lfs（代表[Git Large File Storage](https://git-lfs.github.com/)) 對於較大的文件。

在下一節中，我們將介紹將文件上傳到 Hub 的三種不同方式：通過 **huggingface_hub** 並通過 git 命令。

### The `upload_file` approach

使用 **upload_file** 不需要在您的系統上安裝 git 和 git-lfs。它使用 HTTP POST 請求將文件直接推送到 🤗 Hub。這種方法的一個限制是它不能處理大於 5GB 的文件。
如果您的文件大於 5GB，請按照下面詳述的另外兩種方法進行操作。API 可以按如下方式使用：

```py
from huggingface_hub import upload_file

upload_file(
    "/config.json",
    path_in_repo="config.json",
    repo_id="/dummy-model",
)
```

這將上傳文件 **config.json** 可在 **path_to_file** 到存儲庫的根目錄 **config.json** , 到 **dummy-model** 存儲庫。
其他可能有用的參數是：

- token，如果要通過給定的令牌覆蓋緩存中存儲的令牌。
- repo_type, 如果你想要上傳一個 `dataset` 或一個 `space` 而不是模型。 接受的值為 `"dataset"` 和 `"space"`.

### The `Repository` class

以類似 git 的方式管理本地存儲庫。它抽象了 git 可能遇到的大部分痛點，以提供我們需要的所有功能。

使用這個類需要安裝 git 和 git-lfs，所以確保你已經安裝了 git-lfs（參見[here](https://git-lfs.github.com/)安裝說明）並在開始之前進行設置。

為了開始使用我們剛剛創建的存儲庫，我們可以通過克隆遠程存儲庫將其初始化到本地文件夾開始：

```py
from huggingface_hub import Repository

repo = Repository("", clone_from="/dummy-model")
```

這創建了文件夾 **path_to_dummy_folder** 在我們的工作目錄中。該文件夾僅包含 **.gitattributes** 文件，因為這是通過實例化存儲庫時創建的唯一文件 **create_repo**。

從現在開始，我們可以利用幾種傳統的 git 方法：

```py
repo.git_pull()
repo.git_add()
repo.git_commit()
repo.git_push()
repo.git_tag()
```

另外！我們建議您查看 **Repository** 可用文件[here](https://github.com/huggingface/huggingface_hub/tree/main/src/huggingface_hub#advanced-programmatic-repository-management)有關所有可用方法的概述。

目前，我們有一個模型和一個標記器，我們希望將其推送到模型中心。我們已經成功克隆了存儲庫，因此我們可以將文件保存在該存儲庫中。

我們首先通過拉取最新更改來確保我們的本地克隆是最新的：

```py
repo.git_pull()
```

完成後，我們保存模型和標記器文件：

```py
model.save_pretrained("")
tokenizer.save_pretrained("")
```

這 **path_to_dummy_folder** 現在包含所有模型和標記器文件。我們遵循通常的 git 工作流程，將文件添加到暫存區，提交它們並將它們推送到模型中心：

```py
repo.git_add()
repo.git_commit("Add model and tokenizer files")
repo.git_push()
```

恭喜！您剛剛將第一個文件推送到hub上。

### The git-based approach

這是上傳文件的非常簡單的方法：我們將直接使用 git 和 git-lfs 來完成。大多數困難都被以前的方法抽象掉了，但是下面的方法有一些警告，所以我們將遵循一個更復雜的用例。

使用這個類需要安裝 git 和 git-lfs，所以請確保你有[git-lfs](https://git-lfs.github.com/)安裝（請參閱此處瞭解安裝說明）並在開始之前進行設置。

首先從初始化 git-lfs 開始：

```bash
git lfs install
```

```bash
Updated git hooks.
Git LFS initialized.
```

完成後，第一步是克隆您的模型存儲庫：

```bash
git clone https://huggingface.co//
```

我的用戶名是 **lysandre** 我使用了模型名稱 **dummy** ，所以對我來說，命令最終如下所示：

```
git clone https://huggingface.co/lysandre/dummy
```

我現在有一個名為的文件夾假在我的工作目錄中。我能 **cd** 進入文件夾並查看內容：

```bash
cd dummy && ls
```

```bash
README.md
```

如果您剛剛使用 Hugging Face Hub 創建了您的存儲庫 **create_repo** 方法，這個文件夾應該只包含一個隱藏的 **.gitattributes** 文件。如果您按照上一節中的說明使用 Web 界面創建存儲庫，則該文件夾應包含一個自述文件文件旁邊的隱藏 **.gitattributes** 文件，如圖所示。

添加一個常規大小的文件，例如配置文件、詞彙文件，或者基本上任何幾兆字節以下的文件，就像在任何基於 git 的系統中所做的一樣。但是，更大的文件必須通過 git-lfs 註冊才能將它們推送到Hugging Face。

讓我們回到 Python 來生成我們想要提交到我們的虛擬存儲庫的模型和標記器：

{#if fw === 'pt'}
```py
from transformers import AutoModelForMaskedLM, AutoTokenizer

checkpoint = "camembert-base"

model = AutoModelForMaskedLM.from_pretrained(checkpoint)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

# Do whatever with the model, train it, fine-tune it...

model.save_pretrained("")
tokenizer.save_pretrained("")
```
{:else}
```py
from transformers import TFAutoModelForMaskedLM, AutoTokenizer

checkpoint = "camembert-base"

model = TFAutoModelForMaskedLM.from_pretrained(checkpoint)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

# Do whatever with the model, train it, fine-tune it...

model.save_pretrained("")
tokenizer.save_pretrained("")
```
{/if}

現在我們已經保存了一些模型和標記器工件，讓我們再看看假文件夾：

```bash
ls
```

{#if fw === 'pt'}
```bash
config.json  pytorch_model.bin  README.md  sentencepiece.bpe.model  special_tokens_map.json tokenizer_config.json  tokenizer.json
```

If you look at the file sizes (for example, with `ls -lh`), you should see that the model state dict file (*pytorch_model.bin*) is the only outlier, at more than 400 MB.

{:else}
```bash
config.json  README.md  sentencepiece.bpe.model  special_tokens_map.json  tf_model.h5  tokenizer_config.json  tokenizer.json
```

如果您查看文件大小（例如， **ls -lh** )，您應該會看到模型狀態 dict 文件 (pytorch_model.bin) 是唯一的異常值，超過 400 MB。

{/if}

> [!TIP]
> ✏️ 從 web 界面創建存儲庫時，*.gitattributes* 文件會自動設置為將具有某些擴展名的文件，例如 *.bin* 和 *.h5* 視為大文件，git-lfs 會對其進行跟蹤您無需進行必要的設置。 

我們現在可以繼續進行，就像我們通常使用傳統 Git 存儲庫一樣。我們可以使用以下命令將所有文件添加到 Git 的暫存環境中 **git add** 命令：

```bash
git add .
```

然後我們可以查看當前暫存的文件：

```bash
git status
```

{#if fw === 'pt'}
```bash
On branch main
Your branch is up to date with 'origin/main'.

Changes to be committed:
  (use "git restore --staged ..." to unstage)
  modified:   .gitattributes
	new file:   config.json
	new file:   pytorch_model.bin
	new file:   sentencepiece.bpe.model
	new file:   special_tokens_map.json
	new file:   tokenizer.json
	new file:   tokenizer_config.json
```
{:else}
```bash
On branch main
Your branch is up to date with 'origin/main'.

Changes to be committed:
  (use "git restore --staged ..." to unstage)
  modified:   .gitattributes
  	new file:   config.json
	new file:   sentencepiece.bpe.model
	new file:   special_tokens_map.json
	new file:   tf_model.h5
	new file:   tokenizer.json
	new file:   tokenizer_config.json
```
{/if}

同樣，我們可以確保 git-lfs 使用其跟蹤正確的文件 **status** 命令：

```bash
git lfs status
```

{#if fw === 'pt'}
```bash
On branch main
Objects to be pushed to origin/main:

Objects to be committed:

	config.json (Git: bc20ff2)
	pytorch_model.bin (LFS: 35686c2)
	sentencepiece.bpe.model (LFS: 988bc5a)
	special_tokens_map.json (Git: cb23931)
	tokenizer.json (Git: 851ff3e)
	tokenizer_config.json (Git: f0f7783)

Objects not staged for commit:

```

我們可以看到所有文件都有 **Git** 作為處理程序，除了其中有 **LFS**的*pytorch_model.bin* 和 *sentencepiece.bpe.model*。

{:else}
```bash
On branch main
Objects to be pushed to origin/main:

Objects to be committed:

	config.json (Git: bc20ff2)
	sentencepiece.bpe.model (LFS: 988bc5a)
	special_tokens_map.json (Git: cb23931)
	tf_model.h5 (LFS: 86fce29)
	tokenizer.json (Git: 851ff3e)
	tokenizer_config.json (Git: f0f7783)

Objects not staged for commit:

```

我們可以看到所有文件都有 **Git** 作為處理程序，除了其中有 **LFS**的*t5_model.h5*。

{/if}

讓我們繼續最後的步驟，提交併推送到Hugging Face遠程倉庫：

```bash
git commit -m "First model version"
```

{#if fw === 'pt'}
```bash
[main b08aab1] First model version
 7 files changed, 29027 insertions(+)
  6 files changed, 36 insertions(+)
 create mode 100644 config.json
 create mode 100644 pytorch_model.bin
 create mode 100644 sentencepiece.bpe.model
 create mode 100644 special_tokens_map.json
 create mode 100644 tokenizer.json
 create mode 100644 tokenizer_config.json
```
{:else}
```bash
[main b08aab1] First model version
 6 files changed, 36 insertions(+)
 create mode 100644 config.json
 create mode 100644 sentencepiece.bpe.model
 create mode 100644 special_tokens_map.json
 create mode 100644 tf_model.h5
 create mode 100644 tokenizer.json
 create mode 100644 tokenizer_config.json
```
{/if}

推送可能需要一些時間，具體取決於您的互聯網連接速度和文件大小：

```bash
git push
```

```bash
Uploading LFS objects: 100% (1/1), 433 MB | 1.3 MB/s, done.
Enumerating objects: 11, done.
Counting objects: 100% (11/11), done.
Delta compression using up to 12 threads
Compressing objects: 100% (9/9), done.
Writing objects: 100% (9/9), 288.27 KiB | 6.27 MiB/s, done.
Total 9 (delta 1), reused 0 (delta 0), pack-reused 0
To https://huggingface.co/lysandre/dummy
   891b41d..b08aab1  main -> main
```

{#if fw === 'pt'}
If we take a look at the model repository when this is finished, we can see all the recently added files:

UI 允許您瀏覽模型文件和提交，並查看每個提交引入的差異：

{:else}

如果我們在完成後查看模型存儲庫，我們可以看到所有最近添加的文件：

UI 允許您瀏覽模型文件和提交，並查看每個提交引入的差異：

{/if}