Aug 18, 2021 5 min read NLP

How to Upload Models to Hugging Face's Model Distribution Network With Happy Transformer

Learn how to share your Transformer models with the world

This article will discuss how to upload a Transformer model created with Happy Transformer to Hugging Face's model distribution network. Hugging Face allows anyone to upload Transformer models to the cloud, which can either then be kept in private or shared with the public. So, it is possible to train a Transformer model with just a few lines of code using my very own Happy Transformer package and then upload the trained model to the network. After the model has been uploaded to the network, you may download and use it as before either with Hugging Face's Transformers library or Happy Transformer.

This process described in this article applies to all Happy Transformer models. So whether you're performing text classification, question answering, or text generation, the same steps as described in this tutorial apply.

Example Of Training a Model

First off, we'll briefly discuss how to train a Transformer model with Happy Transformer to have an example model to work with. We'll train a model to perform grammar correction with a single training case. Of course, this model does not have true grammar correction abilities and is just intended for demonstration purposes. I've already uploaded a fully trained grammar correction model that you can access here and read about how to replicate within this article.

First off, we'll pip install Happy Transformer.

pip install happytransformer

The code below downloads and instantiates a T5 model, and the fine-tunes it with a single case. Please read this article which fully explains the process

from happytransformer import HappyTextToText
import csv

happy_tt = HappyTextToText("T5", "t5-base")
input_text = "grammar: this sentences has bad grammar."
expected_ouput_text = "this sentence has bad grammar."

with open("train.csv", 'w', newline='') as csvfile:
	writter = csv.writer(csvfile)
	writter.writerow(["input", "target"])
	input= ""
	writter.writerow([input_text, expected_ouput_text])

happy_tt.train("train.csv")

Upload

Let's now upload the model to Hugging Face's network! Some of the commands going forward are intended to be ran within a terminal. If you're using Google Colab, then you can add an exclamation mark before the code within a cell to run it as terminal code. I'll be adding exclamation marks to commands meant for the terminal for clarity.

Connect to Hugging Face

First off, we're going to pip install a package called huggingface_hub that will allow us to communicate with Hugging Face's model distribution network

!pip install huggingface_hub

From here, we can login with our Hugging Face credentials. You can create an account here if you do not already have one.

!huggingface-cli login

You'll now be prompted within the terminal to provide your login credentials.

Install Git Large File Storage

From here we need to install git-lfs which allows us to use git for large files. In this case, since we're working with Transformer models, so we clearly need to use git-lfs as the file size limit for standard git is 10mb.

!sudo apt-get install git-lfs

Create Git Repository

Now, we must create a git repository on Hugging Face's network. Provide what you want to name the model to the model-name parameter. In this case, I'll be using the name "t5-example-upload."

!huggingface-cli repo create model-name

If you're creating under an organization, like I am, then you can add a flag called organization as shown below.

!huggingface-cli repo create t5-example-upload --organization vennify

From here, we can clone the git repository directly into our environment.

!git clone https://huggingface.co/vennify/t5-example-upload

In my case, this will create a folder called "t5-example-upload" which we can now visit by using a change directory "cd" command. If you're within Colab, then adding an exclamation mark before this command will cause it to act differently.

Initialize Git Large File Storage

cd t5-example-upload

We are now within our git repository. So, from here, we can initialize git lfs with the following command.

!git lfs install

Let's head back a level before we move onto saving the model

cd ..

Save the Model

From here we can save the model (including its tokenizer) with one line of code by calling happy_tt's save method. Now, this method is the same for all "Happy objects" such as objects for the HappyTextClassification and HappyGeneration classes. We'll save the model and its files within our folder containing our git repository called "t5-example-upload/".

happy_tt.save("t5-example-upload/")

Commit Files

Let's go back into our git repository.

cd t5-example-upload

Assuming you've done everything correctly, your output should look something as follows after calling "git status".

!git status

Result:

config.json

pytorch_model.bin

special_tokens_map.json

spiece.model

tokenizer.json

tokenizer_config.json

nothing added to commit but untracked files present (use "git add" to track)

Let's add all of these files to our next commit.

!git add .

When using Colab, I had to add my email address to git config before preceding. This can be accomplished easily with the following command.

!git config --global user.email "vennify@example.com"

Finally, we can commit our changes!

!git commit -m "Initial commit"

Push

If you're using your own device, you may be able to get way with a simple "git push" command as shown below.

git push

However, if you're using Google Colab (or possibly a similar service), you'll have to add your credentials to the git push command. So, below is the command you can use, but of course, sub in your own username and token. You can find your access token on this webpage.

Below is an example of the final company you'll use to upload your model. If you created the repository under a company, then replace the second username parameter with the company's name.

!git push https://username:token/username/model-name

Here's an example for the model we're uploading. Of course, I will not use my include my actual token, and we'll just assume my token is "api_123".

!git push https://ericfillon:api_123/vennify/t5-example-upload

And that's it! We just uploaded a model to Hugging Face's model distribution network.

Download

We can now download and use the model like any other that's publicly available on the network. .

happy_t5_example = HappyTextToText("T5", "vennify/t5-example-upload")

And from here we can call the generate_text() method to produce text.

Conclusion

And there we go, we just uploaded a model to Hugging Face's model distribution network. Now, I hope you use what you learned to publish models to the network to help advanced the NLP community. By uploading a model, you'll be potentially saving 100s of NLP developers the work of fine-tuning a model and also prevent energy consumption as you'll be decreasing the need to fine-tune models.

JOIN THE HAPPY TRANSFORMER DISCORD COMMUNITY AND SHARE ANY MODELS YOU PUBLISH. PEOPLE WILL LOVE TO LEARN ABOUT THEM.

Check out this article for an idea on a good first model to upload.