How to Upload Models to Hugging Face's Model Distribution Network With Happy Transformer
This article will discuss how to upload a Transformer model created with Happy Transformer to Hugging Face's model distribution network. Hugging Face allows anyone to upload Transformer models to the cloud, which can either then be kept in private or shared with the public. So, it is possible to train a Transformer model with just a few lines of code using my very own Happy Transformer package and then upload the trained model to the network. After the model has been uploaded to the network, you may download and use it as before either with Hugging Face's Transformers library or Happy Transformer.
This process described in this article applies to all Happy Transformer models. So whether you're performing text classification, question answering, or text generation, the same steps as described in this tutorial apply.
Example Of Training a Model
First off, we'll briefly discuss how to train a Transformer model with Happy Transformer to have an example model to work with. We'll train a model to perform grammar correction with a single training case. Of course, this model does not have true grammar correction abilities and is just intended for demonstration purposes. I've already uploaded a fully trained grammar correction model that you can access here and read about how to replicate within this article.
First off, we'll pip install Happy Transformer.
pip install happytransformer
The code below downloads and instantiates a T5 model, and the fine-tunes it with a single case. Please read this article which fully explains the process
from happytransformer import HappyTextToText
import csv
happy_tt = HappyTextToText("T5", "t5-base")
input_text = "grammar: this sentences has bad grammar."
expected_ouput_text = "this sentence has bad grammar."
with open("train.csv", 'w', newline='') as csvfile:
writter = csv.writer(csvfile)
writter.writerow(["input", "target"])
input= ""
writter.writerow([input_text, expected_ouput_text])
happy_tt.train("train.csv")
Upload
Let's now upload the model to Hugging Face's network! Some of the commands going forward are intended to be ran within a terminal. If you're using Google Colab, then you can add an exclamation mark before the code within a cell to run it as terminal code. I'll be adding exclamation marks to commands meant for the terminal for clarity.
Connect to Hugging Face
First off, we're going to pip install a package called huggingface_hub that will allow us to communicate with Hugging Face's model distribution network
!pip install huggingface_hub
From here, we can login with our Hugging Face credentials. You can create an account here if you do not already have one.
!huggingface-cli login
You'll now be prompted within the terminal to provide your login credentials.
Install Git Large File Storage
From here we need to install git-lfs which allows us to use git ย for large files. In this case, since we're working with Transformer models, so we clearly need to use git-lfs as the file size limit for standard git is 10mb.
!sudo apt-get install git-lfs
Create Git Repository ย
Now, we must create a git repository on Hugging Face's network. Provide what you want to name the model to the model-name parameter. In this case, I'll be using the name "t5-example-upload."
!huggingface-cli repo create model-name
If you're creating under an organization, like I am, then you can add a flag called organization as shown below.
!huggingface-cli repo create t5-example-upload --organization vennify
From here, we can clone the git repository directly into our environment.
!git clone https://huggingface.co/vennify/t5-example-upload
In my case, this will create a folder called "t5-example-upload" which we can now visit by using a change directory "cd" command. If you're within Colab, then adding an exclamation mark before this command will cause it to act differently.
Initialize Git Large File Storage
cd t5-example-upload
We are now within our git repository. So, from here, we can initialize git lfs with the following command.
!git lfs install
Let's head back a level before we move onto saving the model
cd ..
Save the Model
From here we can save the model (including its tokenizer) with one line of code by calling happy_tt's save method. Now, this method is the same for all "Happy objects" such as objects for the HappyTextClassification and HappyGeneration classes. We'll save the model and its files within our folder containing our git repository called "t5-example-upload/".
happy_tt.save("t5-example-upload/")
Commit Files
Let's go back into our git repository.
cd t5-example-upload
Assuming you've done everything correctly, your output should look something as follows after calling "git status".
!git status
Result:
config.json
pytorch_model.bin
special_tokens_map.json
spiece.model
tokenizer.json
tokenizer_config.json
nothing added to commit but untracked files present (use "git add" to track)
Let's add all of these files to our next commit.
!git add .
When using Colab, I had to add my email address to git config before preceding. This can be accomplished easily with the following command.
!git config --global user.email "vennify@example.com"
Finally, we can commit our changes!
!git commit -m "Initial commit"
Push
If you're using your own device, you may be able to get way with a simple "git push" command as shown below.
git push
However, if you're using Google Colab (or possibly a similar service), you'll have to add your credentials to the git push command. So, below is the command you can use, but of course, sub in your own username and token. You can find your access token on this webpage.
Below is an example of the final company you'll use to upload your model. If you created the repository under a company, then replace the second username parameter with the company's name.
!git push https://username:token/username/model-name
Here's an example for the model we're uploading. Of course, I will not use my include my actual token, and we'll just assume my token is "api_123".
!git push https://ericfillon:api_123/vennify/t5-example-upload
And that's it! We just uploaded a model to Hugging Face's model distribution network.
Download
We can now download and use the model like any other that's publicly available on the network. .
happy_t5_example = HappyTextToText("T5", "vennify/t5-example-upload")
And from here we can call the generate_text() method to produce text.
Conclusion
And there we go, we just uploaded a model to Hugging Face's model distribution network. Now, I hope you use what you learned to publish models to the network to help advanced the NLP community. By uploading a model, you'll be potentially saving 100s of NLP developers the work of fine-tuning a model and also prevent energy consumption as you'll be decreasing the need to fine-tune models.
JOIN THE HAPPY TRANSFORMER DISCORD COMMUNITY AND SHARE ANY MODELS YOU PUBLISH. PEOPLE WILL LOVE TO LEARN ABOUT THEM.
Check out this article for an idea on a good first model to upload.
Resources
Support Happy Transformer by giving it a star ๐๐๐
Subscribe to my YouTube Channel for an upcoming video on grammar correction.