|
- How to do Tokenizer Batch processing? - HuggingFace
in the Tokenizer documentation from huggingface, the call fuction accepts List[List[str]] and says: text (str, List[str], List[List[str]], optional) — The sequence or batch of sequences to be encoded Each sequence can be a string or a list of strings (pretokenized string)
- huggingface hub - ImportError: cannot import name cached_download . . .
Thanks for contributing an answer to Stack Overflow! Please be sure to answer the question Provide details and share your research!
- How to change huggingface transformers default cache directory?
@juanchito Maybe you were thinking of something different, but creating an empty directory on a different filesystem with more capacity and then making a symlink from ~ cache huggingface to that directory does work - at least until you need to clear the cache for some reason and forgot it was a symlink ;-) Setting HF_HOME is a bit cleaner, though, and works equally well on all platforms
- How to download a model from huggingface? - Stack Overflow
from huggingface_hub import snapshot_download snapshot_download(repo_id="bert-base-uncased") These tools make model downloads from the Hugging Face Model Hub quick and easy For more information and advanced usage, you can refer to the official Hugging Face documentation: huggingface-cli Documentation snapshot_download Documentation
- Load a pre-trained model from disk with Huggingface Transformers . . .
I went to this site here which shows the directory tree for the specific huggingface model I wanted I happened to want the uncased model, but these steps should be similar for your cased version Also note that my link is to a very specific commit of this model, just for the sake of reproducibility - there will very likely be a more up-to-date
- How to load huggingface model resource from local disk?
I wanted to load huggingface model resource from local disk from sentence_transformers import SentenceTransformer # initialize sentence transformer model # How to load 'bert-base-nli-mean-tokens' from local disk? model = SentenceTransformer('bert-base-nli-mean-tokens') # create sentence embeddings sentence_embeddings = model encode(sentences)
- Huggingface: How do I find the max length of a model?
Now I even remember that I noticed this in the past (around the huggingface 2 * version) but forgot about it But I would assume that this maximum sequence length information is always stored in the config json just using different keys as it is integral for the model setup
- How to add new tokens to an existing Huggingface tokenizer?
Thanks for this very comprehensive response Two comments : 1 for two examples above "Extending existing AutoTokenizer with new bpe-tokenized tokens" and "Direct Answer to OP", you did not resize embeddings, is that an oblivion or is it intended ? 2 After the embeddings have been resized, am I right that the model + tokenizer thus made needs to be fine-tuned because the new embeddings have
|
|
|