Step-by-Step Guide to Running Llama LLMs with Hugging Face and Python Locally
There are numerous ways to run large language models (LLMs) such as DeepSeek, Claude, or Meta's Llama locally on your laptop. Options like Ollama and Modular's Max platform provide convenient ways to deploy these models, but if you want full control over your LLM experience, integrating Python and Hugging Face APIs is the best approach.
How to Run Llama in a Python App
To run a large language model like Llama locally within a Python application, follow these steps:
- Set up a Python environment with PyTorch, Hugging Face, and Transformers dependencies.
- Find the LLM on Hugging Face and identify the required files.
- Download the model files programmatically from the Hugging Face repository.
- Create a processing pipeline to load and interact with the model.
- Query the pipeline to generate text and interact with the LLM.
Install Transformers, PyTorch, and Hugging Face Dependencies
First, create a virtual Python environment and install the necessary dependencies with the following commands:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install huggingface_hub
pip install transformers
Find the LLM on Hugging Face
The required files to run your LLM locally can be found on the model's Hugging Face homepage. You will need to specify the model repository ID and the necessary file names. These files include:
vocab.jsonmerges.txtconfig.jsonmodel.safetensorsspecial_tokens_map.jsontokenizer.jsontokenizer_config.json
Implement Llama in Python
Once the dependencies are installed and the required files are identified, use the following Python code to download and run the LLM:
from huggingface_hub import snapshot_download
from transformers import AutoTokenizer, AutoModelForCausalLM, TextGenerationPipeline
API_KEY = 'hf_XYZABCDEFGHIJKLMNOPQRSTUV'
model_id = "Llama-2-7B-chat-hf" # Replace with the actual model repo ID
snapshot_download(repo_id=model_id, token=API_KEY)
model = AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
text_pipeline = TextGenerationPipeline(model=model, tokenizer=tokenizer)
output = text_pipeline("Provide a summary of AI's impact on healthcare.")
print(output)
Interacting with Llama in Python
The example above prompts Llama for a summary of AI's impact on healthcare. The model responds with generated text, formatted as JSON:
[{ "generated_text": "AI has significantly transformed healthcare by improving diagnostics, personalizing treatment, and enhancing administrative efficiencies."}]
This guide demonstrates how easy it is to run LLMs locally and integrate them into your Python code. By following these steps, you can harness the power of Llama models while maintaining full control over your deployment environment.
| Author | JEE Ganesh | |
| Published | 11 months ago | |
| Category: | Artificial Intelligence | |
| HashTags | #Java #Python #Programming #AI #ArtificialIntelligence |

