Step-by-Step Guide to Running Llama LLMs with Hugging Face and Python Locally

There are numerous ways to run large language models (LLMs) such as DeepSeek, Claude, or Meta's Llama locally on your laptop. Options like Ollama and Modular's Max platform provide convenient ways to deploy these models, but if you want full control over your LLM experience, integrating Python and Hugging Face APIs is the best approach.

How to Run Llama in a Python App

To run a large language model like Llama locally within a Python application, follow these steps:

Set up a Python environment with PyTorch, Hugging Face, and Transformers dependencies.
Find the LLM on Hugging Face and identify the required files.
Download the model files programmatically from the Hugging Face repository.
Create a processing pipeline to load and interact with the model.
Query the pipeline to generate text and interact with the LLM.

Install Transformers, PyTorch, and Hugging Face Dependencies

First, create a virtual Python environment and install the necessary dependencies with the following commands:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install huggingface_hub
pip install transformers

Find the LLM on Hugging Face

The required files to run your LLM locally can be found on the model's Hugging Face homepage. You will need to specify the model repository ID and the necessary file names. These files include:

vocab.json
merges.txt
config.json
model.safetensors
special_tokens_map.json
tokenizer.json
tokenizer_config.json

Implement Llama in Python

Once the dependencies are installed and the required files are identified, use the following Python code to download and run the LLM:

from huggingface_hub import snapshot_download
from transformers import AutoTokenizer, AutoModelForCausalLM, TextGenerationPipeline

API_KEY = 'hf_XYZABCDEFGHIJKLMNOPQRSTUV'

model_id = "Llama-2-7B-chat-hf"  # Replace with the actual model repo ID

snapshot_download(repo_id=model_id, token=API_KEY)

model = AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)

text_pipeline = TextGenerationPipeline(model=model, tokenizer=tokenizer)

output = text_pipeline("Provide a summary of AI's impact on healthcare.")
print(output)

Interacting with Llama in Python

The example above prompts Llama for a summary of AI's impact on healthcare. The model responds with generated text, formatted as JSON:

[{ "generated_text": "AI has significantly transformed healthcare by improving diagnostics, personalizing treatment, and enhancing administrative efficiencies."}]

This guide demonstrates how easy it is to run LLMs locally and integrate them into your Python code. By following these steps, you can harness the power of Llama models while maintaining full control over your deployment environment.

Author	JEE Ganesh
Published	11 months ago
Category:	Artificial Intelligence
HashTags	#Java #Python #Programming #AI #ArtificialIntelligence

MyExamCloud Blog