Unsloth: The Two-Person Library Behind Most Local LLM Fine-Tuning

Subscribe to the Newsletter

Join 10k+ people to get notified about new posts, news and tips.

Do not worry we don't spam!

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Log in

Have no account yet? Sign up

Create an account

Already have an account? Log in

Reset password

Remember your password? Log in

Terms of use

SingularityByte.com values the privacy of our users. Therefore, this privacy policy explains in detail how we use and protect the information we collect when you visit our website.. Read this privacy policy completely. Please refrain from visiting the site if the terms outlined below are not satisfactory to you. We reserve the right to change this policy at any time and will list these changes in the updates section of the policy. By reading this notice and visiting the site, you agree that you understand that customers will not be personally notified when this policy changes. Therefore, we advise our customers to frequently review our privacy policy so that they remain aware of its updates. By using the site, you accept that the posted policy and all its changes apply to your interaction with SingularityByte.com.

Information Collected by SingularityByte.com

Personal information may be collected by this site in many ways. This information includes:

Personal identifying information like your name, address, email, phone number, age, gender, and other personal data
Server data related to the IP address you used to visit our website, which includes your address, browser, OS, access time, and site activity.
Financial information related to your orders including your payment method and identifying payment information. We rarely store financial information collected on our site for transaction purposes. That information gets sent directly to our payment processor.
Social network data including Facebook permissions and user information from other networks, provided you log onto our site using one of these media sites.
Mobile device information such as your device ID, model, and location, if you use our site by accessing trough our website.

How We Use This Information

Our website uses information collected to:
• Manage your account information
• Customize ads
• Deliver promotions
• Email your account confirmation
• Manage purchases and payments
• Increase site efficiency
• Notify you of updates
• Offer new products
• Monitor and prevent theft
• Request your customer feedback
• Resolve account disputes
• Respond to your service requests

Information Disclosure

Normally, your information stays on our site. However, below we have listed the situations that may
require us to share the information we collect from you:
• When required by law, such as for fraud protection
• With our third-party providers for payment processing and hosting
• With your consent for marketing purposes
• When you post comments on the site
• To our advertisers, affiliates, and partners
• If this site goes bankrupt and data must be transferred

Cookies, Trackers, and Online Ads

We may use cookies, trackers, web beacons, and other technology to customize our website to improve your experience. We may customize the site using this information. These trackers do not have access to your personal information and can be removed from your browser options. In addition, third-party software provides ads for our site for marketing campaigns. These programs have access to tracking technology to optimize your ad experience. For more information about these
ads, visit [link to the privacy policies of affiliate advertisers]. Website analytics such as through Google Analytics may also be used to track users
and remarket our website. We do not give these vendors access to your personal information.

Other Sites

Our website may contain links to third-party websites in the form of policies, ads, and other non-affiliated links. Once you leave our site, we are no longer responsible for how your information is collected and disclosed. Please refer to the privacy policies of those third-party sites for more information.

Information Security

We take technical and administrative precautions to protect your data, but we cannot guarantee its safety against all types of fraud or misuse. If you provide personal information, we cannot verify its total security against all types of interception.

Do-Not-Track

Some browsers offer Do-Not-Track settings to prevent any information from being distributed. Since these settings have not been legally established as standard practice, we do acknowledge these settings.

Additional Options

At any time, you may opt to review or change your account settings, including contact information. If you wish to delete your account, you may do so to remove most of your information, however, some identifying information will be retained to prevent fraud.
You may also opt-out of emails and other correspondences from our site at any time.

Microsoft Clarity

We partner with Microsoft Clarity and Microsoft Advertising to capture how you use and interact with our website through behavioral metrics, heatmaps, and session replay to improve and market our products/services. Website usage data is captured using first and third-party cookies and other tracking technologies to determine the popularity of products/services and online activity. Additionally, we use this information for site optimization, fraud/security purposes, and advertising. For more information about how Microsoft collects and uses your data, visit the Microsoft Privacy Statement.

Contact Us

If you have questions or concerns about this privacy policy, please feel free to contact us at: desk@SingularityByte.com

Do you agree to our terms? Sign up

License Apache 2.0

TL;DR

2x faster fine-tuning and 30-70% less VRAM via hand-tuned Triton kernels. 7B in 5 GB QLoRA, 70B in 41 GB QLoRA on a single A6000.
Apache 2.0 core, supports Llama 4/3.x, Qwen 3.6, Gemma 4, Mistral, Phi 4, DeepSeek-V3/R1, GLM. Exports to GGUF, vLLM, Ollama, safetensors.
Built and maintained by Daniel and Michael Han, two brothers from Sydney. 64k+ stars on GitHub, ~2k new stars per month.

☍ Announcement ⬇ Download Model

System Requirements

RAM	16GB
GPU	RTX 3090 / 4090 / A6000
VRAM	5GB (7B QLoRA)

Table of Contents

If you run fine-tuning on a single GPU, you have almost certainly used Unsloth and you probably did not realize it is a two-person operation. Daniel and Michael Han ship the library that quietly turned LLM fine-tuning from "rent eight H100s for a week" into "borrow an RTX 4090 for an afternoon," then went home and did it again the next day. 64,000+ GitHub stars, two thousand more every month, full Apache-2.0 core. If the local-LLM scene has unpaid infrastructure on the inference side (the GGUF quant-masters), Unsloth is the equivalent on the training side.

What Unsloth actually does

Unsloth is a drop-in replacement for HuggingFace Transformers + PEFT specifically for the fine-tuning step. The headline claim from the official benchmarks is roughly 2x faster training and 30 to 70 percent less VRAM, with the speedups jumping further on short-sequence datasets and Qwen3-class models. The savings come from hand-tuned Triton kernels, smarter memory layout for LoRA adapters, and a packing implementation that beats Flash Attention 3 on most realistic training mixes.

Concrete numbers from the docs: a 7B model needs about 5 GB of VRAM for QLoRA, 19 GB for full LoRA. A 70B model fits in 41 GB of VRAM with QLoRA, which means a single A6000 can fine-tune Llama 3.3 70B. None of that is achievable with a stock HuggingFace pipeline on the same hardware.

Install and first run

Install

pip install unsloth

Python 3.9 or newer, PyTorch 2.0 or newer, and a CUDA 7.0 or newer GPU. That covers everything from a V100 through Blackwell. The current release also lights up on RTX 3090, 4090, and 5090, plus the standard datacenter parts. Apple Silicon is in progress; MLX training is marked "coming soon" in the requirements page.

Fine-tune Llama 3.1 8B in a few lines

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Meta-Llama-3.1-8B",
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)

model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj","k_proj","v_proj","o_proj",
                      "gate_proj","up_proj","down_proj"],
    lora_alpha = 16,
    use_gradient_checkpointing = "unsloth",
)

From there you point a SFTTrainer from TRL at your dataset and let it run. The library ships notebooks for every supported model on the model catalog page, so the cold-start path is "click the notebook, swap your dataset path, run."

Model support

The catalog is wide. The major families currently maintained: Llama 4, 3.3, 3.2, 3.1; Mistral 3.2, 3.1, Ministral; Gemma 4, 3n, 2; Qwen 3.6, 3.5, 3, 2.5; Phi 4; DeepSeek-V3 and R1; GLM. Vision-language models, audio models, and Whisper-class TTS fine-tuning are also supported via the vision and multimodal collection.

Output formats matter as much as input support. Unsloth ships exporters for 4-bit, 8-bit, and 16-bit safetensors, GGUF for llama.cpp and Ollama, and the vLLM format for production serving. The path from "I fine-tuned a model" to "I am serving it on my own hardware" is one CLI command, not an afternoon of conversion scripts.

Beyond plain SFT

Reward modeling and preference optimization are first-class. The RLHF docs cover DPO, ORPO, KTO, and GRPO with example notebooks. Recent releases added vision fine-tuning and audio model training, and the catalog now includes speech models like Sesame and Orpheus. The pace is unusual for a project this size: notable feature drops land monthly, and minor releases land weekly.

The two-person team

Unsloth is built by Daniel Han and his brother Michael, brothers from Sydney who went through Y Combinator's summer 2024 batch. Daniel previously worked on optimized t-SNE and SVD at NVIDIA, which is exactly the right background for the hand-written kernel work that drives Unsloth's speedups. The pair posts kernel-level deep dives on the Unsloth blog and Twitter, and most of the framework's heavyweight performance jumps have been written up there before they ship.

Who uses Unsloth in the wild

The fingerprint shows up across the open-weights ecosystem. Hugging Face used Unsloth for SmolLM3-3B, and the documentation pages call out direct collaborations with Qwen, Mistral, NVIDIA, and Microsoft. Beyond the named users, scanning HuggingFace model cards for "fine-tuned with Unsloth" returns thousands of community results across every base family the library supports.

Language-specific fine-tuning is now trivial

One under-rated use case is language-specific instruction tuning on a small base. If you want a Polish-speaking Llama 3.2 3B that performs better on Polish dialogue than the multilingual base, Unsloth is the path of least resistance: the Qwen3.5 documentation covers 201 languages and Gemma 4 covers 140, so the underlying models already speak Polish (or Czech, or Slovenian, or any of the 200-language tail). Adding a few thousand Polish instruction-output pairs is a single notebook away. The catch most teams hit is dataset quality, not training infrastructure.

Limitations and gotchas

Unsloth is still a single-GPU framework at heart. Multi-GPU support exists but is less battle-tested than the single-GPU path. The "2x speedup" headline depends on baseline; against a naive HuggingFace pipeline the speedup is sometimes much larger, and against a hand-tuned Flash Attention 3 pipeline it's smaller. Numbers in the published methodology are honest about that. And the GGUF and vLLM export paths assume llama.cpp or vLLM in the latest minor; older runtime versions occasionally choke on tokenizer additions.

Who should care

If you fine-tune on anything smaller than an 8-GPU H100 node, you should be using Unsloth. The library is free, the model coverage is comprehensive, the export path lands on every common inference runtime, and the maintainer cadence is healthy. The cost of trying it is one pip install and a sample notebook. Pick a base model, grab a thousand examples, and have your first fine-tune running in the time it takes to read this article.

Sources and further reading

Numbers community-reported from Unsloth's published methodology and HuggingFace model card metadata. Not independently re-run. Compiled 2026-05-19.