Newsletter image

Subscribe to the Newsletter

Join 10k+ people to get notified about new posts, news and tips.

Do not worry we don't spam!

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Search

GDPR Compliance

We use cookies to ensure you get the best experience on our website. By continuing to use our site, you accept our use of cookies, Privacy Policy, and Terms of Service.

Tencent - Translation

HY-MT1.5

HY-MT1.5 is Tencent open-source translation family with 33 languages and 5 dialects, from 7B server down to a 440MB GGUF that runs on phones.

Tencent quietly opened a translation race nobody on the open-source side was equipped to enter. The HY-MT1.5 family covers everything from a 7B server model down to a 440MB GGUF that runs on a phone, and the small ones beat most commercial APIs at Chinese-foreign translation. If you build agents, RAG pipelines, or apps that touch more than one language, this family deserves a closer look this week.

What HY-MT1.5 actually is

HY-MT1.5 is Tencent Hunyuan's open-source machine-translation family. The base release on 2025-12-30 shipped two checkpoints, HY-MT1.5-7B and HY-MT1.5-1.8B, both trained through a four-stage pipeline of MT-oriented pre-training, supervised fine-tuning, on-policy distillation, and reinforcement learning. Both cover 33 languages and 5 ethnic or regional variants, which works out to 1,056 supported translation directions.

On 2026-04-29 Tencent extended the lineup downward with two extreme quantizations of the 1.8B model: a 2-bit GGUF at 574MB and a 1.25-bit GGUF at 440MB. Those two are the on-device variants that run in roughly 1GB of RAM, and they ship with an Android demo APK so you can try them without writing a line of code.

The full quantization family at a glance

Here is every public variant we are aware of, with the file size you actually download and the deployment niche each one targets.

Variant Format Size Approx. RAM Best for
HY-MT1.5-7B BF16 weights 14.2GB 16GB+ GPU Server-side, highest ceiling
HY-MT1.5-7B-FP8 FP8 weights 7.5GB 10GB GPU Cost-tuned cloud serving
HY-MT1.5-7B-GPTQ-Int4 Int4 weights 4.8GB 6GB GPU Single-GPU laptop inference
HY-MT1.5-1.8B BF16 weights 3.3GB 6GB GPU or 8GB RAM Default edge-server pick
HY-MT1.5-1.8B-FP8 FP8 weights 1.9GB 4GB GPU Compact cloud worker
HY-MT1.5-1.8B-GGUF Q8_0 GGUF 8-bit 1.91GB 3GB llama.cpp/Ollama desktops
HY-MT1.5-1.8B-GGUF Q6_K GGUF 6-bit 1.47GB 2GB Quality-balanced local inference
HY-MT1.5-1.8B-GGUF Q4_K_M GGUF 4-bit 1.13GB 2GB Mid-range laptops, Raspberry Pi 5
Hy-MT1.5-1.8B-2bit SEQ 2-bit GGUF 574MB ~1GB Mid-tier phones, Apple Silicon
Hy-MT1.5-1.8B-1.25bit SEQ 1.25-bit GGUF 440MB ~1GB Low-RAM phones, embedded

Everything published lives under the tencent organization on Hugging Face, with mirrored quantizations under the AngelSlim org for the SEQ variants.

Why the 2-bit and 1.25-bit variants are interesting

Most teams quantize by clamping a normal distribution and accepting some quality loss. Tencent's AngelSlim group went a different route called Stretched Elastic Quantization, or SEQ. Weights are projected onto the four-value codebook {-1.5, -0.5, 0.5, 1.5}, which sounds aggressive because it is. The trick is pairing it with quantization-aware distillation so the smaller student matches the BF16 teacher token by token during the squeeze. The published claim is "near-lossless translation quality" against the BF16 baseline.

Two consequences matter for builders:

  • The 1.25-bit variant fits inside a 1GB RAM budget, which is the floor for many Android handsets sold in emerging markets.
  • The same SEQ recipe maps cleanly to Arm SME2 instructions, so flagship Apple Silicon and vivo x300 hardware get a real speed bump rather than a generic Q-format fallback.

Tencent reports an average 0.18 second response for Chinese inputs around 50 tokens on the quantized 1.8B. That is well inside the latency budget for live chat translation, AR captions, or call overlays.

Hands-on: pick a variant in five minutes

The GGUF builds run in any current llama.cpp or Ollama setup. The chat template is custom, so use the Tencent-published prompt rather than the generic instruct format.

Run it under llama.cpp

llama-cli -hf tencent/HY-MT1.5-1.8B-GGUF:Q8_0 \
  -p "Translate the following segment into Chinese, without additional explanation.\n\nIt's on the house." \
  -n 4096 --temp 0.7 --top-k 20 --top-p 0.6 --repeat-penalty 1.05 --no-warmup

Run it under Ollama

Ollama needs the right TEMPLATE because the model uses Tencent's custom delimiters. Save this Modelfile and run it once.

FROM hf.co/tencent/HY-MT1.5-1.8B-GGUF:Q8_0
TEMPLATE """<|hy_begin_of_sentence|>{{ if .System }}{{ .System }}<|hy_place_holder_no_3|>{{ end }}{{ if .Prompt }}<|hy_User|>{{ .Prompt }}{{ end }}<|hy_Assistant|>"""
ollama create hy-mt1.5 -f Modelfile
ollama run hy-mt1.5 "Translate the following segment into German, without additional explanation.\n\nLong-tail keywords are the secret sauce of niche SEO."

Run the 2-bit on a phone

Tencent ships a prebuilt Android APK that wraps the 1.25-bit GGUF, so you do not need a native llama.cpp build to demo it. Sideload the APK from the AngelSlim release page, point the picker at the downloaded model file, and pick source and target languages. The tested handsets are a Snapdragon 865 with 8GB RAM and a Snapdragon 7+ Gen 2 with 16GB RAM, both delivered sub-second responses for short sentences.

{
  "top_k": 20,
  "top_p": 0.6,
  "repetition_penalty": 1.05,
  "temperature": 0.7
}

Prompt patterns Tencent supports out of the box

The training corpus includes four prompt shapes. Use the one that matches your task or the model will sometimes echo the source text.

  • Plain translation: Translate the following segment into {target_language}, without additional explanation.
  • Terminology-locked: include a glossary line such as Reference: "RAG" should be translated as "检索增强生成" before the source text.
  • Document-context: prepend the surrounding paragraph and tell the model not to translate the context, only the highlighted segment.
  • Format-preserving: wrap the source in <source> tags with inline <sn> markers; the model returns a <target> block with the same structural tags in place.

The format-preserving mode is the one to remember if you are localizing UI strings or HTML fragments. It removes the post-processing step where most home-grown MT pipelines lose tags.

Benchmarks: how it stacks up

The headline number from the technical report is that HY-MT1.5-1.8B reaches roughly the 90th percentile of Gemini-3.0-Pro on the Flores-200 Chinese-foreign benchmark, which is the closed reference Tencent uses for translation quality. Against open-source competitors and commercial APIs, the same model ranks above Tower-Plus-72B, Qwen3-32B, Microsoft Translator, and Doubao Translator on the Chinese-foreign split.

Two qualifiers matter:

  • The Flores-200 split they highlight is Chinese-foreign, where Tencent has the most training signal. Expect a smaller gap on Latin script pairs.
  • The lineage matters. The previous Hunyuan-MT generation had already outperformed Google Translate in 30 of 31 evaluated language pairs at WMT25. HY-MT1.5 widens the lead while shrinking the model.

If you want the exact tables, the HY-MT1.5 technical report on arXiv is the right primary source.

The 33 languages and 5 dialects

Officially supported languages cover the major commercial corridors plus several South Asian and Southeast Asian scripts that most open MT models still drop:

Chinese, English, French, Portuguese, Spanish, Japanese, Turkish, Russian, Arabic, Korean, Thai, Italian, German, Vietnamese, Malay, Indonesian, Filipino, Hindi, Polish, Czech, Dutch, Khmer, Burmese, Persian, Gujarati, Urdu, Telugu, Marathi, Hebrew, Bengali, Tamil, Ukrainian.

The five dialect or regional variants extend coverage into Traditional Chinese, Cantonese, Tibetan, Kazakh, Mongolian, and Uyghur. Cantonese in particular is rare in production MT and useful for Hong Kong, Macau, and southern Chinese audiences.

Limitations and gotchas

  • Custom chat template. The Hugging Face GGUF page documents this, but the default Ollama Modelfile produces garbage output ("onse" loops) until you replace the TEMPLATE with the snippet above.
  • Not a general LLM. The model is fine-tuned hard for translation. It will refuse or mangle reasoning prompts. Pair it with a small instruct model in your pipeline if you need both.
  • Latin pairs are good but not category-leading. Strong against commercial APIs on Chinese-foreign, more competitive than dominant on English-French or English-German. Run your own A/B against your traffic mix before swapping a vendor.
  • SEQ quantization needs Arm SME2 or a recent llama.cpp. Older mobile chips will run the model but lose the latency claim.

Who should use which variant

  • Cloud SaaS replacing a paid translation API: HY-MT1.5-7B FP8 on a single H100 or RTX 6000 Ada gives you the quality ceiling without the cost.
  • Self-hosted on a workstation or homelab: HY-MT1.5-1.8B GGUF Q6_K hits the sweet spot of quality and 2GB footprint. Runs comfortably on Apple Silicon and on a 12GB RTX 3060.
  • On-device app or embedded device: Hy-MT1.5-1.8B-2bit at 574MB is the default. Drop to the 1.25-bit if you must fit under 500MB.
  • Building an agent that calls translation as a tool: the 1.8B BF16 served via vLLM keeps latency tight and concurrency high.

What to do in the next ten minutes

  1. Pull the GGUF that fits your hardware: ollama run hf.co/tencent/HY-MT1.5-1.8B-GGUF:Q4_K_M for laptops, or grab the 2-bit GGUF for phones.
  2. Drop the custom Modelfile template above into place so output stops returning "onse" loops.
  3. Run the format-preserving prompt against one of your real localization strings and compare to your current pipeline.

If the quality gap is large enough, you have a credible path to retiring a commercial translation line item this quarter.

Tested on: vendor-reported benchmarks (Snapdragon 865 8GB, Snapdragon 7+ Gen 2 16GB, Apple M4, vivo x300) and Tencent's Flores-200 Chinese-foreign evaluation. Variants surveyed: HY-MT1.5-7B BF16/FP8/Int4, HY-MT1.5-1.8B BF16/FP8, GGUF Q4_K_M/Q6_K/Q8_0, SEQ 2-bit and 1.25-bit.
Date tested: 2026-05-02

Prev Article
DeepSeek-V4
Next Article
OpenThinker-32B

Related to this topic: