ToxiCN Family
§ 1About the Project
Online toxic language causes tangible harm to individuals and communities, and reliable detection underpins responsible content moderation as well as the safety alignment of language models. Progress in Chinese, however, has long been bottlenecked by the lack of large-scale, fine-grained, openly available resources — particularly for indirect phenomena (homophones, abbreviated slurs, sarcasm, dog-whistle references) and for the multimodal wild west of harmful memes.
The ToxiCN Family is our ongoing effort to close this gap along two complementary axes:
(i) Text — ToxiCN (ACL 2023): a hierarchical taxonomy (toxic / hate / targeted group / expression form) paired with a manually curated Chinese corpus covering both direct and indirect toxicity, and a knowledge-enhanced baseline (TKE).
(ii) Multimodal — ToxiCN_MM (NeurIPS 2024): a 12K image–text meme dataset annotated for harmful types and modality combinations, together with a Multimodal Knowledge-Enhancement Detector designed for Chinese cultural context.
Both resources are intended as reproducible reference points rather than final solutions, and have since been adopted by a growing line of follow-up work, summarised in the Cited By section below.
§ 2ToxiCN — ACL 2023
PaperFacilitating Fine-grained Detection of Chinese Toxic Language: Hierarchical Taxonomy, Resources, and Benchmarks
Abstract
Hierarchical Taxonomy
- Level 1 — Toxic vs. Non-toxic — Binary judgement on whether a post is toxic in any form.
- Level 2 — Hate vs. General Offensive — Among toxic posts, separate group-targeted hate speech from general offensive language (insults without a protected-group target).
- Level 3 — Targeted Group — For hate speech, the protected group attacked: gender, race, region, LGBTQ, others.
- Level 4 — Expression Form — Direct expression vs. indirect expression (cloaked: homophones, abbreviations, irony, dog-whistles).
Dataset
ToxiCN is a manually annotated Chinese corpus of online posts spanning multiple platforms (e.g. Zhihu, Tieba). Each post is labelled under a four-level hierarchical scheme so that downstream models can be trained or evaluated at any granularity, from binary toxic detection to fine-grained targeted-group / expression-form classification.
| Total samples | 12,011 online posts |
| Toxic samples | 6,461 (≈53.8%) — hate or general offence |
| Annotation axes | 4-level hierarchical taxonomy |
| Target categories | gender · race · region · LGBTQ · others |
| Language | Simplified Chinese |
| License | released for academic research |
Insult Lexicon
Beyond direct slurs harvested from prior lexicons, ToxiCN systematically derives an extended Chinese insult lexicon by tracing how online users disguise toxic intent. We catalogue six recurring derivation patterns and treat each derived form as a first-class lexicon entry, so downstream models can resolve them back to their actual referents:
Derivation Patterns
- Homophonic substitution — Replace one or more characters with phonetically identical / similar ones to bypass keyword filters (e.g. 默 ≈ 黑·犬).
- Compositional decomposition — Split a target character into its sub-components, then re-assemble in surface text (e.g. 仙女 = 小·仙·女).
- Cross-lingual abbreviation — Use Pinyin initials, English / mixed-script abbreviations to encode a slur (e.g.
txl→ 同性恋). - Hybrid Chinese–English splicing — Fuse a Chinese morpheme with an English fragment to reconstruct a slur (e.g. ni + ger → ni哥).
- Historical / cultural allusion — Re-purpose a region or historical name to demean a group (e.g. 南满 → 南蛮).
- Conspiracy / meme reference — Invoke a memeified narrative as a coded slur (e.g. Kalergi → racial conspiracy meme).
Worked Examples
| Term | Literal Meaning | Composition | Actual Meaning | Category |
|---|---|---|---|---|
| 默(mò) | silence | 黑(hēi) 犬(quǎn) → black dog | n*gger | racial |
| 南(nán) 满(mǎn) | South Manchu | 南满 → 南蛮(mán) | southern barbarians | regional |
| 蠢驴 | silly donkey | — | foolish people | general |
| txl | txl | txl → 同(tóng) 性(xìng) 恋(liàn) | gay | lgbtq |
| ni 哥(gē) | ni brother | ni + ger → n*gger | n*gger | racial |
| 小(xiǎo) 仙(xiān) 女(nǚ) | fairy | — | shrew | sexual |
| 凯(kǎi) 勒(lè) 奇(qí) | Kalergi | — | Kalergi Plan | racial |
Sensitive English referents are masked (e.g. n*gger) for public display; full forms are kept inside the released lexicon files.
§ 3ToxiCN_MM — NeurIPS 2024
PaperTowards Comprehensive Detection of Chinese Harmful Memes
Abstract
Harmful-Type Taxonomy
- Targeted Harmful — Memes that attack a specific individual, group, or social category — the most common harmful type on Chinese platforms.
- General Offence — Insults, profanity, or aggressive content not directed at a particular protected group.
- Sexual Innuendo — Implicit or explicit sexual content delivered through visual metaphor, character substitution, or suggestive composition.
- Dispirited Culture — Memes propagating nihilism, self-harm framing, or anti-aspirational narratives that erode social well-being.
Dataset
ToxiCN_MM is a Chinese harmful-meme dataset of 12,000 image–text pairs collected from public online sources. Memes are annotated for both harmful type and modality combination, enabling fine-grained study of where toxicity actually arises (text-only, image-only, or only after fusion). Version 2.0 (Dec. 2024) re-annotates <1% ambiguous samples and additionally releases the specific attacked targets for targeted harmful memes.
| Total samples | 12,000 image–text meme pairs |
| Harmful samples | annotated along harmful type and modality |
| Annotation axes | harmful type × modality combination |
| Target categories | targeted harmful · general offence · sexual innuendo · dispirited culture |
| Language | Simplified Chinese |
| License | CC BY-NC-ND 4.0 (academic only) |
§ 4Ethics Statement
The ToxiCN family is released solely to support research on Chinese toxic-language and harmful-meme detection, content moderation, and the safety alignment of language and vision-language models. The resources must not be used to generate, amplify, or weaponise toxic content, nor to surveil or profile individuals. All samples were collected from publicly accessible platforms with personally identifiable information removed; annotators were briefed about the disturbing nature of the material, paid fairly, and could withdraw at any time.
Because the corpora inevitably reflect the user bases and moderation policies of their sources, models trained on them should not be deployed as the sole arbiter of moderation decisions. The opinions and findings contained in the samples should not be interpreted as representing the views of the authors. We acknowledge the risk of malicious actors attempting to reverse-engineer cloaked slurs or memes, and sincerely hope users will employ the datasets responsibly. Any content that infringes copyright or other intellectual-property rights will be removed upon request.
The datasets and example outputs contain offensive and potentially traumatising language and imagery — reader discretion is advised.
§ 5Cited By — curated, grouped by direction
Traditional Toxicity Detection
- Unified Game Moderation: Soft-Prompting and LLM-Assisted Label Transfer for Resource-Efficient Toxicity Detection
- Implanting LLM's Knowledge via Reading Comprehension Tree for Toxicity Detection
- Giving Control Back to Models: Enabling Offensive Language Detection Models to Autonomously Identify and Mitigate Biases
Cloaked / Perturbed Toxicity Detection
- MMBERT: Scaled Mixture-of-Experts Multimodal BERT for Robust Chinese Hate Speech Detection under Cloaking Perturbations
- Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and Findings
- Enhancing Chinese Offensive Language Detection with Homophonic Perturbation
- Lost in Pronunciation: Detecting Chinese Offensive Language Disguised by Phonetic Cloaking Replacement
- ToxiCloakCN: Evaluating Robustness of Offensive Language Detection in Chinese with Cloaking Perturbations
Span-level Toxicity Detection
Detoxification
- Redefining Experts: Interpretable Decomposition of Language Models for Toxicity Mitigation
- Overview of the Multilingual Text Detoxification Task at PAN 2025
- Multilingual and Explainable Text Detoxification with Parallel Corpora
- Chinese Toxic Language Mitigation via Sentiment Polarity Consistent Rewrites
The list is curated by the authors and grouped by research direction.
Last update: 2026-05-29.
§ 6Resources
§ 7BibTeX
ToxiCN ACL 2023
@inproceedings{lu-etal-2023-facilitating,
title = "Facilitating Fine-grained Detection of {C}hinese Toxic Language: Hierarchical Taxonomy, Resources, and Benchmarks",
author = "Lu, Junyu and Xu, Bo and Zhang, Xiaokun and Min, Changrong and Yang, Liang and Lin, Hongfei",
booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.acl-long.898",
pages = "16235--16250"
}
ToxiCN_MM NeurIPS 2024
@inproceedings{lu2024towards,
title = {Towards Comprehensive Detection of Chinese Harmful Memes},
author = {Junyu Lu and Bo Xu and Xiaokun Zhang and Hongbo Wang and Haohao Zhu and Dongyu Zhang and Liang Yang and Hongfei Lin},
booktitle = {The Thirty-eighth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year = {2024},
url = {https://openreview.net/forum?id=PSDXcYjrkO}
}