ToxiCN Family

Chinese Toxic Language Benchmarks — text & multimodal
Maintained by Junyu Lu · Dalian University of Technology
ToxiCN
ACL 2023
Facilitating Fine-grained Detection of Chinese Toxic Language: Hierarchical Taxonomy, Resources, and Benchmarks
ToxiCN_MM
NeurIPS 2024
Towards Comprehensive Detection of Chinese Harmful Memes

§ 1About the Project

Online toxic language causes tangible harm to individuals and communities, and reliable detection underpins responsible content moderation as well as the safety alignment of language models. Progress in Chinese, however, has long been bottlenecked by the lack of large-scale, fine-grained, openly available resources — particularly for indirect phenomena (homophones, abbreviated slurs, sarcasm, dog-whistle references) and for the multimodal wild west of harmful memes.

The ToxiCN Family is our ongoing effort to close this gap along two complementary axes:

(i) Text — ToxiCN (ACL 2023): a hierarchical taxonomy (toxic / hate / targeted group / expression form) paired with a manually curated Chinese corpus covering both direct and indirect toxicity, and a knowledge-enhanced baseline (TKE).

(ii) Multimodal — ToxiCN_MM (NeurIPS 2024): a 12K image–text meme dataset annotated for harmful types and modality combinations, together with a Multimodal Knowledge-Enhancement Detector designed for Chinese cultural context.

Both resources are intended as reproducible reference points rather than final solutions, and have since been adopted by a growing line of follow-up work, summarised in the Cited By section below.

§ 2ToxiCN — ACL 2023

PaperFacilitating Fine-grained Detection of Chinese Toxic Language: Hierarchical Taxonomy, Resources, and Benchmarks

Junyu Lu, Bo Xu, Xiaokun Zhang, Changrong Min, Liang Yang, Hongfei Lin

Abstract

The widespread dissemination of toxic online posts is increasingly damaging to society. However, research on detecting toxic language in Chinese has lagged significantly due to limited monolingual resources. In this work, we constructed ToxiCN, a comprehensive Chinese dataset that includes both direct and indirect toxic samples, annotated based on a fine-grained hierarchical taxonomy. Furthermore, we propose a benchmark model — Toxic Knowledge Enhancement (TKE) — that incorporates an insult lexicon to enhance toxic language detection. Extensive experiments demonstrate the dataset's quality and the strength of TKE.

Hierarchical Taxonomy

  1. Level 1 — Toxic vs. Non-toxic — Binary judgement on whether a post is toxic in any form.
  2. Level 2 — Hate vs. General Offensive — Among toxic posts, separate group-targeted hate speech from general offensive language (insults without a protected-group target).
  3. Level 3 — Targeted Group — For hate speech, the protected group attacked: gender, race, region, LGBTQ, others.
  4. Level 4 — Expression Form — Direct expression vs. indirect expression (cloaked: homophones, abbreviations, irony, dog-whistles).

Dataset

ToxiCN is a manually annotated Chinese corpus of online posts spanning multiple platforms (e.g. Zhihu, Tieba). Each post is labelled under a four-level hierarchical scheme so that downstream models can be trained or evaluated at any granularity, from binary toxic detection to fine-grained targeted-group / expression-form classification.

Total samples12,011 online posts
Toxic samples6,461 (≈53.8%) — hate or general offence
Annotation axes4-level hierarchical taxonomy
Target categoriesgender · race · region · LGBTQ · others
LanguageSimplified Chinese
Licensereleased for academic research

Insult Lexicon

Beyond direct slurs harvested from prior lexicons, ToxiCN systematically derives an extended Chinese insult lexicon by tracing how online users disguise toxic intent. We catalogue six recurring derivation patterns and treat each derived form as a first-class lexicon entry, so downstream models can resolve them back to their actual referents:

Derivation Patterns

  1. Homophonic substitution — Replace one or more characters with phonetically identical / similar ones to bypass keyword filters (e.g. 默 ≈ 黑·犬).
  2. Compositional decomposition — Split a target character into its sub-components, then re-assemble in surface text (e.g. 仙女 = 小·仙·女).
  3. Cross-lingual abbreviation — Use Pinyin initials, English / mixed-script abbreviations to encode a slur (e.g. txl → 同性恋).
  4. Hybrid Chinese–English splicing — Fuse a Chinese morpheme with an English fragment to reconstruct a slur (e.g. ni + ger → ni哥).
  5. Historical / cultural allusion — Re-purpose a region or historical name to demean a group (e.g. 南满 → 南蛮).
  6. Conspiracy / meme reference — Invoke a memeified narrative as a coded slur (e.g. Kalergi → racial conspiracy meme).

Worked Examples

TermLiteral MeaningCompositionActual MeaningCategory
默(mò)silence黑(hēi) 犬(quǎn) → black dogn*ggerracial
南(nán) 满(mǎn)South Manchu南满 → 南蛮(mán)southern barbariansregional
蠢驴silly donkeyfoolish peoplegeneral
txltxltxl → 同(tóng) 性(xìng) 恋(liàn)gaylgbtq
ni 哥(gē)ni brotherni + ger → n*ggern*ggerracial
小(xiǎo) 仙(xiān) 女(nǚ)fairyshrewsexual
凯(kǎi) 勒(lè) 奇(qí)KalergiKalergi Planracial

Sensitive English referents are masked (e.g. n*gger) for public display; full forms are kept inside the released lexicon files.

§ 3ToxiCN_MM — NeurIPS 2024

PaperTowards Comprehensive Detection of Chinese Harmful Memes

Junyu Lu, Bo Xu, Xiaokun Zhang, Hongbo Wang, Haohao Zhu, Dongyu Zhang, Liang Yang, Hongfei Lin

Abstract

We introduce the definition of Chinese harmful memes — multimodal units consisting of an image and Chinese inline text that have the potential to cause harm to an individual, an organisation, a community, a social group, or society as a whole. These memes range from overt offence to subtle stereotypes, often reflecting and reinforcing underlying negative values on the Chinese Internet. To support research on detecting them, we construct ToxiCN MM, a 12,000-sample dataset annotated along two axes — harmful types (targeted harmful, general offence, sexual innuendo, dispirited culture) and modality combination (text–image fusion, harmful text only, harmful image only, both). We further propose a Multimodal Knowledge Enhancement Detector that incorporates contextual information of meme content — generated by an LLM — to better understand Chinese memes.

Harmful-Type Taxonomy

  1. Targeted Harmful — Memes that attack a specific individual, group, or social category — the most common harmful type on Chinese platforms.
  2. General Offence — Insults, profanity, or aggressive content not directed at a particular protected group.
  3. Sexual Innuendo — Implicit or explicit sexual content delivered through visual metaphor, character substitution, or suggestive composition.
  4. Dispirited Culture — Memes propagating nihilism, self-harm framing, or anti-aspirational narratives that erode social well-being.

Dataset

ToxiCN_MM is a Chinese harmful-meme dataset of 12,000 image–text pairs collected from public online sources. Memes are annotated for both harmful type and modality combination, enabling fine-grained study of where toxicity actually arises (text-only, image-only, or only after fusion). Version 2.0 (Dec. 2024) re-annotates <1% ambiguous samples and additionally releases the specific attacked targets for targeted harmful memes.

Total samples12,000 image–text meme pairs
Harmful samplesannotated along harmful type and modality
Annotation axesharmful type × modality combination
Target categoriestargeted harmful · general offence · sexual innuendo · dispirited culture
LanguageSimplified Chinese
LicenseCC BY-NC-ND 4.0 (academic only)

§ 4Ethics Statement

The ToxiCN family is released solely to support research on Chinese toxic-language and harmful-meme detection, content moderation, and the safety alignment of language and vision-language models. The resources must not be used to generate, amplify, or weaponise toxic content, nor to surveil or profile individuals. All samples were collected from publicly accessible platforms with personally identifiable information removed; annotators were briefed about the disturbing nature of the material, paid fairly, and could withdraw at any time.

Because the corpora inevitably reflect the user bases and moderation policies of their sources, models trained on them should not be deployed as the sole arbiter of moderation decisions. The opinions and findings contained in the samples should not be interpreted as representing the views of the authors. We acknowledge the risk of malicious actors attempting to reverse-engineer cloaked slurs or memes, and sincerely hope users will employ the datasets responsibly. Any content that infringes copyright or other intellectual-property rights will be removed upon request.

The datasets and example outputs contain offensive and potentially traumatising language and imagery — reader discretion is advised.

§ 5Cited By  — curated, grouped by direction

Below we highlight a selection of representative follow-up works that build on the ToxiCN family, grouped by research direction. We are grateful to the many researchers whose attention and follow-up work have given the ToxiCN family its continued life.

Span-level Toxicity Detection

Token-/span-level localisation of the toxic component within a longer post, beyond document-level binary judgement.

The list is curated by the authors and grouped by research direction. Last update: 2026-05-29.

§ 6Resources

§ 7BibTeX

ToxiCN ACL 2023

@inproceedings{lu-etal-2023-facilitating,
    title = "Facilitating Fine-grained Detection of {C}hinese Toxic Language: Hierarchical Taxonomy, Resources, and Benchmarks",
    author = "Lu, Junyu  and  Xu, Bo  and  Zhang, Xiaokun  and  Min, Changrong  and  Yang, Liang  and  Lin, Hongfei",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.898",
    pages = "16235--16250"
}

ToxiCN_MM NeurIPS 2024

@inproceedings{lu2024towards,
    title     = {Towards Comprehensive Detection of Chinese Harmful Memes},
    author    = {Junyu Lu and Bo Xu and Xiaokun Zhang and Hongbo Wang and Haohao Zhu and Dongyu Zhang and Liang Yang and Hongfei Lin},
    booktitle = {The Thirty-eighth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
    year      = {2024},
    url       = {https://openreview.net/forum?id=PSDXcYjrkO}
}