【華】臺灣網路巨量語料庫(TWWAC)
This action may take several minutes for large corpora, please wait.

【華】臺灣網路巨量語料庫(TWWAC)

TWWaC

Counts
Tokens1448066042
Words1131530841
General info
Corpus description Document
LanguageChinese
EncodingUTF-8
Compiled10/27/2018 23:17:05
Tagset Description
Lexicon sizes
word6845099
tag47
Tags legend
adjectiveA|VH
adverbD.*
conjunctionC.*
determinerNe.*
nounNa|Nb|Nc|Ncd|Nd|Nf|Nh|Nv
prepositionP
pronounNhaa|Nhab|Nhb|Nhc
verbV.*
Lempos suffixes
adjective-j
adverb-a
conjunction-c
noun-n
preposition-p
pronoun-d
verb-v

Structures and attributes

Subcorpora statistics

Subcorpus Tokens Words %
study 1018633 ~ 795968 0.0703443745282
野狗 6327196 ~ 4944123 0.436941121225