This action may take several minutes for large corpora, please wait.
【華】臺灣網路巨量語料庫(TWWAC)
TWWaC
Counts |
Tokens | 1448066042 |
Words | 1131530841 |
General info |
Corpus description |
Document |
Language | Chinese |
Encoding | UTF-8 |
Compiled | 10/27/2018 23:17:05 |
Tagset |
Description |
Lexicon sizes |
word | 6845099 |
tag | 47 |
Tags legend |
adjective | A|VH |
adverb | D.* |
conjunction | C.* |
determiner | Ne.* |
noun | Na|Nb|Nc|Ncd|Nd|Nf|Nh|Nv |
preposition | P |
pronoun | Nhaa|Nhab|Nhb|Nhc |
verb | V.* |
Lempos suffixes |
adjective | -j |
adverb | -a |
conjunction | -c |
noun | -n |
preposition | -p |
pronoun | -d |
verb | -v |
Structures and attributes
Subcorpora statistics
Subcorpus |
Tokens |
Words |
% |
study |
1018633 |
~ 795968 |
0.0703443745282 |
野狗 |
6327196 |
~ 4944123 |
0.436941121225 |