MA in Linguistics with a specialization in character-level language modeling.
Highlights
- Pro
Pinned Loading
-
corpus_toolkit
corpus_toolkit PublicPython toolkit for corpus analysis: tokenization, lexical diversity, vocabulary growth prediction, entropy measures, and Zipf/Heaps visualizations.
Python 5
-
-
shannon
shannon PublicThis project uses KenLM to analyze language entropy and redundancy in English and Linear B.
Python
-
writing_direction
writing_direction PublicThis script predicts language directionality (LTR or RTL) using Gini and entropy calculations on character distributions from Europarl and UDHR corpora.
Python
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.