We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
尤其是名词,基本不能用,比如: women - 0:woman/1:s/s:womens/3:womens (womens是什么鬼????女人们们???) children - 0:child/1:s/s:childrens (本来就是复数,基本不会在children后再加s) sheep - s:sheeps (这是单复数同形的) lives - 0:life/1:s (这个明显有两个lemma,分别是life和live,应该是 1:3s 才对) .......................
没有经过严格测试,只是随随便便搜索一下就发现这么多问题,注意,我只是随随便便搜索一下哦,还有音标、释义也有很多问题,惨不忍睹
合理推测,如果认真测试,可能有几千个错误
The text was updated successfully, but these errors were encountered:
看了下lemma.en.txt里确实有womens,然后看这个是来自于bnc,搜了下确实有
然后从这个文章里的The History of the Possessive Apostrophe这一段看,应该是women's的误用
Sorry, something went wrong.
lemma.en.txt 包含了从 bnc 统计得到的频率信息:
woman/60142 -> women women/43 -> womens
在众多语料中,确实出现了 43 个 women 到 womens 的对应关系,我之前思考过要不要保留 womens 的,最后我选择忠实的保留 BNC 的变换关系,因为:
1)确实有语料用了 womens,保留这个信息,至少允许你用 womens 反查到 women,对吧? 2)我同时给出了出现频率:"women/43" 后面那个 43 就是出现的次数,有了这个次数,其实允许你方便的过滤掉低于一定阈值的干扰信息,比如你可以选择忽略 500 以内的变换,认为是错误的用法。
我给你们保留了最完整的语料信息,给你们选择的自由。
另外一个获取词形变化的是用: https://www.nodebox.net/code/index.php/Linguistics
这个工具库来做,不过它的数据量很有限。
No branches or pull requests
尤其是名词,基本不能用,比如:
women - 0:woman/1:s/s:womens/3:womens (womens是什么鬼????女人们们???)
children - 0:child/1:s/s:childrens (本来就是复数,基本不会在children后再加s)
sheep - s:sheeps (这是单复数同形的)
lives - 0:life/1:s (这个明显有两个lemma,分别是life和live,应该是 1:3s 才对)
.......................
没有经过严格测试,只是随随便便搜索一下就发现这么多问题,注意,我只是随随便便搜索一下哦,还有音标、释义也有很多问题,惨不忍睹
合理推测,如果认真测试,可能有几千个错误
The text was updated successfully, but these errors were encountered: