You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've added a usertools/ folder with three starting scripts. Two of them are for sorting our .json output files so that you get consistent sorting between runs, the third is a word search that trawls through a wiktextract .json file, with a toggle for regex, filtering by language(s) and a max output count.
If you have anything that you would like to add there, just put up a pull request with a new file in usertools/, or if you have specific requests post here.
These are meant to be small command-line scripts for the most part, but even that's not a must as long as it could be helpful to someone somewhere down the line. It would be very nice if they are simple and easy to understand even for people new to programming, so that they can be edited (and resubmitted as new variant scripts).
The text was updated successfully, but these errors were encountered:
The sorting scripts aren't nearly enough to get a working diff. I've committed json-compare-samples.py which takes two files, indexes one of them trying to give each json object its own key (which doesn't work when there's not enough distinguishing info and many "Noun" "Noun" "Noun" sections inside the same etymology...), then the other file is jumped through and each line has a one in N chance (--one-in-a) to be chosen as a sample. The sample is also wrung through the same process to craft a key that should correspond with one in the index of the first file, and then those two lines are compared using difflib if they are different; this outputs something like a diff for each object being compared, comparing lines of strings.
I've added a usertools/ folder with three starting scripts. Two of them are for sorting our .json output files so that you get consistent sorting between runs, the third is a word search that trawls through a wiktextract .json file, with a toggle for regex, filtering by language(s) and a max output count.
If you have anything that you would like to add there, just put up a pull request with a new file in usertools/, or if you have specific requests post here.
These are meant to be small command-line scripts for the most part, but even that's not a must as long as it could be helpful to someone somewhere down the line. It would be very nice if they are simple and easy to understand even for people new to programming, so that they can be edited (and resubmitted as new variant scripts).
The text was updated successfully, but these errors were encountered: