Samuel Sloniker
97c4eef086
Move deserialize to Model object
1 year ago
Samuel Sloniker
7b7ef39d0b
Merge compiler into model.py
1 year ago
Samuel Sloniker
a252a15e9d
Clean up code
1 year ago
Samuel Sloniker
9513025e60
Fix type annotations
1 year ago
Samuel Sloniker
2c3fc77ba6
Finish classification explanations
...
A couple things I missed in 7f68dc6fc6
1 year ago
Samuel Sloniker
d8f3d2e701
Bump model version
...
99ad07a876
broke the model format,
although probably only in a few edge cases
Still enough of a change for a model version bump
1 year ago
Samuel Sloniker
7f68dc6fc6
Add classification explanations
...
Closes #17
1 year ago
Samuel Sloniker
99ad07a876
Casefold
...
Closes #14
1 year ago
Samuel Sloniker
56550ca457
Remove Classifier objects
...
Closes #16
1 year ago
Samuel Sloniker
75fdb5ba3c
Split compiler into two functions
1 year ago
Samuel Sloniker
aad590636a
Fix type annotations
1 year ago
Samuel Sloniker
099e810a18
Fix `check`
1 year ago
Samuel Sloniker
ec7f4116fc
Include file name of output in arguments
1 year ago
Samuel Sloniker
f8dbc78b82
Allow hash algorithm selection
...
Closes #9
1 year ago
Samuel Sloniker
6f21e0d4e9
Remove debug print lines from compiler
1 year ago
Samuel Sloniker
41bba61410
Remove `has_emoji` and bump model version
...
Closes #11
1 year ago
Samuel Sloniker
10668691ea
Normalize characters
...
Closes #3
1 year ago
Samuel Sloniker
295a1189de
Include numbers in tokenized output
...
Closes #12
1 year ago
Samuel Sloniker
74b2ba81b9
Deserialize from file
1 year ago
Samuel Sloniker
9916744801
New type annotation for serialize
1 year ago
Samuel Sloniker
7e7b5f3e9c
Performance improvements
1 year ago
Samuel Sloniker
c84758af56
list, not tuple
1 year ago
Samuel Sloniker
c754293d69
Compiler performance improvements
1 year ago
Samuel Sloniker
8d42a92848
Add type annotation to Model.get()
1 year ago
Samuel Sloniker
b1228edd9c
Add CLI for Model.get()
1 year ago
Samuel Sloniker
25192ffddf
Add ability to look up individual token
...
Closes #10
1 year ago
Samuel Sloniker
548d670960
Use Classifier for --category
1 year ago
Samuel Sloniker
b3a43150d8
Split hash function
1 year ago
Samuel Sloniker
08437a2696
Add normalize()
1 year ago
Samuel Sloniker
fc4665bb9e
Separate tokenization and hashing
1 year ago
Samuel Sloniker
448f200923
Add `confidence` to Model; deprecate Classifier
1 year ago
Samuel Sloniker
f1a1ed9e2a
Remove most emoji-optional code
...
Almost all of the code previously used to make the emoji module optional
is removed in this commit. It was always my intent to make the `emoji`
module a hard dependency in v3.0.0 and remove the code for making it
optional, but for some reason I remembered to do the former but not the
latter; in fact, I added emoji-optional code to the new model handling
code. I can't completely remove this code because 3.0.0 will not
successfully deserialize a model without the `has_emoji` field in the
JSON config options, but this commit removes as much as possible without
breaking the model format and API version.
See also issue #11
1 year ago
Samuel Sloniker
3340abbf15
Fix CLI tool
2 years ago
Samuel Sloniker
a10569b5ab
New model format
...
Use Model objects and binary serialization format
2 years ago
Samuel Sloniker
f4ae5f851d
Hash words and ngrams
2 years ago
Samuel Sloniker
1d1ccbb7cc
Add min_count
2 years ago
Samuel Sloniker
af1d1749d2
Refactor word count dict in compiler
...
This makes future changes to the algorithm much simpler.
2 years ago
Samuel Sloniker
aea35ad059
Switch to GPL
2 years ago
Samuel Sloniker
7d1cbcaee0
Make sure text is lowercase
2 years ago
Samuel Sloniker
c2cd6f62fb
Revert "Switch to `statistics.stdev`"
...
This reverts commit 76df1dc56d
.
Fix major performance regression
2 years ago
Samuel Sloniker
76df1dc56d
Switch to `statistics.stdev`
2 years ago
Samuel Sloniker
3634a10aeb
Fix another emoji bug
2 years ago
Samuel Sloniker
9538cf8c22
Fix emoji handling
2 years ago
Samuel Sloniker
185692790f
Add emoji checks, improve docs
2 years ago
Samuel Sloniker
ff8cba84c7
format `pack.py`
2 years ago
Samuel Sloniker
8c6dd0bde9
Type checks for pack
2 years ago
Samuel Sloniker
5082c2226b
Move pack to main module; format code
2 years ago
Samuel Sloniker
e711767d24
Add type checks to all functions that need them
2 years ago
Samuel Sloniker
67ac3a4591
Working type checks
2 years ago
Samuel Sloniker
b36d8e6081
Fix annotations
2 years ago