Compare commits

..

No commits in common. "185692790f8670a4ec6c9ed41528f1ef170e7264" and "ff8cba84c76cbdecc05f04ecdcb3de3fbebebaf6" have entirely different histories.

5 changed files with 5 additions and 62 deletions

View File

@ -14,7 +14,7 @@ GPTC provides both a CLI tool and a Python library.
### Classifying text ### Classifying text
gptc classify [-n <max_ngram_length>] <compiled model file> python -m gptc classify [-n <max_ngram_length>] <compiled model file>
This will prompt for a string and classify it, then print (in JSON) a dict of This will prompt for a string and classify it, then print (in JSON) a dict of
the format `{category: probability, category:probability, ...}` to stdout. (For the format `{category: probability, category:probability, ...}` to stdout. (For
@ -22,24 +22,17 @@ information about `-n <max_ngram_length>`, see section "Ngrams.")
Alternatively, if you only need the most likely category, you can use this: Alternatively, if you only need the most likely category, you can use this:
gptc classify [-n <max_ngram_length>] <-c|--category> <compiled model file> python -m gptc classify [-n <max_ngram_length>] <-c|--category> <compiled model file>
This will prompt for a string and classify it, outputting the category on This will prompt for a string and classify it, outputting the category on
stdout (or "None" if it cannot determine anything). stdout (or "None" if it cannot determine anything).
### Compiling models ### Compiling models
gptc compile [-n <max_ngram_length>] <raw model file> python -m gptc compile [-n <max_ngram_length>] <raw model file>
This will print the compiled model in JSON to stdout. This will print the compiled model in JSON to stdout.
### Packing models
gptc pack <dir>
This will print the raw model in JSON to stdout. See `models/unpacked/` for an
example of the format. Any exceptions will be printed to stderr.
## Library ## Library
### `gptc.Classifier(model, max_ngram_length=1)` ### `gptc.Classifier(model, max_ngram_length=1)`
@ -59,44 +52,12 @@ category:probability, ...}`
Classify `text`. Returns the category into which the text is placed (as a Classify `text`. Returns the category into which the text is placed (as a
string), or `None` when it cannot classify the text. string), or `None` when it cannot classify the text.
#### `Classifier.model`
The classifier's model.
#### `Classifier.has_emoji`
Check whether emojis are supported by the `Classifier`. (See section "Emoji.")
Equivalent to `gptc.has_emoji and gptc.model_has_emoji(model)`.
### `gptc.compile(raw_model, max_ngram_length=1)` ### `gptc.compile(raw_model, max_ngram_length=1)`
Compile a raw model (as a list, not JSON) and return the compiled model (as a Compile a raw model (as a list, not JSON) and return the compiled model (as a
dict). dict).
For information about `max_ngram_length`, see section "Ngrams." For information about `max_ngram_length`, see section "Ngrams."
### `gptc.pack(directory, print_exceptions=False)
Pack the model in `directory` and return a tuple of the format:
(raw_model, [(exception,),(exception,)...])
Note that the exceptions are contained in single-item tuples. This is to allow
more information to be provided without breaking the API in future versions of
GPTC.
See `models/unpacked/` for an example of the format.
### `gptc.has_emoji`
`True` if the `emoji` package is installed (see section "Emoji"), `False`
otherwise.
### `gptc.model_has_emoji(compiled_model)`
Returns `True` if `compiled_model` was compiled with emoji support, `False`
otherwise.
## Ngrams ## Ngrams
GPTC optionally supports using ngrams to improve classification accuracy. They GPTC optionally supports using ngrams to improve classification accuracy. They
@ -123,10 +84,6 @@ If the [`emoji`](https://pypi.org/project/emoji/) package is installed, GPTC
will automatically handle emojis the same way as words. If it is not installed, will automatically handle emojis the same way as words. If it is not installed,
GPTC will still work but will ignore emojis. GPTC will still work but will ignore emojis.
`emoji` must be installed on both the system used to compile the model and the
system used to classify text. Emojis are ignored if it is missing on either
system.
## Model format ## Model format
This section explains the raw model format, which is how you should create and This section explains the raw model format, which is how you should create and

View File

@ -5,8 +5,6 @@
from gptc.compiler import compile as compile from gptc.compiler import compile as compile
from gptc.classifier import Classifier as Classifier from gptc.classifier import Classifier as Classifier
from gptc.pack import pack as pack from gptc.pack import pack as pack
from gptc.tokenizer import has_emoji as has_emoji
from gptc.model_info import model_has_emoji as model_has_emoji
from gptc.exceptions import ( from gptc.exceptions import (
GPTCError as GPTCError, GPTCError as GPTCError,
ModelError as ModelError, ModelError as ModelError,

View File

@ -1,6 +1,6 @@
# SPDX-License-Identifier: LGPL-3.0-or-later # SPDX-License-Identifier: LGPL-3.0-or-later
import gptc.tokenizer, gptc.compiler, gptc.exceptions, gptc.weighting, gptc.model_info import gptc.tokenizer, gptc.compiler, gptc.exceptions, gptc.weighting
import warnings import warnings
from typing import Dict, Union, cast, List from typing import Dict, Union, cast, List
@ -33,7 +33,6 @@ class Classifier:
self.model = model self.model = model
model_ngrams = cast(int, model.get("__ngrams__", 1)) model_ngrams = cast(int, model.get("__ngrams__", 1))
self.max_ngram_length = min(max_ngram_length, model_ngrams) self.max_ngram_length = min(max_ngram_length, model_ngrams)
self.has_emoji = gptc.tokenizer.has_emoji and gptc.model_info.model_has_emoji(model)
def confidence(self, text: str) -> Dict[str, float]: def confidence(self, text: str) -> Dict[str, float]:
"""Classify text with confidence. """Classify text with confidence.

View File

@ -1,8 +0,0 @@
# SPDX-License-Identifier: LGPL-3.0-or-later
import gptc.compiler
from typing import Dict, Union, cast, List
def model_has_emoji(model: gptc.compiler.MODEL) -> bool:
return cast(int, model.get("__emoji__]", 0)) == 1

View File

@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project] [project]
name = "gptc" name = "gptc"
version = "2.1.0" version = "2.0.1"
description = "General-purpose text classifier" description = "General-purpose text classifier"
readme = "README.md" readme = "README.md"
authors = [{ name = "Samuel Sloniker", email = "sam@kj7rrv.com"}] authors = [{ name = "Samuel Sloniker", email = "sam@kj7rrv.com"}]
@ -18,9 +18,6 @@ classifiers = [
dependencies = [] dependencies = []
requires-python = ">=3.7" requires-python = ">=3.7"
[project.optional-dependencies]
emoji = ["emoji"]
[project.urls] [project.urls]
Homepage = "https://git.kj7rrv.com/kj7rrv/gptc" Homepage = "https://git.kj7rrv.com/kj7rrv/gptc"