gptc/README.md

# GPTC

General-purpose text classifier in Python

GPTC provides both a CLI tool and a Python library.

## CLI Tool

### Classifying text

    python -m gptc <modelfile>

This will prompt for a string and classify it, outputting the category on
stdout (or "None" if it cannot determine anything).

Alternatively, if you need confidence data, use:

    python -m gptc -j <modelfile>

This will print (in JSON) a dict of the format `{category: probability,
category:probability, ...}` to stdout.

### Compiling models

    python -m gptc <raw model file> -c|--compile <compiled model file>

## Library

### `gptc.Classifier(model)`

Create a `Classifier` object using the given *compiled* model (as a dict, not
JSON).

#### `Classifier.confidence(text)`

Classify `text`. Returns a dict of the format `{category: probability,
category:probability, ...}`

#### `Classifier.classify(text)`

Classify `text`. Returns the category into which the text is placed (as a
string), or `None` when it cannot classify the text.

### `gptc.compile(raw_model)`
Compile a raw model (as a list, not JSON) and return the compiled model (as a
dict).

## Model format

This section explains the raw model format, which is how you should create and
edit models.

Raw models are formatted as a list of dicts. See below for the format:

    [
        {
            "text": "<text in the category>",
            "category": "<the category>"
        }
    ]

GPTC handles models as Python `list`s of `dict`s of `str`s (for raw models) or
`dict`s of `str`s and `float`s (for compiled models), and they can be stored
in any way these Python objects can be. However, it is recommended to store
them in JSON format for compatibility with the command-line tool.

## Example model

An example model, which is designed to distinguish between texts written by
Mark Twain and those written by William Shakespeare, is available in `models`.
The raw model is in `models/raw.json`; the compiled model is in
`models/compiled.json`.
Setup 2020-03-16 10:57:15 -07:00			`# GPTC`
Add blank lines before and after headings in README 2022-05-20 17:22:37 -07:00
Setup 2020-03-16 10:57:15 -07:00			`General-purpose text classifier in Python`

Reorganize code and improve README 2021-10-26 13:33:15 -07:00			`GPTC provides both a CLI tool and a Python library.`

Setup 2020-03-16 10:57:15 -07:00			`## CLI Tool`
Add blank lines before and after headings in README 2022-05-20 17:22:37 -07:00
Reorganize code and improve README 2021-10-26 13:33:15 -07:00			`### Classifying text`

Update 'README.md' 2022-04-02 11:11:04 -07:00			`python -m gptc <modelfile>`
Reorganize code and improve README 2021-10-26 13:33:15 -07:00
			`This will prompt for a string and classify it, outputting the category on`
Add Classifier.confidence() 2021-11-03 06:38:22 -07:00			`stdout (or "None" if it cannot determine anything).`

			`Alternatively, if you need confidence data, use:`

Update 'README.md' 2022-04-02 11:11:04 -07:00			`python -m gptc -j <modelfile>`
Add Classifier.confidence() 2021-11-03 06:38:22 -07:00
			This will print (in JSON) a dict of the format `{category: probability,
			category:probability, ...}` to stdout.

Reorganize code and improve README 2021-10-26 13:33:15 -07:00			`### Compiling models`

Update 'README.md' 2022-04-02 11:11:04 -07:00			`python -m gptc <raw model file> -c\|--compile <compiled model file>`
Setup 2020-03-16 10:57:15 -07:00
			`## Library`
Add blank lines before and after headings in README 2022-05-20 17:22:37 -07:00
Setup 2020-03-16 10:57:15 -07:00			### `gptc.Classifier(model)`
Add blank lines before and after headings in README 2022-05-20 17:22:37 -07:00
Reorganize code and improve README 2021-10-26 13:33:15 -07:00			Create a `Classifier` object using the given compiled model (as a dict, not
			`JSON).`
Add blank lines before and after headings in README 2022-05-20 17:22:37 -07:00
Add Classifier.confidence() 2021-11-03 06:38:22 -07:00			#### `Classifier.confidence(text)`
Add blank lines before and after headings in README 2022-05-20 17:22:37 -07:00
Add Classifier.confidence() 2021-11-03 06:38:22 -07:00			Classify `text`. Returns a dict of the format `{category: probability,
			category:probability, ...}`
Add blank lines before and after headings in README 2022-05-20 17:22:37 -07:00
Reorganize code and improve README 2021-10-26 13:33:15 -07:00			#### `Classifier.classify(text)`
Add blank lines before and after headings in README 2022-05-20 17:22:37 -07:00
Add Classifier.confidence() 2021-11-03 06:38:22 -07:00			Classify `text`. Returns the category into which the text is placed (as a
classify returns None, not 'unknown' 2020-08-14 16:11:42 -07:00			string), or `None` when it cannot classify the text.
Add blank lines before and after headings in README 2022-05-20 17:22:37 -07:00
Fix heading 2022-05-21 12:54:31 -07:00			### `gptc.compile(raw_model)`
Reorganize code and improve README 2021-10-26 13:33:15 -07:00			`Compile a raw model (as a list, not JSON) and return the compiled model (as a`
			`dict).`
Setup 2020-03-16 10:57:15 -07:00
			`## Model format`
Add blank lines before and after headings in README 2022-05-20 17:22:37 -07:00
Setup 2020-03-16 10:57:15 -07:00			`This section explains the raw model format, which is how you should create and`
			`edit models.`

			`Raw models are formatted as a list of dicts. See below for the format:`

			`[`
			`{`
			`"text": "<text in the category>",`
			`"category": "<the category>"`
			`}`
			`]`

Reorganize code and improve README 2021-10-26 13:33:15 -07:00			GPTC handles models as Python `list`s of `dict`s of `str`s (for raw models) or
			`dict`s of `str`s and `float`s (for compiled models), and they can be stored
			`in any way these Python objects can be. However, it is recommended to store`
			`them in JSON format for compatibility with the command-line tool.`
Setup 2020-03-16 10:57:15 -07:00
Reorganize code and improve README 2021-10-26 13:33:15 -07:00			`## Example model`
Add blank lines before and after headings in README 2022-05-20 17:22:37 -07:00
Reorganize code and improve README 2021-10-26 13:33:15 -07:00			`An example model, which is designed to distinguish between texts written by`
			Mark Twain and those written by William Shakespeare, is available in `models`.
Move models to subdir 2020-08-14 16:24:16 -07:00			The raw model is in `models/raw.json`; the compiled model is in
			`models/compiled.json`.