redshred.api package

Submodules

redshred.api.client module

class redshred.api.client.RedShredClient(token: str | None = None, host: str | None = None, host_verify: bool | None = None, config_path: str | None = None, context_override: str | None = None, config: Configuration | None = None)[source]

Bases: object

Client for interacting with the RedShred Platform

collection(slug: str) Collection[source]

Retrieve a collection from the RedShred database.

Args:

slug: slug, short reference, or link field from the remote collection

Returns:

Collection object

collections(**client_params) CollectionIterator[source]

Fetch the collections that a user has access to and return them as a CollectionIterator.

Returns:

CollectionIterator: An iterator used to access collections, see CollectionIterator class for interface details

file(storage_path: str, inline: bool = False, unconfined: bool = True, width: int = 800, **kwargs) bytes[source]

Fetch a file stored in RedShred by its relative path.

Many enrichments in RedShred can generate files (e.g. extracted images) and will serve these back with a relative path that can be passed to this method to retrieve.

If inline is True, this will display images directly in notebook context. In these cases, unconfined and width will be passed directly to Image().

Args:

storage_path (str): path to file as given in API response data inline (bool, optional): Whether to attempt to display the file inline in a notebook, images only. Defaults to False. unconfined (bool, optional): passed to IPython.core.display.Image, requires inline=True. Defaults to True. width (int, optional): passed to IPython.core.display.Image, requires inline=True. Defaults to 800.

Returns:

bytes: file contents as bytes

get_text(api_object)[source]
stats(collection_name: str | Collection) dict[source]

Review current states of documents in a collection.

Read state is one of [‘unread’, ‘queued’, ‘reading’, ‘read’, ‘crashed’]:

  • unread - newly uploaded documents that are not yet fully enriched and indexed

  • queued - documents that are awaiting reading

  • reading - documents that are currently being enriched by the RedShred reader

  • read - documents that have been read and are “at rest” in RedShred

  • crashed - documents that could not be successfully processed by RedShred.

Documents in crashed states can be reported to RedShred through the chat window in the documentation. It is our goal that all documents should be read successfully although the amount of enrichment may vary (e.g. encrypted PDFs shouldn’t crash, but will likely be sparsely enriched.)

Args:

collection_name (str): Name of target collection

Returns:

dict: Read statistics for collection

property user

redshred.api.http module

class redshred.api.http.RedShredAPI(configuration: Configuration | None = None, host: str | None = None, token: str | None = None, verbosity: int = 0, verify: bool | None = None, session: Session | None = None)[source]

Bases: object

Creates an authenticated requests session for the RedShred Platform.

RedShredAPI exposes .get, .post, .put, .patch, .head, and .delete methods that can take an endpoint as an argument. /v2/ endpoints are the default and only supported API version.

For example: ```

from redshred import RedShredAPI redshred = RedShredAPI() redshred.get(‘/collections/’).json()

```

ok()[source]

Check that authorization is configured correctly, returns True if authorized

paginate(endpoint: str, results_per_page: int = 100, params: Dict[str, Any] | None = None, raise_for_status: bool = False, **requests_kwargs: Dict[str, Any]) Iterator[Response][source]

Iterate over RedShred results pages as raw http response objects

Args:

endpoint (str): url to yield paginated responses results_per_page (int): the number of results that should be included in each response. Defaults to 100. params (dict): requests library parameters for http query raise_for_status: raise an traceback for any http error encountered **requests_kwargs (): anything to pass to requests library

Returns: raw http response object.

Module contents