redshred.api package

Submodules

redshred.api.client module

Bases: object

Client for interacting with the RedShred Platform.

This client provides easy access to the RedShred API, enabling users to interact with various RedShred Platform services such as retrieving user data, fetching contents of files, and accessing collection statistics.

The client can be configured via environmental variables, a .rsconfig file, or manual initialization arguments.

Attributes:: config (Configuration): The configuration for the RedShred API. api (RedShredAPI): Interface to the RedShred API services. _user (RedShredUser, optional): The authenticated user’s details. Default is None.
Args:: token (str, optional): The authentication token. host (str, optional): The RedShred API host address. host_verify (bool, optional): Flag to enable or disable SSL verification. config_path (str, optional): Path to the .rsconfig file. context_override (str, optional): Overrides the context specified in the configuration. config (Configuration, optional): A pre-initialized Configuration object.

collection(slug: str) → Collection[source]

Retrieve a specific collection by its slug.

Args:: slug (str): The slug, short reference, or link field of the collection.
Returns:: Collection: The retrieved collection object.
Raises:: HTTPError: If the requested collection cannot be found or another error occurs.

collections(**client_params) → CollectionIterator[source]

Fetch collections accessible to the user.

Args:: **client_params: Additional parameters for the client request.
Returns:: CollectionIterator: An iterator to access user’s collections.

file(storage_path: str, inline: bool = False, unconfined: bool = True, width: int = 800, **kwargs) → bytes[source]

Fetch a file stored in RedShred by its relative path.

Many enrichments in RedShred can generate files (e.g. extracted images) and will serve these back with a relative path that can be passed to this method to retrieve.

If inline is True, this will display images directly in notebook context. In these cases, unconfined and width will be passed directly to Image().

Args:

storage_path (str): path to file as given in API response data inline (bool, optional): Whether to attempt to display the file inline in a notebook, images only. Defaults

to False.

unconfined (bool, optional): passed to IPython.core.display.Image, requires inline=True. Defaults to True. width (int, optional): passed to IPython.core.display.Image, requires inline=True. Defaults to 800. **kwargs: passed to IPython.core.display.Image, requires inline=True.

Returns:

bytes: file contents as bytes

get_text(api_object: redshred.models.api.ApiObject)[source]

Extract the text from a given api object.

Args:: api_object: Any API object that has a get_text method.
Returns:: str: The text extracted from the API object.

stats(collection_name: str | Collection) → dict[source]

Review the current states of documents in a collection.

Read state is one of [‘unread’, ‘queued’, ‘reading’, ‘read’, ‘crashed’]:

unread - newly uploaded documents that are not yet fully enriched and indexed
queued - documents that are awaiting reading
reading - documents that are currently being enriched by the RedShred reader
read - documents that have been read and are “at rest” in RedShred
crashed - documents that could not be successfully processed by RedShred.

Documents in crashed states can be reported to RedShred through the chat window in the documentation. It is our goal that all documents should be read successfully although the amount of enrichment may vary (e.g. encrypted PDFs shouldn’t crash, but will likely be sparsely enriched.)

Args:: collection_name (str): Name of target collection
Returns:: dict: Read statistics for collection

property user

Retrieve the current authenticated user’s details.

Returns:: RedShredUser: An object representing the authenticated user.
Raises:: HTTPError: If authentication fails or an error occurs with the HTTP call. TypeError: If the data returned is not in the expected format. ConnectionError: If a connection error occurs.

redshred.api.http module

Bases: object

This is the RedShredAPI class. It is used to interact with the RedShred API.

The class is initialized with a configuration object, host, token, verbosity level, verify option, and a session. If a configuration object is provided, the host, token, and verify attributes are set from the configuration.

The class has methods for handling HTTP requests: GET, HEAD, POST, PUT, DELETE, CONNECT, OPTIONS, TRACE, PATCH. If an attribute that is not one of these is accessed, an AttributeError is raised.

The class also includes a paginate method for handling paginated responses from the API. It takes an endpoint, results per page, parameters, raise for status flag, and any other requests keyword arguments. It returns an iterator of requests.Response objects.

The class also includes a method for checking if the API is accessible and a method for making requests to the API. The request method takes a method, endpoint, parameters, raise for status flag, and any other requests keyword arguments. It returns a requests.Response object.

If the provided endpoint is a fully qualified URL, the class checks if it matches the base_url defined at creation and removes the base to match the format the request is expecting. If the provided endpoint does not match the base_url, a ValueError is raised.

ok() → bool[source]

Checks if the API is accessible with the provided credentials.

This method makes a HEAD request to a known endpoint and evaluates the response to determine if the API is accessible. A 403 status code suggests bad authentication, while a 404 suggests valid credentials.

Returns:

bool: True if the API is accessible with the provided credentials, False otherwise.

Examples:

>>> accessible = api.ok()
>>> print(f"API Accessible: {accessible}")
API Accessible: True

paginate(endpoint: str, results_per_page: int = 100, params: Dict[str, Any] | None = None, raise_for_status: bool = False, **requests_kwargs: Dict[str, Any]) → Iterator[Response][source]

Generates pages of API responses from a paginated endpoint.

This function iterates over the pages of a paginated API endpoint, yielding each page as a requests.Response object. It continues requesting pages until no further pages are available.

Args:

endpoint (str): The API endpoint from which to retrieve paginated results. results_per_page (int): The number of results to retrieve per page. Default is 100. params (Dict[str, Any], optional): A dictionary of parameters to pass in the query string. raise_for_status (bool): If True, HTTPError will be raised for HTTP responses with error status codes. **requests_kwargs (Dict[str, Any]): Additional keyword arguments to pass to the requests methods.

Yields:

Iterator[requests.Response]: An iterator of requests.Response objects, each corresponding to a page of results.

Examples:

>>> paginator = api.paginate("items", params={"param1": "value1"})
>>> for page in paginator:
>>>     print(page.json())  # Process each page
>>>     print(page.json()["results"][0]) # Process data in page

redshred.api package

Submodules

redshred.api.client module

redshred.api.http module

Module contents