redshred package

Subpackages

Submodules

redshred.configuration module

class redshred.configuration.Configuration(token=None, host=None, host_verify=None, config_path=None, context_override=None)[source]

Bases: object

Handles configuration for the RedShred API client.

The order in which the configuration sources are looked up is as follows:

  1. Explicit values given to the constructor.

  2. Environment variables: REDSHRED_TOKEN and REDSHRED_HOST.

  3. Environment variables: REDSHRED_CONTEXT and REDSHRED_CONFIG.

  4. REDSHRED_CONTEXT alone (uses the default config path).

  5. REDSHRED_CONFIG alone (defaults to currentContext).

  6. None given: uses the default config path with currentContext.

Attributes:

context (str): The current context name. user (Optional[str]): The username, if available. host (str): The RedShred API server URL. token (str): The authentication token for RedShred API. verify (bool): Flag indicating whether to verify the server’s SSL certificate. options (Dict[str, Any]): Additional configuration options. source (str): The source of the configuration, such as environment variables or config file path.

Raises:

ConfigurationError: If there is a general configuration error. ConfigurationFileError: If there is a problem specifically with the configuration file format.

context: str
host: str
info()[source]

Provides a multiline string representation of the configuration details.

The returned string includes the configuration context, host, obfuscated token, username, and the source from which the configuration was obtained.

Returns:

str: A human-readable, formatted summary of the configuration settings.

options: Dict[str, Any]
source: str
token: str
user: str | None
verify: bool
exception redshred.configuration.ConfigurationError[source]

Bases: Exception

Exception raised for general configuration errors.

This error is used for broader configuration issues not necessarily related to the file format itself, such as missing environment variables or failure to locate a required configuration file.

exception redshred.configuration.ConfigurationFileError[source]

Bases: Exception

Exception raised for problems with the configuration file format.

This error is thrown when there is a problem with parsing the RedShred configuration file, which might be due to issues like improper formatting or missing required fields.

redshred.exceptions module

exception redshred.exceptions.RedShredAPIError(*args, reason=None, **kwargs)[source]

Bases: Exception

reason: Any
exception redshred.exceptions.RedShredFileExistsError(*args, **kwargs)[source]

Bases: RedShredHTTPError

exception redshred.exceptions.RedShredHTTPError(*args, **kwargs)[source]

Bases: HTTPError

classmethod from_http_error(http_error)[source]

redshred.spatial module

class redshred.spatial.BoundingBox(initlist: Iterable | None = None)[source]

Bases: list, _BaseGeometricMixin

as_boundingbox() BoundingBox[source]

Get a BoundingBox object from a GeoJSON object

as_geojson(_type='Polygon') GeoJSON[source]
as_numpy(dtype=<class 'numpy.float64'>)[source]
as_shape() BaseGeometry[source]

Get a shapely object from the BoundingBox

classmethod from_shape(geometry) BoundingBox[source]
get_bounds() BoundingBox[source]

for api consistency with Geojson

get_offsets(numpy=False) tuple[int, int][source]

x and y offsets for each coordinate of the geojson object

property height
json(**kwargs) str[source]

Dump current object to GeoJSON string

property min_x
property min_y
normalize_to_page() BoundingBox[source]
rotate(rotation, origin=(0.5, 0.5)) BoundingBox[source]
scale(xfact=1.0, yfact=1.0, origin=(0, 0)) BoundingBox[source]

shapely scaling transformation

translate(xoff=0.0, yoff=0.0) BoundingBox[source]

shapely offset transformation

property width
class redshred.spatial.GeoJSON[source]

Bases: dict, _BaseGeometricMixin

as_boundingbox() BoundingBox[source]

Get a BoundingBox object from a GeoJSON object

as_geojson() GeoJSON[source]
as_shape() BaseGeometry[source]

Get a shapely object from the GeoJSON dictionary Returns:

BaseGeometry

convex_hull()[source]

Return the convex hull of a geometry

classmethod from_bounds(x_min, y_min, x_max, y_max, base_class='polygon')[source]

Create a new object from x_min, y_min, x_max, y_max values. By default, a polygon is returned, but you can override this functionality by passing “multipolygon” as your base_class

classmethod from_coords(coords)[source]

Create a new GeoJSON object from either a list of coordinates, or a list of list of coordinates for a Polygon and Multipolygon respectively

classmethod from_shape(shapely_shape)[source]

Create a new GeoJSON object from a shapely shape

get_bounds() BoundingBox[source]

Get a BoundingBox object from a GeoJSON object

get_coordinates(canonical=True, numpy=False) List[List[float]][source]

Get a coordinates list object from a GeoJSON object

get_offsets(numpy=False) Tuple[int, int][source]

x and y offsets for each coordinate of the geojson object

property height
json(**kwargs) str[source]

Dump current object to GeoJSON string

property min_x
property min_y
normalize_to_page() GeoJSON[source]

Normalize the given GeoJSON from the quiltspace or document space to a [0,1], [0,1] scale. This returns a new GeoJSON instance.

rotate(angle, origin='center') GeoJSON[source]

Rotate a GeoJSON object using affine transformations. This returns a new GeoJSON instance.

scale(xfact=1.0, yfact=1.0, origin=(0, 0)) GeoJSON[source]

Scale a GeoJSON object using affine transformations. This returns a new GeoJSON instance.

translate(xoff=0.0, yoff=0.0) GeoJSON[source]

Offset a GeoJSON object by the specified amount. This returns a new GeoJSON instance.

property width

redshred.util module

util functions

redshred.util.decode_datetime(string_repr)[source]
redshred.util.id_from_any(something)[source]

Module contents

This is the RedShred client library in Python.

  1. RedShred, LLC 2018-2023

class redshred.Collection(*, self_link: str = None, id: str = None, config: CollectionConfiguration | None = None, created_at: datetime = None, created_by: str = None, description: str | None = None, documents_link: str = None, marked_for_delete: bool | None = False, metadata: dict | None = None, name: str = None, owner: str = None, perspectives_link: str = None, segments_link: str = None, slug: str = None, updated_at: datetime = None, updated_by: str = None, user_data: dict | None = None, client: Any = None, **data)[source]

Bases: ApiObject

A collection class.

This class is used to create collections. It provides methods to create, read, update, and delete the collection, get documents, perspectives, and segments from the collection, and upload CSVs, files, URLs, and text to the collection.

Attributes:

id: The ID of the collection. config: The configuration of the collection. created_at: The date the collection was created. created_by: The user who created the collection. description: The description of the collection. documents_link: The link to the documents in the collection. marked_for_delete: Whether the collection is marked for deletion. metadata: The metadata of the collection. name: The name of the collection. owner: The owner of the collection. perspectives_link: The link to the perspectives in the collection. segments_link: The link to the segments in the collection. self_link: The self link of the collection. slug: The slug of the collection. updated_at: The date the collection was last updated. updated_by: The user who last updated the collection. user_data: The user data of the collection.

client: Any
config: CollectionConfiguration | None
create(client: 'redshred.api.http.RedShredAPI' | 'redshred.api.client.RedShredClient') TApiObject[source]

Creates the local object on the remote server.

This method is used to create the local object on the remote server. It uses the provided client to make the API request. The method identifies the creatable fields, makes a POST request to the server, and updates the local object with the response.

Args:

client (Union[RedShredAPI, RedShredClient]): The client to use for the API request.

Returns:

TApiObject: The local object updated with the response from the server.

Raises:

RedShredAPIError: If the server response status code is not 201 (Created).

created_at: datetime.datetime
created_by: str
delete()[source]

Permanently delete a collection remotely

description: str | None
document(document_id) Document[source]

Retrieves a specific document from the collection.

This method uses the provided document ID to load a Document object from the collection. The document ID can be provided in various formats and is converted to a standard format using the id_from_any function.

Args:

document_id (str): The ID of the document to retrieve.

Returns:

Document: The loaded Document object.

documents(q: str | None = None, fields: List[str] | None = None, **url_params) DocumentIterator[Document][source]

Returns an iterator over the documents in the collection.

This method creates a DocumentIterator object that can be used to iterate over the documents in the collection. The documents can be filtered using a query string and specific fields can be included in the output. Additional URL parameters can be provided as keyword arguments.

Args:

q (str, optional): The query string to filter the documents. Defaults to None. fields (List[str], optional): The fields to include in the output. Defaults to None. **url_params: Additional URL parameters.

Returns:

DocumentIterator[Document]: An iterator over the documents in the collection.

id: str
classmethod load(client: 'redshred.api.http.RedShredAPI' | 'redshred.api.client.RedShredClient', slug: str = None, url: str = None) Collection[source]

See APIObject.load

marked_for_delete: bool | None
metadata: dict | None
name: str
owner: str
perspective(perspective_id) Perspective[source]

Retrieves a specific perspective from the collection.

This method uses the provided perspective ID to load a Perspective object from the collection. The perspective ID can be provided in various formats and is converted to a standard format using the id_from_any function.

Args:

perspective_id (str): The ID of the perspective to retrieve.

Returns:

Perspective: The loaded Perspective object.

perspectives(q: str | None = None, fields: List[str] | None = None, **url_params) PerspectiveIterator[Perspective][source]

Returns an iterator over the perspectives in the collection.

This method creates a PerspectiveIterator object that can be used to iterate over the perspectives in the collection. The perspectives can be filtered using a query string and specific fields can be included in the output. Additional URL parameters can be provided as keyword arguments.

Args:

q (str, optional): The query string to filter the perspectives. Defaults to None. fields (List[str], optional): The fields to include in the output. Defaults to None. **url_params: Additional URL parameters.

Returns:

PerspectiveIterator[Perspective]: An iterator over the perspectives in the collection.

segment(segment_id) Segment[source]

Retrieves a specific segment from the collection.

Args:

segment_id (str): The ID of the segment to retrieve.

Returns:

Segment: The loaded Segment object.

segments(q: str | None = None, fields: List[str] | None = None, **url_params) SegmentIterator[Segment][source]

Returns an iterator over the segments in the collection.

This method creates a SegmentIterator object that can be used to iterate over the segments in the collection. The segments can be filtered using a query string and specific fields can be included in the output. Additional URL parameters can be provided as keyword arguments.

Args:

q (str, optional): The query string to filter the segments. Defaults to None. fields (List[str], optional): The fields to include in the output. Defaults to None. **url_params: Additional URL parameters.

Returns:

SegmentIterator[Segment]: An iterator over the segments in the collection.

slug: str
updated_at: datetime.datetime
updated_by: str
upload_csv(file, content_columns: List[str], delimiter=',', rename: str | None = None, **user_data)[source]
Args:

file: source file to upload content_columns: Used for multi-document upload via CSV. A list which specifies the column(s)

that will be used for the document body.

delimiter: delimiter for csv, defaults to “,” rename: a new name if desired **user_data: any additional user data

Returns:

None

upload_file(file: Path | BufferedReader, rename: str | None = None, save_origin: bool | None = False, **user_data) Document[source]

Convenience method to upload a filelike into RedShred.

Args:

collection_link (str): Target collection to upload file to file (str, filelike): Either a filename, url of file, or open() filelike object rename (str, optional): File name override. Defaults to existing filename save_origin (bool, optional): Save the path to the file on disk. Defaults to False user_data (dict): arbitrary dictionary to store with document on server

Raises:

ValueError: Name argument missing for URL upload

Returns:

dict: Returned payload from API server

upload_text(text: str, name: str, **user_data) Document[source]

Convenience method to upload raw text into RedShred.

Given a collection name and a url, upload that text into RedShred.

Args:

text (str): Text to upload. name (str, optional): File name to save text as. user_data (dict): arbitrary dictionary to store with document on server

Returns:

dict: Returned payload from API server

upload_url(url: str, rename: str | None = None, save_origin: bool | None = True, **user_data) Document[source]

Convenience method to upload a URL into RedShred.

Given a collection name and a url, upload that file into RedShred.

Args:

collection_link (str): Target collection to upload file to url (str, filelike): Url of file to upload. rename (str, optional): File name override. Defaults to existing filename save_origin (bool, optional): Save the url to the file. Defaults to True user_data (dict): arbitrary dictionary to store with document on server

Raises:

ValueError: Name argument missing for URL upload

Returns:

dict: Returned payload from API server

user_data: dict | None
class redshred.CollectionConfiguration(*, tokenizer: List[Tokenizers | str | ConfiguredTokenizer | TesseractTokenizerConfig | AdvancedOCRTokenizerConfig] | Tokenizers | str | ConfiguredTokenizer | TesseractTokenizerConfig | AdvancedOCRTokenizerConfig = None, enrichments: List[DefinedAcronymsPerspective | ExternalAPIPerspective | GrouperPerspective | HuggingfacePerspective | IrisPerspective | PageImagesPerspective | PdftotextPerspective | PreprocessPerspective | RegexPerspective | SentencesPerspective | SpacyPerspective | TFIDFPerspective | TypographyPerspective | PerspectiveConfiguration] = None, notifications: List[NotificationConfiguration] = None, document_uniqueness: DocumentUniqueness = 'contents', allow_anonymous_downloads: bool = False)[source]

Bases: BaseModel

class Config[source]

Bases: _DefaultPydanticConfig

allow_anonymous_downloads: bool
dict(*, include: AbstractSetIntStr | MappingIntStrAny | None = None, exclude: AbstractSetIntStr | MappingIntStrAny | None = None, by_alias: bool = False, skip_defaults: bool | None = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False) DictStrAny

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

document_uniqueness: DocumentUniqueness
enrichments: List[DefinedAcronymsPerspective | ExternalAPIPerspective | GrouperPerspective | HuggingfacePerspective | IrisPerspective | PageImagesPerspective | PdftotextPerspective | PreprocessPerspective | RegexPerspective | SentencesPerspective | SpacyPerspective | TFIDFPerspective | TypographyPerspective | PerspectiveConfiguration]
classmethod from_dict(config: Dict[str, Any])[source]
json(*, include: AbstractSetIntStr | MappingIntStrAny | None = None, exclude: AbstractSetIntStr | MappingIntStrAny | None = None, by_alias: bool = False, skip_defaults: bool | None = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, encoder: Callable[[Any], Any] | None = None, models_as_dict: bool = True, **dumps_kwargs: Any) unicode

Generate a JSON representation of the model, include and exclude arguments as per dict().

encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().

notifications: List[NotificationConfiguration]
tokenizer: List[Tokenizers | str | ConfiguredTokenizer | TesseractTokenizerConfig | AdvancedOCRTokenizerConfig] | Tokenizers | str | ConfiguredTokenizer | TesseractTokenizerConfig | AdvancedOCRTokenizerConfig
validate_remote_schema(client: redshred.api.client.RedShredClient)[source]
yaml(*, include: Set[str] | None = None, exclude: Set[str] | None = None, by_alias: bool = False, skip_defaults: bool | None = None, exclude_unset: bool = True, exclude_defaults: bool = False, exclude_none: bool = False, encoder: Callable[[Any], Any] | None = None, models_as_dict: bool = True, **dumps_kwargs: Any)[source]

Generate a YAML representation of the model from the JSON representation, include and exclude arguments as per dict().

encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().

class redshred.Document(*, self_link: str = None, id: str = None, collection_link: str = None, collection_slug: str = None, config: CollectionConfiguration | None = None, content_hash: str = None, created_at: datetime = None, created_by: str = None, csv_metadata: dict | None = None, description: str | None = None, document_segment_link: str = None, errors: dict | str | None = None, file_link: str = None, file_size: int = None, index: int = None, metadata: dict | None = None, n_pages: int = None, name: str = None, original_name: str = None, pages_link: str = None, pdf_link: str = None, perspectives_link: str = None, read_state: str = None, read_state_updated_at: datetime = None, region: GeoJSON = None, segments_link: str = None, slug: str = None, source: str = None, summary: str = None, text: str = None, updated_at: datetime = None, updated_by: str = None, user_data: dict | None = None, warnings: dict | None = None, uniqueness_id: str | None = None, **data)[source]

Bases: ApiObject

A document class.

This class is used to create documents. It provides methods to create, read, update, and delete the document, get pages, perspectives, and segments from the document, and download the document.

Attributes:

id: The ID of the document. collection_link: The link to the collection the document belongs to. collection_slug: The slug of the collection the document belongs to. config: The configuration of the document. content_hash: The content hash of the document. created_at: The date the document was created. created_by: The user who created the document. csv_metadata: The CSV metadata of the document. description: The description of the document. document_segment_link: The link to the document segment. errors: The errors of the document. file_link: The link to the file of the document. file_size: The size of the file of the document. index: The index of the document. metadata: The metadata of the document. n_pages: The number of pages in the document. name: The name of the document. original_name: The original name of the document. pages_link: The link to the pages in the document. pdf_link: The link to the PDF of the document. perspectives_link: The link to the perspectives in the document. read_state: The read state of the document. read_state_updated_at: The date the read state of the document was last updated. region: The region of the document. segments_link: The link to the segments in the document. self_link: The self link of the document. slug: The slug of the document. source: The source of the document. summary: The summary of the document. text: The text of the document. updated_at: The date the document was last updated. updated_by: The user who last updated the document. user_data: The user data of the document. warnings: The warnings of the document. uniqueness_id: The uniqueness ID of the document.

collection() Collection[source]
collection_slug: str
config: CollectionConfiguration | None
content_hash: str
create(*args, **kwargs)[source]
created_at: datetime.datetime
created_by: str
csv_metadata: dict | None
description: str | None
download(path: str | 'pathlib.Path') int[source]

Download the original_file uploaded to RedShred to the specified path, returning the total bytes written

Args:

path: a path to somewhere on the local filesystem

Returns: number of bytes written

download_bytes() bytes[source]

Download the original_file uploaded to RedShred to the specified path, returning the total bytes written

Returns: document as bytes

errors: dict | str | None
file_size: int
id: str
index: int
metadata: dict | None
n_pages: int
name: str
original_name: str
page(index) Page[source]

Retrieves a specific page from the document.

Args:

index (int): The index of the page to retrieve.

Returns:

Page: The loaded Page object.

pages(q: str | None = None, fields: List[str] | None = None, **url_params) PageIterator[Page][source]

Returns an iterator over the pages in the document.

Args:

q (str, optional): The query string to filter the pages. Defaults to None. fields (List[str], optional): The fields to include in the output. Defaults to None. **url_params: Additional URL parameters.

Returns:

PageIterator[Page]: An iterator over the pages in the document.

perspective(perspective_id) Perspective[source]
perspectives(q: str | None = None, fields: List[str] | None = None, **url_params) PerspectiveIterator[Perspective][source]
read_state: str
read_state_updated_at: datetime.datetime
region: GeoJSON
reread_document(force=False)[source]

Reread the document and generate any new or changed perspectives and retry any failed perspectives.

Args:

force: force a reread even if the document is not in a state that allows it to be read

segment(segment_id) Perspective[source]
segments(q: str | None = None, fields: List[str] | None = None, **url_params) SegmentIterator[Segment][source]
slug: str
source: str
summary: str
text: str
uniqueness_id: str | None
updated_at: datetime.datetime
updated_by: str
user_data: dict | None
wait_until_read(wait_time_seconds: int = 5)[source]

Synchronously wait until the document has been read

Args:

wait_time_seconds: time to wait between checks

warnings: dict | None
class redshred.Page(*, self_link: str = None, collection_link: str = None, collection_slug: str = None, content_hash: str = None, created_at: datetime = None, created_by: str = None, document_index: int = None, document_name: str = None, dpi: int = None, height: float = None, id: str = None, index: int = None, metadata: dict | None = None, name: str = None, page_segment_link: str = None, perspectives_link: str = None, region: GeoJSON = None, segments_link: str = None, summary: str = None, text: str = None, tokens_file_link: str = None, units: str = None, updated_at: datetime = None, updated_by: str = None, user_data: dict | None = None, width: float = None, **data)[source]

Bases: ApiObject

A page class.

This class is used to create pages. It provides methods to set attributes.

Attributes:

collection_link: The link to the collection the page belongs to. collection_slug: The slug of the collection the page belongs to. content_hash: The content hash of the page. created_at: The date the page was created. created_by: The user who created the page. document_index: The index of the document the page belongs to. document_name: The name of the document the page belongs to. dpi: The DPI of the page. height: The height of the page. id: The ID of the page. index: The index of the page. metadata: The metadata of the page. name: The name of the page. page_segment_link: The link to the page segment. perspectives_link: The link to the perspectives in the page. region: The region of the page. segments_link: The link to the segments in the page. self_link: The self link of the page. summary: The summary of the page. text: The text of the page. tokens_file_link: The link to the tokens file of the page. units: The units of the page. updated_at: The date the page was last updated. updated_by: The user who last updated the page. user_data: The user data of the page. width: The width of the page.

collection()[source]
collection_slug: str
content_hash: str
created_at: datetime.datetime
created_by: str
document()[source]
document_index: int
document_name: str
dpi: int
height: float
id: str
index: int
metadata: dict | None
name: str
next()[source]

Returns the next page in the document.

Returns:

Page: The next page in the document.

Raises:

ValueError: If there is no next page.

perspective(perspective_id) Perspective[source]
perspectives(q: str | None = None, fields: List[str] | None = None, **url_params) PerspectiveIterator[Perspective][source]
previous()[source]

Returns the previous page in the document.

Returns:

Page: The previous page in the document.

Raises:

ValueError: If there is no previous page.

region: GeoJSON
segment(segment_id) Segment[source]
segments(q: str | None = None, fields: List[str] | None = None, **url_params) SegmentIterator[Segment][source]
summary: str
text: str
tokens()[source]

Returns a list of tokens in the page.

Returns:

List[Token]: A list of tokens in the page.

units: str
updated_at: datetime.datetime
updated_by: str
user_data: dict | None
width: float
class redshred.Perspective(*, self_link: str = None, name: str = None, enrichment_name: str = None, collection_link: str = None, collection_slug: str = None, created_at: datetime = None, created_by: str = None, document_link: str = None, description: str | None = None, document_name: str = None, enrichment_config: dict = None, errors: dict | None = None, id: str = None, metadata: dict | None = None, segment_types: list = None, segments_link: str = None, slug: str = None, updated_at: datetime = None, updated_by: str = None, user_data: dict | None = None, warnings: dict | None = None, cache_id: str | None = None, **data)[source]

Bases: ApiObject

A Perspective class.

This class is used to create Perspective objects. It provides methods to create, read, update, and delete the Perspective object, get segments from the Perspective object, and set the API.

Attributes:

name: The name of the Perspective. enrichment_name: The name of the enrichment. collection_link: The link to the collection the Perspective belongs to. collection_slug: The slug of the collection the Perspective belongs to. created_at: The date the Perspective was created. created_by: The user who created the Perspective. document_link: The link to the document the Perspective belongs to. description: The description of the Perspective. document_name: The name of the document the Perspective belongs to. enrichment_config: The configuration of the enrichment. errors: The errors of the Perspective. id: The ID of the Perspective. metadata: The metadata of the Perspective. segment_types: The types of segments in the Perspective. segments_link: The link to the segments in the Perspective. self_link: The self link of the Perspective. slug: The slug of the Perspective. updated_at: The date the Perspective was last updated. updated_by: The user who last updated the Perspective. user_data: The user data of the Perspective. warnings: The warnings of the Perspective. cache_id: The cache ID of the Perspective.

bulk_create_segments(segments: List[dict | Segment], batch_size=128)[source]

Creates multiple segments in the perspective all at once, faster than individually creating them.

This method takes a list of segments and creates them in the perspective. The segments can be provided as dictionaries or Segment objects. The method uses the provided batch size to determine how many segments to create at a time. The default batch size is 128. If a segment is provided as a Segment object and it already has an ID, a ValueError is raised. The method returns a list of the created segments.

Args:

segments (List[Union[dict, Segment]]): The segments to create. Can be provided as dictionaries or Segment objects. batch_size (int, optional): The number of segments to create at a time. Defaults to 128.

Returns:

List[Segment]: The created segments.

Raises:

ValueError: If a segment is provided as a Segment object and it already has an ID.

cache_id: str | None
collection()[source]
collection_slug: str
create(collection: str | Collection | None = None, document: Document | None = None, client: 'redshred.api.http.RedShredAPI' | 'redshred.api.client.RedShredClient' | None = None)[source]

Create the local object on the remote server

created_at: datetime.datetime
created_by: str
description: str | None
document()[source]
document_name: str
enrichment_config: dict
enrichment_name: str
errors: dict | None
id: str
metadata: dict | None
name: str
segment(segment_id) Segment[source]
segment_types: list
segments(q: str | None = None, fields: List[str] | None = None, **url_params) SegmentIterator[Segment][source]
slug: str
updated_at: datetime.datetime
updated_by: str
user_data: dict | None
warnings: dict | None
class redshred.PerspectiveConfiguration(*, name: str, perspective: str, segments: SegmentQuery | Dict = None, description: str = '', config: Dict[str, Any] = None, debug: bool = False)[source]

Bases: BaseModel

class Config[source]

Bases: _DefaultPydanticConfig

config: Dict[str, Any]
debug: bool
description: str
name: str
perspective: str
segments: SegmentQuery | Dict
exception redshred.RedShredAPIError(*args, reason=None, **kwargs)[source]

Bases: Exception

reason: Any
class redshred.RedShredClient(token: str | None = None, host: str | None = None, host_verify: bool | None = None, config_path: str | None = None, context_override: str | None = None, config: Configuration | None = None)[source]

Bases: object

Client for interacting with the RedShred Platform.

This client provides easy access to the RedShred API, enabling users to interact with various RedShred Platform services such as retrieving user data, fetching contents of files, and accessing collection statistics.

The client can be configured via environmental variables, a .rsconfig file, or manual initialization arguments.

Attributes:

config (Configuration): The configuration for the RedShred API. api (RedShredAPI): Interface to the RedShred API services. _user (RedShredUser, optional): The authenticated user’s details. Default is None.

Args:

token (str, optional): The authentication token. host (str, optional): The RedShred API host address. host_verify (bool, optional): Flag to enable or disable SSL verification. config_path (str, optional): Path to the .rsconfig file. context_override (str, optional): Overrides the context specified in the configuration. config (Configuration, optional): A pre-initialized Configuration object.

collection(slug: str) Collection[source]

Retrieve a specific collection by its slug.

Args:

slug (str): The slug, short reference, or link field of the collection.

Returns:

Collection: The retrieved collection object.

Raises:

HTTPError: If the requested collection cannot be found or another error occurs.

collections(**client_params) CollectionIterator[source]

Fetch collections accessible to the user.

Args:

**client_params: Additional parameters for the client request.

Returns:

CollectionIterator: An iterator to access user’s collections.

file(storage_path: str, inline: bool = False, unconfined: bool = True, width: int = 800, **kwargs) bytes[source]

Fetch a file stored in RedShred by its relative path.

Many enrichments in RedShred can generate files (e.g. extracted images) and will serve these back with a relative path that can be passed to this method to retrieve.

If inline is True, this will display images directly in notebook context. In these cases, unconfined and width will be passed directly to Image().

Args:

storage_path (str): path to file as given in API response data inline (bool, optional): Whether to attempt to display the file inline in a notebook, images only. Defaults

to False.

unconfined (bool, optional): passed to IPython.core.display.Image, requires inline=True. Defaults to True. width (int, optional): passed to IPython.core.display.Image, requires inline=True. Defaults to 800. **kwargs: passed to IPython.core.display.Image, requires inline=True.

Returns:

bytes: file contents as bytes

get_text(api_object: redshred.models.api.ApiObject)[source]

Extract the text from a given api object.

Args:

api_object: Any API object that has a get_text method.

Returns:

str: The text extracted from the API object.

stats(collection_name: str | Collection) dict[source]

Review the current states of documents in a collection.

Read state is one of [‘unread’, ‘queued’, ‘reading’, ‘read’, ‘crashed’]:

  • unread - newly uploaded documents that are not yet fully enriched and indexed

  • queued - documents that are awaiting reading

  • reading - documents that are currently being enriched by the RedShred reader

  • read - documents that have been read and are “at rest” in RedShred

  • crashed - documents that could not be successfully processed by RedShred.

Documents in crashed states can be reported to RedShred through the chat window in the documentation. It is our goal that all documents should be read successfully although the amount of enrichment may vary (e.g. encrypted PDFs shouldn’t crash, but will likely be sparsely enriched.)

Args:

collection_name (str): Name of target collection

Returns:

dict: Read statistics for collection

property user

Retrieve the current authenticated user’s details.

Returns:

RedShredUser: An object representing the authenticated user.

Raises:

HTTPError: If authentication fails or an error occurs with the HTTP call. TypeError: If the data returned is not in the expected format. ConnectionError: If a connection error occurs.

exception redshred.RedShredFileExistsError(*args, **kwargs)[source]

Bases: RedShredHTTPError

exception redshred.RedShredHTTPError(*args, **kwargs)[source]

Bases: HTTPError

classmethod from_http_error(http_error)[source]
class redshred.Segment(*, self_link: str = None, segment_type: str = None, regions: GeoJSON = None, bounding_box: BoundingBox | None = None, collection_link: str = None, collection_slug: str = None, created_at: datetime = None, created_by: str = None, document_link: str = None, document_name: str = None, enrichment_data: dict | None = None, enrichment_name: str = None, errors: dict | None = None, id: str = None, labels: list = None, metadata: dict | None = None, max_x: float | None = None, max_y: float | None = None, min_x: float | None = None, min_y: float | None = None, perspective_link: str = None, summary: str = None, text: str = None, updated_at: datetime = None, updated_by: str = None, user_data: dict | None = None, warnings: dict | None = None, cache_id: str | None = None, **data)[source]

Bases: ApiObject

A Segment class.

This class is used to create Segment objects. It provides methods to create, read, update, and delete the Segment object, get segments from the Segment object, and set the API.

Attributes:

segment_type: The type of the Segment. regions: The regions of the Segment. bounding_box: The bounding box of the Segment. collection_link: The link to the collection the Segment belongs to. collection_slug: The slug of the collection the Segment belongs to. created_at: The date the Segment was created. created_by: The user who created the Segment. document_link: The link to the document the Segment belongs to. enrichment_data: The enrichment data of the Segment. enrichment_name: The name of the enrichment. errors: The errors of the Segment. id: The ID of the Segment. labels: The labels of the Segment. metadata: The metadata of the Segment. max_x: The maximum x-coordinate of the Segment. max_y: The maximum y-coordinate of the Segment. min_x: The minimum x-coordinate of the Segment. min_y: The minimum y-coordinate of the Segment. perspective_link: The link to the perspective of the Segment. self_link: The self link of the Segment. summary: The summary of the Segment. text: The text of the Segment. updated_at: The date the Segment was last updated. updated_by: The user who last updated the Segment. user_data: The user data of the Segment. warnings: The warnings of the Segment. cache_id: The cache ID of the Segment.

between(segment: Segment, strict: bool = False) BoundingBox[source]

# TEMPORARILY NOT IMPLEMENTED Provides a helper function to generate the bounding box between two segments.

Args:

segment (Segment): A segment to define the space strict (bool, optional): If True, returned bounding box will be area exatly between the two segments. If False, the bounding box returned will be the entire page width between two segments. Defaults to False.

Returns:

list: Bounding box of the area between two segments.

bounding_box: BoundingBox | None
cache_id: str | None
collection()[source]
collection_slug: str
create(perspective: Perspective | None = None)[source]

Create the local object on the remote server

created_at: datetime.datetime
created_by: str
document()[source]
document_name: str
enrichment_data: dict | None
enrichment_name: str
errors: dict | None
get_segment_image(path_to_save_folder: str | None = None, return_bytes=False, inline=False, **url_params) bytes | str[source]

Retrieves the image of the segment.

This method uses the SegmentCropper to get the cropped image of the segment. The image can be returned as bytes, opened inline using PIL, or saved to a specified folder.

Args:
path_to_save_folder (str, optional): The path to the folder where the image will be saved. Defaults to the

current working directory.

return_bytes (bool, optional): If True, the image will be returned as bytes. Defaults to False. inline (bool, optional): If True, the image will be opened inline using PIL. Defaults to False. **url_params: Additional URL parameters.

Returns:

Union[bytes, str]: The image of the segment, either as bytes or a path to the saved image.

get_segments_from_perspective(perspective_name: str, **params)[source]

Get all segments that are in the same perspective as this segment

get_text(**url_params)[source]

Retrieves the text of the segment.

This method uses the TokenLookup to get the text of the segment.

Args:

**url_params: Additional URL parameters.

Returns:

str: The text of the segment.

id: str
labels: list
max_x: float | None
max_y: float | None
metadata: dict | None
min_x: float | None
min_y: float | None
perspective()[source]
q(query: str, search_type: Literal['documents', 'pages', 'perspectives', 'segments'] = 'segments', **url_params)[source]

Executes a query on the API object.

This method uses the provided query and search type to execute a search on the API object. The search type can be one of “documents”, “pages”, “perspectives”, or “segments”. Additional URL parameters can be provided as keyword arguments.

Args:

query (str): The query to execute. search_type (str): The type of search to perform. Defaults to “segments”. **url_params: Additional URL parameters.

Returns:

An iterator over the results of the query.

regions: GeoJSON
segment_type: str
summary: str
text: str
updated_at: datetime.datetime
updated_by: str
user_data: dict | None
warnings: dict | None