redshred.models package

Submodules

redshred.models.api module

class redshred.models.api.APIObjectIterator(data: Iterable[ApiObject] = None, client: 'redshred.api.client.RedShredClient' | 'redshred.api.http.RedShredAPI' = None, path: str = None, q: str = None, fields: List[str] | Set[str] | str | None = None, **client_params)[source]

Bases: Iterator

An iterator class for API objects.

This class is used to iterate over API objects. It provides methods to set the API, add other iterators, get the length of the iterator, and get the next item in the iterator.

Attributes:: see APIObjectIterator class for attributes _api: The API to use for the iteration. _client: The client to use for the iteration. _paginator: The paginator to use for the iteration. _page_1: The first page of the iteration. _fields: The fields to include in the iteration. __alternate_iterator: An alternate iterator to use. __iterator: The iterator to use. path: The path to use for the iteration. count: The count of items in the iterator. query: The query to use for the iteration.

count: int | None

first(): Return the next item from the iterator. When exhausted, raise StopIteration

iter_dict(**dict_args)[source]

Returns an iterator that yields dictionary representations of items in the iterator.

This method uses the dict method of the items in the iterator to convert them to dictionaries. If fields were specified when initializing the iterator, only these fields are included in the dictionaries.

Args:: **dict_args: Additional arguments to pass to the dict method of the items.
Returns:: An iterator that yields dictionary representations of items in the iterator.

path: str | None

query: str

to_list()[source]: Returns a list of items, fully consuming the iterator.

class redshred.models.api.ApiObject(*, self_link: str, **data)[source]

Bases: SerializableModel

An API object class.

This class is used to create API objects. It provides methods to create, read, update, and delete the API object, get items from the API object, and set the API.

Attributes:: _api: The API to use for the object. _client: The client to use for the object. _last_refreshed: The last time the object was refreshed. self_link: The self link of the object.

create(client) → TApiObject[source]

dashboard(query=None) → str | None[source]

Returns a URL to the RedShred dashboard representation of the object.

This method generates a URL that points to the dashboard representation of the object. The URL is generated based on the type of the object (Collection, Document, Perspective, Segment, or Page). If the object type is not supported, a NotImplementedError is raised.

Args:: query (str, optional): An optional query string to append to the URL. Defaults to None.
Returns:: str: The URL to the dashboard representation of the object.
Raises:: NotImplentedError: If the object type is not supported.

delete() → TApiObject[source]: Delete the remote object

classmethod load(client: 'redshred.api.http.RedShredAPI' | 'redshred.api.client.RedShredClient', url: str = None, collection: str | Collection = None, object_id: str = None, **key_word_filters) → TApiObject[source]

Loads an API object.

This method is used to load an API object from a given URL, collection, or object ID. If a URL is provided, the API object is loaded directly from the URL. If a collection is provided, the API object is loaded from the collection. If an object ID is provided, the API object is loaded from the object ID. Additional keyword filters can be provided to filter the API objects.

Args:: client (Union[RedShredAPI, RedShredClient]): The client to use for the API request. url (str, optional): The URL to load the API object from. Defaults to None. collection (Union[str, Collection], optional): The collection to load the API object from. Defaults to None. object_id (str, optional): The object ID to load the API object from. Defaults to None. **key_word_filters: Additional keyword filters to apply when loading the API object.
Returns:: TApiObject: The loaded API object.
Raises:: ValueError: If neither a collection slug nor a Collection object is provided when the URL is not provided. RedShredAPIError: If more than one response is returned when a unique response is expected.

q(query: str, search_type: Literal['documents', 'pages', 'perspectives', 'segments'] = 'segments', **url_params)[source]

Executes a query on the API object.

This method uses the provided query and search type to execute a search on the API object. The search type can be one of “documents”, “pages”, “perspectives”, or “segments”. Additional URL parameters can be provided as keyword arguments.

Args:: query (str): The query to execute. search_type (str): The type of search to perform. Defaults to “segments”. **url_params: Additional URL parameters.
Returns:: An iterator over the results of the query.

read() → TApiObject[source]: Update the object with the remote state

self_link: str

update(**kwargs) → TApiObject[source]: Update the remote object with the current local state.

class redshred.models.api.Collection(*, self_link: str = None, id: str = None, config: CollectionConfiguration | None = None, created_at: datetime = None, created_by: str = None, description: str | None = None, documents_link: str = None, marked_for_delete: bool | None = False, metadata: dict | None = None, name: str = None, owner: str = None, perspectives_link: str = None, segments_link: str = None, slug: str = None, updated_at: datetime = None, updated_by: str = None, user_data: dict | None = None, client: Any = None, **data)[source]

Bases: ApiObject

A collection class.

This class is used to create collections. It provides methods to create, read, update, and delete the collection, get documents, perspectives, and segments from the collection, and upload CSVs, files, URLs, and text to the collection.

Attributes:: id: The ID of the collection. config: The configuration of the collection. created_at: The date the collection was created. created_by: The user who created the collection. description: The description of the collection. documents_link: The link to the documents in the collection. marked_for_delete: Whether the collection is marked for deletion. metadata: The metadata of the collection. name: The name of the collection. owner: The owner of the collection. perspectives_link: The link to the perspectives in the collection. segments_link: The link to the segments in the collection. self_link: The self link of the collection. slug: The slug of the collection. updated_at: The date the collection was last updated. updated_by: The user who last updated the collection. user_data: The user data of the collection.

client: Any

config: CollectionConfiguration | None

create(client: 'redshred.api.http.RedShredAPI' | 'redshred.api.client.RedShredClient') → TApiObject[source]

Creates the local object on the remote server.

This method is used to create the local object on the remote server. It uses the provided client to make the API request. The method identifies the creatable fields, makes a POST request to the server, and updates the local object with the response.

Args:: client (Union[RedShredAPI, RedShredClient]): The client to use for the API request.
Returns:: TApiObject: The local object updated with the response from the server.
Raises:: RedShredAPIError: If the server response status code is not 201 (Created).

created_at: datetime.datetime

created_by: str

delete()[source]: Permanently delete a collection remotely

description: str | None

document(document_id) → Document[source]

Retrieves a specific document from the collection.

This method uses the provided document ID to load a Document object from the collection. The document ID can be provided in various formats and is converted to a standard format using the id_from_any function.

Args:: document_id (str): The ID of the document to retrieve.
Returns:: Document: The loaded Document object.

documents(q: str | None = None, fields: List[str] | None = None, **url_params) → DocumentIterator[Document][source]

Returns an iterator over the documents in the collection.

This method creates a DocumentIterator object that can be used to iterate over the documents in the collection. The documents can be filtered using a query string and specific fields can be included in the output. Additional URL parameters can be provided as keyword arguments.

Args:: q (str, optional): The query string to filter the documents. Defaults to None. fields (List[str], optional): The fields to include in the output. Defaults to None. **url_params: Additional URL parameters.
Returns:: DocumentIterator[Document]: An iterator over the documents in the collection.

documents_link: str

id: str

classmethod load(client: 'redshred.api.http.RedShredAPI' | 'redshred.api.client.RedShredClient', slug: str = None, url: str = None) → Collection[source]: See APIObject.load

marked_for_delete: bool | None

metadata: dict | None

name: str

owner: str

perspective(perspective_id) → Perspective[source]

Retrieves a specific perspective from the collection.

This method uses the provided perspective ID to load a Perspective object from the collection. The perspective ID can be provided in various formats and is converted to a standard format using the id_from_any function.

Args:: perspective_id (str): The ID of the perspective to retrieve.
Returns:: Perspective: The loaded Perspective object.

perspectives(q: str | None = None, fields: List[str] | None = None, **url_params) → PerspectiveIterator[Perspective][source]

Returns an iterator over the perspectives in the collection.

This method creates a PerspectiveIterator object that can be used to iterate over the perspectives in the collection. The perspectives can be filtered using a query string and specific fields can be included in the output. Additional URL parameters can be provided as keyword arguments.

Args:: q (str, optional): The query string to filter the perspectives. Defaults to None. fields (List[str], optional): The fields to include in the output. Defaults to None. **url_params: Additional URL parameters.
Returns:: PerspectiveIterator[Perspective]: An iterator over the perspectives in the collection.

perspectives_link: str

segment(segment_id) → Segment[source]

Retrieves a specific segment from the collection.

Args:: segment_id (str): The ID of the segment to retrieve.
Returns:: Segment: The loaded Segment object.

segments(q: str | None = None, fields: List[str] | None = None, **url_params) → SegmentIterator[Segment][source]

Returns an iterator over the segments in the collection.

This method creates a SegmentIterator object that can be used to iterate over the segments in the collection. The segments can be filtered using a query string and specific fields can be included in the output. Additional URL parameters can be provided as keyword arguments.

Args:: q (str, optional): The query string to filter the segments. Defaults to None. fields (List[str], optional): The fields to include in the output. Defaults to None. **url_params: Additional URL parameters.
Returns:: SegmentIterator[Segment]: An iterator over the segments in the collection.

segments_link: str

self_link: str

slug: str

updated_at: datetime.datetime

updated_by: str

upload_csv(file, content_columns: List[str], delimiter=',', rename: str | None = None, **user_data)[source]

Args:

file: source file to upload content_columns: Used for multi-document upload via CSV. A list which specifies the column(s)

that will be used for the document body.

delimiter: delimiter for csv, defaults to “,” rename: a new name if desired **user_data: any additional user data

Returns:

None

upload_file(file: Path | BufferedReader, rename: str | None = None, save_origin: bool | None = False, **user_data) → Document[source]

Convenience method to upload a filelike into RedShred.

Args:: collection_link (str): Target collection to upload file to file (str, filelike): Either a filename, url of file, or open() filelike object rename (str, optional): File name override. Defaults to existing filename save_origin (bool, optional): Save the path to the file on disk. Defaults to False user_data (dict): arbitrary dictionary to store with document on server
Raises:: ValueError: Name argument missing for URL upload
Returns:: dict: Returned payload from API server

upload_text(text: str, name: str, **user_data) → Document[source]

Convenience method to upload raw text into RedShred.

Given a collection name and a url, upload that text into RedShred.

Args:: text (str): Text to upload. name (str, optional): File name to save text as. user_data (dict): arbitrary dictionary to store with document on server
Returns:: dict: Returned payload from API server

upload_url(url: str, rename: str | None = None, save_origin: bool | None = True, **user_data) → Document[source]

Convenience method to upload a URL into RedShred.

Given a collection name and a url, upload that file into RedShred.

Args:: collection_link (str): Target collection to upload file to url (str, filelike): Url of file to upload. rename (str, optional): File name override. Defaults to existing filename save_origin (bool, optional): Save the url to the file. Defaults to True user_data (dict): arbitrary dictionary to store with document on server
Raises:: ValueError: Name argument missing for URL upload
Returns:: dict: Returned payload from API server

user_data: dict | None

class redshred.models.api.CollectionIterator(data: Iterable[ApiObject] = None, client: 'redshred.api.client.RedShredClient' | 'redshred.api.http.RedShredAPI' = None, path: str = None, q: str = None, fields: List[str] | Set[str] | str | None = None, **client_params)[source]

Bases: APIObjectIterator

A CollectionIterator class.

This class is used to create CollectionIterator objects. It inherits from the APIObjectIterator class.

Attributes:: see APIObjectIterator class for attributes

class redshred.models.api.Document(*, self_link: str = None, id: str = None, collection_link: str = None, collection_slug: str = None, config: CollectionConfiguration | None = None, content_hash: str = None, created_at: datetime = None, created_by: str = None, csv_metadata: dict | None = None, description: str | None = None, document_segment_link: str = None, errors: dict | str | None = None, file_link: str = None, file_size: int = None, index: int = None, metadata: dict | None = None, n_pages: int = None, name: str = None, original_name: str = None, pages_link: str = None, pdf_link: str = None, perspectives_link: str = None, read_state: str = None, read_state_updated_at: datetime = None, region: GeoJSON = None, segments_link: str = None, slug: str = None, source: str = None, summary: str = None, text: str = None, updated_at: datetime = None, updated_by: str = None, user_data: dict | None = None, warnings: dict | None = None, uniqueness_id: str | None = None, **data)[source]

Bases: ApiObject

A document class.

This class is used to create documents. It provides methods to create, read, update, and delete the document, get pages, perspectives, and segments from the document, and download the document.

Attributes:: id: The ID of the document. collection_link: The link to the collection the document belongs to. collection_slug: The slug of the collection the document belongs to. config: The configuration of the document. content_hash: The content hash of the document. created_at: The date the document was created. created_by: The user who created the document. csv_metadata: The CSV metadata of the document. description: The description of the document. document_segment_link: The link to the document segment. errors: The errors of the document. file_link: The link to the file of the document. file_size: The size of the file of the document. index: The index of the document. metadata: The metadata of the document. n_pages: The number of pages in the document. name: The name of the document. original_name: The original name of the document. pages_link: The link to the pages in the document. pdf_link: The link to the PDF of the document. perspectives_link: The link to the perspectives in the document. read_state: The read state of the document. read_state_updated_at: The date the read state of the document was last updated. region: The region of the document. segments_link: The link to the segments in the document. self_link: The self link of the document. slug: The slug of the document. source: The source of the document. summary: The summary of the document. text: The text of the document. updated_at: The date the document was last updated. updated_by: The user who last updated the document. user_data: The user data of the document. warnings: The warnings of the document. uniqueness_id: The uniqueness ID of the document.

collection() → Collection[source]

collection_link: str

collection_slug: str

config: CollectionConfiguration | None

content_hash: str

create(*args, **kwargs)[source]

created_at: datetime.datetime

created_by: str

csv_metadata: dict | None

description: str | None

document_segment_link: str

download(path: str | 'pathlib.Path') → int[source]

Download the original_file uploaded to RedShred to the specified path, returning the total bytes written

Args:: path: a path to somewhere on the local filesystem

Returns: number of bytes written

download_bytes() → bytes[source]

Download the original_file uploaded to RedShred to the specified path, returning the total bytes written

Returns: document as bytes

errors: dict | str | None

file_link: str

file_size: int

id: str

index: int

metadata: dict | None

n_pages: int

name: str

original_name: str

page(index) → Page[source]

Retrieves a specific page from the document.

Args:: index (int): The index of the page to retrieve.
Returns:: Page: The loaded Page object.

pages(q: str | None = None, fields: List[str] | None = None, **url_params) → PageIterator[Page][source]

Returns an iterator over the pages in the document.

Args:: q (str, optional): The query string to filter the pages. Defaults to None. fields (List[str], optional): The fields to include in the output. Defaults to None. **url_params: Additional URL parameters.
Returns:: PageIterator[Page]: An iterator over the pages in the document.

pages_link: str

pdf_link: str

perspective(perspective_id) → Perspective[source]

perspectives(q: str | None = None, fields: List[str] | None = None, **url_params) → PerspectiveIterator[Perspective][source]

perspectives_link: str

read_state: str

read_state_updated_at: datetime.datetime

region: GeoJSON

reread_document(force=False)[source]

Reread the document and generate any new or changed perspectives and retry any failed perspectives.

Args:: force: force a reread even if the document is not in a state that allows it to be read

segment(segment_id) → Perspective[source]

segments(q: str | None = None, fields: List[str] | None = None, **url_params) → SegmentIterator[Segment][source]

segments_link: str

self_link: str

slug: str

source: str

summary: str

text: str

uniqueness_id: str | None

updated_at: datetime.datetime

updated_by: str

user_data: dict | None

wait_until_read(wait_time_seconds: int = 5)[source]

Synchronously wait until the document has been read

Args:: wait_time_seconds: time to wait between checks

warnings: dict | None

class redshred.models.api.DocumentIterator(data: Iterable[ApiObject] = None, client: 'redshred.api.client.RedShredClient' | 'redshred.api.http.RedShredAPI' = None, path: str = None, q: str = None, fields: List[str] | Set[str] | str | None = None, **client_params)[source]

Bases: APIObjectIterator

A DocumentIterator class.

This class is used to create DocumentIterator objects. It inherits from the APIObjectIterator class.

Attributes:: see APIObjectIterator class for attributes

class redshred.models.api.Page(*, self_link: str = None, collection_link: str = None, collection_slug: str = None, content_hash: str = None, created_at: datetime = None, created_by: str = None, document_index: int = None, document_name: str = None, dpi: int = None, height: float = None, id: str = None, index: int = None, metadata: dict | None = None, name: str = None, page_segment_link: str = None, perspectives_link: str = None, region: GeoJSON = None, segments_link: str = None, summary: str = None, text: str = None, tokens_file_link: str = None, units: str = None, updated_at: datetime = None, updated_by: str = None, user_data: dict | None = None, width: float = None, **data)[source]

Bases: ApiObject

A page class.

This class is used to create pages. It provides methods to set attributes.

Attributes:: collection_link: The link to the collection the page belongs to. collection_slug: The slug of the collection the page belongs to. content_hash: The content hash of the page. created_at: The date the page was created. created_by: The user who created the page. document_index: The index of the document the page belongs to. document_name: The name of the document the page belongs to. dpi: The DPI of the page. height: The height of the page. id: The ID of the page. index: The index of the page. metadata: The metadata of the page. name: The name of the page. page_segment_link: The link to the page segment. perspectives_link: The link to the perspectives in the page. region: The region of the page. segments_link: The link to the segments in the page. self_link: The self link of the page. summary: The summary of the page. text: The text of the page. tokens_file_link: The link to the tokens file of the page. units: The units of the page. updated_at: The date the page was last updated. updated_by: The user who last updated the page. user_data: The user data of the page. width: The width of the page.

collection()[source]

collection_link: str

collection_slug: str

content_hash: str

created_at: datetime.datetime

created_by: str

document()[source]

document_index: int

document_name: str

dpi: int

height: float

id: str

index: int

metadata: dict | None

name: str

next()[source]

Returns the next page in the document.

Returns:: Page: The next page in the document.
Raises:: ValueError: If there is no next page.

page_segment_link: str

perspective(perspective_id) → Perspective[source]

perspectives(q: str | None = None, fields: List[str] | None = None, **url_params) → PerspectiveIterator[Perspective][source]

perspectives_link: str

previous()[source]

Returns the previous page in the document.

Returns:: Page: The previous page in the document.
Raises:: ValueError: If there is no previous page.

region: GeoJSON

segment(segment_id) → Segment[source]

segments(q: str | None = None, fields: List[str] | None = None, **url_params) → SegmentIterator[Segment][source]

segments_link: str

self_link: str

summary: str

text: str

tokens()[source]

Returns a list of tokens in the page.

Returns:: List[Token]: A list of tokens in the page.

tokens_file_link: str

units: str

updated_at: datetime.datetime

updated_by: str

user_data: dict | None

width: float

class redshred.models.api.PageIterator(data: Iterable[ApiObject] = None, client: 'redshred.api.client.RedShredClient' | 'redshred.api.http.RedShredAPI' = None, path: str = None, q: str = None, fields: List[str] | Set[str] | str | None = None, **client_params)[source]

Bases: APIObjectIterator

A PageIterator class.

This class is used to create PageIterator objects. It inherits from the APIObjectIterator class.

Attributes:: see APIObjectIterator class for attributes

class redshred.models.api.Perspective(*, self_link: str = None, name: str = None, enrichment_name: str = None, collection_link: str = None, collection_slug: str = None, created_at: datetime = None, created_by: str = None, document_link: str = None, description: str | None = None, document_name: str = None, enrichment_config: dict = None, errors: dict | None = None, id: str = None, metadata: dict | None = None, segment_types: list = None, segments_link: str = None, slug: str = None, updated_at: datetime = None, updated_by: str = None, user_data: dict | None = None, warnings: dict | None = None, cache_id: str | None = None, **data)[source]

Bases: ApiObject

A Perspective class.

This class is used to create Perspective objects. It provides methods to create, read, update, and delete the Perspective object, get segments from the Perspective object, and set the API.

Attributes:: name: The name of the Perspective. enrichment_name: The name of the enrichment. collection_link: The link to the collection the Perspective belongs to. collection_slug: The slug of the collection the Perspective belongs to. created_at: The date the Perspective was created. created_by: The user who created the Perspective. document_link: The link to the document the Perspective belongs to. description: The description of the Perspective. document_name: The name of the document the Perspective belongs to. enrichment_config: The configuration of the enrichment. errors: The errors of the Perspective. id: The ID of the Perspective. metadata: The metadata of the Perspective. segment_types: The types of segments in the Perspective. segments_link: The link to the segments in the Perspective. self_link: The self link of the Perspective. slug: The slug of the Perspective. updated_at: The date the Perspective was last updated. updated_by: The user who last updated the Perspective. user_data: The user data of the Perspective. warnings: The warnings of the Perspective. cache_id: The cache ID of the Perspective.

bulk_create_segments(segments: List[dict | Segment], batch_size=128)[source]

Creates multiple segments in the perspective all at once, faster than individually creating them.

This method takes a list of segments and creates them in the perspective. The segments can be provided as dictionaries or Segment objects. The method uses the provided batch size to determine how many segments to create at a time. The default batch size is 128. If a segment is provided as a Segment object and it already has an ID, a ValueError is raised. The method returns a list of the created segments.

Args:: segments (List[Union[dict, Segment]]): The segments to create. Can be provided as dictionaries or Segment objects. batch_size (int, optional): The number of segments to create at a time. Defaults to 128.
Returns:: List[Segment]: The created segments.
Raises:: ValueError: If a segment is provided as a Segment object and it already has an ID.

cache_id: str | None

collection()[source]

collection_link: str

collection_slug: str

create(collection: str | Collection | None = None, document: Document | None = None, client: 'redshred.api.http.RedShredAPI' | 'redshred.api.client.RedShredClient' | None = None)[source]: Create the local object on the remote server

created_at: datetime.datetime

created_by: str

description: str | None

document()[source]

document_link: str

document_name: str

enrichment_config: dict

enrichment_name: str

errors: dict | None

id: str

metadata: dict | None

name: str

segment(segment_id) → Segment[source]

segment_types: list

segments(q: str | None = None, fields: List[str] | None = None, **url_params) → SegmentIterator[Segment][source]

segments_link: str

self_link: str

slug: str

updated_at: datetime.datetime

updated_by: str

user_data: dict | None

warnings: dict | None

class redshred.models.api.PerspectiveIterator(data: Iterable[ApiObject] = None, client: 'redshred.api.client.RedShredClient' | 'redshred.api.http.RedShredAPI' = None, path: str = None, q: str = None, fields: List[str] | Set[str] | str | None = None, **client_params)[source]

Bases: APIObjectIterator

A PerspectiveIterator class.

This class is used to create PerspectiveIterator objects. It inherits from the APIObjectIterator class.

Attributes:: see APIObjectIterator class for attributes

class redshred.models.api.RedShredUser(*, active: bool, email: str, first_name: str, joined: datetime, last_login: datetime, last_name: str, staff: bool, super: bool, username: str, token: str, **data)[source]

Bases: SerializableModel

A RedShred user class.

This class is used to create RedShred users. It provides methods to set attributes.

Attributes:: active: Whether the user is active. email: The email of the user. first_name: The first name of the user. joined: The date the user joined. last_login: The date of the user’s last login. last_name: The last name of the user. staff: Whether the user is a staff member. super: Whether the user is a superuser. username: The username of the user. token: The token of the user.

active: bool

email: str

first_name: str

joined: datetime

last_login: datetime

last_name: str

staff: bool

super: bool

token: str

username: str

class redshred.models.api.Segment(*, self_link: str = None, segment_type: str = None, regions: GeoJSON = None, bounding_box: BoundingBox | None = None, collection_link: str = None, collection_slug: str = None, created_at: datetime = None, created_by: str = None, document_link: str = None, document_name: str = None, enrichment_data: dict | None = None, enrichment_name: str = None, errors: dict | None = None, id: str = None, labels: list = None, metadata: dict | None = None, max_x: float | None = None, max_y: float | None = None, min_x: float | None = None, min_y: float | None = None, perspective_link: str = None, summary: str = None, text: str = None, updated_at: datetime = None, updated_by: str = None, user_data: dict | None = None, warnings: dict | None = None, cache_id: str | None = None, **data)[source]

Bases: ApiObject

A Segment class.

This class is used to create Segment objects. It provides methods to create, read, update, and delete the Segment object, get segments from the Segment object, and set the API.

Attributes:: segment_type: The type of the Segment. regions: The regions of the Segment. bounding_box: The bounding box of the Segment. collection_link: The link to the collection the Segment belongs to. collection_slug: The slug of the collection the Segment belongs to. created_at: The date the Segment was created. created_by: The user who created the Segment. document_link: The link to the document the Segment belongs to. enrichment_data: The enrichment data of the Segment. enrichment_name: The name of the enrichment. errors: The errors of the Segment. id: The ID of the Segment. labels: The labels of the Segment. metadata: The metadata of the Segment. max_x: The maximum x-coordinate of the Segment. max_y: The maximum y-coordinate of the Segment. min_x: The minimum x-coordinate of the Segment. min_y: The minimum y-coordinate of the Segment. perspective_link: The link to the perspective of the Segment. self_link: The self link of the Segment. summary: The summary of the Segment. text: The text of the Segment. updated_at: The date the Segment was last updated. updated_by: The user who last updated the Segment. user_data: The user data of the Segment. warnings: The warnings of the Segment. cache_id: The cache ID of the Segment.

between(segment: Segment, strict: bool = False) → BoundingBox[source]

# TEMPORARILY NOT IMPLEMENTED Provides a helper function to generate the bounding box between two segments.

Args:: segment (Segment): A segment to define the space strict (bool, optional): If True, returned bounding box will be area exatly between the two segments. If False, the bounding box returned will be the entire page width between two segments. Defaults to False.
Returns:: list: Bounding box of the area between two segments.

bounding_box: BoundingBox | None

cache_id: str | None

collection()[source]

collection_link: str

collection_slug: str

create(perspective: Perspective | None = None)[source]: Create the local object on the remote server

created_at: datetime.datetime

created_by: str

document()[source]

document_link: str

document_name: str

enrichment_data: dict | None

enrichment_name: str

errors: dict | None

get_segment_image(path_to_save_folder: str | None = None, return_bytes=False, inline=False, **url_params) → bytes | str[source]

Retrieves the image of the segment.

This method uses the SegmentCropper to get the cropped image of the segment. The image can be returned as bytes, opened inline using PIL, or saved to a specified folder.

Args:

path_to_save_folder (str, optional): The path to the folder where the image will be saved. Defaults to the: current working directory.

return_bytes (bool, optional): If True, the image will be returned as bytes. Defaults to False. inline (bool, optional): If True, the image will be opened inline using PIL. Defaults to False. **url_params: Additional URL parameters.

Returns:

Union[bytes, str]: The image of the segment, either as bytes or a path to the saved image.

get_segments_from_perspective(perspective_name: str, **params)[source]: Get all segments that are in the same perspective as this segment

get_text(**url_params)[source]

Retrieves the text of the segment.

This method uses the TokenLookup to get the text of the segment.

Args:: **url_params: Additional URL parameters.
Returns:: str: The text of the segment.

id: str

labels: list

max_x: float | None

max_y: float | None

metadata: dict | None

min_x: float | None

min_y: float | None

perspective()[source]

perspective_link: str

q(query: str, search_type: Literal['documents', 'pages', 'perspectives', 'segments'] = 'segments', **url_params)[source]

Executes a query on the API object.

This method uses the provided query and search type to execute a search on the API object. The search type can be one of “documents”, “pages”, “perspectives”, or “segments”. Additional URL parameters can be provided as keyword arguments.

Args:: query (str): The query to execute. search_type (str): The type of search to perform. Defaults to “segments”. **url_params: Additional URL parameters.
Returns:: An iterator over the results of the query.

regions: GeoJSON

segment_type: str

self_link: str

summary: str

text: str

updated_at: datetime.datetime

updated_by: str

user_data: dict | None

warnings: dict | None

class redshred.models.api.SegmentIterator(data: Iterable[ApiObject] = None, client: 'redshred.api.client.RedShredClient' | 'redshred.api.http.RedShredAPI' = None, path: str = None, q: str = None, fields: List[str] | Set[str] | str | None = None, **client_params)[source]

Bases: APIObjectIterator

A SegmentIterator class.

This class is used to create SegmentIterator objects. It inherits from the APIObjectIterator class.

Attributes:: see APIObjectIterator class for attributes

class redshred.models.api.SerializableModel(**data)[source]

Bases: BaseModel

A serializable model class.

This class is used to create serializable models. It provides methods to set attributes, convert datetimes, convert bounding boxes, convert GeoJSON, and generate a YAML representation of the model.

Attributes:: _private_fields: The private fields of the model. _large_fields: The large fields of the model. _read_only_fields: The read-only fields of the model. _validate_writeable: Whether to validate writeable fields.

class Config[source]

Bases: object

arbitrary_types_allowed = True

extra = 'allow'

json_encoders = {<class 'datetime.datetime'>: <function SerializableModel.Config.<lambda>>, <class 'redshred.spatial.BoundingBox'>: <class 'list'>, <class 'redshred.spatial.GeoJSON'>: <class 'dict'>}

underscore_attrs_are_private = True

use_enum_values = True

classmethod dict_to_geojson(v)[source]

classmethod list_to_bbox(v)[source]

classmethod lists_to_bboxes(v)[source]

yaml(include=None, exclude=None, indent=None)[source]

Converts the object to a YAML representation.

This method first converts the object to JSON, then converts the JSON to YAML. The resulting YAML string is returned.

Args:: include: A list of fields to include in the output. If None, all fields are included. exclude: A list of fields to exclude from the output. If None, no fields are excluded. indent: The number of spaces to use for indentation in the output. If None, the default indentation is used.
Returns:: A string containing the YAML representation of the object.

class redshred.models.api.Token(*, index: int, text: str, text_with_ws: str | None = None, bboxes: List[BoundingBox], regions: GeoJSON, metadata: dict | None = None, rotation: int = 0)[source]

Bases: SerializableModel

A token class.

This class is used to create tokens. It provides methods to set attributes.

Attributes:: index: The index of the token. text: The text of the token. text_with_ws: The text of the token with whitespace. bboxes: The bounding boxes of the token. regions: The regions of the token. metadata: The metadata of the token. rotation: The rotation of the token.

class Config[source]

Bases: Config

extra = 'ignore'

bboxes: List[BoundingBox]

index: int

metadata: dict | None

regions: GeoJSON

rotation: int

text: str

text_with_ws: str | None

redshred.models.api.get_type(name)[source]

redshred.models.configuration module

class redshred.models.configuration.AdvancedOCRTokenizerConfig(*, name: Literal[Tokenizers.advanced_ocr], config: AdvancedOCRTokenizerConfigOptions = None)[source]

Bases: ConfiguredTokenizer

class Config[source]: Bases: _DefaultPydanticConfig

config: AdvancedOCRTokenizerConfigOptions

name: Literal[Tokenizers.advanced_ocr]

class redshred.models.configuration.AdvancedOCRTokenizerConfigOptions(*, images_only: bool = False, dpi: int = 150, detection_model: str = 'dit', recognition_model: str = 'str', threshold: float = 0.5)[source]

Bases: BaseModel

class Config[source]: Bases: _DefaultPydanticConfig

detection_model: str

dpi: int

images_only: bool

recognition_model: str

threshold: float

Bases: BaseModel

class Config[source]: Bases: _DefaultPydanticConfig

allow_anonymous_downloads: bool

dict(*, include: AbstractSetIntStr | MappingIntStrAny | None = None, exclude: AbstractSetIntStr | MappingIntStrAny | None = None, by_alias: bool = False, skip_defaults: bool | None = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False) → DictStrAny: Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

document_uniqueness: DocumentUniqueness

enrichments: List[DefinedAcronymsPerspective | ExternalAPIPerspective | GrouperPerspective | HuggingfacePerspective | IrisPerspective | PageImagesPerspective | PdftotextPerspective | PreprocessPerspective | RegexPerspective | SentencesPerspective | SpacyPerspective | TFIDFPerspective | TypographyPerspective | PerspectiveConfiguration]

classmethod from_dict(config: Dict[str, Any])[source]

json(*, include: AbstractSetIntStr | MappingIntStrAny | None = None, exclude: AbstractSetIntStr | MappingIntStrAny | None = None, by_alias: bool = False, skip_defaults: bool | None = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, encoder: Callable[[Any], Any] | None = None, models_as_dict: bool = True, **dumps_kwargs: Any) → unicode

Generate a JSON representation of the model, include and exclude arguments as per dict().

encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().

notifications: List[NotificationConfiguration]

tokenizer: List[Tokenizers | str | ConfiguredTokenizer | TesseractTokenizerConfig | AdvancedOCRTokenizerConfig] | Tokenizers | str | ConfiguredTokenizer | TesseractTokenizerConfig | AdvancedOCRTokenizerConfig

validate_remote_schema(client: redshred.api.client.RedShredClient)[source]

yaml(*, include: Set[str] | None = None, exclude: Set[str] | None = None, by_alias: bool = False, skip_defaults: bool | None = None, exclude_unset: bool = True, exclude_defaults: bool = False, exclude_none: bool = False, encoder: Callable[[Any], Any] | None = None, models_as_dict: bool = True, **dumps_kwargs: Any)[source]

Generate a YAML representation of the model from the JSON representation, include and exclude arguments as per dict().

encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().

class redshred.models.configuration.ConfiguredTokenizer(*, name: Literal[Tokenizers.pdftotext] | Literal[Tokenizers.pdfminer] | Literal[Tokenizers.tet], config: Dict[str, Any] = None)[source]

Bases: BaseModel

class Config[source]: Bases: _DefaultPydanticConfig

config: Dict[str, Any]

name: Literal[Tokenizers.pdftotext] | Literal[Tokenizers.pdfminer] | Literal[Tokenizers.tet]

class redshred.models.configuration.DocumentUniqueness(value)[source]

Bases: str, Enum

An enumeration.

always = 'always'

contents = 'contents'

filename = 'filename'

class redshred.models.configuration.NotificationConfiguration(*, label: ConstrainedStrValue, recipients: ConstrainedListValue[str], query: str, condition: str = 'length(@) > `0`')[source]

Bases: BaseModel

A model representing a notification configuration.

This model is used to define a notification configuration, which includes the label, recipients, query, and condition.

Attributes:: label: Identifying Label for the notification. recipients: A list of recipients. query: A string containing a query in the redshred query language. condition: JMESPath expression to evaluate against.

class Config[source]: Bases: _DefaultPydanticConfig

condition: str

label: str

query: str

recipients: List[str]

class redshred.models.configuration.PerspectiveConfiguration(*, name: str, perspective: str, segments: SegmentQuery | Dict = None, description: str = '', config: Dict[str, Any] = None, debug: bool = False)[source]

Bases: BaseModel

class Config[source]: Bases: _DefaultPydanticConfig

config: Dict[str, Any]

debug: bool

description: str

name: str

perspective: str

segments: SegmentQuery | Dict

class redshred.models.configuration.TesseractTokenizerConfig(*, name: Literal[Tokenizers.tesseract], config: TesseractTokenizerConfigOptions = None)[source]

Bases: ConfiguredTokenizer

class Config[source]: Bases: _DefaultPydanticConfig

config: TesseractTokenizerConfigOptions

name: Literal[Tokenizers.tesseract]

class redshred.models.configuration.TesseractTokenizerConfigOptions(*, images_only: bool = False)[source]

Bases: BaseModel

class Config[source]: Bases: _DefaultPydanticConfig

images_only: bool

class redshred.models.configuration.Tokenizers(value)[source]

Bases: str, Enum

An enumeration.

advanced_ocr = 'advanced_ocr'

pdfminer = 'pdfminer'

pdftotext = 'pdftotext'

tesseract = 'tesseract'

tet = 'tet'

redshred.models package

Submodules

redshred.models.api module

redshred.models.configuration module

Module contents