redshred package
Subpackages
- redshred.api package
- redshred.cli package
- redshred.enrichments package
- Submodules
- redshred.enrichments.base module
- redshred.enrichments.defined_acronyms module
- redshred.enrichments.external_api module
- redshred.enrichments.grouper module
GrouperPerspectiveGrouperPerspectiveConfigGrouperPerspectiveConfig.ConfigGrouperPerspectiveConfig.hull_methodGrouperPerspectiveConfig.hull_method_optionsGrouperPerspectiveConfig.operation_labelsGrouperPerspectiveConfig.operationsGrouperPerspectiveConfig.root_labelGrouperPerspectiveConfig.whitespace_calculation_methodGrouperPerspectiveConfig.whitespace_method_optionsGrouperPerspectiveConfig.x_gapGrouperPerspectiveConfig.y_gap
GrouperPerspectiveHullMethodobject_setattr()
- redshred.enrichments.huggingface module
HuggingfacePerspectiveHuggingfacePerspectiveConfigHuggingfacePerspectiveConfig.ConfigHuggingfacePerspectiveConfig.modelHuggingfacePerspectiveConfig.model_classHuggingfacePerspectiveConfig.model_sourceHuggingfacePerspectiveConfig.pipeline_taskHuggingfacePerspectiveConfig.task_configHuggingfacePerspectiveConfig.task_config_classHuggingfacePerspectiveConfig.task_specific_templateHuggingfacePerspectiveConfig.tokenizerHuggingfacePerspectiveConfig.tokenizer_class
object_setattr()
- redshred.enrichments.iris module
- redshred.enrichments.page_images module
BackendOptionsPageImagesPerspectivePageImagesPerspectiveBackendPageImagesPerspectiveConfigobject_setattr()
- redshred.enrichments.pdftotext module
- redshred.enrichments.preprocess module
- redshred.enrichments.regex module
- redshred.enrichments.sentences module
- redshred.enrichments.spacy module
- redshred.enrichments.tfidf module
TFIDFPerspectiveTFIDFPerspectiveConfigTFIDFPerspectiveNormobject_setattr()
- redshred.enrichments.typography module
- Module contents
- redshred.microservices package
- redshred.models package
- Submodules
- redshred.models.api module
APIObjectIteratorApiObjectCollectionCollection.clientCollection.configCollection.create()Collection.created_atCollection.created_byCollection.delete()Collection.descriptionCollection.document()Collection.documents()Collection.documents_linkCollection.idCollection.load()Collection.marked_for_deleteCollection.metadataCollection.nameCollection.ownerCollection.perspective()Collection.perspectives()Collection.perspectives_linkCollection.segment()Collection.segments()Collection.segments_linkCollection.self_linkCollection.slugCollection.updated_atCollection.updated_byCollection.upload_csv()Collection.upload_file()Collection.upload_text()Collection.upload_url()Collection.user_data
CollectionIteratorDocumentDocument.collection()Document.collection_linkDocument.collection_slugDocument.configDocument.content_hashDocument.create()Document.created_atDocument.created_byDocument.csv_metadataDocument.descriptionDocument.document_segment_linkDocument.download()Document.download_bytes()Document.errorsDocument.file_linkDocument.file_sizeDocument.idDocument.indexDocument.metadataDocument.n_pagesDocument.nameDocument.original_nameDocument.page()Document.pages()Document.pages_linkDocument.pdf_linkDocument.perspective()Document.perspectives()Document.perspectives_linkDocument.read_stateDocument.read_state_updated_atDocument.regionDocument.reread_document()Document.segment()Document.segments()Document.segments_linkDocument.self_linkDocument.slugDocument.sourceDocument.summaryDocument.textDocument.uniqueness_idDocument.updated_atDocument.updated_byDocument.user_dataDocument.wait_until_read()Document.warnings
DocumentIteratorPagePage.collection()Page.collection_linkPage.collection_slugPage.content_hashPage.created_atPage.created_byPage.document()Page.document_indexPage.document_namePage.dpiPage.heightPage.idPage.indexPage.metadataPage.namePage.next()Page.page_segment_linkPage.perspective()Page.perspectives()Page.perspectives_linkPage.previous()Page.regionPage.segment()Page.segments()Page.segments_linkPage.self_linkPage.summaryPage.textPage.tokens()Page.tokens_file_linkPage.unitsPage.updated_atPage.updated_byPage.user_dataPage.width
PageIteratorPerspectivePerspective.bulk_create_segments()Perspective.cache_idPerspective.collection()Perspective.collection_linkPerspective.collection_slugPerspective.create()Perspective.created_atPerspective.created_byPerspective.descriptionPerspective.document()Perspective.document_linkPerspective.document_namePerspective.enrichment_configPerspective.enrichment_namePerspective.errorsPerspective.idPerspective.metadataPerspective.namePerspective.segment()Perspective.segment_typesPerspective.segments()Perspective.segments_linkPerspective.self_linkPerspective.slugPerspective.updated_atPerspective.updated_byPerspective.user_dataPerspective.warnings
PerspectiveIteratorRedShredUserSegmentSegment.between()Segment.bounding_boxSegment.cache_idSegment.collection()Segment.collection_linkSegment.collection_slugSegment.create()Segment.created_atSegment.created_bySegment.document()Segment.document_linkSegment.document_nameSegment.enrichment_dataSegment.enrichment_nameSegment.errorsSegment.get_segment_image()Segment.get_segments_from_perspective()Segment.get_text()Segment.idSegment.labelsSegment.max_xSegment.max_ySegment.metadataSegment.min_xSegment.min_ySegment.perspective()Segment.perspective_linkSegment.q()Segment.regionsSegment.segment_typeSegment.self_linkSegment.summarySegment.textSegment.updated_atSegment.updated_bySegment.user_dataSegment.warnings
SegmentIteratorSerializableModelTokenget_type()
- redshred.models.configuration module
AdvancedOCRTokenizerConfigAdvancedOCRTokenizerConfigOptionsCollectionConfigurationCollectionConfiguration.ConfigCollectionConfiguration.allow_anonymous_downloadsCollectionConfiguration.dict()CollectionConfiguration.document_uniquenessCollectionConfiguration.enrichmentsCollectionConfiguration.from_dict()CollectionConfiguration.json()CollectionConfiguration.notificationsCollectionConfiguration.tokenizerCollectionConfiguration.validate_remote_schema()CollectionConfiguration.yaml()
ConfiguredTokenizerDocumentUniquenessNotificationConfigurationPerspectiveConfigurationTesseractTokenizerConfigTesseractTokenizerConfigOptionsTokenizers
- Module contents
- redshred.visualize package
Submodules
redshred.configuration module
- class redshred.configuration.Configuration(token=None, host=None, host_verify=None, config_path=None, context_override=None)[source]
Bases:
objectHandles configuration for the RedShred API client.
The order in which the configuration sources are looked up is as follows:
Explicit values given to the constructor.
Environment variables: REDSHRED_TOKEN and REDSHRED_HOST.
Environment variables: REDSHRED_CONTEXT and REDSHRED_CONFIG.
REDSHRED_CONTEXT alone (uses the default config path).
REDSHRED_CONFIG alone (defaults to currentContext).
None given: uses the default config path with currentContext.
- Attributes:
context (str): The current context name. user (Optional[str]): The username, if available. host (str): The RedShred API server URL. token (str): The authentication token for RedShred API. verify (bool): Flag indicating whether to verify the server’s SSL certificate. options (Dict[str, Any]): Additional configuration options. source (str): The source of the configuration, such as environment variables or config file path.
- Raises:
ConfigurationError: If there is a general configuration error. ConfigurationFileError: If there is a problem specifically with the configuration file format.
- context: str
- host: str
- info()[source]
Provides a multiline string representation of the configuration details.
The returned string includes the configuration context, host, obfuscated token, username, and the source from which the configuration was obtained.
- Returns:
str: A human-readable, formatted summary of the configuration settings.
- options: Dict[str, Any]
- source: str
- token: str
- user: str | None
- verify: bool
- exception redshred.configuration.ConfigurationError[source]
Bases:
ExceptionException raised for general configuration errors.
This error is used for broader configuration issues not necessarily related to the file format itself, such as missing environment variables or failure to locate a required configuration file.
- exception redshred.configuration.ConfigurationFileError[source]
Bases:
ExceptionException raised for problems with the configuration file format.
This error is thrown when there is a problem with parsing the RedShred configuration file, which might be due to issues like improper formatting or missing required fields.
redshred.exceptions module
- exception redshred.exceptions.RedShredAPIError(*args, reason=None, **kwargs)[source]
Bases:
Exception- reason: Any
- exception redshred.exceptions.RedShredFileExistsError(*args, **kwargs)[source]
Bases:
RedShredHTTPError
redshred.spatial module
- class redshred.spatial.BoundingBox(initlist: Iterable | None = None)[source]
Bases:
list,_BaseGeometricMixin- as_boundingbox() BoundingBox[source]
Get a BoundingBox object from a GeoJSON object
- classmethod from_shape(geometry) BoundingBox[source]
- get_bounds() BoundingBox[source]
for api consistency with Geojson
- get_offsets(numpy=False) tuple[int, int][source]
x and y offsets for each coordinate of the geojson object
- property height
- property min_x
- property min_y
- normalize_to_page() BoundingBox[source]
- rotate(rotation, origin=(0.5, 0.5)) BoundingBox[source]
- scale(xfact=1.0, yfact=1.0, origin=(0, 0)) BoundingBox[source]
shapely scaling transformation
- translate(xoff=0.0, yoff=0.0) BoundingBox[source]
shapely offset transformation
- property width
- class redshred.spatial.GeoJSON[source]
Bases:
dict,_BaseGeometricMixin- as_boundingbox() BoundingBox[source]
Get a BoundingBox object from a GeoJSON object
- as_shape() BaseGeometry[source]
Get a shapely object from the GeoJSON dictionary Returns:
BaseGeometry
- classmethod from_bounds(x_min, y_min, x_max, y_max, base_class='polygon')[source]
Create a new object from x_min, y_min, x_max, y_max values. By default, a polygon is returned, but you can override this functionality by passing “multipolygon” as your base_class
- classmethod from_coords(coords)[source]
Create a new GeoJSON object from either a list of coordinates, or a list of list of coordinates for a Polygon and Multipolygon respectively
- get_bounds() BoundingBox[source]
Get a BoundingBox object from a GeoJSON object
- get_coordinates(canonical=True, numpy=False) List[List[float]][source]
Get a coordinates list object from a GeoJSON object
- get_offsets(numpy=False) Tuple[int, int][source]
x and y offsets for each coordinate of the geojson object
- property height
- property min_x
- property min_y
- normalize_to_page() GeoJSON[source]
Normalize the given GeoJSON from the quiltspace or document space to a [0,1], [0,1] scale. This returns a new GeoJSON instance.
- rotate(angle, origin='center') GeoJSON[source]
Rotate a GeoJSON object using affine transformations. This returns a new GeoJSON instance.
- scale(xfact=1.0, yfact=1.0, origin=(0, 0)) GeoJSON[source]
Scale a GeoJSON object using affine transformations. This returns a new GeoJSON instance.
- translate(xoff=0.0, yoff=0.0) GeoJSON[source]
Offset a GeoJSON object by the specified amount. This returns a new GeoJSON instance.
- property width
redshred.util module
util functions
Module contents
This is the RedShred client library in Python.
RedShred, LLC 2018-2023
- class redshred.Collection(*, self_link: str = None, id: str = None, config: CollectionConfiguration | None = None, created_at: datetime = None, created_by: str = None, description: str | None = None, documents_link: str = None, marked_for_delete: bool | None = False, metadata: dict | None = None, name: str = None, owner: str = None, perspectives_link: str = None, segments_link: str = None, slug: str = None, updated_at: datetime = None, updated_by: str = None, user_data: dict | None = None, client: Any = None, **data)[source]
Bases:
ApiObjectA collection class.
This class is used to create collections. It provides methods to create, read, update, and delete the collection, get documents, perspectives, and segments from the collection, and upload CSVs, files, URLs, and text to the collection.
- Attributes:
id: The ID of the collection. config: The configuration of the collection. created_at: The date the collection was created. created_by: The user who created the collection. description: The description of the collection. documents_link: The link to the documents in the collection. marked_for_delete: Whether the collection is marked for deletion. metadata: The metadata of the collection. name: The name of the collection. owner: The owner of the collection. perspectives_link: The link to the perspectives in the collection. segments_link: The link to the segments in the collection. self_link: The self link of the collection. slug: The slug of the collection. updated_at: The date the collection was last updated. updated_by: The user who last updated the collection. user_data: The user data of the collection.
- client: Any
- config: CollectionConfiguration | None
- create(client: 'redshred.api.http.RedShredAPI' | 'redshred.api.client.RedShredClient') TApiObject[source]
Creates the local object on the remote server.
This method is used to create the local object on the remote server. It uses the provided client to make the API request. The method identifies the creatable fields, makes a POST request to the server, and updates the local object with the response.
- Args:
client (Union[RedShredAPI, RedShredClient]): The client to use for the API request.
- Returns:
TApiObject: The local object updated with the response from the server.
- Raises:
RedShredAPIError: If the server response status code is not 201 (Created).
- created_at: datetime.datetime
- created_by: str
- description: str | None
- document(document_id) Document[source]
Retrieves a specific document from the collection.
This method uses the provided document ID to load a Document object from the collection. The document ID can be provided in various formats and is converted to a standard format using the id_from_any function.
- Args:
document_id (str): The ID of the document to retrieve.
- Returns:
Document: The loaded Document object.
- documents(q: str | None = None, fields: List[str] | None = None, **url_params) DocumentIterator[Document][source]
Returns an iterator over the documents in the collection.
This method creates a DocumentIterator object that can be used to iterate over the documents in the collection. The documents can be filtered using a query string and specific fields can be included in the output. Additional URL parameters can be provided as keyword arguments.
- Args:
q (str, optional): The query string to filter the documents. Defaults to None. fields (List[str], optional): The fields to include in the output. Defaults to None. **url_params: Additional URL parameters.
- Returns:
DocumentIterator[Document]: An iterator over the documents in the collection.
- documents_link: str
- id: str
- classmethod load(client: 'redshred.api.http.RedShredAPI' | 'redshred.api.client.RedShredClient', slug: str = None, url: str = None) Collection[source]
See APIObject.load
- marked_for_delete: bool | None
- metadata: dict | None
- name: str
- owner: str
- perspective(perspective_id) Perspective[source]
Retrieves a specific perspective from the collection.
This method uses the provided perspective ID to load a Perspective object from the collection. The perspective ID can be provided in various formats and is converted to a standard format using the id_from_any function.
- Args:
perspective_id (str): The ID of the perspective to retrieve.
- Returns:
Perspective: The loaded Perspective object.
- perspectives(q: str | None = None, fields: List[str] | None = None, **url_params) PerspectiveIterator[Perspective][source]
Returns an iterator over the perspectives in the collection.
This method creates a PerspectiveIterator object that can be used to iterate over the perspectives in the collection. The perspectives can be filtered using a query string and specific fields can be included in the output. Additional URL parameters can be provided as keyword arguments.
- Args:
q (str, optional): The query string to filter the perspectives. Defaults to None. fields (List[str], optional): The fields to include in the output. Defaults to None. **url_params: Additional URL parameters.
- Returns:
PerspectiveIterator[Perspective]: An iterator over the perspectives in the collection.
- perspectives_link: str
- segment(segment_id) Segment[source]
Retrieves a specific segment from the collection.
- Args:
segment_id (str): The ID of the segment to retrieve.
- Returns:
Segment: The loaded Segment object.
- segments(q: str | None = None, fields: List[str] | None = None, **url_params) SegmentIterator[Segment][source]
Returns an iterator over the segments in the collection.
This method creates a SegmentIterator object that can be used to iterate over the segments in the collection. The segments can be filtered using a query string and specific fields can be included in the output. Additional URL parameters can be provided as keyword arguments.
- Args:
q (str, optional): The query string to filter the segments. Defaults to None. fields (List[str], optional): The fields to include in the output. Defaults to None. **url_params: Additional URL parameters.
- Returns:
SegmentIterator[Segment]: An iterator over the segments in the collection.
- segments_link: str
- self_link: str
- slug: str
- updated_at: datetime.datetime
- updated_by: str
- upload_csv(file, content_columns: List[str], delimiter=',', rename: str | None = None, **user_data)[source]
- Args:
file: source file to upload content_columns: Used for multi-document upload via CSV. A list which specifies the column(s)
that will be used for the document body.
delimiter: delimiter for csv, defaults to “,” rename: a new name if desired **user_data: any additional user data
- Returns:
None
- upload_file(file: Path | BufferedReader, rename: str | None = None, save_origin: bool | None = False, **user_data) Document[source]
Convenience method to upload a filelike into RedShred.
- Args:
collection_link (str): Target collection to upload file to file (str, filelike): Either a filename, url of file, or open() filelike object rename (str, optional): File name override. Defaults to existing filename save_origin (bool, optional): Save the path to the file on disk. Defaults to False user_data (dict): arbitrary dictionary to store with document on server
- Raises:
ValueError: Name argument missing for URL upload
- Returns:
dict: Returned payload from API server
- upload_text(text: str, name: str, **user_data) Document[source]
Convenience method to upload raw text into RedShred.
Given a collection name and a url, upload that text into RedShred.
- Args:
text (str): Text to upload. name (str, optional): File name to save text as. user_data (dict): arbitrary dictionary to store with document on server
- Returns:
dict: Returned payload from API server
- upload_url(url: str, rename: str | None = None, save_origin: bool | None = True, **user_data) Document[source]
Convenience method to upload a URL into RedShred.
Given a collection name and a url, upload that file into RedShred.
- Args:
collection_link (str): Target collection to upload file to url (str, filelike): Url of file to upload. rename (str, optional): File name override. Defaults to existing filename save_origin (bool, optional): Save the url to the file. Defaults to True user_data (dict): arbitrary dictionary to store with document on server
- Raises:
ValueError: Name argument missing for URL upload
- Returns:
dict: Returned payload from API server
- user_data: dict | None
- class redshred.CollectionConfiguration(*, tokenizer: List[Tokenizers | str | ConfiguredTokenizer | TesseractTokenizerConfig | AdvancedOCRTokenizerConfig] | Tokenizers | str | ConfiguredTokenizer | TesseractTokenizerConfig | AdvancedOCRTokenizerConfig = None, enrichments: List[DefinedAcronymsPerspective | ExternalAPIPerspective | GrouperPerspective | HuggingfacePerspective | IrisPerspective | PageImagesPerspective | PdftotextPerspective | PreprocessPerspective | RegexPerspective | SentencesPerspective | SpacyPerspective | TFIDFPerspective | TypographyPerspective | PerspectiveConfiguration] = None, notifications: List[NotificationConfiguration] = None, document_uniqueness: DocumentUniqueness = 'contents', allow_anonymous_downloads: bool = False)[source]
Bases:
BaseModel- allow_anonymous_downloads: bool
- dict(*, include: AbstractSetIntStr | MappingIntStrAny | None = None, exclude: AbstractSetIntStr | MappingIntStrAny | None = None, by_alias: bool = False, skip_defaults: bool | None = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False) DictStrAny
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
- document_uniqueness: DocumentUniqueness
- enrichments: List[DefinedAcronymsPerspective | ExternalAPIPerspective | GrouperPerspective | HuggingfacePerspective | IrisPerspective | PageImagesPerspective | PdftotextPerspective | PreprocessPerspective | RegexPerspective | SentencesPerspective | SpacyPerspective | TFIDFPerspective | TypographyPerspective | PerspectiveConfiguration]
- json(*, include: AbstractSetIntStr | MappingIntStrAny | None = None, exclude: AbstractSetIntStr | MappingIntStrAny | None = None, by_alias: bool = False, skip_defaults: bool | None = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, encoder: Callable[[Any], Any] | None = None, models_as_dict: bool = True, **dumps_kwargs: Any) unicode
Generate a JSON representation of the model, include and exclude arguments as per dict().
encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().
- notifications: List[NotificationConfiguration]
- tokenizer: List[Tokenizers | str | ConfiguredTokenizer | TesseractTokenizerConfig | AdvancedOCRTokenizerConfig] | Tokenizers | str | ConfiguredTokenizer | TesseractTokenizerConfig | AdvancedOCRTokenizerConfig
- validate_remote_schema(client: redshred.api.client.RedShredClient)[source]
- yaml(*, include: Set[str] | None = None, exclude: Set[str] | None = None, by_alias: bool = False, skip_defaults: bool | None = None, exclude_unset: bool = True, exclude_defaults: bool = False, exclude_none: bool = False, encoder: Callable[[Any], Any] | None = None, models_as_dict: bool = True, **dumps_kwargs: Any)[source]
Generate a YAML representation of the model from the JSON representation, include and exclude arguments as per dict().
encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().
- class redshred.Document(*, self_link: str = None, id: str = None, collection_link: str = None, collection_slug: str = None, config: CollectionConfiguration | None = None, content_hash: str = None, created_at: datetime = None, created_by: str = None, csv_metadata: dict | None = None, description: str | None = None, document_segment_link: str = None, errors: dict | str | None = None, file_link: str = None, file_size: int = None, index: int = None, metadata: dict | None = None, n_pages: int = None, name: str = None, original_name: str = None, pages_link: str = None, pdf_link: str = None, perspectives_link: str = None, read_state: str = None, read_state_updated_at: datetime = None, region: GeoJSON = None, segments_link: str = None, slug: str = None, source: str = None, summary: str = None, text: str = None, updated_at: datetime = None, updated_by: str = None, user_data: dict | None = None, warnings: dict | None = None, uniqueness_id: str | None = None, **data)[source]
Bases:
ApiObjectA document class.
This class is used to create documents. It provides methods to create, read, update, and delete the document, get pages, perspectives, and segments from the document, and download the document.
- Attributes:
id: The ID of the document. collection_link: The link to the collection the document belongs to. collection_slug: The slug of the collection the document belongs to. config: The configuration of the document. content_hash: The content hash of the document. created_at: The date the document was created. created_by: The user who created the document. csv_metadata: The CSV metadata of the document. description: The description of the document. document_segment_link: The link to the document segment. errors: The errors of the document. file_link: The link to the file of the document. file_size: The size of the file of the document. index: The index of the document. metadata: The metadata of the document. n_pages: The number of pages in the document. name: The name of the document. original_name: The original name of the document. pages_link: The link to the pages in the document. pdf_link: The link to the PDF of the document. perspectives_link: The link to the perspectives in the document. read_state: The read state of the document. read_state_updated_at: The date the read state of the document was last updated. region: The region of the document. segments_link: The link to the segments in the document. self_link: The self link of the document. slug: The slug of the document. source: The source of the document. summary: The summary of the document. text: The text of the document. updated_at: The date the document was last updated. updated_by: The user who last updated the document. user_data: The user data of the document. warnings: The warnings of the document. uniqueness_id: The uniqueness ID of the document.
- collection() Collection[source]
- collection_link: str
- collection_slug: str
- config: CollectionConfiguration | None
- content_hash: str
- created_at: datetime.datetime
- created_by: str
- csv_metadata: dict | None
- description: str | None
- document_segment_link: str
- download(path: str | 'pathlib.Path') int[source]
Download the original_file uploaded to RedShred to the specified path, returning the total bytes written
- Args:
path: a path to somewhere on the local filesystem
Returns: number of bytes written
- download_bytes() bytes[source]
Download the original_file uploaded to RedShred to the specified path, returning the total bytes written
Returns: document as bytes
- errors: dict | str | None
- file_link: str
- file_size: int
- id: str
- index: int
- metadata: dict | None
- n_pages: int
- name: str
- original_name: str
- page(index) Page[source]
Retrieves a specific page from the document.
- Args:
index (int): The index of the page to retrieve.
- Returns:
Page: The loaded Page object.
- pages(q: str | None = None, fields: List[str] | None = None, **url_params) PageIterator[Page][source]
Returns an iterator over the pages in the document.
- Args:
q (str, optional): The query string to filter the pages. Defaults to None. fields (List[str], optional): The fields to include in the output. Defaults to None. **url_params: Additional URL parameters.
- Returns:
PageIterator[Page]: An iterator over the pages in the document.
- pages_link: str
- pdf_link: str
- perspective(perspective_id) Perspective[source]
- perspectives(q: str | None = None, fields: List[str] | None = None, **url_params) PerspectiveIterator[Perspective][source]
- perspectives_link: str
- read_state: str
- read_state_updated_at: datetime.datetime
- reread_document(force=False)[source]
Reread the document and generate any new or changed perspectives and retry any failed perspectives.
- Args:
force: force a reread even if the document is not in a state that allows it to be read
- segment(segment_id) Perspective[source]
- segments(q: str | None = None, fields: List[str] | None = None, **url_params) SegmentIterator[Segment][source]
- segments_link: str
- self_link: str
- slug: str
- source: str
- summary: str
- text: str
- uniqueness_id: str | None
- updated_at: datetime.datetime
- updated_by: str
- user_data: dict | None
- wait_until_read(wait_time_seconds: int = 5)[source]
Synchronously wait until the document has been read
- Args:
wait_time_seconds: time to wait between checks
- warnings: dict | None
- class redshred.Page(*, self_link: str = None, collection_link: str = None, collection_slug: str = None, content_hash: str = None, created_at: datetime = None, created_by: str = None, document_index: int = None, document_name: str = None, dpi: int = None, height: float = None, id: str = None, index: int = None, metadata: dict | None = None, name: str = None, page_segment_link: str = None, perspectives_link: str = None, region: GeoJSON = None, segments_link: str = None, summary: str = None, text: str = None, tokens_file_link: str = None, units: str = None, updated_at: datetime = None, updated_by: str = None, user_data: dict | None = None, width: float = None, **data)[source]
Bases:
ApiObjectA page class.
This class is used to create pages. It provides methods to set attributes.
- Attributes:
collection_link: The link to the collection the page belongs to. collection_slug: The slug of the collection the page belongs to. content_hash: The content hash of the page. created_at: The date the page was created. created_by: The user who created the page. document_index: The index of the document the page belongs to. document_name: The name of the document the page belongs to. dpi: The DPI of the page. height: The height of the page. id: The ID of the page. index: The index of the page. metadata: The metadata of the page. name: The name of the page. page_segment_link: The link to the page segment. perspectives_link: The link to the perspectives in the page. region: The region of the page. segments_link: The link to the segments in the page. self_link: The self link of the page. summary: The summary of the page. text: The text of the page. tokens_file_link: The link to the tokens file of the page. units: The units of the page. updated_at: The date the page was last updated. updated_by: The user who last updated the page. user_data: The user data of the page. width: The width of the page.
- collection_link: str
- collection_slug: str
- content_hash: str
- created_at: datetime.datetime
- created_by: str
- document_index: int
- document_name: str
- dpi: int
- height: float
- id: str
- index: int
- metadata: dict | None
- name: str
- next()[source]
Returns the next page in the document.
- Returns:
Page: The next page in the document.
- Raises:
ValueError: If there is no next page.
- page_segment_link: str
- perspective(perspective_id) Perspective[source]
- perspectives(q: str | None = None, fields: List[str] | None = None, **url_params) PerspectiveIterator[Perspective][source]
- perspectives_link: str
- previous()[source]
Returns the previous page in the document.
- Returns:
Page: The previous page in the document.
- Raises:
ValueError: If there is no previous page.
- segments(q: str | None = None, fields: List[str] | None = None, **url_params) SegmentIterator[Segment][source]
- segments_link: str
- self_link: str
- summary: str
- text: str
- tokens()[source]
Returns a list of tokens in the page.
- Returns:
List[Token]: A list of tokens in the page.
- tokens_file_link: str
- units: str
- updated_at: datetime.datetime
- updated_by: str
- user_data: dict | None
- width: float
- class redshred.Perspective(*, self_link: str = None, name: str = None, enrichment_name: str = None, collection_link: str = None, collection_slug: str = None, created_at: datetime = None, created_by: str = None, document_link: str = None, description: str | None = None, document_name: str = None, enrichment_config: dict = None, errors: dict | None = None, id: str = None, metadata: dict | None = None, segment_types: list = None, segments_link: str = None, slug: str = None, updated_at: datetime = None, updated_by: str = None, user_data: dict | None = None, warnings: dict | None = None, cache_id: str | None = None, **data)[source]
Bases:
ApiObjectA Perspective class.
This class is used to create Perspective objects. It provides methods to create, read, update, and delete the Perspective object, get segments from the Perspective object, and set the API.
- Attributes:
name: The name of the Perspective. enrichment_name: The name of the enrichment. collection_link: The link to the collection the Perspective belongs to. collection_slug: The slug of the collection the Perspective belongs to. created_at: The date the Perspective was created. created_by: The user who created the Perspective. document_link: The link to the document the Perspective belongs to. description: The description of the Perspective. document_name: The name of the document the Perspective belongs to. enrichment_config: The configuration of the enrichment. errors: The errors of the Perspective. id: The ID of the Perspective. metadata: The metadata of the Perspective. segment_types: The types of segments in the Perspective. segments_link: The link to the segments in the Perspective. self_link: The self link of the Perspective. slug: The slug of the Perspective. updated_at: The date the Perspective was last updated. updated_by: The user who last updated the Perspective. user_data: The user data of the Perspective. warnings: The warnings of the Perspective. cache_id: The cache ID of the Perspective.
- bulk_create_segments(segments: List[dict | Segment], batch_size=128)[source]
Creates multiple segments in the perspective all at once, faster than individually creating them.
This method takes a list of segments and creates them in the perspective. The segments can be provided as dictionaries or Segment objects. The method uses the provided batch size to determine how many segments to create at a time. The default batch size is 128. If a segment is provided as a Segment object and it already has an ID, a ValueError is raised. The method returns a list of the created segments.
- Args:
segments (List[Union[dict, Segment]]): The segments to create. Can be provided as dictionaries or Segment objects. batch_size (int, optional): The number of segments to create at a time. Defaults to 128.
- Returns:
List[Segment]: The created segments.
- Raises:
ValueError: If a segment is provided as a Segment object and it already has an ID.
- cache_id: str | None
- collection_link: str
- collection_slug: str
- create(collection: str | Collection | None = None, document: Document | None = None, client: 'redshred.api.http.RedShredAPI' | 'redshred.api.client.RedShredClient' | None = None)[source]
Create the local object on the remote server
- created_at: datetime.datetime
- created_by: str
- description: str | None
- document_link: str
- document_name: str
- enrichment_config: dict
- enrichment_name: str
- errors: dict | None
- id: str
- metadata: dict | None
- name: str
- segment_types: list
- segments(q: str | None = None, fields: List[str] | None = None, **url_params) SegmentIterator[Segment][source]
- segments_link: str
- self_link: str
- slug: str
- updated_at: datetime.datetime
- updated_by: str
- user_data: dict | None
- warnings: dict | None
- class redshred.PerspectiveConfiguration(*, name: str, perspective: str, segments: SegmentQuery | Dict = None, description: str = '', config: Dict[str, Any] = None, debug: bool = False)[source]
Bases:
BaseModel- config: Dict[str, Any]
- debug: bool
- description: str
- name: str
- perspective: str
- segments: SegmentQuery | Dict
- exception redshred.RedShredAPIError(*args, reason=None, **kwargs)[source]
Bases:
Exception- reason: Any
- class redshred.RedShredClient(token: str | None = None, host: str | None = None, host_verify: bool | None = None, config_path: str | None = None, context_override: str | None = None, config: Configuration | None = None)[source]
Bases:
objectClient for interacting with the RedShred Platform.
This client provides easy access to the RedShred API, enabling users to interact with various RedShred Platform services such as retrieving user data, fetching contents of files, and accessing collection statistics.
The client can be configured via environmental variables, a .rsconfig file, or manual initialization arguments.
- Attributes:
config (Configuration): The configuration for the RedShred API. api (RedShredAPI): Interface to the RedShred API services. _user (RedShredUser, optional): The authenticated user’s details. Default is None.
- Args:
token (str, optional): The authentication token. host (str, optional): The RedShred API host address. host_verify (bool, optional): Flag to enable or disable SSL verification. config_path (str, optional): Path to the .rsconfig file. context_override (str, optional): Overrides the context specified in the configuration. config (Configuration, optional): A pre-initialized Configuration object.
- collection(slug: str) Collection[source]
Retrieve a specific collection by its slug.
- Args:
slug (str): The slug, short reference, or link field of the collection.
- Returns:
Collection: The retrieved collection object.
- Raises:
HTTPError: If the requested collection cannot be found or another error occurs.
- collections(**client_params) CollectionIterator[source]
Fetch collections accessible to the user.
- Args:
**client_params: Additional parameters for the client request.
- Returns:
CollectionIterator: An iterator to access user’s collections.
- file(storage_path: str, inline: bool = False, unconfined: bool = True, width: int = 800, **kwargs) bytes[source]
Fetch a file stored in RedShred by its relative path.
Many enrichments in RedShred can generate files (e.g. extracted images) and will serve these back with a relative path that can be passed to this method to retrieve.
If inline is True, this will display images directly in notebook context. In these cases, unconfined and width will be passed directly to Image().
- Args:
storage_path (str): path to file as given in API response data inline (bool, optional): Whether to attempt to display the file inline in a notebook, images only. Defaults
to False.
unconfined (bool, optional): passed to IPython.core.display.Image, requires inline=True. Defaults to True. width (int, optional): passed to IPython.core.display.Image, requires inline=True. Defaults to 800. **kwargs: passed to IPython.core.display.Image, requires inline=True.
- Returns:
bytes: file contents as bytes
- get_text(api_object: redshred.models.api.ApiObject)[source]
Extract the text from a given api object.
- Args:
api_object: Any API object that has a get_text method.
- Returns:
str: The text extracted from the API object.
- stats(collection_name: str | Collection) dict[source]
Review the current states of documents in a collection.
Read state is one of [‘unread’, ‘queued’, ‘reading’, ‘read’, ‘crashed’]:
unread - newly uploaded documents that are not yet fully enriched and indexed
queued - documents that are awaiting reading
reading - documents that are currently being enriched by the RedShred reader
read - documents that have been read and are “at rest” in RedShred
crashed - documents that could not be successfully processed by RedShred.
Documents in crashed states can be reported to RedShred through the chat window in the documentation. It is our goal that all documents should be read successfully although the amount of enrichment may vary (e.g. encrypted PDFs shouldn’t crash, but will likely be sparsely enriched.)
- Args:
collection_name (str): Name of target collection
- Returns:
dict: Read statistics for collection
- property user
Retrieve the current authenticated user’s details.
- Returns:
RedShredUser: An object representing the authenticated user.
- Raises:
HTTPError: If authentication fails or an error occurs with the HTTP call. TypeError: If the data returned is not in the expected format. ConnectionError: If a connection error occurs.
- exception redshred.RedShredFileExistsError(*args, **kwargs)[source]
Bases:
RedShredHTTPError
- class redshred.Segment(*, self_link: str = None, segment_type: str = None, regions: GeoJSON = None, bounding_box: BoundingBox | None = None, collection_link: str = None, collection_slug: str = None, created_at: datetime = None, created_by: str = None, document_link: str = None, document_name: str = None, enrichment_data: dict | None = None, enrichment_name: str = None, errors: dict | None = None, id: str = None, labels: list = None, metadata: dict | None = None, max_x: float | None = None, max_y: float | None = None, min_x: float | None = None, min_y: float | None = None, perspective_link: str = None, summary: str = None, text: str = None, updated_at: datetime = None, updated_by: str = None, user_data: dict | None = None, warnings: dict | None = None, cache_id: str | None = None, **data)[source]
Bases:
ApiObjectA Segment class.
This class is used to create Segment objects. It provides methods to create, read, update, and delete the Segment object, get segments from the Segment object, and set the API.
- Attributes:
segment_type: The type of the Segment. regions: The regions of the Segment. bounding_box: The bounding box of the Segment. collection_link: The link to the collection the Segment belongs to. collection_slug: The slug of the collection the Segment belongs to. created_at: The date the Segment was created. created_by: The user who created the Segment. document_link: The link to the document the Segment belongs to. enrichment_data: The enrichment data of the Segment. enrichment_name: The name of the enrichment. errors: The errors of the Segment. id: The ID of the Segment. labels: The labels of the Segment. metadata: The metadata of the Segment. max_x: The maximum x-coordinate of the Segment. max_y: The maximum y-coordinate of the Segment. min_x: The minimum x-coordinate of the Segment. min_y: The minimum y-coordinate of the Segment. perspective_link: The link to the perspective of the Segment. self_link: The self link of the Segment. summary: The summary of the Segment. text: The text of the Segment. updated_at: The date the Segment was last updated. updated_by: The user who last updated the Segment. user_data: The user data of the Segment. warnings: The warnings of the Segment. cache_id: The cache ID of the Segment.
- between(segment: Segment, strict: bool = False) BoundingBox[source]
# TEMPORARILY NOT IMPLEMENTED Provides a helper function to generate the bounding box between two segments.
- Args:
segment (Segment): A segment to define the space strict (bool, optional): If True, returned bounding box will be area exatly between the two segments. If False, the bounding box returned will be the entire page width between two segments. Defaults to False.
- Returns:
list: Bounding box of the area between two segments.
- bounding_box: BoundingBox | None
- cache_id: str | None
- collection_link: str
- collection_slug: str
- create(perspective: Perspective | None = None)[source]
Create the local object on the remote server
- created_at: datetime.datetime
- created_by: str
- document_link: str
- document_name: str
- enrichment_data: dict | None
- enrichment_name: str
- errors: dict | None
- get_segment_image(path_to_save_folder: str | None = None, return_bytes=False, inline=False, **url_params) bytes | str[source]
Retrieves the image of the segment.
This method uses the SegmentCropper to get the cropped image of the segment. The image can be returned as bytes, opened inline using PIL, or saved to a specified folder.
- Args:
- path_to_save_folder (str, optional): The path to the folder where the image will be saved. Defaults to the
current working directory.
return_bytes (bool, optional): If True, the image will be returned as bytes. Defaults to False. inline (bool, optional): If True, the image will be opened inline using PIL. Defaults to False. **url_params: Additional URL parameters.
- Returns:
Union[bytes, str]: The image of the segment, either as bytes or a path to the saved image.
- get_segments_from_perspective(perspective_name: str, **params)[source]
Get all segments that are in the same perspective as this segment
- get_text(**url_params)[source]
Retrieves the text of the segment.
This method uses the TokenLookup to get the text of the segment.
- Args:
**url_params: Additional URL parameters.
- Returns:
str: The text of the segment.
- id: str
- labels: list
- max_x: float | None
- max_y: float | None
- metadata: dict | None
- min_x: float | None
- min_y: float | None
- perspective_link: str
- q(query: str, search_type: Literal['documents', 'pages', 'perspectives', 'segments'] = 'segments', **url_params)[source]
Executes a query on the API object.
This method uses the provided query and search type to execute a search on the API object. The search type can be one of “documents”, “pages”, “perspectives”, or “segments”. Additional URL parameters can be provided as keyword arguments.
- Args:
query (str): The query to execute. search_type (str): The type of search to perform. Defaults to “segments”. **url_params: Additional URL parameters.
- Returns:
An iterator over the results of the query.
- segment_type: str
- self_link: str
- summary: str
- text: str
- updated_at: datetime.datetime
- updated_by: str
- user_data: dict | None
- warnings: dict | None