Welcome to RedShred API Client’s documentation!
Basic Usage
Authenticating the client
There are a few different options for authentication; by explicitly specifying the server information, using a RedShred configuration file, or using environmental variables. For the time being we will only discus the first option, but you can rever to the API documentation for more information on the others.
Here is how you would authenticate with explicit credentials:
1from redshred import RedShredClient
2client = RedShredClient(token="ae6daceef240103b2b9e9b562ff6784690cdd0fb", host="https://api.redshred.com"
3print(client.user.json(indent=2))
{
"active": true,
"email": "johndoe@theinternet.com",
"first_name": "John",
"joined": "2022-11-15T15:40:56.928582+00:00",
"last_login": "2023-01-24T15:59:24.019915+00:00",
"last_name": "Doe",
"username": "johndoe@theinternet.com"
}
Congratulations, you are now authenticated!
Creating Collections and Documents
In order to process and upload documents, we need a collection to store them in.
1from redshred import Collection
2
3# Create the collection object locally
4collection = Collection(slug="my-collection")
5
6# associate it with our client and create it remotely
7collection.create(client)
8
9print(collection.yaml())
config: {}
created_at: '2023-01-24T19:44:27.874546'
created_by: johndoe@theinternet.com
description: null
documents_link: https://api.dev.redshred.com/v2/collections/my-collection/documents
id: ETiN7a7EAsNYb2CwX4zkvY
marked_for_delete: null
metadata: null
name: null
owner: johndoe@theinternet.com
perspectives_link: https://api.dev.redshred.com/v2/collections/my-collection/perspectives
segments_link: https://api.dev.redshred.com/v2/collections/my-collection/segments
self_link: https://api.dev.redshred.com/v2/collections/my-collection
slug: my-collection
updated_at: '2023-01-24T19:44:27.874563'
updated_by: johndoe@theinternet.com
user_data: {}
Once we have created a collection we will want to upload a document:
1# First we can retrieve the collection we created above using similar syntax as creating it
2collection = client.collection("my-collection")
3document = collection.upload_file("/home/johndoe/Documents/my-document.pdf")
4
5print(document.json(indent=2)))
{
"self_link": "https://api.dev.redshred.com/v2/collections/my-collection/documents/aE9J62ZR7RAhVYaCeSwftJ",
"id": "aE9J62ZR7RAhVYaCeSwftJ",
"collection_link": "https://api.dev.redshred.com/v2/collections/my-collection",
"collection_slug": "my-collection",
"config": null,
"content_hash": "22132ba64ec6bf79eabbba1b57ce9c8d8663bbc0e1252f17032f79b693d2edfa",
"created_at": "2023-01-24T19:50:26.839768+00:00",
"created_by": "johndoe@theinternet.com",
"csv_metadata": null,
"description": null,
"document_segment_link": null,
"errors": null,
"file_link": "https://api.dev.redshred.com/v2/files/my-collection/s22132ba64ec6bf79eabbba1b57ce9c8d8663bbc0e1252f17032f79b693d2edfa.pdf?name=my-document.pdf",
"file_size": 21367,
"index": 1,
"metadata": null,
"n_pages": null,
"name": "my-document.pdf",
"original_name": "my-document.pdf",
"pages_link": "https://api.dev.redshred.com/v2/collections/my-collection/documents/aE9J62ZR7RAhVYaCeSwftJ/pages",
"pdf_link": "https://api.dev.redshred.com/v2/files/my-collection/s22132ba64ec6bf79eabbba1b57ce9c8d8663bbc0e1252f17032f79b693d2edfa.pdf",
"perspectives_link": "https://api.dev.redshred.com/v2/collections/my-collection/documents/aE9J62ZR7RAhVYaCeSwftJ/perspectives",
"read_state": "queued",
"read_state_updated_at": "2023-01-24T19:50:26.961375+00:00",
"region": {
"coordinates": [
[
[0.0, 0.0],
[1.0, 0.0],
[1.0, 1.0],
[0.0, 1.0],
[0.0, 0.0]
]
],
"type": "Polygon"
},
"segments_link": "https://api.dev.redshred.com/v2/collections/my-collection/documents/aE9J62ZR7RAhVYaCeSwftJ/segments",
"slug": "my-documentpdf",
"source": "file",
"summary": null,
"text": null,
"updated_at": "2023-01-24T19:50:26.894566+00:00",
"updated_by": "johndoe@theinternet.com",
"user_data": null,
"warnings": null,
}
We notice above that most of the information we would expect to see, like text or n_pages is empty. This is because the document is still reading! We can wait for the document to read and see what’s different from above like so:
1document.wait_until_read() # the interpreter will pause here until the document has finished reading remotely
2print(document.yaml(include={"text", "n_pages"}))
n_pages: 2
text: The rain in Spain stays mainly in the plain...
Accessing and Interacting with API Objects
1for collection in client.collections():
2 print(f"{collection.slug!r} created at {collection.created_at} by {collection.created_by}")
3 if collection.slug == "my-collection":
4 print("my-collection found!")
5 break
6
7# First we find our document. Since we know we only have one with the title "my-document.pdf", we can just
8# just get the first search result from querying the server via its attributes
9document = collection.documents(name="my-document.pdf").first()
10
11# we can then do a similar operation to find our text perspective, called "typography" and all of its paragraph
12# segments
13typography_perspective = document.perspectives(name="typography").first()
14for segment in typography_perspective.segments(segment_type="paragraph"):
15 print(f"paragraph id: {segment.id}")
16 print(segment.text)
'my-other-collection' created at 2023-01-24 19:44:27.874546+00:00 by johndoe@theinternet.com
'my-collection' created at 2023-01-11 17:37:03.342783+00:00 by johndoe@theinternet.com
my-collection found!
paragraph id: WmXRmpJVseMgdGL5TbM8b6
The rain in Spain stays mainly in the plain
RSQL Call Query Creation
The Redshred API supports RSQL (Redshred Query Language) for advanced querying. The client library provides a safe way to construct these queries using string templates and automatic escaping of special characters.
Basic Query
1 rs = RedShredClient()
2 collections = rs.collections(q='text="rose" and segment_type="paragraph"')
Template with Values
Special Character Handling
Common Use Cases
API Documentation
Contents:
- redshred package
- Subpackages
- redshred.api package
- redshred.cli package
- redshred.enrichments package
- Submodules
- redshred.enrichments.base module
- redshred.enrichments.defined_acronyms module
- redshred.enrichments.external_api module
- redshred.enrichments.grouper module
GrouperPerspective
GrouperPerspectiveConfig
GrouperPerspectiveConfig.Config
GrouperPerspectiveConfig.hull_method
GrouperPerspectiveConfig.hull_method_options
GrouperPerspectiveConfig.operation_labels
GrouperPerspectiveConfig.operations
GrouperPerspectiveConfig.root_label
GrouperPerspectiveConfig.whitespace_calculation_method
GrouperPerspectiveConfig.whitespace_method_options
GrouperPerspectiveConfig.x_gap
GrouperPerspectiveConfig.y_gap
GrouperPerspectiveHullMethod
object_setattr()
- redshred.enrichments.huggingface module
HuggingfacePerspective
HuggingfacePerspectiveConfig
HuggingfacePerspectiveConfig.Config
HuggingfacePerspectiveConfig.model
HuggingfacePerspectiveConfig.model_class
HuggingfacePerspectiveConfig.model_source
HuggingfacePerspectiveConfig.pipeline_task
HuggingfacePerspectiveConfig.task_config
HuggingfacePerspectiveConfig.task_config_class
HuggingfacePerspectiveConfig.task_specific_template
HuggingfacePerspectiveConfig.tokenizer
HuggingfacePerspectiveConfig.tokenizer_class
object_setattr()
- redshred.enrichments.iris module
- redshred.enrichments.page_images module
BackendOptions
PageImagesPerspective
PageImagesPerspectiveBackend
PageImagesPerspectiveConfig
object_setattr()
- redshred.enrichments.pdftotext module
- redshred.enrichments.preprocess module
- redshred.enrichments.regex module
- redshred.enrichments.sentences module
- redshred.enrichments.spacy module
- redshred.enrichments.tfidf module
TFIDFPerspective
TFIDFPerspectiveConfig
TFIDFPerspectiveNorm
object_setattr()
- redshred.enrichments.typography module
- Module contents
- redshred.microservices package
- redshred.models package
- Submodules
- redshred.models.api module
APIObjectIterator
ApiObject
Collection
Collection.client
Collection.config
Collection.create()
Collection.created_at
Collection.created_by
Collection.delete()
Collection.description
Collection.document()
Collection.documents()
Collection.documents_link
Collection.id
Collection.load()
Collection.marked_for_delete
Collection.metadata
Collection.name
Collection.owner
Collection.perspective()
Collection.perspectives()
Collection.perspectives_link
Collection.segment()
Collection.segments()
Collection.segments_link
Collection.self_link
Collection.slug
Collection.updated_at
Collection.updated_by
Collection.upload_csv()
Collection.upload_file()
Collection.upload_text()
Collection.upload_url()
Collection.user_data
CollectionIterator
Document
Document.collection()
Document.collection_link
Document.collection_slug
Document.config
Document.content_hash
Document.create()
Document.created_at
Document.created_by
Document.csv_metadata
Document.description
Document.document_segment_link
Document.download()
Document.download_bytes()
Document.errors
Document.file_link
Document.file_size
Document.id
Document.index
Document.metadata
Document.n_pages
Document.name
Document.original_name
Document.page()
Document.pages()
Document.pages_link
Document.pdf_link
Document.perspective()
Document.perspectives()
Document.perspectives_link
Document.read_state
Document.read_state_updated_at
Document.region
Document.reread_document()
Document.segment()
Document.segments()
Document.segments_link
Document.self_link
Document.slug
Document.source
Document.summary
Document.text
Document.uniqueness_id
Document.updated_at
Document.updated_by
Document.user_data
Document.wait_until_read()
Document.warnings
DocumentIterator
Page
Page.collection()
Page.collection_link
Page.collection_slug
Page.content_hash
Page.created_at
Page.created_by
Page.document()
Page.document_index
Page.document_name
Page.dpi
Page.height
Page.id
Page.index
Page.metadata
Page.name
Page.next()
Page.page_segment_link
Page.perspective()
Page.perspectives()
Page.perspectives_link
Page.previous()
Page.region
Page.segment()
Page.segments()
Page.segments_link
Page.self_link
Page.summary
Page.text
Page.tokens()
Page.tokens_file_link
Page.units
Page.updated_at
Page.updated_by
Page.user_data
Page.width
PageIterator
Perspective
Perspective.bulk_create_segments()
Perspective.cache_id
Perspective.collection()
Perspective.collection_link
Perspective.collection_slug
Perspective.create()
Perspective.created_at
Perspective.created_by
Perspective.description
Perspective.document()
Perspective.document_link
Perspective.document_name
Perspective.enrichment_config
Perspective.enrichment_name
Perspective.errors
Perspective.id
Perspective.metadata
Perspective.name
Perspective.segment()
Perspective.segment_types
Perspective.segments()
Perspective.segments_link
Perspective.self_link
Perspective.slug
Perspective.updated_at
Perspective.updated_by
Perspective.user_data
Perspective.warnings
PerspectiveIterator
RedShredUser
Segment
Segment.between()
Segment.bounding_box
Segment.cache_id
Segment.collection()
Segment.collection_link
Segment.collection_slug
Segment.create()
Segment.created_at
Segment.created_by
Segment.document()
Segment.document_link
Segment.document_name
Segment.enrichment_data
Segment.enrichment_name
Segment.errors
Segment.get_segment_image()
Segment.get_segments_from_perspective()
Segment.get_text()
Segment.id
Segment.labels
Segment.max_x
Segment.max_y
Segment.metadata
Segment.min_x
Segment.min_y
Segment.perspective()
Segment.perspective_link
Segment.q()
Segment.regions
Segment.segment_type
Segment.self_link
Segment.summary
Segment.text
Segment.updated_at
Segment.updated_by
Segment.user_data
Segment.warnings
SegmentIterator
SerializableModel
Token
get_type()
- redshred.models.configuration module
AdvancedOCRTokenizerConfig
AdvancedOCRTokenizerConfigOptions
CollectionConfiguration
CollectionConfiguration.Config
CollectionConfiguration.allow_anonymous_downloads
CollectionConfiguration.dict()
CollectionConfiguration.document_uniqueness
CollectionConfiguration.enrichments
CollectionConfiguration.from_dict()
CollectionConfiguration.json()
CollectionConfiguration.notifications
CollectionConfiguration.tokenizer
CollectionConfiguration.validate_remote_schema()
CollectionConfiguration.yaml()
ConfiguredTokenizer
DocumentUniqueness
NotificationConfiguration
PerspectiveConfiguration
TesseractTokenizerConfig
TesseractTokenizerConfigOptions
Tokenizers
- Module contents
- redshred.visualize package
- Submodules
- redshred.configuration module
- redshred.exceptions module
- redshred.spatial module
BoundingBox
BoundingBox.as_boundingbox()
BoundingBox.as_geojson()
BoundingBox.as_numpy()
BoundingBox.as_shape()
BoundingBox.from_shape()
BoundingBox.get_bounds()
BoundingBox.get_offsets()
BoundingBox.height
BoundingBox.json()
BoundingBox.min_x
BoundingBox.min_y
BoundingBox.normalize_to_page()
BoundingBox.rotate()
BoundingBox.scale()
BoundingBox.translate()
BoundingBox.width
GeoJSON
GeoJSON.as_boundingbox()
GeoJSON.as_geojson()
GeoJSON.as_shape()
GeoJSON.convex_hull()
GeoJSON.from_bounds()
GeoJSON.from_coords()
GeoJSON.from_shape()
GeoJSON.get_bounds()
GeoJSON.get_coordinates()
GeoJSON.get_offsets()
GeoJSON.height
GeoJSON.json()
GeoJSON.min_x
GeoJSON.min_y
GeoJSON.normalize_to_page()
GeoJSON.rotate()
GeoJSON.scale()
GeoJSON.translate()
GeoJSON.width
- redshred.util module
- Module contents
Collection
Collection.client
Collection.config
Collection.create()
Collection.created_at
Collection.created_by
Collection.delete()
Collection.description
Collection.document()
Collection.documents()
Collection.documents_link
Collection.id
Collection.load()
Collection.marked_for_delete
Collection.metadata
Collection.name
Collection.owner
Collection.perspective()
Collection.perspectives()
Collection.perspectives_link
Collection.segment()
Collection.segments()
Collection.segments_link
Collection.self_link
Collection.slug
Collection.updated_at
Collection.updated_by
Collection.upload_csv()
Collection.upload_file()
Collection.upload_text()
Collection.upload_url()
Collection.user_data
CollectionConfiguration
CollectionConfiguration.Config
CollectionConfiguration.allow_anonymous_downloads
CollectionConfiguration.dict()
CollectionConfiguration.document_uniqueness
CollectionConfiguration.enrichments
CollectionConfiguration.from_dict()
CollectionConfiguration.json()
CollectionConfiguration.notifications
CollectionConfiguration.tokenizer
CollectionConfiguration.validate_remote_schema()
CollectionConfiguration.yaml()
Document
Document.collection()
Document.collection_link
Document.collection_slug
Document.config
Document.content_hash
Document.create()
Document.created_at
Document.created_by
Document.csv_metadata
Document.description
Document.document_segment_link
Document.download()
Document.download_bytes()
Document.errors
Document.file_link
Document.file_size
Document.id
Document.index
Document.metadata
Document.n_pages
Document.name
Document.original_name
Document.page()
Document.pages()
Document.pages_link
Document.pdf_link
Document.perspective()
Document.perspectives()
Document.perspectives_link
Document.read_state
Document.read_state_updated_at
Document.region
Document.reread_document()
Document.segment()
Document.segments()
Document.segments_link
Document.self_link
Document.slug
Document.source
Document.summary
Document.text
Document.uniqueness_id
Document.updated_at
Document.updated_by
Document.user_data
Document.wait_until_read()
Document.warnings
Page
Page.collection()
Page.collection_link
Page.collection_slug
Page.content_hash
Page.created_at
Page.created_by
Page.document()
Page.document_index
Page.document_name
Page.dpi
Page.height
Page.id
Page.index
Page.metadata
Page.name
Page.next()
Page.page_segment_link
Page.perspective()
Page.perspectives()
Page.perspectives_link
Page.previous()
Page.region
Page.segment()
Page.segments()
Page.segments_link
Page.self_link
Page.summary
Page.text
Page.tokens()
Page.tokens_file_link
Page.units
Page.updated_at
Page.updated_by
Page.user_data
Page.width
Perspective
Perspective.bulk_create_segments()
Perspective.cache_id
Perspective.collection()
Perspective.collection_link
Perspective.collection_slug
Perspective.create()
Perspective.created_at
Perspective.created_by
Perspective.description
Perspective.document()
Perspective.document_link
Perspective.document_name
Perspective.enrichment_config
Perspective.enrichment_name
Perspective.errors
Perspective.id
Perspective.metadata
Perspective.name
Perspective.segment()
Perspective.segment_types
Perspective.segments()
Perspective.segments_link
Perspective.self_link
Perspective.slug
Perspective.updated_at
Perspective.updated_by
Perspective.user_data
Perspective.warnings
PerspectiveConfiguration
RedShredAPIError
RedShredClient
RedShredFileExistsError
RedShredHTTPError
Segment
Segment.between()
Segment.bounding_box
Segment.cache_id
Segment.collection()
Segment.collection_link
Segment.collection_slug
Segment.create()
Segment.created_at
Segment.created_by
Segment.document()
Segment.document_link
Segment.document_name
Segment.enrichment_data
Segment.enrichment_name
Segment.errors
Segment.get_segment_image()
Segment.get_segments_from_perspective()
Segment.get_text()
Segment.id
Segment.labels
Segment.max_x
Segment.max_y
Segment.metadata
Segment.min_x
Segment.min_y
Segment.perspective()
Segment.perspective_link
Segment.q()
Segment.regions
Segment.segment_type
Segment.self_link
Segment.summary
Segment.text
Segment.updated_at
Segment.updated_by
Segment.user_data
Segment.warnings
- Subpackages