Welcome to RedShred API Client’s documentation!
Basic Usage
Authenticating the client
There are a few different options for authentication; by explicitly specifying the server information, using a RedShred configuration file, or using environmental variables. For the time being we will only discus the first option, but you can rever to the API documentation for more information on the others.
Here is how you would authenticate with explicit credentials:
1from redshred.api import RedShredClient
2client = RedShredClient(token="ae6daceef240103b2b9e9b562ff6784690cdd0fb", host="https://api.redshred.com"
3print(client.user.json(indent=2))
{
"active": true,
"email": "johndoe@theinternet.com",
"first_name": "John",
"joined": "2022-11-15T15:40:56.928582+00:00",
"last_login": "2023-01-24T15:59:24.019915+00:00",
"last_name": "Doe",
"username": "johndoe@theinternet.com"
}
Congratulations, you are now authenticated!
Creating Collections and Documents
In order to process and upload documents, we need a collection to store them in.
1from redshred.models.api import Collection
2
3# Create the collection object locally
4collection = Collection(slug="my-collection")
5
6# associate it with our client and create it remotely
7collection.create(client)
8
9print(collection.yaml())
config: {}
created_at: '2023-01-24T19:44:27.874546'
created_by: johndoe@theinternet.com
description: null
documents_link: https://api.dev.redshred.com/v2/collections/my-collection/documents
id: ETiN7a7EAsNYb2CwX4zkvY
marked_for_delete: null
metadata: null
name: null
owner: johndoe@theinternet.com
perspectives_link: https://api.dev.redshred.com/v2/collections/my-collection/perspectives
segments_link: https://api.dev.redshred.com/v2/collections/my-collection/segments
self_link: https://api.dev.redshred.com/v2/collections/my-collection
slug: my-collection
updated_at: '2023-01-24T19:44:27.874563'
updated_by: johndoe@theinternet.com
user_data: {}
Once we have created a collection we will want to upload a document:
1# First we can retrieve the collection we created above using similar syntax as creating it
2collection = client.collection("my-collection")
3document = collection.upload_file("/home/johndoe/Documents/my-document.pdf")
4
5print(document.json(indent=2)))
{
"self_link": "https://api.dev.redshred.com/v2/collections/my-collection/documents/aE9J62ZR7RAhVYaCeSwftJ",
"id": "aE9J62ZR7RAhVYaCeSwftJ",
"collection_link": "https://api.dev.redshred.com/v2/collections/my-collection",
"collection_slug": "my-collection",
"config": null,
"content_hash": "22132ba64ec6bf79eabbba1b57ce9c8d8663bbc0e1252f17032f79b693d2edfa",
"created_at": "2023-01-24T19:50:26.839768+00:00",
"created_by": "johndoe@theinternet.com",
"csv_metadata": null,
"description": null,
"document_segment_link": null,
"errors": null,
"file_link": "https://api.dev.redshred.com/v2/files/my-collection/s22132ba64ec6bf79eabbba1b57ce9c8d8663bbc0e1252f17032f79b693d2edfa.pdf?name=my-document.pdf",
"file_size": 21367,
"index": 1,
"metadata": null,
"n_pages": null,
"name": "my-document.pdf",
"original_name": "my-document.pdf",
"pages_link": "https://api.dev.redshred.com/v2/collections/my-collection/documents/aE9J62ZR7RAhVYaCeSwftJ/pages",
"pdf_link": "https://api.dev.redshred.com/v2/files/my-collection/s22132ba64ec6bf79eabbba1b57ce9c8d8663bbc0e1252f17032f79b693d2edfa.pdf",
"perspectives_link": "https://api.dev.redshred.com/v2/collections/my-collection/documents/aE9J62ZR7RAhVYaCeSwftJ/perspectives",
"read_state": "queued",
"read_state_updated_at": "2023-01-24T19:50:26.961375+00:00",
"region": {
"coordinates": [
[
[0.0, 0.0],
[1.0, 0.0],
[1.0, 1.0],
[0.0, 1.0],
[0.0, 0.0]
]
],
"type": "Polygon"
},
"segments_link": "https://api.dev.redshred.com/v2/collections/my-collection/documents/aE9J62ZR7RAhVYaCeSwftJ/segments",
"slug": "my-documentpdf",
"source": "file",
"summary": null,
"text": null,
"updated_at": "2023-01-24T19:50:26.894566+00:00",
"updated_by": "johndoe@theinternet.com",
"user_data": null,
"warnings": null,
}
We notice above that most of the information we would expect to see, like text or n_pages is empty. This is because the document is still reading! We can wait for the document to read and see what’s different from above like so:
1document.wait_until_read() # the interpreter will pause here until the document has finished reading remotely
2print(document.yaml(include={"text", "n_pages"}))
n_pages: 2
text: The rain in Spain stays mainly in the plain...
Accessing and Interacting with API Objects
1for collection in client.collections():
2 print(f"{collection.slug!r} created at {collection.created_at} by {collection.created_by}")
3 if collection.slug == "my-collection":
4 print("my-collection found!")
5 break
6
7# First we find our document. Since we know we only have one with the title "my-document.pdf", we can just
8# just get the first search result from querying the server via its attributes
9document = collection.documents(name="my-document.pdf").first()
10
11# we can then do a similar operation to find our text perspective, called "typography" and all of its paragraph
12# segments
13typography_perspective = document.perspectives(name="typography").first()
14for segment in typography_perspective.segments(segment_type="paragraph"):
15 print(f"paragraph id: {segment.id}")
16 print(segment.text)
'my-other-collection' created at 2023-01-24 19:44:27.874546+00:00 by johndoe@theinternet.com
'my-collection' created at 2023-01-11 17:37:03.342783+00:00 by johndoe@theinternet.com
my-collection found!
paragraph id: WmXRmpJVseMgdGL5TbM8b6
The rain in Spain stays mainly in the plain
API Documentation
Contents:
- redshred package
- Subpackages
- redshred.api package
- redshred.cli package
- redshred.enrichments package
- Submodules
- redshred.enrichments.base module
- redshred.enrichments.defined_acronyms module
- redshred.enrichments.external_api module
- redshred.enrichments.grouper module
GrouperPerspectiveGrouperPerspectiveConfigGrouperPerspectiveConfig.ConfigGrouperPerspectiveConfig.hull_methodGrouperPerspectiveConfig.hull_method_optionsGrouperPerspectiveConfig.operation_labelsGrouperPerspectiveConfig.operationsGrouperPerspectiveConfig.root_labelGrouperPerspectiveConfig.whitespace_calculation_methodGrouperPerspectiveConfig.whitespace_method_optionsGrouperPerspectiveConfig.x_gapGrouperPerspectiveConfig.y_gap
GrouperPerspectiveHullMethodobject_setattr()
- redshred.enrichments.huggingface module
HuggingfacePerspectiveHuggingfacePerspectiveConfigHuggingfacePerspectiveConfig.ConfigHuggingfacePerspectiveConfig.modelHuggingfacePerspectiveConfig.model_classHuggingfacePerspectiveConfig.model_sourceHuggingfacePerspectiveConfig.pipeline_taskHuggingfacePerspectiveConfig.task_configHuggingfacePerspectiveConfig.task_config_classHuggingfacePerspectiveConfig.task_specific_templateHuggingfacePerspectiveConfig.tokenizerHuggingfacePerspectiveConfig.tokenizer_class
object_setattr()
- redshred.enrichments.iris module
- redshred.enrichments.page_images module
BackendOptionsPageImagesPerspectivePageImagesPerspectiveBackendPageImagesPerspectiveConfigobject_setattr()
- redshred.enrichments.pdftotext module
- redshred.enrichments.preprocess module
- redshred.enrichments.regex module
- redshred.enrichments.sentences module
- redshred.enrichments.spacy module
- redshred.enrichments.tfidf module
TFIDFPerspectiveTFIDFPerspectiveConfigTFIDFPerspectiveNormobject_setattr()
- redshred.enrichments.typography module
- Module contents
- redshred.microservices package
- redshred.models package
- Submodules
- redshred.models.api module
APIObjectIteratorApiObjectCollectionCollection.clientCollection.configCollection.create()Collection.created_atCollection.created_byCollection.delete()Collection.descriptionCollection.document()Collection.documents()Collection.documents_linkCollection.idCollection.load()Collection.marked_for_deleteCollection.metadataCollection.nameCollection.ownerCollection.perspective()Collection.perspectives()Collection.perspectives_linkCollection.segment()Collection.segments()Collection.segments_linkCollection.self_linkCollection.slugCollection.updated_atCollection.updated_byCollection.upload_csv()Collection.upload_file()Collection.upload_text()Collection.upload_url()Collection.user_data
CollectionIteratorDocumentDocument.collection()Document.collection_linkDocument.collection_slugDocument.configDocument.content_hashDocument.create()Document.created_atDocument.created_byDocument.csv_metadataDocument.descriptionDocument.document_segment_linkDocument.download()Document.download_bytes()Document.errorsDocument.file_linkDocument.file_sizeDocument.idDocument.indexDocument.metadataDocument.n_pagesDocument.nameDocument.original_nameDocument.page()Document.pages()Document.pages_linkDocument.pdf_linkDocument.perspective()Document.perspectives()Document.perspectives_linkDocument.read_stateDocument.read_state_updated_atDocument.regionDocument.reread_document()Document.segment()Document.segments()Document.segments_linkDocument.self_linkDocument.slugDocument.sourceDocument.summaryDocument.textDocument.uniqueness_idDocument.updated_atDocument.updated_byDocument.user_dataDocument.wait_until_read()Document.warnings
DocumentIteratorPagePage.collection()Page.collection_linkPage.collection_slugPage.content_hashPage.created_atPage.created_byPage.document()Page.document_indexPage.document_namePage.dpiPage.heightPage.idPage.indexPage.metadataPage.namePage.next()Page.page_segment_linkPage.perspective()Page.perspectives()Page.perspectives_linkPage.previous()Page.regionPage.segment()Page.segments()Page.segments_linkPage.self_linkPage.summaryPage.textPage.tokens()Page.tokens_file_linkPage.unitsPage.updated_atPage.updated_byPage.user_dataPage.width
PageIteratorPerspectivePerspective.bulk_create_segments()Perspective.cache_idPerspective.collection()Perspective.collection_linkPerspective.collection_slugPerspective.create()Perspective.created_atPerspective.created_byPerspective.descriptionPerspective.document()Perspective.document_linkPerspective.document_namePerspective.enrichment_configPerspective.enrichment_namePerspective.errorsPerspective.idPerspective.metadataPerspective.namePerspective.segment()Perspective.segment_typesPerspective.segments()Perspective.segments_linkPerspective.self_linkPerspective.slugPerspective.updated_atPerspective.updated_byPerspective.user_dataPerspective.warnings
PerspectiveIteratorRedShredUserSegmentSegment.between()Segment.bounding_boxSegment.cache_idSegment.collection()Segment.collection_linkSegment.collection_slugSegment.create()Segment.created_atSegment.created_bySegment.document()Segment.document_linkSegment.document_nameSegment.enrichment_dataSegment.enrichment_nameSegment.errorsSegment.get_segment_image()Segment.get_segments_from_perspective()Segment.get_text()Segment.idSegment.labelsSegment.max_xSegment.max_ySegment.metadataSegment.min_xSegment.min_ySegment.perspective()Segment.perspective_linkSegment.q()Segment.regionsSegment.segment_typeSegment.self_linkSegment.summarySegment.textSegment.updated_atSegment.updated_bySegment.user_dataSegment.warnings
SegmentIteratorSerializableModelTokenget_type()
- redshred.models.configuration module
ChoiceSegmenterChoiceSentenceTokenizerCollectionConfigurationCollectionConfiguration.ConfigCollectionConfiguration.allow_anonymous_downloadsCollectionConfiguration.dict()CollectionConfiguration.document_uniquenessCollectionConfiguration.enrichmentsCollectionConfiguration.from_dict()CollectionConfiguration.json()CollectionConfiguration.notificationsCollectionConfiguration.tokenizerCollectionConfiguration.validate_remote_schema()CollectionConfiguration.yaml()
CustomSegmenterDocumentUniquenessNotificationConfigurationPerspectiveConfigurationPerspectiveSegmentQueryTokenizersTypographyConfiguration
- redshred.models.enrichments module
- Module contents
- redshred.visualize package
- Submodules
- redshred.configuration module
- redshred.exceptions module
- redshred.spatial module
BoundingBoxBoundingBox.as_boundingbox()BoundingBox.as_geojson()BoundingBox.as_numpy()BoundingBox.as_shape()BoundingBox.from_shape()BoundingBox.get_bounds()BoundingBox.get_offsets()BoundingBox.heightBoundingBox.json()BoundingBox.min_xBoundingBox.min_yBoundingBox.normalize_to_page()BoundingBox.rotate()BoundingBox.scale()BoundingBox.translate()BoundingBox.width
GeoJSONGeoJSON.as_boundingbox()GeoJSON.as_geojson()GeoJSON.as_shape()GeoJSON.convex_hull()GeoJSON.from_bounds()GeoJSON.from_coords()GeoJSON.from_shape()GeoJSON.get_bounds()GeoJSON.get_coordinates()GeoJSON.get_offsets()GeoJSON.heightGeoJSON.json()GeoJSON.min_xGeoJSON.min_yGeoJSON.normalize_to_page()GeoJSON.rotate()GeoJSON.scale()GeoJSON.translate()GeoJSON.width
- redshred.util module
- Module contents
- Subpackages