Storage Type
As peers connected to PeerMR execute the stages of each job, they store the output of the map/reduce operations in either PMRFS, GCS or S3. GCS and S3 are faster and more reliable, but billing costs for IO and storage are typically higher than PMRFS. With PMRFS, workers store the data in the same browsers they are running in using IndexedDB and transfer the data to other peers using WebRTC. This is the less expensive storage option. PMRFS can be used in both public and private mode jobs. For public mode jobs, the data will be distributed between your coordinator, your coordinator's workers, and public workers connected to PeerMR. For private mode jobs, the data will only be distributed between your coordinator and your coordinator's workers.
Specify storage type when creating a JobExection
, either pmrfs
, gcs
or s3
:
const execution = new JobExecution(storageType = 'pmrfs');
GCS Setup
To use GCS, you must add your details in the Cloud Data page under the GCS section. Signed URLs are used for workers to get temporary permission to write job output to your GCS bucket. Follow the instructions here to get a private key and service account email that PeerMR uses to generate signed urls for your bucket.
To allow workers to read/write from your bucket, set a CORS policy on the bucket so that requests coming from workers will succeed. Use the following policy to and set CORS on your bucket:
[
{
"origin": [
"https://www.peermr.com"
],
"responseHeader": [
"Content-Type",
"Access-Control-Allow-Origin",
],
"method": [
"GET",
"HEAD",
"PUT"
],
"maxAgeSeconds": 3600
}
]
S3 Setup
To use S3, you must add your details in the Cloud Data page under the S3 section. Signed URLs are used for workers to get temporary permission to write job output to your S3 bucket.
To allow workers to read/write from your bucket, set a CORS policy on the bucket so that requests coming from workers will succeed. Use the following policy to and set CORS on your bucket:
[
{
"AllowedHeaders": [
"*"
],
"AllowedMethods": [
"GET",
"HEAD",
"PUT"
],
"AllowedOrigins": [
"https://www.peermr.com"
],
"ExposeHeaders": [
"Content-Type",
"Access-Control-Allow-Origin"
],
"MaxAgeSeconds": 3600
}
]
Grant public read to your bucket so that workers can read from your bucket:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "PublicReadGetObject",
"Effect": "Allow",
"Principal": "*",
"Action": [
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::<your-bucket-name>/*"
]
}
]
}