Key Concepts

Compute

PeerMR provides compute to users in the form of jobs. Jobs are defined in JavaScript and executed entirely in the browser. Jobs are distributed and run in parallel across multiple browsers via map and reduce constructs. MPI-like primitives are also provided to allow direct communication between workers. A job consists of one or more stages; each stage is either a Map stage or a Reduce stage. Map stages are stateless, they send their output to the next stage immediately on completion of input processing. Reduce stages are stateful, they can only send their output onto the next stage once the entire set of known inputs is fully processed.

Storage

For storage, you can bring your own GCS and S3 or use PMRFS, a distributed file system provided by PeerMR. Architecturally PMRFS is similar to HDFS. There is a single component that coordinates the file system operations among multiple browser nodes which actually store and serve the data. Unlike HDFS, PMRFS persistence is ephemeral. Because clients can come and go at anytime, it is not guaranteed that stored files will be present due to one or more parts being missing. PMRFS is mainly meant for intermediate storage of data during compute and is not meant to store objects long term, although it can be used for that purpose. Its core purpose is to provide a storage layer for the inputs and outputs of the compute layer. You can also choose to use S3 or GCS instead of PMRFS.

Buckets

Similar to S3 and GCS, PMRFS buckets are top level objects. Bucket names are globally unique. Each user is limited to 20 buckets.

Objects

Objects belong to a bucket and are identified by key that look like file paths. Key names are unique across a bucket. Unlike HDFS, not all objects are immutable. Objects that are intermediate inputs/outputs generated during job processing may be deleted unless the saveIntermediateOutput flag is set to true.

Blobs

Blobs are pieces of objects. They are binary data that are replicated across browsers. Blobs can become unavailable from PMRFS in multiple ways:

  • Users can delete blobs through the UI or with browser dev tools.
  • All browsers replicating a blob are disconnected from PeerMR.

If one or more blobs for an object are unavailable, the entire object is considered unavailable since the full object cannot be reconstructed. Like torrents, unavailable objects may be made available again if all blobs are brought back online.