mRpostman: An IMAP Client for R

Internet Message Access Protocol (IMAP) clients are a common feature in several programming languages. Despite having some packages for electronic messages retrieval, the R language, until recently, lacked a broader solution, capable of coping with different IMAP servers and providing a wide spectrum of features. mRpostman covers most of the IMAP 4rev1 functionalities by implementing tools for message searching, selective fetching of message attributes, mailbox management, attachment extraction, and several other IMAP features that can be executed in virtually any mail provider. By doing so, it enables users to perform data analysis based on e-mail content. The goal of this article is to showcase the toolkit provided with the mRpostman package, to describe its key features and provide some application examples.

The acknowledgement of the R programming language [1] as having remarkable statistical capabilities is much due to the excellence brought by its statistical and data analysis packages.This reputation also stands on the capabilities of a myriad of utility packages, which extends the use of the language by facilitating the integration of the steps involved in data collection, analysis, and communication.With that in mind, and considering the amount of data transmitted daily through e-mail, mRpostman was conceived to fill the absence of an Internet Message Access Protocol (IMAP) client in the R statistical environment; therefore, providing an appropriate toolkit for electronic messages retrieval, and paving the way for e-mail data analysis in R.
The Comprehensive R Archive Network (CRAN) has at least seven packages for sending emails (Table 1).Whereas some of these packages aim to provide a plain Simple Mail Transport Protocol (SMTP) client for R (e.g.sendmailR and emayili), others focus on more sophisticated implementations, using Application Program Interfaces (API), or providing seamless integration between SMTP and other R features such as rmarkdown [2].However, despite the surplus of available clients in R, the SMTP protocol is not suitable for receiving e-mails.It only allows clients to communicate with servers to deliver their messages.
For the purpose of message retrieval, there are the Post Office Protocol 3 (POP3) and the Internet Message Access Protocol (IMAP).In comparison with IMAP, POP3 is a very limited protocol, working as a simple interface for clients to download e-mails from servers.IMAP, on the other hand, is a much more complex protocol, and can be considered as the evolution of POP3, with a very different and broader set of functionalities.In contrast to POP3, all the messages are kept on the IMAP server and not locally.This means that a user can access the same mail account using parallel connections from different clients [3].Besides the mail folders structure and management, the capacity of issuing sophisticated search queries also contribute to the level of complexity of the IMAP protocol.
Amid CRAN packages for e-mail communication, only gmailr and edeR have IMAP capabilities (Table 1).However, those capabilities are restricted to Gmail accounts and few IMAP functionalities.Although gmailr supports both protocols, the package is more SMTP-focused, which explains its low number of IMAP features.Therefore, R was clearly lacking a broader IMAP client solution.It was in that mainstay that mRpostman was conceived.In this article, we present a brief view of the main functionalities of the package and its applications.

Software description
mRpostman is conceived to be an easy-to-use session-based IMAP client for R. The package implements intuitive methods for executing the majority of the IMAP commands described in the Request for Comments 35011 , such as mailbox management, and selectively search and fetch of message attributes.The package also implements complementary functions for decoding quoted-printable and base 64 content, following the MIME specification 2 .
All these methods and functions play an important role in facilitating email data analysis.We shall not overlook the amount of data analyses daily performed on e-mail content.The package has proved to be very useful as an additional feature in this workflow by, for instance, enabling the possibility of automating the attachments retrieval step.Also, by fetching other message contents, users are able to apply statistical techniques for analysing the frequency of e-mails with regard to some message aspect, running sentiment analysis on e-mail content, etc.
Since mRpostman works as a session-based IMAP client, one can think of the provided methods following a natural order in which the steps shall be organised in the event of an IMAP session (Fig. 1).For instance, if the goal is to search messages within a specific period of time and/or containing a specific word, first we need to configure the connection to the IMAP server; then, choose a mail folder where the search is to be performed; and execute the single criteria (left) or the custom multi-criteria search (right).If the user intends to fetch the matched message(s) or its parts, additional fetch steps can be chained to the described schema.mRpostman is flexible in the sense that the aforementioned steps can be used either under the tidy framework, with pipes [14], or via the conventional base R approach.

Software architeture
The software was designed following the object-oriented framework from the R6 package [15].A class called ImapCon is implemented to retain and organize the necessary IMAP connection parameters.All the methods that derive from this class will serve one of the two following purposes: to issue a request toward the IMAP server (request methods) or re-configure an existing IMAP connection (reset methods).
In order to execute IMAP commands, this package makes extensive use of the curl[16] R package3 .All mRpostman's request methods are built on top of the so-called curl handles.Under the hood, a curl handle consists of a C pointer variable that gathers the necessary parameters to execute a request to the server.As a matter of fact, the handle itself does not issue any command, but is used as a parameter inside a curl's fetch function.This last object is the one that actually triggers the request to the server, ranging from mail folder selection to search queries, or message fetch requests.
The object-oriented framework combined with the use of one curl handle per session enables mRpostman to elegantly run as a session based IMAP client, without demanding a connection reconfiguration between commands.For example, if a mail folder is selected on the current session, all requests using the same connection token will be performed on the selected folder, unless the user re-selects a different one.

Software functionalities 3.1.1. Configuring an IMAP connection
As we demonstrated in Fig. 1, the first step for using mRpostman is to configure an IMAP connection.It consists of creating a connection token object of class ImapCon that will retain all the relevant information to issue requests toward the server.
configure imap is the function used to configure and create a new IMAP connection.The mandatory arguments are three character strings: url, username, and password for plain authentication; or url, username, and xoauth2 bearer for OAuth2.0authentication4 .
The following example illustrates how to configure a connection to a Microsoft Exchange IMAP 4 server; more specifically, to an Office 365 Outlook account using plain authentication.library("mRpostman") con <-configure_imap(url = "imaps://outlook.office365.com",username = "user@agency.gov",password = rstudioapi::askForPassword()) We opted for using an Outlook Office 365 account as an example in order to highlight the difference between mRpostman and the other two CRAN packages which, although also capable of receiving e-mails, are restricted to Gmail accounts and fewer IMAP functionalities.Although mRpostman is able to theoretically connect to any mail provider 5 , the Outlook Office 365 service is broadly used by universities and companies.This enriches the range of data analyses applications of this package, thus justifying our choice.
In a hypothetical situation where the user needs to simultaneously connect to more than one e-mail account (in different providers or not) in the same R session, it can be easily attained by creating and configuring multiple connection tokens, such as con1, con2, and so on.

Selecting a mail folder
Mailboxes are structured as folders in the IMAP protocol.This allows us to replicate many of the operations done in a local folder such as creating, renaming or deleting folders.As messages are kept inside the mail folders, users need to select one of them whenever they intend to execute a search, fetch or other message-related operation, as presented in Fig. 1.
In this sense, the select folder method is one of the key features of this package.It selects a mail folder for the current IMAP section.The mandatory argument is a character string containing the name of the folder to be selected.
Supposing that we want to select the "INBOX" folder and considering that we are going to use the same connection object (con) that has been previously created, the command would be: Further details on other important mailbox management features are provided in [18].

Message search
The IMAP protocol is designed to allow the execution of single or multicriteria queries on the mailboxes.This package implements a vast range of IMAP search commands, which consist of a critical feature for performing data analysis on email content.
As of its version 1.0.0,mRpostman has five types of single-criterion search methods implemented: by date; string; flag, size; and span of time (WITHIN extension) 6 .The custom-search, on the other hand, enables the execution of multi-criteria queries by allowing the combination of two or more types of search.However, in this article, we will focus on the singlecriterion search-by-string type.
The search string method searches messages that contain a specific string or expression.One or more specific sections of a message, such as the TEXT section or the TO header field, for example, must be specified.
In the following code snippet, we search for messages from senders whose mail domain is "@ksu.edu".ids <-con$search_string(expr = "@ksu.edu",where = "FROM") The resulting object is a vector containing the matched unique ids (UID) or the message sequence numbers 7 such as presented below: Further details on the other single-search methods and the custom-search method available in this package are provided in [18].

Message fetch
After executing a search query, users may be interested in fetching the full content or some part of the messages indicated in the search results.In this regard, mRpostman implements six types of fetch features: fetch body Fetches the message body (message's full content), or an specified MIME level, which can refer to the text or the attachments if there are any.
fetch header Fetches the message header, which comprises all the components of the HEADER section of a message.Besides the traditional ones (from, to, cc, subject), it may include several more fields.
fetch metadata Fetches the message metadata, which consists of some message's attributes such as the internal date, and the envelope (from, to, cc, and subject fields). 6The WITHIN extension is not supported by all IMAP servers.A call to the listserver capabilities method will present all the IMAP extensions supported by the mail provider [18]. 7More details on the message identification methodology deployed by the IMAP protocol are provided in [19,12,18].fetch text Fetches the message text section, which can comprise attachment MIME levels if applicable.
Each of these methods can be seamlessly integrated into a previous search operation so that the returned ids are used as input for the fetch method.

Attachment extraction
In its pretension to be an IMAP client for R, mRpostman provides methods that enable users to list and download message payloads.This feature can be particularly critical for automating the analysis of attachment data files, for instance.
Attachments can be downloaded using two different approaches in this package: extending the fetch text/body operation by adding an attachment extraction step at the end of the workflow with get attachments; or directly fetching attachment parts via the fetch attachments method.In this article, we focus on the first type of attachment methods, adding a step to our previous workflow.
The get attachments method extracts attachment files from the fetched messages and saves these files to the disk.In the following code excerpt, we extract attachments in a unique pipeline that gathers fetching and search steps.
con$search_string(expr = "@ksu.edu",where = "FROM") %>% con$fetch_text() %>% con$get_attachments() During the execution, the software locally saves the extracted attachments into sub-folders inside the user's working directory.These sub-folders are named following the messages' ids.The attachments are placed into their respective messages' sub-folders as demonstrated in Fig. 2. Note that the parent levels are named after the informed username and the selected mail folder.
For more information on the other attachment-related methods, the reader should refer to the documentation in [18].

Illustrative Examples
To demonstrate the capabilities of the proposed software, we explore two use cases of this package in support of data analysis tasks: a simple study of the frequency of e-mails grouped by senders; and a sentiment analysis run on a set of e-mails received during a period.The R scripts needed for reproducing these examples are provided in the appendixes.Although the results cannot be exactly reproduced once it reflects the author's mailbox contents, they can be easily adapted to the reader's context.

Frequency analysis of e-mail data
In the first example, we run a simple analysis of the e-mail frequency with regard to senders.This can be especially useful in professional fields, such as marketing and customer service offices.A period of analysis was defined, and a search-by-date is performed using the search period method.Then, senders' information for the returned ids are fetched via fetch metadata, using the ENVELOPE attribute.After some basic manipulation with regular expressions, the data is ready to be plotted as shown in Fig. 3.
omitted@tbs−education.fr omitted@lsbu.ac.uk omitted@gmail.comcortana@microsoft.comno−reply@researchgatemail.net The same kind of analysis can be replicated for the messages' subjects with only a few modifications in the regular expressions code chunks.Considering that some companies/users deal with subject-standardized e-mails, this approach can be useful to analyze the frequency of e-mails with regard to different categories of subjects.

Sentiment analysis on e-mail data
For the sentiment analysis example, we also define a period of analysis and run a search period query.Then, we retrieve the text part of the messages by fetching the first MIME level with fetch body(..., mime level = 1L).The texts go trough a first cleaning step with a call to the clean msg text function.After further cleaning procedures, we use a lexicon [20] via the syuzhet package [21] to evaluate the sentiment of each e-mail.The output below is a subset of the resulting data frame.The last two columns indicate, respectively, the counts of negative and positive words for each message.The other columns provide counts related to detailed emotions, which are not necessarily positive nor negative.

Impact
As we have demonstrated, mRpostman clearly fills an existent gap of a broad, complete, and, at the same time, easy-to-use IMAP client for the R language.The package has consolidated itself as an important tool for collecting massive e-mail content, thus contributing to data analysis tasks in R.
Although all sort of users have been taking advantage of this package, we are inclined to think that its use has been prevailing amid companies.We have received a considerable number of feedback from enterprise users who deploy mRpostman as an additional feature for automatically producing daily reports based on attachment data files.Besides this, there are important applications for marketing and post-sales departments, for example.They can also deploy this package to collect e-mail data for analyzing e-mail frequency, or performing sentiment analysis, as we have demonstrated in Section 4.

Conclusions
mRpostman aims to provide an easy-to-use IMAP client for R. Its design allows the efficient, elegant, and intuitive execution of several IMAP commands on a wide range of mail providers.Consequently, users cannot only manage their mailboxes but also conduct e-mail data analysis from inside R. Finally, because IMAP is such a complex protocol, this package is in constant development, which means that new features are to be implemented in future versions.

Conflict of Interest
No conflict of interest exists: We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.

Fig. 1 :
Fig.1: Basic schema for fetching the full content of a message or its parts after a search query.

Fig. 2 :
Fig. 2: Local directory tree for the extracted attachment files

Fig. 3 :
Fig.3: An example of e-mail frequency analysis grouped by sender

Table 1 :
the package is currently under active development.If the package does not provide IMAP support, the remaining fields do not apply.
Comparison of the current available CRAN packages for e-mail communication.The following attributes are evaluated: protocol -the supported protocol (SMTP or IMAP); mail providers -if the IMAP protocol is supported, which mail providers are supported by the package; Features -which type of IMAP features are available in the package; active development -if