Check facts

Inside the Censorship Machine

Roskomnadzor plans to arrange total surveillance of the entire Russian-speaking internet using artificial intelligence. Is it possible?

Inside the Censorship Machine
ILLUSTRATION: AI-ARTIST SPELIY ARBUZ WITH HELP FROM THE MIDJORNEY NEURAL NETWORK

“We don’t have even a free finger” 

On July 12, 2022, Alexander Fedotov, head of the science and technology center of the General Radio Frequency Center (GRFC), was preparing for the next meeting of the Expert Council on Artificial Intelligence. GRFC is part of Russia’s main censor— Roskomnadzor. The center is responsible for monitoring the internet, preparing information sheets and reports on "prohibited information" it finds, as well as blocking such information.

For many years, Roskomnadzor employees have had to search for this so-called prohibited information mainly manually: the programs available to them can only filter materials by keywords, and then someone needs to double-check. The number of topics the programs search for is very limited. Management didn’t like the inefficiency of manual searches and Roskomnadzor was always a few steps behind, unable to keep up with the speed of publications on the internet. There simply weren’t enough people for the amount of work. When authorities requested that the organization find creative solutions as to how work could be improved, one of the employees replied in an email: “We love to be creative, but right now we don’t have not just free hands but even a free finger.”

That is why GRFC was tasked with developing several automated systems that would constantly monitor social networks, media, messenger channels, image boards and other sources of information. This was the task of the meeting for which Fedotov was preparing. He had to write some introductory words for himself and for his supervisor, Ruslan Nesterenko, interim CEO of GRFC.

In an introduction Nesterenko said that a year earlier, the GRFC had already carried out research in efforts to develop programs based on machine learning and neural networks. The goal was for Roskomnadzor to have tools for global surveillance not only of individual oppositionists, activists, volunteers and independent journalists seen as objectionable to the state — but also for almost any Russian who dares to speak out on social networks.

IStories journalists discovered the content of Nesterenko’s speech and the internal correspondence of his employees thanks to the largest ever leak of internal documents from Roskomnadzor. IStories received exclusive access to over two million documents, images, and internal emails. The project is called #RussianCensorFiles.

Here’s what we found:

  • what automated systems are being developed by Roskomnadzor for total internet surveillance, and whether it’s possible to implement the systems;
  • Which topics will these systems track, and how;
  • What technologies does Roskomnadzor already have on hand.

Boar goes hunting

Having received the floor after Ruslan Nesterenko, Alexander Fedotov, the head of projects for the development of automated systems, emphasized that "we need to fight not only with current problems, but also predict what we’ll face in a few years." For a year this task has been carried out with a project that the employees of the GRFC called "an automated system for the comprehensive analysis of media materials and search for points of information tension in the global internet ‘Vepr’", or in short — "AS Vepr" (“vepr” is translated as “boar” from Russian).

The main duties of Vepr are to analyze materials in social networks and mass media and, based on that analysis, to identify the so-called points of information tension (under which the authors of the study mean the spreading of publications that can cause a societal reaction), as well as to build a forecast model of the social and political dynamics, to predict scenarios for the information distribution and "the conversion [of information] into an information threat" in order to then transfer the data to the “power structures”.

The GRFC commissioned a team of experts, researchers and engineers from the Moscow Institute of Physics and Technology (MIPT) to explore the possibilities for creating such a system, led by the head of the Department of Machine Learning and Digital Humanities, Konstantin Vorontsov. According to a report prepared by the Vorontsov team, before starting work, they studied existing methods of internet censorship. Researchers were most interested in China's experience, because “to date, China's internet censorship program can be considered the most complex in the world. In this regard, the country has even begun to export its technology to other countries such as Cuba, Zimbabwe and Belarus.” However, Russian developers are trying to create an equally complex system for the total surveillance and censorship of the internet in Russia.

As planned by Roskomnadzor, Vepr should first of all focus on:

  • protest moods and facts regarding the destabilization of Russian society (for example, on the topics of territorial integrity, ethnic hatred, migration policy, etc.);
  • negative attitude towards leading state figures, state structures and interstate organizations;
  • “fakes” about leading state figures, as well as about the state and the country as a whole;
  • manipulation of public opinion and polarization of society (for example, topics on the non-systemic opposition, sanctions pressure, etc.);
  • The undermining and discrediting of “traditional values”.

According to Vepr's technical documents, the need to work out these exact areas is due to "the task of overtaking the information initiative. [...] The experience of the mid-1980s in the USSR (so-called perestroika) showed that “sleeping” points of information tension tend to grow rapidly if they are activated and deliberately promoted. To respond to threats, complete information on each point of information tension is needed, in order to ensure rapid decision-making processes.”

In order to work on topics, Vepr needs to know who it’s protecting (for example, Vladimir Putin), who’s violating prohibitions (for example, independent investigative journalists), as well as whichever specific threat the violators are creating (for example, reporting on socially significant information about the president, which he’s trying to cover-up). As stated in the technical documents, “when developing, the threat and violator model is subject to agreement with the FSB [Federal Security Service] and FSTEC [Federal Service for Technical and Export Control].”

Having received data on who is a friend (or enemy) of the regime, and what agenda to follow, Vepr should provide a forecast — what may follow the reaction of journalists and social media users. To do this Roskomnadzor wants, with the help of Vepr, to get “a complete picture of the involvement of society with the social characteristics of individuals,” as well as psychological portraits of those who distribute information created on social networks. “If the source is the media, its funding needs to be checked for compliance with the activities of a foreign agent. It’s important to note that the main work on preventive counteraction should be carried out with the [distributor] of information, and not its consumers. It’s necessary to deal with the source of information tension,” Vepr’s technical documents say.

In order to receive the necessary flow of information for analysis, GRFC employees plan to create a bot-farm — a lot of fake accounts through which one can gain access to closed communities on social networks.

“A sense of being part of something big”

According to the plan, Vepr should be operational by the end of 2024. However, due to the war in Ukraine, there may be delays. In one email, Denis Kasimov, head of the digital transformation department, wrote that it’s difficult to predict exact deadlines due to “sanctions pressure in the current economic environment.” According to Kasimov, there aren’t enough specialists for the task. “Experts who can perform these works are currently involved in fulfilling especially important requests from government agencies of the Russian Federation in the context of the ongoing special operation of the Russian Armed Forces in Ukraine,” the email says.

Support independent journalism
Your donation will help us to continue telling the truth — we do not obey censorship

It was difficult to attract good IT specialists to cooperate with Roskomnadzor even before the war. In October 2020, Igor Ivanov, an employee of the GRFC, asked his colleague Ivan Zuev to “try to make contacts on a friendly, altruistic basis” with several experts in neural networks. To this message, Zuev replied that they “most likely will be told to fuck off,” because “there was no money allocated, we definitely don’t have technology solutions that are interesting for them, the image of Roskomnadzor among IT people plays against their interest in us.”

Zuev's deputy, Alexander Mitkin, offered his own suggestions on how to lure experts into censorship projects. For example, to promise them “participation in projects of a national scale, including behind the scenes ones — a ‘shared secret’ with a sense of being part of something big,” and “their name in reports for the head of Roskomnadzor, and higher,” “lobbying them for projects in Roskomnadzor and other enterprises we work with (E.Soft, Rostelecom, etc.),” “a chance to meet the ‘right’ people of power,” “roundtable invitations,” and most importantly — “our friendship.”

“MIR”: even more control

Vepr is only part of a complex censorship machine that Roskomnadzor is implementing.

In general terms, its architecture will look like this: a general crawler [a program that automatically collects information on the internet] uploads texts, audio, images and videos from social networks, media and search results, and then these files go to the Unified Analysis Module (UAM). With the help of neural networks, it should, firstly, identify prohibited information, and secondly, create forecasts and analytics (which is Vepr’s responsibility).

AS MIR, using neural networks, should search for information prohibited by the authorities in texts, Vepr should predict “points of information tension” and threats of protests
AS MIR, using neural networks, should search for information prohibited by the authorities in texts, Vepr should predict “points of information tension” and threats of protests
SCREENSHOT FROM INTERNAL GRFC PRESENTATION

The information system for monitoring internet resources (MIR) based on natural language processing (NLP) technologies should find prohibited information in texts. According to the developers' plan, the system should be able to:

  • identify names, names of locations and organizations; the tone of voice with which they are mentioned (negative, positive or neutral); 
  • distribute messages according to stories, topics, headings;
  • look for mirrors of blocked sites and reprints of content;
  • track the distribution of content from the original source;
  • predict the distribution of content and its traffic;
  • to determine the facts regarding "opinion manipulation" and "opinion polarization stimulation";
  • predict the socio-demographic characteristics of the publication's audience — the distribution of the audience according to gender, age, education and income level.

It was planned that neural networks would be able to find prohibited information in texts with “calls for the violent overthrow of power,” “insulting the president,” “fakes about the president and the state,” and “propaganda about non-traditional sexual relations” by 2023.

In the summer, developers began to train neural networks to search for opposition content. Specialists of the monitoring department marked up materials, for example, with calls for “riots,” so that in the future the neural network itself could find such messages.

However, there is no information in the leaked materials showing that the neural networks of the Unified Analysis Module can already find those types of “violations”. It’s only mentioned that the UAM finds prohibited information about drugs, suicide, child pornography, ISIS and "Right Sector" in Yandex search results.

Type of ViolationType of informationAccuracy of UAM (January 2022)Accuracy of Linguistic Dictionaries on Social Networks (January 2022)Expected accuracy of the combined system (In December 2022)
NarcoticsText72%78%
Suicide contentText60%50%
Child PornographyText79%34%65%+
ISILText14%27%
Right SectorText20%30%
Hizb ut-TahrirTextAnalysis underway (16.02)43%

What percentage of violations were found automatically via the Unified Analysis Module (neural network) and via dictionaries (traditional method), confirmed by a human. Internal GRFC presentation.

So far, none of the other functions mentioned in the documentation for MIR — searching for mirror sites, tracking methods of disseminating information and the examples of "manipulating opinions" and other grandiose plans — have been implemented.

“Oculus”: recognizing anti-government demonstrations photos, memes with Putin, and men wearing makeup

Calls for demonstrations, insults to the president, and other things dangerous to the authorities made with pictures and photos are currently monitored manually. To fix this, Roskomnadzor plans to implement image and video recognition in the Unified Analysis Module, to find violations, metadata (time, place of publication, author), and identify people in photos and videos. The Oculus system is responsible for this, the development of which is supervised by the head of the experimental work department of the scientific and technical center, Konstantin Zudov.

The research work that describes the capabilities of artificial intelligence for censoring images and videos, was carried out by employees of the laboratory of business solutions based on artificial intelligence of the Moscow Institute of Physics and Technology, led by Dmitry Velichkin.

The system must analyze 200,000 images per day. In 2022-2024, it’s planned to spend 445 million rubles on the development of Oculus.

In August 2022, the department ordered the development of a system by the Russian company Eksikyushn RDC for 58 million rubles. Then experts said that it was impossible to implement a system of such complexity in such a short time (before December 2022), and at this cost.

The internal annex to the terms of reference for Oculus specifies what violations it should find in pictures and videos on the internet. In addition to information about terrorism, drugs, and suicide methods, the system should detect calls for demonstrations (and good attitude to them), “justification of, and calls for, the violent overthrow of power,” as well as insults to the president (“photoshops, demotivators, cartoons, caricatures, sexual insinuations”), obscene vocabulary in relation to him and "comparing the president to negative characters and condemning activities (e.g. Hitler, werewolf, dictator, racist, traitor)".

The document notes that paragraphs related to “justifying and calling for the violent overthrow of power,” and insulting the president and accusing him of extremism, were all added to the document on February 17, 2022 — a week before the start of the full-scale Russian invasion of Ukraine.

Also on the list of violations is “demonstration of the attractiveness of the image of representatives of the LGBT culture” and “images of persons that don’t correspond to the traditional image of a man and a woman (for example, masculine female faces, men wearing make-up).”

In internal presentations dedicated to Oculus, the recognition of protest activity is exactly what’s indicated as the main goal.

The goal of creating the Oculus image recognition system is to find protests in photos and videos and identify their participants
The goal of creating the Oculus image recognition system is to find protests in photos and videos and identify their participants
INTERNAL GRFC PRESENTATION (FEBRUARY 2022)

In September 2022, an employee of the monitoring department sent a folder “Materials on Oculus” — to a colleague. It contains examples of photoshops of Putin and mentions the need to track pictures not only with him, but with all members of the government. There’s also a dictionary in the folder, that will be used to automatically recognize, for example, accusations of extremism against the president and support of the overthrow of authorities.

Dictionary entry on the topic “accusing the president of extremism”
Dictionary entry on the topic “accusing the president of extremism”
GRFC INTERNAL DOCUMENTS

The leak doesn’t contain information about Oculus launching. Judging by GRFC employee correspondence correspondence, in the summer of 2022, employees were actively marking up data sets for training the Oculus neural network — even during holidays.

In February 2022, the head of the scientific and technical center Alexander Fedotov and the head of analysis department, Roman Korostashov, demonstrated the layout of Oculus. According to their statements, the system recognized, for example, wrist cuts, prohibited symbols, train surfing [hanging onto on a train from outside, clinging to the car via the stairs, footboards, etc.] and identified a masked person. They didn’t show any results related to identification of protest activity.

According to the GRFC’s plans, by 2024, Oculus must learn to classify actions not only in photos, but also in videos — again, it should recognize protests, as well as actions that are seriously life-threatening: self-harm (cuts, strangulation), train surfing, school shootings or fights. The leak documents make no mention of advances in video recognition.

Roskomnadzor’s plans also include "recognition of complex multimodal media materials" — posters, comics and memes — since they may contain prohibited information "both directly and indirectly." But at the same time, the authors admit that it’s difficult, since “automated monitoring using AI [artificial intelligence] requires a contextual understanding of internet culture: recent events, political views and cultural beliefs, since memes often refer to other memes or other online events.” GRFC plans to complete research in 2024 on seeking violations in memes.

“100 violation cards per day minimum.” How existing monitoring systems work

Now, employees of GRFC monitor all social networks, media and websites daily — both manually and with the help of software. Some are responsible for media, others for social networks and websites.

For the mass media, an automatic system for monitoring means of mass communications (AS MSMK) is used. The list of monitored media comes from Roskomnadzor.

The leaked documents show that AS MSMK finds potential violations via keywords on various topics (suicides, extremism, calls for protests, “fakes” about the war in Ukraine, “foreign agents,” etc). Every day the system collects an array of cards with alleged violations. An operator reviews the content and comments and decides if it contains violations. If so, then the operator registers them, and if not, he rejects the card. Cards accepted by the operator with confirmed violations automatically go first to the examination department of the GRFC, then to Roskomnadzor.

Dictionary analysis is inaccurate and "requires high labor costs" due to the fact that operators have to manually cross-check a lot of materials, the GRFC acknowledges. From the reports that new employees fill out at the end of a probationary period, the amount of work can be measured. An information analysis specialist reported in July 2022 that she had drawn up 100 cards on suspicions of violations per day minimum, manually entered at least 40 and also managed to monitor the internet “to identify banned anime films.”

Since 2022, the system automatically receives not only text content, but also transcriptions of radio and television broadcasts.

For social network surveillance, an automated system for monitoring and analyzing social media (AS MASM) is used. Since 2022, it’s been merged with the Chisty [clean] Internet (AS CI) system, which censors Yandex search results.

As in the case of the media, in social networks some violations are searched for manually — others are done automatically, which is followed by human verification. For example, MASM automatically searches for materials related to “fake news” about the war in Ukraine and anti-war demonstrations.

Violations are automatically monitored only in social networks VKontakte, Odnoklassniki, Moi Mir, Otvety.Mail.ru, LiveJournal and YouTube. Other social networks — Instagram, Facebook, Twitter, Tiktok, Telegram, Rutube — are monitored manually by GRFC staff, and there are only plans to introduce automation.

To do this, starting June 2022, Roskomnadzor was going to conclude a contract with the Kribrum company, owned by Natalia Kasperskaya and Igor Ashmanov, who cooperate with Russian authorities, support censorship, and the war in Ukraine. Read more about Kribrum and other companies involved in censorship in the drop-down text.

As a result, all projects being developed by Roskomnadzor for the analysis of materials and the media, social networks, and search results, are planned to be merged into a single system. The center of that system is the Unified Materials Analysis module based on artificial intelligence. Here is how this system looks in diagrams — you can download and view them in detail here and here (in Russian).

Planned operation scheme of monitoring systems
Planned operation scheme of monitoring systems
SCREENSHOT FROM INTERNAL MRFC DOCUMENTS

The leaked documents show that Roskomnadzor's plans for total censorship of the internet using artificial intelligence are still very far from being launched. But it’s obvious that, as new functions and systems are introduced, the scale of surveillance of those who dare to speak out against Putin's regime will grow.

“A great excuse to steal from the budget”

The GRFC holds expert conferences on artificial intelligence several times a year. Representatives of the industry, scientists and officials gather and make presentations. We spoke with one of the council participants, an expert in the field of machine learning, on condition of anonymity.

He said that these conferences can be considered "somewhat educational for an internal audience," where industry experts give reports on technologies, government representatives "talk about how cool they are, making use of the most fashionable words of the season [like ‘artificial intelligence,’ ‘neural networks,’ ‘computer vision’ and others], and the management becomes inspired and allocates the budget.

According to our source contact, the dreams of the GRFC to introduce total censorship based on artificial intelligence are theoretically feasible, but unreasonably expensive. “To do this, you need to build several teams: data collection and labeling, monitoring teams, engineering teams, managers, and many others. And of course, to provide it with our own data center with the latest video cards (expensive). It’s a great excuse to steal from the budget. This approach can’t compete with the alternative: to make several hundred moderators manually monitoring social networks for a penny.

For example, the even task of finding offensive pictures of Putin would require a lot of resources. “It’s easy to develop a simple classifier inside VKontakte that determines that the president is in the picture, and the picture has a meme context (with captions and other things), even via VKontakte’s internal tools,” the expert continues. “But in order for this to work constantly at the level of an entire social network, a significant part of the VKontakte team needs to be diverted to this task. And to make this a solid technology that works with a large list of social networks, messenger apps and websites is, rather, a reason to get an even larger budget. A budget that will be spent on who knows what.”

The Vepr project, which is supposed to predict future “information threats” and protest moods, our source contact treats with particular skepticism: “I wouldn’t worry too much that such a system would be implemented. Our industry has such low-hanging fruit, like online ads optimization. Multi-million dollar profits are promised to those who can at least slightly optimize a mundane task like that. And they want a social and political issues forecast based on posts on social networks. It seems like flipping a coin would be more truthful than the predictions of a system like that.”

Roskomnadzor, GRFC and Brand Analytics didn’t respond to a request from IStories and Süddeutsche Zeitung for comment on the leaked materials.

You can find out more about this leak in other Important Stories:

  • Who and why is in Roskomnadzor's sights: potential "foreign agents" and opinion leaders, the media, IT giants, messenger apps and people close to power.
  • How Roskomnadzor monitors negative publications about the Russian president and other topics dangerous to the authorities in order to send reports to the security forces.