Multimodal Architectures for Bias Classification in News
DSM150 Final Coursework
Author
Affiliation
Johannes Van Cauwenberghe
University of London
Published
February 22, 2025
Abstract
Deep learning is uniquely suited for multimodal classification, enabling the integration of distinct data types and build robust representations of key characteristics from multiple modalities. While deep learning techniques for text analysis have significantly advanced, political bias classification in news remains predominantly text-based. Yet, as news is increasingly read online, news readers increasingly select content by scrolling images and titles. Decisions on what to read are largely based on a combination of an image and a title. Meanwhile, news aggregators struggle with filter bubbles and over-personalisation, limiting exposure to an increasingly narrow political spectrum. We thus propose a multimodal bias classification algorithm that learns joint-representations of the image and text data. Extensive experiments on text-only and image-only branches will inform the development of a unified multimodal algorithm. This model achieves an accuracy of 82% and an AUC of 90%, outperforming our single-mode models and showing superior performance overall.
1 Introduction
Central to this project is the objective to apply deep learning techniques for political bias classification. Like Coursework 1, this work follows the universal workflow of machine learning as outlined in the excellent textbook, Deep Learning with Python, by the founder of the popular Keras library, Francois Chollet.
Whilst this methodical approach to deep learning will feature front and center, the project also has a more substantive goal. The goal is to build the best possible classifier for political bias classification in news. Specifically, we will combine text and image features to train a dual-input neural classifier that distinguishes left-wing from right-wing political news content. As social media platforms and news aggregators struggle with ideological echo chambers (Helberger 2019), a systematic method for identifying political bias can improve transparency in journalism and enable more informed news consumption.
A news article’s textual content and its visual elements (like accompanying images) often carry signals of ideological slant or political bias. By political bias, we refer to differences in language use, framing, and the relative emphasis on specific ideological perspectives. The same topic might be framed differently in text and illustrated with divergent imagery by outlets of opposite leanings: a left-leaning source may highlight sympathetic visuals (e.g. migrants’ struggles) while a right-leaning source might choose more fear-inducing imagery (e.g. depicting crime). Several websites provide ratings and invite readers to read the news from multiple perspectives. Examples include: ground.news, Media Fact Check, and AllSides. Over the following pages, we will use the term political bias to describe these ideological leanings.
We will use a self-compiled dataset consisting of three parts:
Labels scraped from AllSides, an organisation that exposes polarisation and partisanship in news content by providing ratings and labels of political bias for all major publications.
20,000 articles fetched from the NewsCatcher API.
Images associated with each article, downloaded and converted.
Our experimental procedure consists of three steps. First, we identify a series of candidate model architectures. Next, we implement them. Finally, we refine and optimise them. This takes an empirical approach to model architectures. Chollet (2021) argues that model architecture defines the hypothesis space of the model as “the space of possible functions that gradient descent can search over, parameterized by the model’s weights” (Chollet 2021).
In this empirical analysis, we will attempt many of the techniques found in (Chollet 2018, chap. 5 and 6) and its later extensions (Chollet 2021, 2024). We aim to develop a unified model architecture that integrates the best-performing text and image models (see a diagram in Figure 8). However, to maintain the report’s coherence, we do not present an exhaustive account of every experiment conducted. Instead, we document key findings in the discussion sections of each branch and we focus on systematically refining architectural configurations and hyperparameters.
The structure of the report is as follows. We briefly discuss related work and the experiment design before moving on to an overview of the dataset with summary statistics in Section 4. Next, in Section 5, we establish a common-sense baseline and build a basic model that beats that baseline. We converge for each type on three architectural configurations with optimised model parameters. We apply this methodical approach for both of the branches individually, i.e. the text models in Section 5.1 and the image models in Section 5.2, as well as for the combined multimodal models in Section 5.3. There we will perform a hyperparameter search to converge on our best model. Finally, in Section 6 we evaluate our performance on the test set, and summarise the key contributions of the work, the challenges it faced, and further directions.
2 Related work
Previous research has explored the classification of political bias through textual data. For example, Hajare et al. (2021) used a dataset of American congressional speeches to predict ideological bias on social media platforms such as Twitter and Gab, achieving an accuracy of 70% with text-only classification. They significantly improved accuracy to 85% by incorporating network analysis through a method known as cascading. Similarly, our earlier work on a preliminary version of the dataset used here achieved 76% accuracy using logistic regression combined with a sophisticated feature selection technique called Odds-ratio adjusted Informative Dirichlet Prior (Van Cauwenberghe 2025).
Thomas and Kovashka (2019) developed an image classifier for political bias and show that algorithms become more accurate at recognising visual bias when helped by complementary text information, and compared this performance with human annotators. In N24NEWS, Wang et al. (2022), proposed a topic-based multimodal news classifier. They found that the image task performed significantly lower (52.80% F1) than the text classification. The authors note that conventional image classification tasks focus on identifying static objects such as cats or dogs. By contrast, news imagery typically represents events loaded with symbolic and contextual meaning (Wang et al. 2022). News images convey semiotic complexity—symbols, metaphors, and context-specific meaning—that standard image models fail to interpret effectively.
Given this prior research, we anticipate that our image model will have limited success in classifying political bias. Throughout the project, we will present visual examples and analyse the patterns our model learns (see Figure 4 and Appendix 8.4 and Appendix 8.5).
3 Experiment design
We train all the models on the training set and tested on the test set. Batch sizes of 512 (text) and 64 (image/multimodal) are selected based on empirical tuning of GPU load. While subject to hyperparameter tuning, models are largely trained with a learning rate of 1e-4 and an Adam optimiser (Kingma and Ba 2017). In our code, we set an early stopping condition which triggers validation loss stops decreasing and save the best model to disk. Each input image is resized to 180 \(\times\) 180 and the maximum length of each input text is set to 45. We train the model on an NVIDIA RTX GPU with 12 GB VRAM. We train each model with the training set, and keep the model that performs best on the validation set. The validation set is not seen by the trained models aside from guiding the model selection. The test set is held back until evaluation, where each model is evaluated against the testing set on accuracy and AUC. The best model is further evaluated on recall, precision, F1, and a confusion matrix showing the detailed false positive and false negative rates.
4 Dataset Overview and Preprocessing
The dataset contains the following key attributes:
index: Unique identifier for each article
title: The title of the piece
snippet: An excerpt of the article
summary: The main body of the article
country: The country of the publication
bias: A categorical string ‘left’ or ‘right’
numerical_rating: A continuous variable on a range from -5 to +5 indicating the bias (left to right)
source_api: The publication name
media: A hyperlink to the image associated with the content
4.1 Imports and Loading
In this section, we will import the libraries and load the dataset from the folder. The datasets are read using Pandas, allowing easy exploration analysis of the data.
Imports
# Turn off (some) warningsimport osos.environ['TF_CPP_MIN_LOG_LEVEL'] ='3'# Import numerical packagesimport numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsfrom graphviz import Sourcefrom sklearn.metrics import (classification_report, RocCurveDisplay, ConfusionMatrixDisplay, roc_auc_score, accuracy_score)# Standard libraryfrom tqdm.notebook import tqdmfrom typing import Literalimport json, re, shutilfrom pathlib import Path# Neural network librariesimport keras;print("Using keras version:", keras.version()) # 3.8.0from keras.api import layers, Model, saving, Inputfrom keras.api.utils import (load_img, img_to_array, array_to_img, image_dataset_from_directory, plot_model, model_to_dot)import keras_tunerimport tensorflow as tf;print("Using tensorflow version:", tf.__version__)# Global settingspd.options.display.max_colwidth =100tqdm.pandas()keras.utils.set_random_seed(153)sns.set_theme('notebook','darkgrid')# Path settingsdata = Path(".data/")imgs = data/"imgs"articles = data/'articles'models = data/'models'# For better monitoring of VRAM memory usephysical_devices = tf.config.list_physical_devices('GPU')if physical_devices:print("Running on", physical_devices[0].device_type) tf.config.experimental.set_memory_growth(physical_devices[0], True)
Using keras version: 3.8.0
Using tensorflow version: 2.17.0
Running on GPU
Load the dataset
# Load the datasetarticles_df = pd.read_pickle(articles/'up_to_feb24.pkl')# Inspect the key columnsarticles_df[['title', 'snippet', 'bias', 'numerical_rating', 'source_api' ] ].sample(5)
title
snippet
bias
numerical_rating
source_api
8816
The fight is on: Progressive groups gear up for a second Trump term
� Donald Trump's presidential win still has the world reeling and processing the discouraging re...
left
-4.00
dailykos.com
6117
Pelosi undergoes ‘successful' hip replacement surgery in Luxembourg after injury
The 84-year-old former House Speaker 'is well on the mend,' said spokesperson Ian Krager.
left
-1.20
politico.com
15814
STEPHEN MOORE: Will Blacks And Hispanics Vote Their Pocketbooks? Trump Should Hope So
Trump's policies were far better for blacks and Hispanics than those of America's first black pr...
right
3.80
dailycaller.com
20254
Facing pressure at home, GOP lawmakers warn Johnson against ‘hatchet' spending cuts
On the eve of their first major vote to advance President Donald Trump's agenda, key House Repub...
left
-1.30
cnn.com
6186
Rahm Emanuel: An alliance to counter's China's aggression
U.S. Amb. to Japan, Rahm Emanuel, joins Morning Joe to discuss his latest WSJ column 'An Allianc...
left
-3.71
msnbc.com
# Inspect the `media` urlsarticles_df.media.sample(3).tolist()
The dataset contains 19,654 articles fetched from the NewsCatcher API, extended with images for this project.1
The API call specified:
A topic: politics.
A list of sources:
The sources are based on labels obtained from AllSides.com2. These include categorical strings “left” and “right”, as well as continuous variables with a numerical rating. These features were were joined onto the articles data. To enable the binary classification objective, “centrist” sources were omitted. Furthermore, preference was given to sources with many records on AllSides.com, as this was taken as an indication of discriminativeness, and therefore useability.
The following table summarises the core attributes of the dataset:
Key Figures for the News Dataset
Attribute
Value
Total Articles
19,654
Right-labelled articles
9,358
Left-labelled articles
10,296
Number of news sources
64
Rating range
-5 to +5
Average numerical rating
0.28 (center right)
Date range
2/10/2024 - 14/2/2025
Country Distribition
77% US, 18% UK, 5% other
4.2.1Summary of the Sources
Originally scraped from AllSides, the numerical_rating represent political bias. Below, we show a quarter of the labelled sources. See Appendix 8.1 for an exhaustive overview.
fig, ax = plt.subplots()source_viz = bias_viz.set_index('source')source_viz.index.name ='sources'source_viz = source_viz.join(top_sources)color = ['r'if num_rat <0else'b'for num_rat in source_viz.numerical_rating]ax.scatter(x=source_viz.numerical_rating, y=source_viz.source, s=source_viz.source *0.05, c=color)for x, y, s inzip(source_viz.numerical_rating, source_viz.source, source_viz.index):if y >450and'dreams'notin s: ax.text(x -1, y + y *0.05, s)ax.set_ylim(-100, 3000)ax.set_xlim(-6,6)ax.set_xlabel('Political Bias')plt.title("Two right-labelled publications feature heavily", y=1.02)plt.show()
Figure 2: Two right-labelled publications feature heavily in the dataset.
ax = sns.histplot(data=articles_df[['source_api', 'numerical_rating']], x='numerical_rating', hue=articles_df.numerical_rating <0, kde=True, legend='', bins=20)ax.set_xlim(-6,6)ax.set_xlabel('Political Bias')ax.set_ylabel('')ax.set_title('Distribution of the labels')plt.show()
Figure 3: The distribution of continuous labels has a clear gap in the centre
# Binarise the labelsarticles_df['binary_bias'] = articles_df.bias.map(lambda bias: 0if bias =='left'else1)
4.2.2Summary of article lengths
Next we deliberate the lengths to use for padding / truncating sequences. Shortening these will speed up training while longer sequences will give the model a richer input representation. We propose to balance the trade-offs by adding the snippets to the titles, and drawing a line at 45 words. Concatenating these text features should give model consistent lengths to work with.
fig, ax = plt.subplots(1,2, figsize=(8,4))articles_df.title.str.split().apply(len).plot.hist(bins=100, ax=ax[0], xlim=(0,30))ax = articles_df.snippet.dropna().str.split().apply(len).plot.hist(bins=50, ax=ax[1], xlim=(0,80))ax.set_ylabel('')plt.suptitle('Respective number of words in title and snippet')plt.show()
ax = articles_df.title_snippet.str.split().apply(len).plot.hist( bins=100, xlim=(0,75), title='Number of words in combined string', figsize=(6,4))ax.vlines(45, 0, 3000, 'r', '--')ax.text(45, 2500, '<- Pad / truncate to here')plt.tight_layout()plt.show()
In this section we will compute train, validation, test indices, pass it in a tf.data.Dataset object, and batch the data. Next, we create a series of convenience functions to allow model-building iterations to focus solely on the model graph itself.
We start with, what is arguably the most important step in any machine learning project, and that is setting aside a portion of the dataset for testing.
4.3.1Splitting the data
We here set aside 3000 samples for testing. This includes images, which we move to a dedicated test folder on our file system.
Note that this dataset is shuffled, i.e. the final 3000 rows corresponds to the full date range. We add all constituent parts and shuffle again. Crucially, before splitting the train and validation set, we use cache() to freeze the shuffle. This is essential as we continue training these models as pre-trained branches in our multimodal model. This ensures the validation set has not been seen by any of the branches.
# Set aside 3000 images as testdef set_aside_test():"""Set aside a portion of the dataset for evaluation.""" os.mkdir(imgs/'test')for image in articles_df.iloc[-3000:].index: image_str =str(image) shutil.move((imgs/image_str).with_suffix('.jpg'), (imgs/'test'/image_str).with_suffix('.jpg')) # set_aside_test()
# Load saved data (without 3000 test samples)articles_df = pd.read_pickle(articles/'up_to_feb24.pkl')articles_df = articles_df.iloc[:-3000].sort_index()articles_train_val = articles_df.title_snippet.valuesy_true_train_val = articles_df.binary_bias.valuesy_true_numerical = articles_df.numerical_rating.values
def load_train_val_ds():""" Loads and shuffles the data. Returns: Tuple (tf.data.Dataset): text_ds, img_ds, multimodal_ds, multimodal_ds_num Note: It is imperative to run all models with one and only one shuffle to prevent validation data being seen in the combined model. """# Load training image dataset from folder img_ds_train_val = image_dataset_from_directory( directory=imgs/'train', labels=None, image_size=(180, 180), batch_size=None, shuffle=False) text_ds_train_val = tf.data.Dataset.from_tensor_slices(articles_train_val) y_true_ds_train_val = tf.data.Dataset.from_tensor_slices(y_true_train_val.astype("float32")) y_true_ds_numerical = tf.data.Dataset.from_tensor_slices(y_true_numerical.astype("float32"))# Combine using `zip` full_ds = tf.data.Dataset.zip((img_ds_train_val, text_ds_train_val, y_true_ds_train_val, y_true_ds_numerical))# Shuffle once and only once full_size = full_ds.cardinality().numpy() full_ds = full_ds.shuffle(buffer_size=full_size, seed=264).cache(data/'cache')# Create datasets text_ds = full_ds.map(lambda img, text, label, num: (text, label)) img_ds = full_ds.map(lambda img, text, label, num: (img, label)) multimodal_ds = full_ds.map(lambda img, text, label, num: ((img, text), label)) multimodal_ds_num = full_ds.map(lambda img, text, label, num: ((img, text), num))return text_ds, img_ds, multimodal_ds, multimodal_ds_numtext_ds, img_ds, multimodal_ds, multimodal_ds_num = load_train_val_ds()
A function to wrap the compilation and training (fit) of the model
This will make the following sections easier to read.
Wrapping functionality
# HyperparamsVOCAB_SIZE =20000MAX_LENGTH =45def get_vectoriser( output_sequence_length=MAX_LENGTH, output_mode: Literal["multi_hot", "int", 'count', 'tfidf'] ='multi_hot'):""" Create a TextVectorization layer with specified presets. Args: output_sequence_length (int): The maximum length of the output sequence. Defaults to MAX_LENGTH. output_mode (Literal): The mode in which the output should be represented. Can be 'multi_hot', 'int', 'count', or 'tfidf'. Defaults to 'multi_hot'. Returns: keras.layers.TextVectorization: A configured TextVectorization layer. """# Combine presets pad_to_max_tokens =Trueif output_sequence_length elseFalse ngrams =2if output_mode =='multi_hot'elseNone# Instantiate the vectoriser vectoriser = layers.TextVectorization( max_tokens=VOCAB_SIZE, # Global variable output_mode=output_mode, output_sequence_length=output_sequence_length, pad_to_max_tokens=pad_to_max_tokens, ngrams=ngrams)# Convert the vocabulary vectoriser.adapt(articles_train_val)# Return the preconfigured layerreturn vectoriserdef save_results(history, name):""" Save the training history results to a JSON file and display the last epoch's results. This function reads an existing 'results.json' file or creates a new one if it doesn't exist. It appends the latest training results to the file, including the model name and the metrics from the last epoch. The results are then displayed as a DataFrame. Args: history (keras.callbacks.History): The history object returned by the `fit` method of a Keras model. name (str): The name of the model, used to identify the results in the JSON file. Returns: None """# Open or create resultstry:withopen('results.json', 'r') as fin: results = json.load(fin)except: results = []# Add the model name and the last results result = {'model_name': name} last_epoch = {k: round(history.history[k][-1], 4) for k in history.history.keys()} result.update(last_epoch) results.append(result)# Savewithopen('results.json', 'w') as fout: json.dump(results, fout, indent=4) display(pd.Series(last_epoch).to_frame(name))def compile_fit(model, name, **kwargs):""" Compile the model and fit it to the data. This function compiles the given model with the 'rmsprop' optimizer and 'binary_crossentropy' loss. It also sets up callbacks for early stopping, TensorBoard, and model checkpointing to save the best iteration. The model is then trained on the training data and validated on the validation data for a specified number of epochs. The training history is saved and the final iteration's results are printed. Args: model (keras.Model): The Keras model to be compiled and trained. name (str): The name used for saving the best model checkpoint and results. Returns: None """ lr = kwargs.get('lr', 1e-4) model.compile( optimizer=kwargs.get('optimizer', keras.optimizers.Adam(learning_rate=lr)), loss="binary_crossentropy", metrics=["accuracy", "auc"]) callbacks = [ keras.callbacks.EarlyStopping(), keras.callbacks.TensorBoard(), keras.callbacks.ModelCheckpoint(models/f"{name}.keras", save_best_only=True)]# A brief summaryprint(f"Total params: {model.count_params()} (of which trainable: {sum(np.prod(w.shape) for w in model.trainable_weights)}).") history = model.fit(train_batch, validation_data=val_batch, epochs=kwargs.get('epochs', 50), steps_per_epoch=steps_per_epoch, validation_steps=validation_steps, callbacks=callbacks, verbose=kwargs.get('verbose', 0))print(f"Trained in {len(history.epoch)} epochs.")# Save results and show the last values save_results(history, name)
5 Building a Multimodal Classifier
In this section we will train two kinds of models, a text and an image classifier. Then we will combine the two in a joint architecture, using the Keras Functional API.
The objectives for this section are:
Training a text-based model
A simple bigram model
A Bidirectional LSTM with an embedding layer
A Bidirectional LSTM with Pre-trained Word Embeddings
Training an image-based model
A basic Convolutional Network
A more sophisticated ConvNet
A fine-tuned Pre-trained CNN
Training an multimodal model
A basic multimodal classifier
An optimised version using keras_tuner hyperparameter search
A multimodal regressor
In keeping with best practice, let’s first put down a purely analytical baseline. Here, we simply predict the majority label for each of our test records:
# Calculate the majority classmajority_class = np.mean(y_true_train_val) # 0.52majority_class_binary =1if majority_class >0.5else0# Calculate the Accuracy of a majority-class predictory_true_test = pd.read_pickle(articles/'up_to_feb24.pkl').iloc[-3000:].binary_bias.valuesnp.mean(y_true_test == majority_class_binary)
0.5206666666666667
Note
The purely analytical baseline achieves an accuracy of 0.52. This is low yet unsurprising given our efforts to balance the dataset.
5.1 Text-based Models
We will experiment with three common architecture patterns. For each of these, we will try extensive variations in configuration, both of model architecture and of hyperparameters. We here present three consolidated attempts at finding the best model, each within its respective architectural typology. At the end of the section we will walk through the paths that have been explored.
5.1.1The Bag-of-bigrams Model
We start by looking at a simple base model: the bigram model. We converged on a model that is made up of two Dense layers layers with 50% dropout in between. Linguistic features are represented a bag-of-words: a sparse vector representation of the occurance of words in the article.
Total params: 400461 (of which trainable: 400461).
Trained in 30 epochs.
bigram
accuracy
0.9204
auc
0.9722
loss
0.2850
val_accuracy
0.8109
val_auc
0.8938
val_loss
0.4055
5.1.2The Bidirectional LTSM Model
In this model, we will use Keras’ recurrent layers for sequence learning. More specifically, we will implement a Bidirectional LSTM. This is fed dense word vectors from an embedding layer, which it learns as part of its model training.
Here, we converged on 16-dimensional embedding vectors, contributing to making this the smallest model in our experiment. It uses one bidirectional LSTM layer with dropout and recurrent dropout both set at high levels, more dropout is added before and after a subsequent Dense layer. This may is counter intuitive but leads to superiour results.
Total params: 331001 (of which trainable: 331001).
Trained in 46 epochs.
bidir_lstm
accuracy
0.9023
auc
0.9543
loss
0.2880
val_accuracy
0.7891
val_auc
0.8689
val_loss
0.4465
5.1.3Bidirectional LTSM with Glove
Continuing with the bidirectional LTSM, here we add GloVe pre-trained word embeddings (Pennington, Socher, and Manning 2014). First, we create the embedding matrix (adapted from Chollet (2021)) mapping words to dense word vectors. Then we pass our vectoriser to the embedding layer, which contains the vocabulary mapping giving each word a unique id.
The model graph itself consists of one bidirectional LSTM layer with recurrent_dropout set to 0.5. The subsequent block starts and ends with dropout and has a residual connection providing direct feedback connection to the bidirectional LSTM layer. This allows better propagation of the error signal, with empirically validated results.
def get_embeddings(vectoriser):""" Get the pre-trained embeddings and transform the vocabulary into an embedding_matrix. Args: vectoriser (keras.layers.TextVectorization): The TextVectorization layer used to vectorize the text data. Returns: np.ndarray: A mapping of word ids and vectors in the form of an embedding matrix. Note: Ensure a global variable EMBED_DIM = 100 is defined. """ path_to_glove_file = data/"glove.6B/glove.6B.100d.txt" embeddings_index = {}withopen(path_to_glove_file) as f:for line in f: word, coefs = line.split(maxsplit=1) coefs = np.fromstring(coefs, "f", sep=" ") embeddings_index[word] = coefs vocabulary = vectoriser.get_vocabulary() word_index =dict(zip(vocabulary, range(len(vocabulary)))) embedding_matrix = np.zeros((VOCAB_SIZE, EMBED_DIM))for word, i in word_index.items():if i < VOCAB_SIZE: embedding_vector = embeddings_index.get(word)if embedding_vector isnotNone: embedding_matrix[i] = embedding_vectorreturn embedding_matrix
acc_viz = text_results[['val_accuracy', 'val_auc']]ax = acc_viz.plot.bar(title='Text Models Accuracy and AUC (validation)', rot=0, xlabel='', ylabel='Percent')for x, y inzip(ax.get_xticks(), acc_viz.val_auc.values): ax.text(x, y+1, round(y, 2))for x, y inzip(ax.get_xticks(), acc_viz.val_accuracy.values): ax.text(x -0.28, y +1, round(y, 2))plt.ylim(0,110)plt.legend(['Accuracy', 'AUC'], bbox_to_anchor=(1, 1, 0, 0))plt.show()
Figure 5: Text Models Accuracy and AUC (validation)
Discussion of Text Models
Key Insight
Moving from the simple Bag-of-bigrams model to a pre-trained embedding-based recurrent network, the expectation was to see a clear progression. However, performance decreased. This is an important result and we adjust our objective correspondingly. We will proceed in our multimodal architecture with the Bag-of-bigrams Model.
In a discussion on sequence learning and model complexity, Chollet (Chollet 2021) suggests to look at the ratio between the number of samples in the training data and the mean number of words per sample. If that ratio is less than 1,500 then the bag-of-bigrams model may perform better. In our dataset, we have just under 20,000 samples with 45 words per sample. With a ratio of under 450, the Bag-of-Bigrams model aligns with Chollet’s heuristic, suggesting it as a more suitable approach for our dataset. According to this heuristic, we would need 67,500 articles for sequence models to outperform.
The bigram model has the added benefit of being lightweight and fast. With its 2.3 million parameters the LTSM-based model with Glove embedding is high in complexity, both in terms of memory and compute — irrespective of whether the pre-trained embeddings are fine-tuned (by setting trainable=True). The memory load remains significant. Regularising the updates to the pre-trained embeddings with L2-regularisation (1e-5) helped reducing large updates to the embedding weights, boosting performance further.
Before continuing with the Computer Vision Branch of our classifier, we briefly reflect on our empirical approach to finding the best model configuration and share the key insights.
Bag-of-Bigram performs better without Tf-idf
While the Bigram Model performed very well out-the-box, it proved hard to optimise further. Additional dense layers, additional units, different optimisers, adjustments to the learning rate, all have little effect. Even Tf-Idf had a (small) negative impact on performance.
Pre-Trained Embeddings Struggle with New Words
The two recurrent models follow a common architectural pattern in text processing. They both use an embedding layer. The main difference is the use of pre-trained GloVe embeddings(Pennington, Socher, and Manning 2014). While the first model learns embedding weights as part of the model training, the latter either updates the weights (when trainable=True in the embedding layer) or leaves them frozen. Adding regularisation to the pre-trained embeddings eventually gave the best performance, yet the gain is small. Including them in the training has little difference overall. We suspect the culprit is the high specificity and timeliness of the words used in news stories — illustrated here:
Small is beautiful when it comes to learned Embeddings
Letting the model learn its embeddings through backpropagation overcomes this problem of new and unknown words. Starting off with 256 dimensions, the model was comparable to the second model in size. However, reducing the dimensionality of these embeddings had little impact to performance. With only 16 dimensional vectors per word, we significantly reduced the overal parameter count, without sacrificing accuracy and AUC. This reduction also reduced our reliance on dropout and residual connections for regularisation.
Alongside dropout, for the recurrent layers we used recurrent dropout. This is a temporally constant dropout that preserves its state at every timestep of the sequence (Chollet 2021, 301). Similarly we found that adding a dense layer makes a significant difference. This is in line with Chollet’s suggestions in “Going even further” (Chollet 2021, 307 - 308).
An overview of other variations to the embedding-based models
Embedding layer
Embedding dimensions from 32 to 256
Recurrent layer
Unidirectional LSTM
Uni- and bidirectional GRU
One and two bidirectional LSTM layers (with return_sequences=True)
Recurrent dropout
Without recurrent_dropout
With recurrent_dropout ranging from 0.2 to 0.5
With recurrent layer units ranging from 16 to 512
Dropout
Without dropout
With dropout ranging from 0.2 to 0.5
Dense layers
With Dense layer ranging from 64 to 512
Learning rate and optimiser
Changing the optimiser to keras.optimizers.Adam
Lowering the learning rate to 1e-4
Residual stream
With a residual stream around the dense and second dropout layer
Max Pooling
Setting return_sequences=True outputs the articles as sequences; GlobalMaxPool1D then aggregates these.
See Appendix 8.4 for an in-depth look of model outputs on various pieces of text.
Moving on to developing our Computer Vision branch, we will now load the image data and then compare a number of CNN-based models.
5.2 Image-based Models
For our Computer Vision branch, we will again experiment with three common architecture patterns. As for the text branch. We will try various configurations, both in terms of model architecture and hyperparameters. Yet here we present a consolidated effort at finding the best model. We will discuss their respective merits, and the paths that led us to them in the discussion at the end of the section.
Total params: 308417 (of which trainable: 308417).
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1740517459.953424 259519 service.cc:146] XLA service 0x792078036a90 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1740517459.953439 259519 service.cc:154] StreamExecutor device (0): NVIDIA GeForce RTX 4070 SUPER, Compute Capability 8.9
I0000 00:00:1740517462.973896 259519 device_compiler.h:188] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
Trained in 3 epochs.
basic_cnn
accuracy
0.7481
auc
0.8331
loss
0.4935
val_accuracy
0.5564
val_auc
0.5810
val_loss
0.7845
del basic_cnn_model
5.2.2Better ConvNet
We continue our search for the best model by combining different layers and borrowing from Xception(see Chollet 2021, 259 - 260).
Table 1: Image Models Accuracy and AUC (validation)
accuracy
auc
loss
val_accuracy
val_auc
val_loss
model_name
basic_cnn
74.81
83.31
49.35
55.64
58.10
78.45
better_cnn
64.55
70.18
62.36
57.78
59.27
66.35
pretrained_cnn
55.55
56.27
67.73
52.96
54.64
68.17
acc_viz = cnn_results[['val_accuracy', 'val_auc']]ax = acc_viz.plot.bar(title='Image Models Accuracy and AUC (validation)', rot=0, xlabel='', ylabel='Percent')for x, y inzip(ax.get_xticks(), acc_viz.val_auc.values): ax.text(x, y+1, round(y, 2))for x, y inzip(ax.get_xticks(), acc_viz.val_accuracy.values): ax.text(x -0.28, y +1, round(y, 2))plt.ylim(0,100)plt.legend(['Accuracy', 'AUC'], bbox_to_anchor=(1, 1, 0, 0))plt.show()
Figure 6: Image Models Accuracy and AUC (validation)
Discussion of Image Models
Key Insights
For the task of bias classification, pretrained models do not yield improvements over the basic cnn. In fact, a series of iterations on the typical ConvNet pattern yielded the greatest gains. However, the choice of optimiser was significant. RMSprop with a learning rate of 1e-3 failed to reduce any loss. By contrast, Adam with 1e-4 performed very well.
Another central takeaway is kernel size. Larger kernel sizes allow the network to learn more complex patterns (Chollet 2021). Adding filters means more expressive power. The typical pattern follows a doubling of filter size in each successive ConV2D block. However, too many parameters lead to overfitting. Hence the need for BatchNormalization. We apply this in our better model right before activation. We also omit the bias (use_bias=False) to prune some of the complexity.
Other paths of enquiry that have been investigated:
Increasing filters, both the number of layers and the step size (doubling, quadrupling of the number of filters) of successive layers
Using SeparableConv2D instead of Conv2D (in various configurations)
Using GlobalAveragePooling2D instead of a Flatten layer (as seen in Xception (Chollet 2021))
Adding up to 3 additional Dense layers between the Flatten and Sigmoid layer, for a more gradual dimensionality reduction (as seen in the VGG16 example (Chollet 2021))
Note: the model is likely to learn shortcuts
Given that some images feature logos of the publication, it is likely that our convnets are learning logos and other pictoral features that reveal its affiliation (recall that labels are publication-based).
We can inspect this visually in three steps:
We get an image
We create an activation model by collecting the intermediary activations
We call predict on the activation model with the below image
Pick an image
#| fig-cap: "An image with a logo"#| label: fig-logo# 1. Pick an imageimg_path = (imgs/'train'/str(19)).with_suffix(".jpg")img = load_img(img_path, target_size=(180, 180), keep_aspect_ratio=True)array = img_to_array(img)plt.imshow(array.astype("uint16"))plt.axis("off")plt.show()
Create an activation model by collecting the intermediary activations
# 2. Create an activation model by collecting the intermediary activationsbetter_cnn_model = saving.load_model(models/'better_cnn.keras')layer_outputs = []layer_names = []for layer in better_cnn_model.layers:ifisinstance(layer, (layers.Conv2D, layers.MaxPooling2D)): layer_outputs.append(layer.output) layer_names.append(layer.name)activation_model = Model(inputs=better_cnn_model.input, outputs=layer_outputs)
Call predict on the activation model with the image tensor
# 3. Call `predict` on the activation model with the image tensorlayer_num =0img_tesnor = np.expand_dims(array, axis=0)activations = activation_model.predict(img_tesnor)layer_activation = activations[layer_num]print(f'The {layer_names[layer_num]} activation has {layer_activation.shape[-1]} filters')filters = np.random.randint(0, 64, 20) fig, axs = plt.subplots(4,5, figsize=(10,8))axs = axs.flatten()for i, f inenumerate(filters): axs[i].imshow(layer_activation[0, :, :, f], cmap="magma") axs[i].set_xticks([]) axs[i].set_yticks([]) axs[i].set_frame_on(False) plt.suptitle("The Guardian logo is picked up by many filters", y=1.01)plt.tight_layout()plt.show()
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 25ms/step
The max_pooling2d_5 activation has 64 filters
Figure 7: The Guardian logo is picked up by many filters
Discussion of Image Models (continued)
The logo will aid the model to some extent. As this is an unwanted side-effect, future efforts should include an attempt to identify patches with unnaturally low variation in RGB values (e.g. the rectrangular frame around the logo) and mask them out.
We now move on to part three of this project, where we combine the text and image features into a multimodal classifier.
To create a multi-input classifier, we reload our best models, remove the final layers, and concatenate their results. For efficiency, we freeze the weights and retrain only the classifier head itself. This reduces the number of trainable parameters by half.
# Load the pre-trained modelsbigram_model = saving.load_model(models/'bigram.keras')better_cnn_model = saving.load_model(models/'better_cnn.keras')# Remove the final layers and load into a new modelbigram_model_ = Model(inputs=bigram_model.input, outputs=bigram_model.layers[-2].output, name='bigram_model_')better_cnn_model_ = Model(inputs=better_cnn_model.input, outputs=better_cnn_model.layers[-2].output, name='better_cnn_model_')# Before calling the above graph, we freeze the layers bigram_model_.trainable =Falsebetter_cnn_model_.trainable =False# Run input through the image modelimg_inputs = Input(shape=(180, 180, 3))image_output = better_cnn_model_(img_inputs)# Run input through the text modeltext_inputs = Input(shape=(), dtype=tf.string)text_output = bigram_model_(text_inputs)combined = layers.Concatenate()([image_output, text_output])outputs = layers.Dense(1, activation="sigmoid")(combined)multimodal_model_sm = Model(inputs=[img_inputs, text_inputs], outputs=outputs)compile_fit(multimodal_model_sm, 'multimodal_model_sm')
Total params: 874920 (of which trainable: 430357).
Trained in 22 epochs.
Here we add two layers to the frozen model. The first is between the ConvNet output (of 430 thousand) and the concatenation, and the second after the concatenation.
We also use Keras-tuner for a further refinement in the form of a random search over a range of hyperparameters (O’Malley et al. 2019).
# Load models without compilebigram_model = keras.models.load_model(models/'bigram.keras', compile=False)better_cnn_model = keras.models.load_model(models/'better_cnn.keras', compile=False)# Remove the final layersbigram_model_ = Model(inputs=bigram_model.input, outputs=bigram_model.layers[-2].output, name='bigram_model_')better_cnn_model_ = Model(inputs=better_cnn_model.input, outputs=better_cnn_model.layers[-2].output, name='better_cnn_model_')# Freeze the layers bigram_model_.trainable =Falsebetter_cnn_model_.trainable =Falsedef build_model(hp):"""Build the model with automated hyperparameter search"""# Define search space hyperparams units = hp.Choice('units', [1, 4]) lr = hp.Float("lr", min_value=1e-5, max_value=1e-4, sampling="log") dropout = hp.Choice('rate', [0.2, 0.5])# Run input through the text model text_inputs = Input(shape=(), dtype=tf.string) text_output = bigram_model_(text_inputs)# Run input through the image model img_inputs = Input(shape=(180, 180, 3)) image_output = better_cnn_model_(img_inputs) image_output = layers.Dense(units, 'relu')(image_output) # Added layer combined = layers.Concatenate()([text_output, image_output]) combined = layers.Dense(units, activation='relu')(combined) # Added layerif hp.Boolean("dropout"): combined = layers.Dropout(dropout)(combined) # Added dropout outputs = layers.Dense(1, activation="sigmoid")(combined) better_multimodal_model = Model(inputs=[img_inputs, text_inputs], outputs=outputs) better_multimodal_model.compile( optimizer=keras.optimizers.Adam(learning_rate=lr), loss="binary_crossentropy", metrics=["accuracy", "auc"])return better_multimodal_modeltuner = keras_tuner.RandomSearch( build_model, objective='val_loss', max_trials=5, overwrite=True, directory=models, project_name="tuner",)tuner.search_space_summary()tuner.search(train_batch, validation_data=val_batch, epochs=5, steps_per_epoch=steps_per_epoch, validation_steps=validation_steps)tuner.results_summary()
# Run another iteration for loggingcompile_fit(tuner.get_best_models()[0], 'best_multimodal_model')
/home/jonux/miniconda3/envs/neural/lib/python3.10/site-packages/keras/src/saving/saving_lib.py:757: UserWarning: Skipping variable loading for optimizer 'adam', because it has 2 variables whereas the saved optimizer has 14 variables.
saveable.load_own_variables(weights_store.get(inner_path))
Total params: 2166016 (of which trainable: 1721453).
Trained in 9 epochs.
best_multimodal_model
accuracy
0.9485
auc
0.9880
loss
0.1899
val_accuracy
0.8217
val_auc
0.9055
val_loss
0.3836
5.3.3A dual-input Regressor
Finally, we can use the numerical ratings (i.e. the continuous variables ranging from -5 to +5 representing political bias) instead the of the categorical to create a more refined predictor of exactly how right- or left-leaning an article and image combination is.
# Re-use the pretrained model; simply drop the activationmultimodal_regressor_base = Model(inputs=multimodal_model_sm.input, outputs=multimodal_model_sm.layers[-2].output, name='regressor_base')# Define the inputs and pass them to the baseimg_inputs = Input(shape=(180, 180, 3))text_inputs = Input(shape=(), dtype=tf.string)combined = multimodal_regressor_base([img_inputs, text_inputs])# Add a dense layer for regressionoutputs = layers.Dense(1)(combined)multimodal_regressor = Model(inputs=[img_inputs, text_inputs], outputs=outputs)# multimodal_regressor.summary()
The Mean Absolute Error of 1.8 indicates that predictions are off about 1.36 on average. This is for this dataset, considering our range of 10 (-5 to +5) and the absence of values between -1 and 2, as seen in Figure 3.
5.3.4Summary of Results of Multimodal Models
Leaving aside the regressor (see Appendix 8.5), let’s inspect at the multimodal models side by side.
acc_viz = mm_results[['val_accuracy', 'val_auc']]ax = acc_viz.plot.bar(title='Multimodal Models Accuracy and AUC (validation)', rot=45, xlabel='', ylabel='Percent')for x, y inzip(ax.get_xticks(), acc_viz.val_auc.values): ax.text(x +0.05, y +1, round(y, 1))for x, y inzip(ax.get_xticks(), acc_viz.val_accuracy.values): ax.text(x -0.2, y +1, round(y, 1))plt.ylim(0,110)plt.legend(['Accuracy', 'AUC'], bbox_to_anchor=(1, 1, 0, 0))plt.show()
Discussion of Image Models
Key Insights
We found that the multimodal classifier performs well. Significant improvements were made by fine-tuning a pretrained classifiers, both in terms of efficiency and convergence speed. While they still have a high parameter complexity, as the many parameters take up space in memory, they have a low optimization complexity, as the models converge on a much reduced feature space. Additionally, a random search over a set of hyperparameters improved this model further.
6 Conclusions
In this section we will first present a detailed evaluation of our models. Next, we will reflect on the key contributions this work has made, and look ahead at areas for further investigation.
6.1 Evaluation
Let’s first look at all the models side by side. We will load the test dataset, run predict() on all of the test samples, and evaluate predictions on the held-out ground truth dataset.
articles_df = pd.read_pickle(articles/'up_to_feb24.pkl').iloc[-3000:].sort_index() # Sort the index to find the images# Load the held-out datasetarticles_test = articles_df.title_snippet.valuesy_true_test = articles_df.binary_bias.values# Create dataset objectsimg_dataset_test = image_dataset_from_directory( directory=imgs/'test', labels=None, image_size=(180, 180), batch_size=None, shuffle=False)text_ds_test = tf.data.Dataset.from_tensor_slices(articles_test)y_true_data_test = tf.data.Dataset.from_tensor_slices(y_true_test.astype("float32"))# Zip multimodal datasetmm_test = tf.data.Dataset.zip((img_dataset_test, text_ds_test, y_true_data_test)).map(lambda img, text, label: ((img, text), label))# Create batches and zip text-only and image-only dataset for predictionstest_batch = mm_test.padded_batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)img_test_batch = tf.data.Dataset.zip((img_dataset_test, y_true_data_test)).padded_batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)text_test_batch = tf.data.Dataset.zip((text_ds_test, y_true_data_test)).padded_batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
Found 3000 files.
label_dict = {}for model in models.glob('*keras'):print(model.stem)if'cnn'in model.stem: test_set = img_test_batchelif model.stem.startswith('bi'): test_set = text_test_batchelse: test_set = test_batch saved_model = keras.models.load_model(model)# Call predict on the batched test set y_pred = saved_model.predict(test_set).flatten()# Create a dict label_dict[model.stem] = y_pred# Save to disk np.save((models/model.stem).with_suffix('.npy'), y_pred)
# Retrieve the numpy arrays from disk and construct a DataFramelabel_dict = {}for model in models.glob('*.npy'): model_path = model modelname = Path(model_path).stem label_dict[modelname] = np.load(model_path).flatten()all_predictions = pd.DataFrame(label_dict)
A ROC curve plot, showing the trade-offs between true positive and false positive rates. Note the overlapping curves: the multimodal model is not surpassing the bag-of-bigram model.
6.1.1The multimodal Classifier
In this section, we analyse the best-performing multimodal classifier, focusing on its classification errors. Specifically, we examine the proportion of false positives and false negatives assessing their impact on model performance. We evaluate these errors through recall and precision, providing a clearer understanding of the classifier’s strengths and weaknesses.
y_pred_binary_best = (all_predictions.best_multimodal_model >0.5).astype(int)conf_disp = ConfusionMatrixDisplay.from_predictions(y_true_test, y_pred_binary_best, display_labels=['Left', 'Right'])conf_disp.ax_.grid(visible=False) conf_disp.ax_.set_title('Confusion matrix for the Best Multimodal Classifier') plt.show()
The confusion matrix for the best model overall, showing a remarkable balance between False Positive and False Negative rates.
precision recall f1-score support
Left 0.82 0.84 0.83 1562
Right 0.82 0.80 0.81 1438
accuracy 0.82 3000
macro avg 0.82 0.82 0.82 3000
weighted avg 0.82 0.82 0.82 3000
Discussion of evaluation
The ConvNet branch did not improve performance of the multimodal model
Despite its non-trivial performance, the learnings of the better_cnn ConvNet did not translate into a better performance of the multimodal model.
Recall and Precision are in near-perfect balance
Our class balancing efforts clearly paid off. We can see no trade-off between recall and precision, with both classes getting consistently good scores.
6.2 Summary and Conclusions
This project demonstrated the efficacy of multimodal architectures for political bias classification. Harnessing deep learning techniques to combine a Bag-of-bigrams text model and a ConvNet image model, we achieved a remarkable accuracy of 82% and AUC of 89.7% on our test set.
The multimodal classifier performs better than both the image classifier and the text classifier individually. However, as seen in Figure 9, there is a wide performance gap between the text and image models, however. Confirming findings by Wang et al. (2022), the best performance on the text classification task performs nearly as well as the best multimodal classifier (1.5 percentage points better in terms of accuracy and 1 percentage points for AUC), while the image models are far behind (30 percentage points).
Our best image model achieves 60% AUC and 58.5% accuracy on the test set. This leaves room for improvement, yet this result is significant. A key takeaway from the process is the standardisation of the images. Simple resizing would have squeezed and deformed photos. The cropping logic employed in our data collection pipeline ensured that the translation-invariant features could be learned effectively. For reproducibility, we attach the data collection pipeline in Appendix 8.2 (articles) and Appendix 8.3 (images) which includes details how we downloaded images and converted them for ingestion by the models.
Our best text model scored significantly higher. With 88.7% AUC, it landed just one percentage point under the combined model. Illustrated in Appendix 8.4, our best text classifier excels at representing textual features, highlighted by easily identifiable words and word combinations that demarcate ideological leanings.
A final contribution of this work is its multimodal regressor, which effectively identifies ideologically radical content, as illustrated in Appendix 8.5.
Challenges included the many images with logos, allowing ConvNets to memorise these as shortcuts (see Figure 7, and Appendix 8.4 illustrates this with two logos popping up amongst its most confident predictions). Additionally, the ideosyncratic word choices of the American President may have made classification easier, potentially inflating performance metrics. This is a broader challenge, as bigram models inevitably latch onto idioms and writing style quirks rather than true bias indicators. For example, a classifier could learn that a certain publication always uses particular phrases, essentially learning an outlet “signature” instead of the ideological content. This leads to overfitting.
6.2.1 Future Directions
Future work should incorporate circulation and readership data to mitigate publication dominance (see Figure 2). Some authors, such as Peng et al. (2025), propose using attention mechanisms – hierachical, self-attention, and cross-modal attention – to model nuanced interactions between text and images.
While binary classification provides a strong starting point, future work can explore multi-class classification to integrate centrist publications to capture a broader spectrum of political orientations. Furthermore, it could rebalance the dataset to reduce the dominance of certain publications. It could start from readership numbers and collect articles in proportions that reflect these numbers. It could also start from UK-only sources. Both of these approaches would be easy to implement with the code in Section 8.2 and Section 8.3. With a rebalanced and enlarged dataset, further advancements could include contextual embeddings such as BERT (SentenceTransformers) to enable a more fine-grained representation of text features.
Finally, this project’s methodology offers promising applications in media recommendation systems. By balancing news exposure to include diverse perspectives, such systems could reduce ideological echo chambers and promote exposure to alternative viewpoints. By addressing the challenges posed by modal imbalance, computational complexity, and shortcuts, future efforts could make a meaningful impact to the study of political bias and societal discourse.
Overall, multimodal deep learning for political bias detection is making news analysis more holistic by not only reading what is written but also seeing how it’s presented, which aligns closely with how human readers detect bias across media.
Peng, Liwen, Songlei Jian, Minne Li, Zhigang Kan, Linbo Qiao, and Dongsheng Li. 2025. “A Unified Multimodal Classification Framework Based on Deep Metric Learning.”Neural Networks 181 (January): 106747. https://doi.org/10.1016/j.neunet.2024.106747.
Pennington, Jeffrey, Richard Socher, and Christopher Manning. 2014. “EMNLP 2014.” In, edited by Alessandro Moschitti, Bo Pang, and Walter Daelemans, 15321543. Doha, Qatar: Association for Computational Linguistics. https://doi.org/10.3115/v1/D14-1162.
Thomas, Christopher, and Adriana Kovashka. 2019. “Predicting the Politics of an Image Using Webly Supervised Data.”CoRR abs/1911.00147. http://arxiv.org/abs/1911.00147.
Van Cauwenberghe, Johannes. 2025. “Political Bias in News: A Feature-Weighted Classifier.”
Wang, Zhen, Xu Shan, Xiangxie Zhang, and Jie Yang. 2022. “LREC 2022.” In, edited by Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, et al., 67686775. Marseille, France: European Language Resources Association. https://aclanthology.org/2022.lrec-1.729/.
8 Appendices
8.1 Appendix I: Exhaustive overview of the sources
To facilitate future iterations, this is the code used to compile the dataset.
# Extend articles_dfimport pandas as pdfrom newscatcherapi import NewsCatcherApiClientarticles_df = pd.read_pickle('.data/articles.pkl')def extend_articles_df(articles_df):""" Extends the articles DataFrame by making 20 API calls to fetch additional articles from left and right biased sources using the NewsCatcher API. Args: articles_df (pd.DataFrame): The original DataFrame containing articles. Returns: pd.DataFrame: The extended DataFrame with additional articles. """ newscatcherapi = NewsCatcherApiClient( x_api_key=os.getenv('NEWSCATCHER_API_KEY')) # Fetch articles from API (recursively) bias_articles = []for bias in ['left', 'right']: bias_sources = articles_df[articles_df.bias == bias].source_api.dropna().unique() all_articles = newscatcherapi.get_search_all_pages(q='*', from_='3 months ago', topic='politics', lang='en', sources=bias_sources, ) bias_articles_df = pd.DataFrame(all_articles['articles']) bias_articles_df['bias'] = bias bias_articles.append(bias_articles_df)# Concatenate the dataframes df_comb = pd.concat(bias_articles, ignore_index=False)# Clean the combineddataframes df_comb.drop_duplicates(subset='title', inplace=True) df_comb.dropna(subset='title', inplace=True) df_comb.rename(columns={'excerpt':'snippet', 'clean_url': 'source_api'}, inplace=True) # Create a new lookup table for adding numerical ratings lookup = articles_df[['numerical_rating', 'source_api']] lookup = lookup.drop_duplicates('source_api').dropna(subset='source_api') #.set_index('source_api')# Join `numerical_rating` and save df_comb_ = df_comb.join(lookup.set_index('source_api'), on='source_api')return df_comb_# Run and save# articles_df = extend_articles_df(articles_df)# articles_df[articles_df.columns].to_pickle('.data/articles/articles_ext.pkl')
8.3 Appendix III: Downloading images
To add nearly 20,000 images to the dataset, we followed the media url in each record, and downloadeded the images.
These images are high resolution images. Based on an estimate, these could easily require 10Gb of storage (\(20,000 \times 500\text{kB}\)). Thus we borrowed some cropping logic from the Keras package (adapted from keras.src.utils.image_utils), to resize the images in memory before saving them to disk. We also named the images as the DataFrame index.
While the vast majority of the download was successful, some were not. We also provide the code for the process to check, retry, and to reset the articles_df DataFrame to align with this newly compiled dataset.
# Save the images to a folderimport pandas as pdfrom io import BytesIOimport requestsfrom PIL import Imagefrom fake_useragent import UserAgentua = UserAgent()def crop(img_size, width_height_tuple=(180,180)):"""From `keras.src.utils.image_utils`""" width, height = img_size target_width, target_height = width_height_tuple crop_height = (width * target_height) // target_width crop_width = (height * target_width) // target_height crop_height =min(height, crop_height) crop_width =min(width, crop_width) crop_box_hstart = (height - crop_height) //2 crop_box_wstart = (width - crop_width) //2 crop_box_wend = crop_box_wstart + crop_width crop_box_hend = crop_box_hstart + crop_height crop_box = [ crop_box_wstart, crop_box_hstart, crop_box_wend, crop_box_hend, ]return crop_box def save_imgs(url, index):""" Downloads and saves an image from a URL with the DataFrame index as the filename. Args: url (str): The image URL. index (int): The index from the DataFrame. Returns: str: The saved filename. """ifnotisinstance(url, str) ornot url.startswith("http"):returnNone# Skip invalid URLstry:# Get image content res = requests.get(url, timeout=1, headers={"user-agent":ua.random})if res.status_code !=200:returnNone# Open image and resize img = Image.open(BytesIO(res.content)) if img.mode =="RGBA": img = img.convert("RGB") img = img.resize((180, 180), resample=Image.NEAREST, box=crop(img_size=img.size)) # Nearest resampling# Save image with index as filename filename = (imgs/str(index)).with_suffix(".jpg") img.save(filename, format="JPEG")return filename exceptExceptionas e:print(index, url, e.args)returnNone# articles_df['imgs'] = articles_df.progress_apply(lambda row: save_imgs(row['media'], row.name), axis=1)
Check for missing items
# Check folder for missing itemsall_downloaded =set([int(img.stem) for img in imgs.glob('*.jpg')])all_idxes =set(range(articles_df.index.max() +1))remainder = all_idxes - all_downloadedlen(remainder)
1458
Retry
We retry downloading the 1458 failed downloads once, and proceed to remove the remainder.
# Try again on remainder# articles_df.loc[list(remainder)].progress_apply(lambda row: save_imgs(row['media'], row.name), axis=1)
# Check if all files are accounted forassert articles_df.index.max() +1-len(remainder) ==len(all_downloaded)
Remove missing
# Reset the dataframeremaining_idxs = articles_df.index.difference(remainder)articles_df = articles_df.loc[remaining_idxs]articles_df.shape
(19730, 9)
Check alignment
# Verify if random image (as it appears locally) matches urlimgs_in_folder = [int(img.stem) for img in imgs.glob('*.jpg')]test_img = np.random.choice(imgs_in_folder, 1)print("Index position:", test_img[0])# This link should now correspond to this imageprint("Click to check:", articles_df.media.loc[test_img].values)Image.open((imgs/str(test_img[0])).with_suffix('.jpg'))
Index position: 13780
Click to check: ['https://assets.zerohedge.com/s3fs-public/styles/16_9_max_700/public/2024-10/241024kamala.jpg?itok=v8a5a1LL']
A brief inspection where the models are most confident in their predictions. We take the probabilities output by the sigmoid, sort it, and show its most extreme values. This is insightful as it shows what the models have learned. In the case of text, while the model predicts talk of “budget cuts” as right-wing, the “story of Jesus’ birth” is featured in the top left predictions. Our series of “most radical” images predictions is less revealing, however. The model does seem to pick up on the logos of The Telegraph and The Guardian, as seen in Figure 7.
from IPython.display import display, Markdowntest_snippets = pd.read_pickle(articles/'up_to_feb24.pkl').iloc[-3000:].title_snippet.valuesdef print_top_5(pred_path, bias = Literal['left', 'right']):# Load the numpy array with predictions preds = np.load(pred_path)# Get the indices of the sorted array order = preds.argsort()if bias =='right': order = order[::-1] # reverse them# Use the indices to order the preds preds_ordered = np.take(preds, order)[:5]# Idem for the snippets snippets = test_snippets[order][:5]# Construct a markdown table md_string =f'Most {bias}-biased snippet ({pred_path.stem} model) | Score \n -----|-'for p, s inzip(preds_ordered, snippets): md_string +=f"\n{s} | {p:.4f} " display(Markdown(md_string))pred_path = models/'bigram.npy'# or: for pred_path in models.glob('bi*.npy'):print_top_5(pred_path, 'left')print_top_5(pred_path, 'right')
Table 2: Most left- and right-biased text, according to the Bag-of-bigrams model. Showing sigmoid probabilities on test set as scores.
Most left-biased snippet (bigram model)
Score
Pentagon Official Linked To Iranian Influence Network Gets Promotion. ‘This official is not a subject of interest’
0.0009
OBR error cut £18bn of headroom from Rachel Reeves’ Budget. The error may have contributed to the market jitters that were seen in the wake of the Budget
0.0011
How new bank transfer scam protections could help you. Banks must now refund up to £85,000 of losses from authorised push payment fraud
0.0011
Israeli Weapon Seen in Rare AP Photos of Beirut Airstrike Appears to be a Powerful Smart Bomb. In all but the blink of an eye, an Associated Press photographer’s camera captured the moments that a battleshipgray Israeli bomb plummeted toward a Beirut building before detonating to bring the tower down. The airstrike came 40 minutes after Israel warned people to…
0.0011
Editor Daily Rundown: Trump Takes Questions From Garbage Truck After Biden’s Comment. Calling all Patriots!
0.0011
Most right-biased snippet (bigram model)
Score
Trump and Harris make their final pitch to voters in last weekend before election day. Vice President Kamala Harris and former President Trump barnstorm battleground states in the last days before the election.
0.9988
On Christmas Eve, Pope Francis Appeals for Courage. Pope Francis said the story of Jesus’ birth as a poor carpenter’s son should instill hope that all people can make an impact on the world, as the pontiff on Tuesday led the world’s Roman Catholics into Christmas.
0.9987
Biden Admin’s Revolving Door With Left-Wing Green Groups Continues as Environmental Official Lands Cushy Gig With Anti-Fossil Fuel Radicals. Bureau of Land Management director Tracy Stone-Manning has already lined up her post-Biden administration job: a cushy six-figure gig leading the Wilderness Society, an influential Washington, D.C.…
0.9985
China Seeks Deeper Economic Ties with ASEAN at Summit Talks as South China Sea Disputes Lurk. Chinese Premier Li Qiang called for deeper market integration with Southeast Asia on Thursday during annual summit talks where territorial disputes in the South China Sea are likely to be high on the agenda.The 10member Association of Southeast Asian Nations’ meeting with…
0.9984
House Republicans float rule change to grant power to interim speaker in case of future ousters. The proposal comes after the historic ouster of former Speaker Kevin McCarthy (R-CA) last year that resulted in a three-week period of inaction.