Stance-Based Fake News Identification on Social Media with Hybrid CNN and RNN-LSTM Models

— Today, fake news can be readily generated and disseminated via social media platforms. Misinformation and hoaxes propagated via online social media or traditional news media are commonly referred to as fake news. Stance-based fake news is based on the opinions of an audience rather than providing correct facts. This study presents a hybrid model focusing on the CNN model and RNN-LSTM model to identify fake news. A balanced dataset of 216k news items called ‘SherLock-FakeNewsNet’ is explored throughout the study resulting in the proposed hybrid model. The NLTK toolkit is utilized to perform some noise reduction. After that, the Keras tokenizer and pre-processor are used for the tokenization and pre-processing steps. Next, the text data is fitted and represented using pre-trained GloVe word embeddings. CNN model is applied with Conv1D layers and MaxPooling layers to extract higher-level features of the text. Following this, to detect the longer context of the given text and to capture interdependencies among word sequences LSTM units are employed in the RNN model. Dropout layers are carefully chosen to reduce overfitting in the final hybrid model. The proposed hybrid model achieves the highest accuracy rate of 92% by outperforming most of the conventional models today with an average Precision of 91%, Recall of 91%, and F1-Score of 91% using an Adam optimizer and a Binary Cross-Entropy loss function. Based on the five experiments performed in the study, we have presented the results by comparing our proposed hybrid model with five related datasets.


I. INTRODUCTION
Nowadays almost half of the population worldwide use social media news feeds rather than conventional news sites to check news.As stated in [1] authors examined that about 60% of Americans looked for news on social media news feeds in contrast to 49% in 2012.From the study [2] authors discovered that social media outpaced other news channels as the leading news source.Because of these occurrences, false news can be easily generated and disseminated across many social media platforms.There is no agreed-upon explanation for false news.Commonly, false news is referred to as intentionally creating false information and stories to delude the audience.
According to [3] authors explored that, to deceive users, false news consisted of incorrect information generated with vicious purposes.As reported by [5] false information is a serious threat to equality, freedom of speech, and journalism as well.From [6] authors demonstrated that social media sites and gossip sites are the largest sources of escalating sharing of false information or rumours.
Based on the distorted spread on social media many instances have occurred.In the 2016 US presidential election, "Pizzagate" false news was extensively disseminated on Twitter.And it was projected that almost one million tweets were attached to false news "Pizzagate" [7] after the presidential election was finished.In [8] authors examined the social bounce of the circulated news, "Palestinians recognizing Texas as part of Mexico". Figure 1 represents the social bounce due to the above news.During the Easter Sunday Attacks in Sri Lanka in 2019, the government decided to curtail the access to social media sites as a consequence of false information propagated across social media platforms that led to misunderstandings between religions [9].This incident evidently proved how social media evolved into a crucial platform for inciting sensitive subjects to the public.Figure 2 represents false news circulated in social media in 2019.According to the study [10] authors have found that Facebook has recorded more than 150 million views for the top 100 fake news stories of 2019.Most of the false news stories are related to the category of politics.
Table I points out the top 10 most viewed fake news stories shared on Facebook in 2019.As per the above, it is witnessed that false news emerges as one of the most serious dangers to social media platforms.Therefore, it is required to finding an appropriate and proper mechanism that can simply identify false information.
Based on the action and the nature of the news, false news can be classified into various types.The typical forms of false news can be represented as follows.
• Visual-based: The graphical representation of the information is depicted in this type of news.This study focuses on the last type, which is about identifying false news based on stance.A self-owned dataset called 'SherLock-FakeNewsNet' is explored throughout the study along with the proposed hybrid model.

The objectives of this research study mainly focus on:
• Observing the drawbacks of existing research works and related tools.The rest of the paper is formed as follows.Section II depicts the related works and tools with a comprehensive review and Section III presents the methodology of the research study.Section IV demonstrates the results of five different experiments and evaluates the discussions based on the results.Finally, Section V represents the conclusion and future work of the research study.

II. LITERATURE REVIEW
A substantial number of researchers have investigated this topic and suggested several methods to overcome this problem.News can be divided into three categories based on its content aspects, namely, style features, textual features, and visual features [11].Natural Language Processing methods [12] like linguistic features, low-rank textual features and neural textual features are used to extract textual characteristics from news material.Linguistic features are consisted of lexical features and syntactic features.Letters per word, repetition of active words and unique words are a few character-level and word-level elements that are included in lexical features.Syntactic features are subsisted of N-grams and repetition of active words which are included in the sentence level elements [13].Matrix factorization and tensor factorization are examples of Low-rank textual features.Clustering is frequently adopted as a method in Matrix factorization to study document representation [14].To learn about the representation of  [15].Long Short-Term Memory (LSTM) neural networks are also employed in RNN [16].The authors of [17] have proposed a strategy for developing a false news identifier that consisted of Natural Language Toolkits and Textblob, and achieved an accuracy rate of 63%.Sentiment analysis is one of the significant research areas in Natural Language Processing.It comprises distinct research directions.In [18] authors have proposed an approach based on ensemble learning paradigms focused on a multi-objective weighted voting scheme to strengthen the predictive performance of different classification tasks.The authors also have evaluated the results using different machine learning algorithms.According to [19], authors have presented an ensemble approach for feature selection in sentiment classification.These authors also experimented with a genetic algorithm to accumulate the individual feature lists.From the study [20], the author has proposed a deep learning-based approach with CNN-LSTM architecture to sentiment analysis on product reviews gained from Twitter.Subsequently, the author also has evaluated the predictive performance with different word embeddings and with several weighting functions.
In [21] authors have examined a neural language model and bidirectional LSTM based framework for sarcasm identification on social media.Neural language model introduced by a term weighted based word embedding model with trigrams to represent text documents.Afterwards, three-layer stacked bidirectional LSTM architecture has been presented to identify sarcastic text documents.In [22] author has presented a deep learning-based approach to sarcasm identification on social media.Subsequently, the author also has explored the performance of the approach by comparing topic-enriched word embedding schemes and conventional word embedding schemes.
Through the paper [23], the author has focused on performing sentiment analysis for a massive open online course (MOOC) reviews using text mining and a deep learning-based approach.Later, the author has elaborated on the efficiency of the approach compared with ensemble learning methods and supervised learning methods.Next, from the study [24], the authors have examined a machine learning-based approach to sentiment analysis on student evaluations.Also, these authors have evaluated the results using different machine learning algorithms.
According to [25], the authors have explored the performance of instance selection methods in text sentiment classification based on fifteen benchmark instance selection methods.They have evaluated the instance selection methods by applying a decision tree classifier and radial basis function networks.Keyword extraction is a convenient method in text classification.Authors of [26] have presented five statistical keyword extraction methods by comparing them with different machine learning algorithms and ensemble methods.
In the study [27], the author has focused on different feature engineering schemes for text classification.The author has presented an ensemble classification scheme based on language function analysis.Through the paper [28], the author has presented a hybrid supervised clustering algorithm to gain a distinct ensemble for text classification.In the study [29], the authors have examined a hybrid ensemble pruning approach for sentiment classification.And the authors have evaluated the approach by comparing it with ensemble methods and evolutionary algorithms.
From the study [30], the authors have proposed an ensemble scheme to identify satirical Turkish news articles with a deep learning-based approach.And the authors also have presented different methods like linguistic and psychological feature sets to extract features.The authors of [31]  Following that, the authors have discussed the impacts of merging different neural networks, including, Convolutional Neural Networks with Gated Recurrent Unit (CNN-GRU) and Convolutional Neural Networks with Long Short-Term Memory (CNN-LSTM).Furthermore, the authors have examined the impact of word embedding characteristics in Deep Neural Networks (DNN).From the study [32], the authors have proposed a technique called TRACEMINER to handle the LSTM-RNN model in social networks and demonstrated some good accuracy for classification-related datasets.
Table II exhibits selected studies and the techniques which have been developed under this topic.Technologies and algorithms that are applied to each tool are represented with a small description of each tool.Almost every tool is accustomed to web scraping and web crawling as the main technologies for the data collection part.Furthermore, some of the tools are equipped with word-embedding techniques like GloVe and Word2vec as well.A few of the tools are based on several deep learning mechanisms like CNN, LSTM and auto-encoders etc.Some of the tools used built-in APIs and algorithms to identify false information on social media.SherLock [39] Web Scraping, CNN, LSTM, Word-Embeddings, News API A fact-checking mobile platform.
As discussed in the literature review section, a substantial number of researchers suggest various approaches to conduct the research.However, after reviewing the existing mechanisms, techniques and related works, the idea of combining Natural Language Processing techniques with several Deep Neural Networks together and building a hybrid neural network to initiate the research, emerged, because combining several Deep Neural Networks has given some highly accurate results as seen from the related research works.
Natural Language Processing techniques such as tokenization were applied to represent words in tokens.Methods like stop word removal were used to remove stop words, and several techniques were employed to clean the dataset.Word embeddings were applied to map words with vectors of real numbers.A hybrid approach was chosen mainly focusing on CNNs to extract the higher-level local characteristics from the input text.Subsequently, RNN-LSTMs were chosen to capture the long-term dependencies of the word sequences.Finally, Dropout layers were carefully selected to prevent overfitting and Dense layers were carefully introduced to enhance the overall efficiency of the hybrid model.From section III, the methodology of the research study widely elaborates the above-discussed methods that are applied to each phase of the research study.

III. METHODOLOGY
To solve the stance-based fake news problem, this research study is directed with the hybrid approach.Natural Language Processing techniques such as stop word removal, tokenization and word embeddings were used with the NLTK toolkit in the first phase of the hybrid approach.As for the second phase of the hybrid approach, sequence of deep neural networks was applied such as Convolutional Neural Networks with Convolution layers, and Maxpooling layers and Recurrent Neural Networks with Long Short-Term Memory layers.Finally, Dropout layers were carefully selected to prevent overfitting, and Dense layers were carefully introduced to enhance the overall efficiency of the hybrid model.The research study was initiated by acquiring a dataset.

A. Finalizing a Method for Collecting a Dataset
HTML parsers and web parsers had been employed as methods by researchers [40] to collect data from different sources.However, for this study, a free web scraper called Scrapy [41] was acknowledged to gather news articles from various sources.
1) Why Scrapy: Scrapy is a free library developed by Scrapinghub to extract data from websites.A wide range of web-scraping and web-crawling mechanisms are used in Scrapy to work on data mining, automated testing and monitoring purposes.Scrapy is used as the data collection framework using the web scraping method.And Scrapy also provides rich support for building scalable large crawling projects.Spiders and selectors use Scrapy to go through the websites and extract data from websites.Scrapy can handle the requests asynchronously, and it uses an auto-throttling mechanism to speed up the process.
Global news articles were assembled from websites such as bbc.com, politifact.com and snopes.comfrom the year 2018 to 2020.From government news sources such as news.lk,local news articles were collected.Furthermore, fact-checking news sources and gossip sites such as AFP Sri Lanka, afpnews.com and gossipcop.comwere investigated to gather articles about false news from the year 2018 to 2020.All the various techniques above resulted in a dataset of 216,682 news articles.2) Why Choose Dataset with Exact Balance: Choosing a balanced dataset is critical in research since it can assist eliminating bias in data processing and interpretation.A balanced dataset has an equal or nearly equal amount of samples from each class or group under consideration.This can assist in ensuring that the statistical models utilized are not bias to one group over another and that the findings are representative of the total population.As datasets included in this stance-based fake news detection area are not balanced, it was decided to use a wellbalanced dataset.

3) How Labelled the Dataset:
News articles that are captured from credible news sources like bbc.com which are identified and guaranteed as real news from fact-checking websites like AFP Fact Check and Snopes were given the reliable label as 1, whereas news gathered from untrustworthy sources like gossip sites which are identified and guaranteed as fake news from the abovementioned fact-checking websites were given the reliable label as 0. Figure 3 represents the portrait of the captured dataset by reflecting two classes of the dataset and how news sources are labelled giving the scores of 0 and 1.

B. Constructing SherLock-FakeNewsNet Dataset
Next, before applying Natural Language Processing techniques, it was decided to clean the dataset.The NLTK toolkit [42] was utilized to remove punctuations, special characters and stop words from the text.Following that, to remove the noisy texts, square brackets and URLs the BeautifulSoup tool was employed.

C. Envisioning SherLock-FakeNewsNet Dataset
Subsequently, WordNet [43] was used to visualize the SherLock-FakeNewsNet dataset.Wordcloud format visualized the frequency of the words in the corpus.Figure 4 represents the portrait of the WordNet visualization for real text which is labelled as 1.

D. Applying Natural Language Processing Techniques
To tokenize the dataset Keras tokenizer [44] was utilized.After that, to pre-process the tokenized dataset Keras preprocessing techniques such as text and sequence processing were applied.Finally, the text data was fitted and represented using global vectors of GloVe [45] word embeddings.
1) Tokenization: Tokens are the fundamental forming blocks of NLP.Word Tokenization is the most used approach in NLP.In Tokenization, words are represented in tokens.Here, NLTK Tokenizer and Keras inbuilt Tokenizer were used to tokenize the text.ToktokTokenizer is a basic tokenizer that was used here which is intended to be quick and efficient, making it well-suited for large-scale NLP applications.It can tokenize text rapidly and accurately since it uses just simple rules based on whitespace and punctuation.

2) Stop Word Removal:
Stop words like "the", "is", "in" and "for" should be removed from the dataset to decrease the dataset size and decrease the time to train the model.Here, the same NLTK toolkit was used to remove stop words and punctuation from the text.

3) Word-Embeddings:
In NLP, words are matched to real numbers of vectors by mathematically embedding them into a continuous vector space with many dimensions per word.Word2Vec and Glove have pre-trained Word-Embeddings for many languages including English.Here, GloVe word embeddings were used to map words to vectors.

4) Why GloVe Word-Embeddings:
GloVe stands for Global Vectors for Word Representation.It is formed on the matrix factorization techniques on the word-count matrix.GloVe word embeddings are used to easily represent the words in vectors in a large number of contexts.In real-world scenarios, it takes a long time to convert words into vectors.Owing to this, pre-trained word embeddings were used to accomplish this task.In GloVe, a large matrix of co-occurrence information is created, and it is based on the overall statistics of the word co-occurrence.

5) Why GloVe over Word2Vec:
GloVe has an advantage over Word2Vec as it considers the global context of words in a corpus rather than simply the local context around each word.GloVe does this by first building a co-occurrence matrix, which assesses how frequently each pair of words co-occurs in a corpus, and then factoring this matrix to generate the word embeddings.GloVe's global method enables it to record more complicated word associations, such as synonymy and antonymy.GloVe is also more computationally efficient than Word2Vec since it does not require a huge number of iterations over the training corpus.This facilitates training on a huge corpus and may result in shorter training periods.

E. Dividing the Dataset to Train and Test Sets
The proposed hybrid neural network was trained by using 90% of the dataset and retaining 10% for testing the resulting model.
1) Why 90:10 Train Test Split Ratio: This ratio has gained popularity because it achieves a reasonable compromise between having enough data to efficiently train the model and having enough data to reliably assess the model's performance.The model may learn from more instances and develop a better understanding of the underlying patterns in the data with a bigger training set (90%).This can lead to improved model performance when tested on fresh, previously unknown data.A smaller testing set (10%), on the other hand, can assist guarantee that the model's performance is tested on a broad range of cases that it has never seen before.This can aid in identifying potential overfitting concerns when the model performs well on training data, but, badly on fresh data.

September 2023
International Journal on Advances in ICT for Emerging Regions

F. Proposed Hybrid Neural Network Architecture
The keras sequential neural network model is chosen as the architecture for the proposed hybrid approach.The first layer of the chosen keras sequential model consisted of an Embedding layer with the parameters such as embedding size and max features which are defined in Table V.Then, to remove a few contexts the Dropout layer is applied.After that, to extract higher-level characteristics, Conv1D layers and MaxPooling layers are added.Following that, to catch the context of the given input text and to capture interdependencies among word sequences two Long-Short-Term-Memory layers are utilized.In the last phases, Dropout layers are carefully selected to reduce overfitting and Dense layers are carefully selected to improve the efficacy of the hybrid model.Two activation functions Relu in Conv1D layers and Sigmoid in Dense layers are employed with the proposed hybrid model.With that, the proposed hybrid model experiments with various loss functions and optimizers.After evaluating the hyperparameter tuning Adam is examined as the proper optimizer, and the Binary Cross-Entropy loss function is explored as the proper loss function which is stated in Table IX.The structure of the suggested hybrid neural network is presented in the following tables.Table IV represents the structure of the hybrid neural network with layer type, output shape and parameters.  1) Why Convolutional Neural Network: Convolutional Neural Networks (CNNs) are a type of neural network that was originally developed for computer vision tasks, such as image classification and object recognition.However, they have also been successfully applied to Natural Language Processing (NLP) tasks, such as text classification, sentiment analysis, and language modelling.One of the key strengths of CNNs is their ability to capture local structure within an input.In NLP, this can correspond to capturing n-grams, or sequences of adjacent words, which can be important for understanding the meaning of a sentence or document.Another significant feature of CNNs is their capacity to exchange parameters across many areas of the input.This can lower the number of parameters needed to train the model while also assisting the model in generalizing to new data.CNNs may also be resilient to variations in input length, which can be very advantageous for NLP tasks where the length of the input might vary greatly.CNNs can benefit from pre-training on vast amounts of unlabelled data, which can assist enhance their performance on downstream NLP tasks.Overall, while traditional NLP methods such as Bag-of-Words and n-gram models have been effective for many tasks, CNNs can offer a powerful and flexible alternative that can capture local structure, share parameters, and generalize well to new data.Figure 6 represents the detailed process and deep neural network architecture of the proposed hybrid approach.2) Why Recurrent Neural Network: Recurrent Neural Networks (RNNs) are a type of neural network that has been widely used in Natural Language Processing (NLP) tasks, such as language modelling, machine translation, sentiment analysis, and speech recognition.RNNs are made to analyze sequential data, like language, where a word's meaning depends on the context of the words that came before it.The meaning of a sentence or document might rely on the context of the words that came before it, making them well-suited for NLP tasks.RNNs have a "memory" that enables them to recognize input dependencies over a lengthy period.This is especially helpful for NLP tasks as the meaning of a sentence might vary depending on its whole input history.For NLP tasks where the input length might vary significantly, the ability of RNNs to handle variable-length inputs is crucial.Overall, RNNs are a powerful tool for NLP tasks that involve sequential data.They can handle variable-length inputs, capture long-term dependencies, and can be pre-trained on large amounts of data, making them a versatile and effective choice for a wide range of NLP tasks.

3) Why Long Short-Term Memory Neural Network:
LSTM (Long Short-Term Memory) is a type of Recurrent Neural Network (RNN) that has been widely used in Natural Language Processing (NLP) tasks, such as language modelling, machine translation, sentiment analysis, and speech recognition.In NLP applications where the meaning of a phrase might rely on the complete history of the input, the ability of LSTMs to manage long-term dependencies is crucial.In order to capture long-term dependencies without being impacted by the vanishing gradient problem, LSTMs use a memory cell that may selectively add or delete information over time.The ability of LSTMs -can handle variable-length inputs -is crucial for NLP applications where the input length might change significantly.LSTMs are also robust to noise and missing data, making them wellsuited for NLP tasks where the input may be noisy or incomplete.Overall, LSTMs are a powerful tool for NLP tasks that involve sequential data with long-term dependencies.They can handle variable-length inputs, are robust to noise, and can be pre-trained on large amounts of data, making them a versatile and effective choice for a wide range of NLP tasks.

IV. RESULTS AND EVALUATION
By dividing the dataset into 90% train size and 10% test size the proposed hybrid neural network is trained.For training and testing sets accuracy rates of 94% and 91.37% are accomplished respectively with Adam optimizer and Binary-Cross Entropy loss function after 10 epochs.The proposed hybrid model is trained using Google Colab and Kaggle Kernels.Following that, Kaggle TPUs and TensorFlow TPUs are used to run the model.

1) Why TensorFlow:
TensorFlow is an open-source machine learning library that can do various tasks related to deep learning.TensorFlow can run on multiple CPUs, GPUs and TPUs.TensorFlow helps to build machine learning and deep learning models easily with high-level APIs like Keras with eager execution.Also, it helps to train and deploy models in the cloud, on the device, on-prem and in the browser.

2) Why Google Colab and Kaggle Kernels:
Google Colab and Kaggle Kernels are used as the platforms to run the proposed deep learning model.TensorFlow TPUs and Kaggle TPUs are used to execute the model.Execution time is more than one hour when using TPUs rather than GPUs, because, TPUs are more powerful and faster than GPUs and CPUs.Kaggle Kernel and Google Colab kernel both support Jupyter notebooks, the notebooks can be saved easily with Kaggle and Google Colab cloudhosted environments.Kaggle and Google Colab provide many in-built datasets for data science-related tasks and those datasets can be easily integrated into the projects as well.

A. Comparing Results with Different Models
First, to do some experiments two different models namely, Convolutional Neural Network (CNN) model and the Recurrent Neural Network (RNN) model with Long Short-Term Memory (LSTM) model are selected.After that, the proposed hybrid model was chosen to experiment with the dataset.From these three models, the proposed hybrid model accomplished the highest rate of accuracy which is 92%.
Table VI indicates the results for different models.It further elaborates on the accuracy and loss values of each model.

B. Evaluation Results of the Proposed Hybrid Model
To evaluate the performance of the proposed hybrid model, compute the Precision, Recall and F1-Score.Precision scores of 0.92 and 0.91, Recall scores of 0.90 and 0.93 and F1-Scores of 0.91 and 0.92 are acquired for fake and not fake classes.
Table VII displays the Precision, Recall, and F1-Score results as well as the testing set's Accuracy, Macro Avg and Weighted Avg.F1-score is the weighted average of Precision and Recall.The results in table VII demonstrate that scores of Precision and Recall are relatively similar.As a result, the F1-Scores of 0.91 and 0.92 for fake and not fake sets exhibit the proposed hybrid model achieves a good percentage.
In scikit-learn, the Confusion matrix [46] is adapted to generate the classification report.
Table VIII demonstrates the confusion matrix of the testing dataset.In the testing dataset, the confusion matrix examines four distinct combinations.

C. Evaluation Results of Training and Testing Sets
Following the execution of the neural network, the Matplotlib [47] library is chosen to produce graphs and display the data.Matplotlib is the ideal tool to represent the outcomes of neural networks.By dividing the dataset into 90% train size and 10% test size the proposed hybrid neural network is trained.After executing the proposed hybrid model, figure 7 IX, Adam is chosen as the best optimizer that fit with the other hyperparameters of the suggested hybrid neural network.

E. Experimental Results of the Hybrid Model
Some related datasets in the stance-based fake news detection area were chosen to experiment with the proposed hybrid model using the same experimental setup used for 'SherLock-FakeNewsNet' dataset.

1) Experiment 01 Results:
As for the first experiment, the proposed hybrid model is checked and evaluated with the "FakeNewsNet" [48] dataset and following Table X   The precision score of 0.86 indicates that out of all the news articles that were classified as fake, 86% were fake.The recall score of 0.88 indicates that out of all the fake news articles in the dataset, the model correctly identified 88% of them.The F1-Score of 0.87 represents the weighted average of the precision and recall scores, taking both into account.The support value of 6363 indicates the number of samples in the dataset used to calculate these metrics.The accuracy of 0.87 indicates the proportion of correctly classified samples out of all the samples in the dataset.

2) Experiment 02 Results:
As for the second experiment, the proposed hybrid model is checked and evaluated with the "FA-KES" [49] dataset and Table XI represents the results of the experiment.The precision score of 0.87 indicates that out of all the news articles that were classified as fake, 87% were fake.The recall score of 0.92 indicates that out of all the fake news articles in the dataset, the model correctly identified 92% of them.The F1-Score of 0.89 represents the weighted average of the precision and recall scores, taking both into account.The support value of 1980 indicates the number of samples in the dataset used to calculate these metrics.The accuracy of 0.88 indicates the proportion of correctly classified samples out of all the samples in the dataset.

3) Experiment 03 Results:
As for the third experiment, the proposed hybrid model is checked and evaluated with the "LIAR" [50] dataset and  The precision score of 0.88 indicates that out of all the news articles that were classified as fake, 88% were fake.The recall score of 0.84 indicates that out of all the fake news articles in the dataset, the model correctly identified 84% of them.The F1-Score of 0.86 represents the weighted average of the precision and recall scores, taking both into account.The support value of 1284 indicates the number of samples in the dataset used to calculate these metrics.

4) Experiment 04 Results:
As for the fourth experiment, the proposed hybrid model is checked and evaluated with the "ISOT" [51] dataset and Table XIII represents the results of the experiment.The precision score of 0.89 indicates that out of all the news articles that were classified as fake, 89% were fake.The recall score of 0.85 indicates that out of all the fake news articles in the dataset, the model correctly identified 85% of them.The F1-Score of 0.87 represents the weighted average of the precision and recall scores, taking both into account.The support value of 1869 indicates the number of samples in the dataset used to calculate these metrics.

5) Experiment 05 Results:
As for the fifth experiment, the proposed hybrid model is checked and evaluated with the "FNC-1" [52] dataset and  The precision score of 0.87 indicates that out of all the news articles that were classified as fake, 87% were fake.The recall score of 0.87 indicates that out of all the fake news articles in the dataset, the model correctly identified 87% of them.The F1-Score of 0.87 represents the weighted average of the precision and recall scores, taking both into account.The support value of 40462 indicates the number of samples in the dataset used to calculate these metrics.
Based on the results obtained from the five experiments conducted on different datasets, it can be concluded that the proposed hybrid CNN and RNN-LSTM model is effective for detecting fake news.The precision scores ranged from 0.86 to 0.89, indicating that the model was able to correctly identify a high percentage of fake news articles.The recall scores ranged from 0.84 to 0.92, indicating that the model was able to correctly identify a high percentage of the actual fake news articles in the dataset.The F1-Scores ranged from 0.86 to 0.89, which indicates that the model was able to balance both precision and recall in identifying fake news articles.The overall accuracy ranged from 0.87 to 0.88, indicating that the model was able to correctly classify a high percentage of samples in the dataset.Therefore, it can be concluded that the proposed hybrid CNN and RNN-LSTM model is a reliable and effective approach for detecting fake news in various datasets.However, further research can be done to explore the potential of this model on more diverse and larger datasets.

V. CONCLUSION AND FUTURE WORK
According to the results obtained from the three different models namely, CNN model, RNN model, and Hybrid model, it can be concluded that after 10 epochs with the Adam optimizer and Binary Cross-Entropy loss function from the proposed hybrid neural network architecture, 92% accuracy was achieved when compared with the available datasets and research works carried out in the stance-based fake news detection.As a result, it can be concluded that the proposed Hybrid approach which is based on CNN, and RNN-LSTM models achieve the highest accuracy rate of 92% by outperforming most of the conventional models today.
Based on the results of five experiments, which are conducted from the proposed hybrid model, we can conclude that this hybrid approach yields the highest Precision, Recall and F1-Score, because different types of messages such as lengthy and short messages also included in the training dataset.In addition, the testing dataset also included a variety of messages.The hybrid approach chose mainly focusing on CNNs to extract the higher-level characteristics from the input text.Subsequently, RNN-LSTMs chose to capture the long-term dependencies of the word sequences.Finally, Dropout layers are carefully selected to reduce overfitting, and Dense layers are carefully selected to enhance the efficiency of the hybrid model.
From the tables, Table X, Table XII, Table XIII, and Table XIV, based on the results of the five experiments conducted on different datasets, it can be concluded that the proposed hybrid CNN and RNN-LSTM model performs well in detecting fake news articles.The precision scores ranging from 0.86 to 0.89 indicate that a high percentage of articles classified as fake by the model were fake.The recall scores ranging from 0.84 to 0.92 indicate that the model correctly identified a high percentage of fake news articles present in the dataset.The F1-Scores ranging from 0.86 to 0.89 represent the weighted average of the precision and recall scores, taking both into account.Additionally, the accuracy scores ranging from 0.87 to 0.88 indicate that the model correctly classified a high proportion of samples in the dataset.Overall, the results suggest that the proposed hybrid model can generalize well across different datasets and perform effectively in detecting fake news articles.
Since this paper considered only the mechanisms of Deep Learning and the mechanisms of Natural Language Processing to identify stance-based fake news, it is an open area to apply different techniques from supervised learning, semi-supervised learning, and unsupervised learning to identify stance-based fake news.
In terms of future developments, the proposed hybrid model will experiment with new methods such as Transformers, Auto-Encoders, Federated Learning APIs and Graph Neural Networks.Finally, this research study will be continued as a public Kaggle Notebook to perform new experiments from time to time.

Fig. 1 .
Fig. 1.Palestinians recognizing Texas as part of Mexico

•
User-based: The user profiles and their intended audience-based news are contained in this news type.• Post-based: Post-based types of news in social media sites are introduced from this news type.• Network-based: Network-based type of news is focused on some selected groups.• Knowledge-based: Explanation for an unsolved problem is provided by this news type.• Style-based: Style-based types of news focus on how to deliver the information in a wrong manner to the audience.• Stance-based: This type of news includes how the statement is made in a news article.

Fig. 4 .Figure 5
Fig. 4. Wordcloud for real text (label-1) Figure 5 represents the portrait of the WordNet visualization for fake text which is labelled as 0.

Fig. 6 :
Fig. 6: Detailed methodology architecture and figure 8 demonstrate the Matplotlib reflection of training and testing sets.The accuracy and loss scores of 0.9204 and 0.1889, 0.9137 and 0.2147 are achieved for the training set and the testing set respectively.

Fig. 7 .
Fig. 7. Matplotlib reflection of results for training and testing accuracy

Fig. 8 .
Fig. 8. Matplotlib reflection of results for training and testing lossD.Hyperparameter Tuning for the Hybrid ModelHyperparameter tuning for the proposed hybrid model is achieved after 10 epochs with the loss function as Binary Cross-Entropy.The above loss function has been chosen due to the two classes of the 'SherLock-FakeNewsNet' dataset.TableIXrepresents the results of different optimizers.

TABLE I MOST
VIEWED FAKE NEWS STORIES ON FACEBOOK IN 2019

TABLE II RELATED TOOLS Title Applied Technologies/Algorithm Description
[36]ember 2023International Journal on Advances in ICT for Emerging RegionsNews Verify[36]Sentiment Analysis, Table III represents the individual instances of two classes.

TABLE IV STRUCTURE
OF THE HYBRID NEURAL NETWORK

Table V
represents the details of the defined parameters of the hybrid neural network with batch size, embedding size, max features, max length, and epochs.

TABLE V DETAILS
OF THE DEFINED PARAMETERS

TABLE VII EVALUATION
RESULTS OF THE PROPOSED HYBRID MODEL Advances in ICT for Emerging RegionsAccording to Table VII, the Recall scores of 0.90 and 0.93 are achieved when assessing the results.It improves the subtlety of the proposed hybrid model.

TABLE VIII TESTING
Table IX represents the results of different optimizers.

TABLE IX RESULTS
FOR DIFFERENT OPTIMIZERS represents the results of the experiment.

TABLE X
EXPERIMENT 01 RESULTS FROM THE PROPOSED HYBRID MODEL Table XII represents the results of the experiment.
Table XIV represents the results of the experiment.
September 2023International Journal on Advances in ICT for Emerging Regions