I.J. Information Technology and Computer Science, 2024, , 4 56-65
Published Online on August 8, 24 MECS Press (http://www.mecs-press.org/) 20 by
DOI: 10.5815/ijitcs.2024.04.04
This work is open access and licensed under the Creative Commons CC BY 4.0 License. Volume 16 24), Issue 4(20
Securing the Internet of Things: Evaluating
Machine Learning Algorithms for Detecting IoT
Cyberattacks Using CIC-IoT2023 Dataset
American International University-Bangladesh (AIUB), Dhaka, 1229, Bangladesh
E-mail: akinul@aiub.edu
ORCID i https://orcid.org/0000-0002- -6780 D: 2942
Arjun Kumar Bose Arnob
American International University-Bangladesh (AIUB), Dhaka, 1229, Bangladesh
E-mail: arjunkumarbosu@gmail.com
ORCID i https://orcid.org/0009-0003- -2328 D: 2244
Received: November 2023; Revised: January 2024; Accepted: 26 March 2024; Published: August 2024 06 02 08
Abstract: An increase in cyber threats directed at interconnected devices has resulted from the proliferation of the Internet
of Things (IoT), which necessitates the implementation comprehensive defenses against evolving attack vectors. This of
research investigates the utilization machine learning prediction models identify and defend against cyber-of (ML) to
attacks targeting IoT networks. Central emphasis placed the thorough examination the CIC-IoT2023 dataset, is on of an
extensive collection comprising a wide range Distributed Denial Service (DDoS) assaults diverse IoT devices. of of on
This ensures the utilization a practical and comprehensive benchmark for assessment. This study develops and of
compares four distinct machine learning models Logistic Regression (LR), K-Nearest Neighbors (KNN), Decision Tree
(DT), and Random Forest (RF) determine their effectiveness detecting and preventing cyber threats the Internet to in to
of of Things (IoT). The comprehensive assessment incorporates a wide range performance indicators, such -score, as F1
accuracy, precision, and recall. Significantly, the results emphasize the superior performance and RF, of DT
demonstrating exceptional accuracy rates of 0.9919 and 0.9916, correspondingly. The models demonstrate an outstanding
capability differentiate between benign and malicious packets, supported their high precision, recall, and to as by F1
scores. The precision-recall curves and confusion matrices provide additional evidence that and are strong DT RF
contenders the field IoT intrusion detection. Additionally, KNN demonstrates a noteworthy accuracy 0.9380. in of of On
the other hand, demonstrates the least accuracy with a value 0.8275, underscoring inherent incapability LR of its to
classify threats. conjunction with the realistic and diverse characteristics -IoT2023 dataset, the study's In of the CIC
empirical assessments provide invaluable knowledge for determining the most effective machine learning algorithms and
fortification strategies to protect infrastructures. Furthermore, this study establishes ground-breaking suggestions for IoT
subsequent inquiries, urging the examination unsupervised learning approaches and the incorporation of of deep learning
models decipher complex patterns within IoT networks. These developments have the potential strengthen to to
cybersecurity protocols for Internet of Things (IoT) ecosystems, reduce the impact of emergent risks, and promote robust
defense systems against ever-changing cyber challenges.
Index Terms: Internet Things, Cybersecurity, Machine Learning, DDoS Attacks, CIC-IoT2023 Dataset.of
1. Introduction
The IoT has become a crucial aspect daily lives, and because expanding use, there has been a rising of our of its
number cyberattacks devices. Security professionals and academics are extremely concerned about the current of on IoT
situation of IoT cyberattacks. IoT device threats fall under several areas, including network assaults, software attacks, and
physical attacks. Node cloning attacks are one type of physical assault that allows for node replication and network access
[1]. Advanced Persistent Threat (APT) assaults on software are type one of attack that allows attackers access a system to
while going lengthy periods [2]. Attackers can overwhelm a network with traffic and bring it to a halt using DDoS assaults
Securing the Internet of Things: Evaluating Machine Learning Algorithms for Detecting
IoT Cyberattacks Using CIC-IoT2023 Dataset
Volume 16 24), Issue 4 (20 57
[3]. IoT device security and privacy are significant problems, and poor authorization and authentication result can in
privacy issues the device level [4]. IoT device vulnerabilities and threats are growing daily, therefore it's critical at to
create strong defenses to keep them safe. Encryption, authentication, and access control are a few of the countermeasures
[5]. It's critical keep with the most recent security techniques and technological advancements prevent assaults to up to
on IoT devices.
Cyberattacks the are becoming more common, and their frequency rising. According [6], the overall on IoT is to
average number of weekly attacks on IoT devices per business increased by 41% in the first two months of 2023 compared
to 2022. The most often targeted IoT devices are those found European businesses, with APAC and Latin American-in
based corporations following behind. average, 54% organizations experience attempted cyber-attacks every week. On of
IoT device threats divided into many types, including network assaults, software attacks, and physical attacks [7]. can be
Node cloning attacks are type physical assault that allows for replication and network access. APT assaults one of node
are one type software attack where attacker can enter a system and undiscovered for a lengthy period [8]. DDoS of an be
assaults networks overwhelm the system with traffic and bring a halt. The creation efficient defenses on can it to of
against cyberattacks, such access control, authentication, and encryption, crucial [9]. as is
As a result these increasing concerns, machine learning (ML) algorithms have surfaced crucial instruments of as in
proactively identifying and mitigating cyber threats in Internet of Things (IoT) ecosystems. By capitalizing on pre-existing
datasets and conducting statistical analysis, machine learning techniques have demonstrated their capacity to detect threats
early, identify network vulnerabilities, and decrease operational expenses [10, 11]. Despite these developments, a
definitive benchmark the most effective machine learning algorithms detect IoT cyber threats has yet for to to be
established, creating a significant void the field cybersecurity research , 7]. A report -based in of IoT [6 on ML
identification malware executable files claims that techniques have been used solve a variety worldwide of in ML to of
computer security issues, including intrusion detection, fraud detection, ransomware recognition, and malware detection
[12]. prevent cyberattacks IoT devices, vital stay speed with most recent technologies and To on it is to up to the
approaches. algorithms are used cybersecurity detect and mitigate cyberattacks [13]. ML in to
This research endeavors fill this critical examining the construction and comparison machine learning to void by of
prediction models that employ the CIC-IoT2023 dataset identify intrusions targeting IoT devices. The dataset presents to
a practical standard that includes a wide range of DDoS attacks on different IoT devices, thereby offering a broad spectrum
for assessing the effectiveness algorithms the context IoT cybersecurity [14]. Logistic Regression (LR), K-of ML in of
Nearest Neighbors (KNN), Decision Tree (DT), and Random Forest (RF) are the models under consideration. The ML
principal aim determine the most efficient machine learning methodologies customized for Internet Things (IoT) is to of
security. This will furnish researchers and practitioners with indispensable knowledge to strengthen their defenses against
emergent cyber threats. far terms identifying and reducing evolving attack vectors concerned, current As in of is
methodologies and techniques have demonstrated their limitations the face growing apprehensions regarding IoT in of
cyber threats. Conventional methodologies frequently confront the complex and varied characteristics IoT networks, of
resulting intrinsic deficiencies when comes precisely detecting and averting advanced cyber threats. in it to
The other sections of the paper are organized as follows: section 2 describes the relevant literature, section 3 outlines
the methodologies and materials employed in this study, section 4 analyzes the findings, and section 5 offers a concluding
summary study. of the
2. Related Works
Extensive research has been conducted in the past investigate the application to of Machine Learning (ML) and Deep
Learning (DL) methods IoT cybersecurity. Nevertheless, these methodologies frequently encounter obstacles when to
attempting manage the intricate and ever-changing characteristics cyber threats that are specifically targeted to of at
interconnected IoT devices. Prevalently flawed are established methodologies, particularly concerning their capacity to
thoroughly detect and thwart innovative attack vectors that exploit susceptibilities across heterogeneous ecosystems. IoT
The IoT a rapidly growing industry that permeates everyday existence. Because IoT devices are networked, they are is
susceptible cyberattacks. The number cyberattacks IoT systems has increased recently, thus it's critical to of on to
recognize and prepare these attacks. Nowadays, very common apply for it is to DL and -based algorithms possible ML as
solutions this problem. Consequently, this study will investigate the findings current research the use to of on of ML and
DL methods for identifying and predicting cyberattacks IoT devices. The research [15] conducted a survey and a on in
literature analysis on ML and DL methods for IoT security. To assess how well different ML-based algorithms performed,
they used the KDD- dataset. The study discovered that cyberattack detection systems may accomplished 99 in IoT be
using both and techniques. The authors also emphasized the need for additional studies enhance the precision ML DL to
and effectiveness these approaches. [16], the use of In of and data analytics for IoT security covered using random ML is
forests, decision trees, and neural networks. They used the NSL- dataset, and the accuracy rate the technique KDD of RF
was 99.6%.
ML algorithms are suggested for automating the detection of cyberattacks as well as for quick prediction and analysis
of attack types [17]. A deep learning methodology suggested another study [18] for anticipating cybersecurity is in
assaults the IoT. The study uses and methods carefully extract important information from a dataset. on ML DL to BoT
They showed the improved accuracy performance and dependability of cyber threat prediction in IoT scenarios. The study
Securing the Internet of Things: Evaluating Machine Learning Algorithms for Detecting
IoT Cyberattacks Using CIC-IoT2023 Dataset
58 Volume 16 24), Issue 4(20
produces more precise and reliable forecasts and enhanced security. a survey [15] and techniques for IoT In of ML DL
assessing cybersecurity IoT, various techniques are explored for anomalous activities and cyber threats detection in ML
using the KDD- dataset. 99
The Bot-IoT dataset [19] is made up of simulated IoT sensor data that includes both normal and attack traffic. Using
ML and models, intrusion detection system (IDS) was created identify the class imbalance issue the dataset. DL an to of
The and multi-layer perceptron models outperformed all other models the performance evaluation different DT in of
models employing three distinct feature sets for identifying DDoS and DoS assaults across IoT networks. More than 99%
accuracy average. The study also showed that, for future Bot-IoT dataset implementations, the Argus flow data on
generator not required. is ML approaches were used by [20] create the best security models for spotting IoT intrusions. to
They used the N-BaIoT dataset, which comprises botnet attacks injected into various IoT devices such doorbells, baby as
monitors, security cameras, and webcams, and they primarily focused botnet attacks targeting different IoT devices. on
They use a variety models, including deep learning models, their botnet detection algorithms for each device. of ML in
With a focus the models that attained a high detection on F1-score, the effectiveness the models was examined through of
multiclass and binary classification. The findings demonstrated that ML-based models, in particular deep learning models,
were successful identifying botnet attacks IoT devices. The findings revealed how techniques enhance in on ML IoT
security and solve issues brought the proliferation devices and threats. on by of IoT
For IoT systems, [21] suggests a paradigm for the next-generation cyber-attack prediction that uses the CHAID
decision tree and multi-class SVM predict cyberattacks with a 99.72% accuracy rate. detect cyberattacks to To in IoT
networks, [22] presents a -based detection method. The study uses LSTM identify network intrusions and focuses DL to
on the detection of DDoS attacks. The study achieves great accuracy rates complicated assault detection and prediction. in
The article covers the deep learning models, datasets, and distributed attack detection systems that were created. The
research evaluates the distributed attack detection framework and demonstrates the efficacy distributed models of DL to
enable IoT networks detect a wide range assaults with high detection and accuracy rates. to of
3. Methods and Materials
When choosing the models for this research, considered the inherent deficiencies traditional methods ML we of
when came identifying and addressing emergent cyber threats networks. The inability conventional it to in IoT of
approaches handle the ever-changing, varied, and dynamic characteristics attack vectors served the impetus for to of as
our investigation into more resilient models that could discern complex patterns IoT traffic. Similarly, the selection in of
evaluation metrics was influenced the deficiencies identified previous evaluations, intending rectify the issues by in to
and offer a holistic assessment the model's efficacy that extended beyond traditional metrics. The dataset, models, of ML
and assessment measures that we employed this study are also covered detail. Fig 1 depicts the overall workflow in in of
our technique. Working with the CIC-IoT2023 dataset requires following a prescribed procedure. Loading the dataset is
the first step, followed by the essential stage data preprocessing, which involves handling missing values, cleaning the of
data, and formatting modifications. Then, make training and evaluating models easier, the dataset divided into to is two
subsets- training, and testing. techniques are then assessed the testing set determine their performance after ML on to
being selected and trained the training set. A detailed evaluation the models' efficacy conducted using relevant on of is
measures, including score, accuracy, precision, and recall. The end goal choose the model that best fits the F1 is to
requirements of the ongoing project or to consider further optimization for improved accuracy. This methodical approach
for working using the CIC-IoT2023 dataset is ensured by this well-organized methodology, leading to intelligent decisions
and reliable outcomes. ML
3.1. Dataset Overview
The CIC-IoT2023 dataset [14], a publicly available dataset that contains actual network traffic from various IoT
devices under both normal and attack circumstances, one that employed study. The Canadian Institute is the we in this
for Cybersecurity (CIC) and the Information Technology University Copenhagen (ITU) collaborated generate the of to
CIC-IoT2023 dataset. A smart home environment with 20 IoT gadgets, including cameras, thermostats, smart TVs, smart
watches, etc., was simulated create the dataset. Wireshark and TCPdump tools were used record to to the network traffic,
while Snort and Suricata intrusion detection systems were used categorize Ten days' worth network traffic five to it. of
days regular traffic and five days attack traffic-make the dataset. TCP SYN Flood, UDP Flood, HTTP Flood, of of up
HTTP Slow Post, Slowloris, MQTT Flood, CoAP Flood, WS-DDoS (WebSocket), Web Service Flood (SOAP), and Web
Service Flood (RESTful) are among the ten various DDoS attack types included dataset. There are around in the 80
million packets the dataset, million which are classified malicious and million normal. For each packet in 64 of as 16 as
in the dataset, there are 115 features, including the protocol, payload size, timestamp, and source and destination IP
addresses.
Fig 2 shows how different cyberattacks are distributed in several instances in the dataset. The graphic deftly classifies
fewer common attacks into "Other" category while highlighting the frequency various attack kinds. The "Other" an of
category utilized when the quantity occurrences for a specific attack less than a predetermined threshold. This is of is
method offers a concise summary of the most common attack routes without overcrowding the chart with labels.
Securing the Internet of Things: Evaluating Machine Learning Algorithms for Detecting
IoT Cyberattacks Using CIC-IoT2023 Dataset
Volume 16 24), Issue 4(20 59
Fig.1. Architecture model machine learning approach of
This dataset differs from other IoT datasets used network intrusion detection studies that possesses the in in it
following features:
Instead simulating emulating devices, uses actual devices both attackers and victims. of or it IoT as
In contrast a small number to of devices from a single vendor protocol, or it encompasses a broad variety of IoT
devices from several manufacturers and protocols.
Instead a single type attack that targets a particular layer service, consists various DDoS attack of of or it of
types that target various layers the network stack. of
Instead a small amount data with diversity and complexity, offers a vast amount data with great of of low it of
diversity.
This dataset can offer a more complex and realistic environment for testing well algorithms work for how ML
identifying cyberattacks. IoT
3.2. Machine Learning Models
Using the CIC-IoT2023 dataset, selected and compared four well-known machine-learning algorithms: RF, DT, we
KNN, and LR. These algorithms were picked based how well-liked and effective they were earlier research on in on
network intrusion detection. With the help the Python and scikit-learn libraries, of we developed these algorithms. Except
for KNN, where changed the number neighbors used the default settings for each algorithm's parameters. we of to 5, we
Before supplying the dataset the models, also performed certain preprocessing operations These actions to ML we on it.
comprise:
Removing features like packet ID, checksum, and other unused superfluous components. or
Converting categorical characteristics, such protocol type and service type, into numerical values. as
Using min-max scaling, numerical features are normalized into a range of [0, 1].
Using the random under-sampling technique, can equalize the class distribution one by lowering the number of
malicious packets the same level the number legitimate packets. to as of
Dividing the dataset, keeping the class proportion constant, into a training set (70%) and a testing set (30%)
using a stratified sampling approach.
Securing the Internet of Things: Evaluating Machine Learning Algorithms for Detecting
IoT Cyberattacks Using CIC-IoT2023 Dataset
60 Volume 16 24), Issue 4(20
Fig.2. Distribution attacks of
3.3. Evaluation Metrics
On we the CIC-IoT2023 dataset, assessed the evaluation the algorithms using various metrics that are of ML
frequently employed classification tasks. The most common evaluation metrics are accuracy, precision, recall, and in F1-
Score which are briefly described below along with the equation calculate. to
Accuracy: The proportion of correctly categorized packets all packets. to
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃+𝑇𝑁
𝑇𝑃 𝑇𝑁
+ +𝐹𝑃 +𝐹𝑁
(1)
Precision: The proportion harmful packets accurately identified relative all malicious packets expected. of to
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃
𝑇𝑃
+𝐹𝑃
(2)
Recall: The proportion harmful packets that were accurately identified all malicious packets. of to
𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃
𝑇𝑃
+𝐹𝑁
(3)
F1-Score: The harmonic means of the recall and precision.
𝐹1 𝑆𝑐𝑜𝑟𝑒 =
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙
2
(4)
4. Results and Discussion
To LR identify cyberattacks, four models have been developed using RF, KNN, DT, and IoT ML algorithms. The
performance assessment these models precision-recall curves displayed Fig 3 for these algorithms. of with is in each of
Securing the Internet of Things: Evaluating Machine Learning Algorithms for Detecting
IoT Cyberattacks Using CIC-IoT2023 Dataset
Volume 16 24), Issue 4(20 61
A precision-recall curve is a graph that shows the trade-off between precision and recall at different probability thresholds.
Precision is the percentage of accurate positive predictions, whereas recall the proportion is of positive incidents that were
correctly predicted. The curve a perfect model would reach the top right corner, signifying 100% recall and 100% of
precision. The model's performance across all thresholds gauged the area under the curve (AUC). and have is by DT RF
the highest AUC, followed by and LR, can KNN as be shown. Due to their ability to distinguish between the most hostile
and legitimate packets, and are therefore the most accurate and trustworthy models for detecting IoT intrusions. DT RF
While also works well, precision lower than that and RF. The algorithm with the lowest AUC, LR, KNN its is of DT is
unsuitable for this task due high rate false positives and false negatives.to its of
Fig.3. Precision-recall curves
The confusion matrix for each these -based models displayed Fig 7. The rows depict the actual of ML is in 4, 5, 6,
classes, while the columns display the predicted classes. The diagonal elements show correct forecasts. the other the On
hand, off-diagonal elements show incorrect forecasts. The confusion matrix used calculate a variety metrics, can be to of
such recall, precision, accuracy, and -score. as F1 In contrast to false positives (FP) and false negatives (FN), which are at
their lowest levels, the percentage of true positives (TP) and true negatives (TN) is highest for and RF. This indicates DT
that they have a low mistake rate and correctly identify the majority packets malicious legitimate. KNN has can of as or
a lot and well, but also has more and than and RF. This indicates that has a greater error rate of TP TN as it FP FN DT it
and that some packets may be mistakenly classified as harmful or legitimate. The proportion of TP and TN lowest while is
the proportion and largest LR. This indicates that has extremely high error rate and can rarely of FP FN is in it an
distinguish between malicious and legitimate messages.
The outcomes highlight how crucial pick the best algorithm for IoT threat detection. Some techniques, it is to ML
such feature selection, dimensionality reduction, parameter tuning, and ensemble methods, can as be used improve the to
evaluation algorithms. These techniques can maximize the algorithms' potential and raise their effectiveness of ML in
spotting IoT assaults.
Securing the Internet of Things: Evaluating Machine Learning Algorithms for Detecting
IoT Cyberattacks Using CIC-IoT2023 Dataset
62 Volume 16 24), Issue 4(20
Fig.4. Confusion matrix (random forest)
Fig.5. Confusion matrix (logistic regression)
Fig.6. Confusion matrix (decision tree)
Securing the Internet of Things: Evaluating Machine Learning Algorithms for Detecting
IoT Cyberattacks Using CIC-IoT2023 Dataset
Volume 16 24), Issue 4(20 63
Fig.7. Confusion matrix -nearest neighbors) (k
The performance evaluations each these models the CIC-IoT2023 dataset for detecting cyber-attacks are of of on
presented Table The evaluation metrics include accuracy, precision, recall, and score for comparing these in 1. F1
techniques based RF, KNN, DT, and algorithms. on LR
We can make several significant conclusions and observations the performance algorithms identifying on of ML in
IoT cyberattacks based the evaluation metrics that the algorithms and excelled, attaining the highest accuracy on DT RF
ratings 0.9919 and 0.9916, respectively. Additionally, they earned the highest precision, recall, and of F1-score, showing
that they are reliable correctly categorizing both valid and malicious packets. in
With accuracy 0.9380, performed reassuringly, effectively and efficiently. Although may not an of KNN it be as
accurate as DT and RF, KNN shows proficiency with -linear data. can, however, non KNN be computationally expensive
and sensitive is to noise and outliers. With a score of 0.8275, Logistic Regression (LR) had the lowest accuracy all the of
methods. This may be explained by the linear assumption made by LR, which leads to a high percentage of false positives.
Additionally, has low precision and -score values due sensitivity it F1 to its to noise and outliers. The most successful and
efficient algorithms for identifying IoT cyberattacks were Decision Tree and Random Forest. These results offer important
information for choosing the best ML algorithms and defense tactics protect the IoT from online threats. to
Table Evaluation metrics the models 1. of ML
Algorithm
Accuracy
Precision
Recall
F1-Score
RF
0.9916
0.9913
0.9916
0.9909
KNN
0.9380
0.9366
0.9380
0.9364
DT
0.9919
0.9920
0.9919
0.9919
LR
0.8275
0.8473
0.8275
0.8034
A detailed evaluation the effectiveness machine learning algorithms mitigating cyber threats the Internet of of in to
of Things was conducted through our analysis of the CIC-IoT2023 dataset. Upon examining the precise evaluation metrics
associated with model, was observed that and exhibited outstanding performance. demonstrated a each it DT RF RF
remarkable Score F1 of 99.08%, recall of 99.15%, and precision 99.12%, addition of in to an accuracy rate 99.15%. of In
a similar vein, demonstrated exceptional performance accuracy 99.18%, Score 99.19%, a recall DT with an of an F1 of
of of 99.18%, and a precision 99.19%. The robustness both models differentiating benign from malevolent packets of in
in is the IoT networks highlighted these metrics. the other hand, by On KNN algorithm demonstrated a noteworthy
accuracy 93.81%. This was supported Score, recall, and precision values 93.64%, 93.81%, and 93.66%, of by F1 of
respectively. In contrast, LR performed less effectively, achieving an accuracy of 82.75%. This resulted in comparatively
lower values for Score, recall, and precision, which were 80.34%, 82.75%, and 84.73% respectively. This detailed F1
examination consistent with research goals, clarifies the intricate functioning each model and provides is our as it of
evidence for the superiority of DT and strengthening IoT cybersecurity. RF in
5. Conclusions
In this study, analyzed four algorithms for detecting IoT cyberattacks using the CIC-IoT2023 dataset: RF, we ML
KNN, DT, and LR. The dataset used this study provides a comprehensive and realistic benchmark containing multiple in
types of DDoS attacks on different IoT devices. carried preparation, model training, performance We out data and
Securing the Internet of Things: Evaluating Machine Learning Algorithms for Detecting
IoT Cyberattacks Using CIC-IoT2023 Dataset
64 Volume 16 24), Issue 4(20
evaluation using relevant metrics like accuracy, precision, recall, and -score. The results show that and are the F1 DT RF
most successful and efficient algorithms identifying IoT cyberattacks, with accuracy rates 0.9919 and 0.9916, for of
respectively. These algorithms are also the best terms precision, recall, and -score values, indicating that they can in of F1
reliably distinguish between malicious and normal packets. With accuracy 0.9380, KNN does admirably well, an of as
while has the lowest accuracy 0.8275. LR at
This study provides a substantial critique the inherent constraints that exist existing approaches IoT of in to
cybersecurity. Through a comprehensive examination of the effectiveness of machine learning models detecting cyber in
threats the context the Internet Things (IoT) and utilizing the CIC-IoT2023 dataset, this research sheds light in of of on
the limitations of conventional methods in managing the ever-changing and intricate nature of such threats. The potential
of DT, and RF algorithms rectify these shortcomings highlighted their superior performance. This could result to is by in
a more dependable and efficient method detecting and thwarting malicious activities interconnected IoT of in
environments. The results this research have substantial ramifications for the pragmatic implementation machine of of
learning fortifying the security the Internet Things. The algorithmic capabilities DT, and demonstrate in of of of RF
exceptional levels accuracy, precision, and recall, rendering them feasible contenders for prompt implementation of in
IoT defense systems. The ability these systems differentiate between benign and malicious traffic provides a strong of to
basis for developing strategies detect and mitigate threats real time. This provides concrete advantages industry to in for
stakeholders who are interested protecting ecosystems. in IoT
Moreover, the study establishes a foundation for numerous expansions and forthcoming trajectories the realm in of
IoT cybersecurity. Investigating federated learning techniques, incorporating unsupervised learning approaches, and
integrating deep learning models are all potentially fruitful avenues for improving the scalability and adaptability of cyber
threat detection mechanisms IoT networks. Moreover, address the ever-changing cyber threat landscape, enhancing in to
the security of IoT infrastructures could accomplished through the integration machine learning algorithms that be of are
continuously improved and diverse datasets are utilized.
Dataset Availability Statement
The dataset used this study found https://www.unb.ca/cic/datasets/iotdataset-2023.html, [accessed in can be on on
05 October 2023].
References
[1] U. Tariq, I. Ahmed, A. K. Bashir, K. Shaukat, A Critical Cybersecurity Analysis and Future Research Directions for the Internet
of Things: A Comprehensive Sensors, Vol. 23, No. 8, 2023. DOI: https://doi.org/10.3390/s23084117 Review.
[2] for IoT X. Cheng, Zhang, Chen, Situation Comprehension J. B. Cyber Systems based on Alerts and Logs APT Correlation,
Sensors, Vol.19, No.18, 2019. DOI: https://doi.org/10.3390/s19184045
[3] P. P. K. Sadhu, V. Yanambaka, Abdelgawad, of Things: Security and Solutions Sensors, Vol. 22, No. 19, A. Internet Survey,
2022. DOI: https://doi.org/10.3390/s22197433
[4] S. Kumar, P. Tiwari, M. Zymbler, Internet of Things is a revolutionary approach for future technology enhancement: a review,
Journal of Big Data, Vol.6, No.1, pp.1-21, 2019. DOI: https://doi.org/10.1186/s40537-019-0268-2
[5] J. P. A. Yaacoub, H. N. Noura, Salman, Chehab, cyber security: Vulnerabilities, attacks, countermeasures, and O. A. Robotics
recommendations, International Journal of Information Security, Vol.21, pp.115-158, 2022. DOI:
https://doi.org/10.1007/s10207-021-00545-8
[6] IoT Check Point Research, Tipping Point: Exploring the Surge The in Cyberattacks 2023. Retrieved on October 12, Globally,
2023, https://blog.checkpoint.com/security/the-tipping-point-exploring-the-surge-from in-iot-cyberattacks-plaguing-the-
education-sector/
[7] K. D. K. Tsiknas, Taketzis, Demertzis, Skianis, threats industrial IoT: a survey on attacks and C. Cyber to countermeasures,
IoT, Vol. No. 1, pp. 163-186, 2021. DOI: https://doi.org/10.3390/iot2010009 2,
[8] M. Abdullahi, Baashar, Alhussian, Y. H. A. N. Alwadain, Aziz, L. F. Capretz, S. Abdulkadir, cybersecurity attacks J. Detecting
in the internet of things using artificial intelligence methods: A systematic literature Electronics, Vol. 11, No. 2022. review, 2,
DOI: https://doi.org/10.3390/electronics11020198
[9] IoT Ani Petrosyan, number of Annual attacks global 2023. Retrieved on October 12, 2023 2022,
https://www.statista.com/statistics/1377569/worldwide-annual-internet-of-things-attacks/
[10] M. M. M. J. Ahsan, Nygard, Gomes, K. E. R. Chowdhury, Rifat, N. F. Connolly, threats and their mitigation Cybersecurity
approaches using Machine Learning A Journal of Cybersecurity and Privacy, Vol. 2, No. 3, pp. 527-555, 2022. Review,
[11] Matthew Urwin, Machine Learning in Cybersecurity: How Works and Companies It to Know, 2023. Retrieved on October 12,
2023 https://builtin.com/artificial-intelligence/machine-learning-cybersecurity
[12] J. Singh, J. Singh, A survey on machine learning-based malware detection in executable Journal of Systems Architecture, files,
Vol. 112, 2021.
[13] In N. K. Vadivelan, Bhargavi, Kodati, Nalini, of cyber-attacks using machine learning. S. M. Detection AIP Conference
Proceedings. AIP Publishing. Vol. 2405, No. 2022. 1,
[14] E. C. P. Neto, Dadkhah, Ferreira, Zohourian, Lu, S. R. A. R. A. A. Ghorbani, A real-time dataset and benchmark CICIoT2023:
for large-scale attacks IoT Sensors, Vol. 23, No. 13, 2023. DOI: https://doi.org/10.3390/s23135941 in environment,
[15] for IoT U. H. Inayat, Zia, Mahmood, M. F. S. M. Khalid, Benbouzid, -based methods M. Learning cyber-attack detection in
systems: A survey on methods, analysis, and future Electronics, Vol. 11, No. 9, 2022. prospects,
[16] for E. Adi, Anwar, Baig, Zeadally, learning and data analytics A. Z. S. Machine the Neural computing and applications, IoT,
Securing the Internet of Things: Evaluating Machine Learning Algorithms for Detecting
IoT Cyberattacks Using CIC-IoT2023 Dataset
Volume 16 24), Issue 4(20 65
Vol. 32, pp. 16205-16233, 2020.
[17] C. Malathi, Padmaja, cyber-attacks using machine learning smart IoT Materials Today: I. N. Identification of in networks,
Proceedings, Vol. 80, pp. 2518-2523, 2023.
[18] for O. A. A. D. Alkhudaydi, Krichen, M. Alghamdi, Deep Learning Methodology A Predicting Cybersecurity Attacks on the
Internet of Things. Vol. 14, No. 10, pp. 550, 2023. Information,
[19] J. J. J. Almaraz-Rivera, G. Perez-Diaz, A. Cantoral-Ceballos, A. Transport and application layer DDoS attacks detection to IoT
devices by using machine learning and deep learning Sensors, Vol. 22, No. 9, 2022. models,
[20] IoT J. M. Kim, Shim, Hong, Shin, Choi, detection of S. Y. E. E. Intelligent botnets using machine learning and deep learning,
Applied Sciences, Vol. 10, No. 19, 2023.
[21] S. S. Dalal, Lilhore, Foujdar, U. K. N. Simaiya, Ayadi, Almujally, Ksibi, -generation cyber-attack prediction M. N. A. A. Next
for IoT systems: leveraging multi-class and optimized CHAID decision Journal of Cloud Computing, Vol. 12, No. SVM tree,
1, pp. 1-20, 2023.
[22] O. N. H. Jullian, Otero, Rodriguez, B. E. Gutierrez, Antona, Canal, -Learning Based Detection for Cyber-Attacks R. Deep in
IoT Networks: A Distributed Attack Detection Journal of Network and Systems Management, Vol. 31, No. 2, pp. Framework,
33, 2023.
Authors Profiles
Dr. Akinul Islam Jony currently holds the position of Associate Professor and serves as the Head of the
Undergraduate Program in Computer Science American International University-Bangladesh (AIUB). His at
research interests encompass a wide range of topics, including cybersecurity, artificial intelligence, machine
learning, e-learning, educational technology, and issues software engineering. in
Arjun Kumar Bose Arnob is a final semester student of BSc Computer Science and Engineering and majoring in
in is Software Engineering the American International University-Bangladesh (AIUB). at He currently working as
a Research Assistant AIUB and at is actively involved research projects. has a strong passion and proficiency in He
in is in Machine Learning and Deep Learning which reflected his work. has consistently performed well He
academically and dedicated his studies. is to
How to cite this paper: Akinul Islam Jony, Arjun Kumar Bose Arnob, "Securing the Internet of Things: Evaluating Machine Learning
Algorithms for Detecting IoT Cyberattacks Using CIC-IoT2023 Dataset", International Journal of Information Technology and
Computer Science(IJITCS), Vol.16, No.4, pp.56-65, 2024. DOI:10.5815/ijitcs.2024.04.04

Preview text:

I.J. Information Technology and Computer Science, 2024, 4, 56-65
Published Online on August 8, 2 2 0 4 b
y MECS Press (http://www.mecs-press.org/)
DOI: 10.5815/ijitcs.2024.04.04
Securing the Internet of Things: Evaluating
Machine Learning Algorithms for Detecting IoT
Cyberattacks Using CIC-IoT2023 Dataset Akinul Islam Jony*
American International University-Bangladesh (AIUB), Dhaka, 1229, Bangladesh E-mail: akinul@aiub.edu
ORCID iD: https://orcid.org/0000-0002-294 - 2 6780 *Corresponding author Arjun Kumar Bose Arnob
American International University-Bangladesh (AIUB), Dhaka, 1229, Bangladesh
E-mail: arjunkumarbosu@gmail.com
ORCID iD: https://orcid.org/0009-0003-224 - 4 2328
Received: 06 November 2023; Revised: 0
2 January 2024; Accepted: 26 March 2024; Published: 0 8 August 2024
Abstract: An increase in cyber threats directed at interconnected devices has resulted from the proliferation of the Internet
of Things (IoT), which necessitates the implementation o
f comprehensive defenses against evolving attack vectors. This
research investigates the utilization of machine learning (ML ) prediction models t
o identify and defend against cyber-
attacks targeting IoT networks. Central emphasis i
s placed on the thorough examination of the CIC-IoT2023 dataset, a n
extensive collection comprising a wide range of Distributed Denial of Service (DDoS) assaults on diverse IoT devices.
This ensures the utilization of a practical and comprehensive benchmark for assessment. This study develops and
compares four distinct machine learning models Logistic Regression (LR), K-Nearest Neighbors (KNN), Decision Tree
(DT), and Random Forest (RF) to determine their effectiveness i
n detecting and preventing cyber threats t o the Internet
of Things (IoT). The comprehensive assessment incorporates a wide range of performance indicators, such as F - 1 score,
accuracy, precision, and recall. Significantly, the results emphasize the superior performance of DT and RF,
demonstrating exceptional accuracy rates of 0.9919 and 0.9916, correspondingly. The models demonstrate an outstanding capability t
o differentiate between benign and malicious packets, a
s supported by their high precision, recall, and F1
scores. The precision-recall curves and confusion matrices provide additional evidence that DT and R F are strong contenders i n the field o
f IoT intrusion detection. Additionally, KNN demonstrates a noteworthy accuracy o f 0.9380. On
the other hand, LR demonstrates the least accuracy with a value o f 0.8275, underscoring it s inherent incapability t o
classify threats. In conjunction with the realistic and diverse characteristics of the CIC-IoT2023 dataset, the study's
empirical assessments provide invaluable knowledge for determining the most effective machine learning algorithms and
fortification strategies to protect IoT infrastructures. Furthermore, this study establishes ground-breaking suggestions for
subsequent inquiries, urging the examination of unsupervised learning approaches and the incorporation of deep learning models t
o decipher complex patterns within IoT networks. These developments have the potential t o strengthen
cybersecurity protocols for Internet of Things (IoT) ecosystems, reduce the impact of emergent risks, and promote robust
defense systems against ever-changing cyber challenges.
Index Terms: Internet of Things, Cybersecurity, Machine Learning, DDoS Attacks, CIC-IoT2023 Dataset. 1. Introduction
The IoT has become a crucial aspect of ou
r daily lives, and because of it
s expanding use, there has been a rising number o f cyberattacks on Io
T devices. Security professionals and academics are extremely concerned about the current
situation of IoT cyberattacks. IoT device threats fall under several areas, including network assaults, software attacks, and
physical attacks. Node cloning attacks are one type of physical assault that allows for node replication and network access
[1]. Advanced Persistent Threat (APT) assaults on software are one type of attack that allows attackers t o access a system
while going lengthy periods [2]. Attackers can overwhelm a network with traffic and bring it to a halt using DDoS assaults
This work is open access and licensed under the Creative Commons CC BY 4.0 License. Volume 16 (2024), Issue 4
Securing the Internet of Things: Evaluating Machine Learning Algorithms for Detecting
IoT Cyberattacks Using CIC-IoT2023 Dataset
[3]. IoT device security and privacy are significant problems, and poor authorization and authentication ca n result i n privacy issues a
t the device level [4]. IoT device vulnerabilities and threats are growing daily, therefore it's critical to
create strong defenses to keep them safe. Encryption, authentication, and access control are a few of the countermeasures [5]. It's critical t o keep u
p with the most recent security techniques and technological advancements t o prevent assaults on IoT devices. Cyberattacks o
n the IoT are becoming more common, and their frequency i s rising. According t o [6], the overall
average number of weekly attacks on IoT devices per business increased by 41% in the first two months of 2023 compared
to 2022. The most often targeted IoT devices are those found in European businesses, with APAC and Latin American-
based corporations following behind. O
n average, 54% of organizations experience attempted cyber-attacks every week. IoT device threats ca
n be divided into many types, including network assaults, software attacks, and physical attacks [7].
Node cloning attacks are one type of physical assault that allows for nod
e replication and network access. APT assaults are one type o f software attack where a
n attacker can enter a system and b
e undiscovered for a lengthy period [8]. DDoS assaults o n networks ca
n overwhelm the system with traffic and bring it t
o a halt. The creation of efficient defenses against cyberattacks, such a
s access control, authentication, and encryption, i s crucial [9].
As a result of these increasing concerns, machine learning (ML) algorithms have surfaced a s crucial instruments in
proactively identifying and mitigating cyber threats in Internet of Things (IoT) ecosystems. By capitalizing on pre-existing
datasets and conducting statistical analysis, machine learning techniques have demonstrated their capacity to detect threats
early, identify network vulnerabilities, and decrease operational expenses [10, 11]. Despite these developments, a
definitive benchmark for the most effective machine learning algorithms t
o detect IoT cyber threats has yet to be
established, creating a significant void i
n the field of IoT cybersecurity research [ , 6 7]. A report on ML-based
identification of malware in executable files claims that M L techniques have been used t
o solve a variety of worldwide
computer security issues, including intrusion detection, fraud detection, ransomware recognition, and malware detection [12]. T o prevent cyberattacks o n IoT devices, it i s vital t o stay up t o speed with th
e most recent technologies and approaches. M L algorithms are used i n cybersecurity t
o detect and mitigate cyberattacks [13]. This research endeavors t
o fill this critical void by examining the construction and comparison of machine learning
prediction models that employ the CIC-IoT2023 dataset t
o identify intrusions targeting IoT devices. The dataset presents
a practical standard that includes a wide range of DDoS attacks on different IoT devices, thereby offering a broad spectrum
for assessing the effectiveness of ML algorithms i
n the context of IoT cybersecurity [14]. Logistic Regression (LR), K-
Nearest Neighbors (KNN), Decision Tree (DT), and Random Forest (RF) are the M
L models under consideration. The principal aim is t
o determine the most efficient machine learning methodologies customized for Internet of Things (IoT)
security. This will furnish researchers and practitioners with indispensable knowledge to strengthen their defenses against
emergent cyber threats. As far i
n terms of identifying and reducing evolving attack vectors i s concerned, current
methodologies and techniques have demonstrated their limitations in the face of growing apprehensions regarding IoT
cyber threats. Conventional methodologies frequently confront the complex and varied characteristics of IoT networks, resulting i
n intrinsic deficiencies when i t comes t
o precisely detecting and averting advanced cyber threats.
The other sections of the paper are organized as follows: section 2 describes the relevant literature, section 3 outlines
the methodologies and materials employed in this study, section 4 analyzes the findings, and section 5 offers a concluding summary of th e study. 2. Related Works
Extensive research has been conducted in the past t
o investigate the application of Machine Learning (ML) and Deep Learning (DL) methods t
o IoT cybersecurity. Nevertheless, these methodologies frequently encounter obstacles when
attempting to manage the intricate and ever-changing characteristics of cyber threats that are specifically targeted a t
interconnected IoT devices. Prevalently flawed are established methodologies, particularly concerning their capacity t o
thoroughly detect and thwart innovative attack vectors that exploit susceptibilities across heterogeneous IoT ecosystems. The IoT i
s a rapidly growing industry that permeates everyday existence. Because IoT devices are networked, they are susceptible t o cyberattacks. The number o f cyberattacks o
n IoT systems has increased recently, thus it's critical t o recognize and prepare fo r these attacks. Nowadays, i t i s very common t
o apply DL and ML-based algorithms a s possible solutions t
o this problem. Consequently, this study will investigate the findings of current research o n the use o f ML and
DL methods for identifying and predicting cyberattacks o n IoT devices. The research i
n [15] conducted a survey and a
literature analysis on ML and DL methods for IoT security. To assess how well different ML-based algorithms performed, they used the KDD-9
9 dataset. The study discovered that cyberattack detection in IoT systems may be accomplished using both ML and D
L techniques. The authors also emphasized the need for additional studies t o enhance the precision and effectiveness o f these approaches. I n [16], the use of M
L and data analytics for IoT security i s covered using random
forests, decision trees, and neural networks. They used the NSL-KD
D dataset, and the accuracy rate of the R F technique was 99.6%.
ML algorithms are suggested for automating the detection of cyberattacks as well as for quick prediction and analysis
of attack types [17]. A deep learning methodology i s suggested i
n another study [18] for anticipating cybersecurity
assaults on the IoT. The study uses ML and DL methods t
o carefully extract important information from a Bo T dataset.
They showed the improved accuracy performance and dependability of cyber threat prediction in IoT scenarios. The study Volume 16 (2024), Issue 4 57
Securing the Internet of Things: Evaluating Machine Learning Algorithms for Detecting
IoT Cyberattacks Using CIC-IoT2023 Dataset
produces more precise and reliable forecasts and enhanced IoT security. I n a survey [15] of ML and D L techniques for assessing cybersecurity i n IoT, various M
L techniques are explored for anomalous activities and cyber threats detection using the KDD-9 9 dataset.
The Bot-IoT dataset [19] is made up of simulated IoT sensor data that includes both normal and attack traffic. Using ML and DL models, a
n intrusion detection system (IDS) was created to identify the class imbalance issue of the dataset.
The DT and multi-layer perceptron models outperformed all other models i
n the performance evaluation o f different
models employing three distinct feature sets for identifying DDoS and DoS assaults across IoT networks. More than 99% accuracy o
n average. The study also showed that, for future Bot-IoT dataset implementations, the Argus flow data generator i
s not required. ML approaches were used by [20] t
o create the best security models for spotting IoT intrusions.
They used the N-BaIoT dataset, which comprises botnet attacks injected into various IoT devices such a s doorbells, baby
monitors, security cameras, and webcams, and they primarily focused on botnet attacks targeting different IoT devices.
They use a variety of ML models, including deep learning models, in their botnet detection algorithms for each device. With a focus o
n the models that attained a high detection F1-score, the effectiveness of the models was examined through
multiclass and binary classification. The findings demonstrated that ML-based models, in particular deep learning models, were successful i
n identifying botnet attacks on IoT devices. The findings revealed how M L techniques enhance IoT
security and solve issues brought o n by the proliferation o f IoT devices and threats.
For IoT systems, [21] suggests a paradigm for the next-generation cyber-attack prediction that uses the CHAID
decision tree and multi-class SVM t
o predict cyberattacks with a 99.72% accuracy rate. T o detect cyberattacks in Io T
networks, [22] presents a DL-based detection method. The study uses LSTM t
o identify network intrusions and focuses
on the detection of DDoS attacks. The study achieves great accuracy rates i
n complicated assault detection and prediction.
The article covers the deep learning models, datasets, and distributed attack detection systems that were created. The
research evaluates the distributed attack detection framework and demonstrates the efficacy of distributed DL models t o enable IoT networks t o detect a wide range o
f assaults with high detection and accuracy rates.
3. Methods and Materials When choosing the M L models for this research, w
e considered the inherent deficiencies of traditional methods when i t came t
o identifying and addressing emergent cyber threats in IoT networks. The inability o f conventional approaches t
o handle the ever-changing, varied, and dynamic characteristics of attack vectors served a s the impetus for
our investigation into more resilient models that could discern complex patterns i
n IoT traffic. Similarly, the selection o f
evaluation metrics was influenced b
y the deficiencies identified i
n previous evaluations, intending t o rectify the issues
and offer a holistic assessment of the model's efficacy that extended beyond traditional metrics. The dataset, ML models,
and assessment measures that we employed i
n this study are also covered i
n detail. Fig 1 depicts the overall workflow of
our technique. Working with the CIC-IoT2023 dataset requires following a prescribed procedure. Loading the dataset i s
the first step, followed by the essential stage of data preprocessing, which involves handling missing values, cleaning the
data, and formatting modifications. Then, t
o make training and evaluating models easier, the dataset i s divided into two
subsets- training, and testing. ML techniques are then assessed o n the testing set t
o determine their performance after
being selected and trained on the training set. A detailed evaluation of the models' efficacy i s conducted using relevant measures, including F
1 score, accuracy, precision, and recall. The end goal is t
o choose the model that best fits the
requirements of the ongoing project or to consider further optimization for improved accuracy. This methodical approach
for working using the CIC-IoT2023 dataset is ensured by this well-organized methodology, leading to intelligent decisions and reliable M L outcomes. 3.1. Dataset Overview
The CIC-IoT2023 dataset [14], a publicly available dataset that contains actual network traffic from various IoT
devices under both normal and attack circumstances, is th e one that we employed in thi
s study. The Canadian Institute
for Cybersecurity (CIC) and the Information Technology University o
f Copenhagen (ITU) collaborated t o generate the
CIC-IoT2023 dataset. A smart home environment with 20 IoT gadgets, including cameras, thermostats, smart TVs, smart
watches, etc., was simulated t
o create the dataset. Wireshark and TCPdump tools were used t o record the network traffic,
while Snort and Suricata intrusion detection systems were used t o categorize it . Ten days' worth o f network traffic—five days o
f regular traffic and five days o
f attack traffic-make up the dataset. TCP SYN Flood, UDP Flood, HTTP Flood,
HTTP Slow Post, Slowloris, MQTT Flood, CoAP Flood, WS-DDoS (WebSocket), Web Service Flood (SOAP), and Web
Service Flood (RESTful) are among the ten various DDoS attack types included in th
e dataset. There are around 80 million packets i n the dataset, 64 million o f which are classified a
s malicious and 16 million as normal. For each packet
in the dataset, there are 115 features, including the protocol, payload size, timestamp, and source and destination I P addresses.
Fig 2 shows how different cyberattacks are distributed in several instances in the dataset. The graphic deftly classifies fewer common attacks into a
n "Other" category while highlighting the frequency of various attack kinds. The "Other" category i
s utilized when the quantity of occurrences for a specific attack i
s less than a predetermined threshold. This
method offers a concise summary of the most common attack routes without overcrowding the chart with labels. 58 Volume 16 (2024), Issue 4
Securing the Internet of Things: Evaluating Machine Learning Algorithms for Detecting
IoT Cyberattacks Using CIC-IoT2023 Dataset
Fig.1. Architecture model of machine learning approach
This dataset differs from other IoT datasets used i
n network intrusion detection studies i n that i t possesses the following features: • Instead of simulating o r emulating devices, i t uses actual Io T devices a s both attackers and victims. • In contrast t
o a small number of devices from a single vendor o
r protocol, it encompasses a broad variety of IoT
devices from several manufacturers and protocols. • Instead o
f a single type of attack that targets a particular layer or service, i t consists o f various DDoS attack
types that target various layers o f the network stack. • Instead of a small amount o f data with lo w diversity and complexity, i
t offers a vast amount of data with great diversity.
This dataset can offer a more complex and realistic environment for testing how well ML algorithms work for identifying IoT cyberattacks.
3.2. Machine Learning Models
Using the CIC-IoT2023 dataset, we selected and compared four well-known machine-learning algorithms: RF, DT,
KNN, and LR. These algorithms were picked based on how well-liked and effective they were in earlier research on
network intrusion detection. With the help o
f the Python and scikit-learn libraries, we developed these algorithms. Except
for KNN, where we changed the number of neighbors to 5, we used the default settings for each algorithm's parameters.
Before supplying the dataset t
o the ML models, we also performed certain preprocessing operations on it . These actions comprise: •
Removing features like packet ID, checksum, and other unused or superfluous components. •
Converting categorical characteristics, such a
s protocol type and service type, into numerical values. •
Using min-max scaling, numerical features are normalized into a range of [0, 1]. •
Using the random under-sampling technique, one can equalize the class distribution by lowering the number of malicious packets t o the same level a
s the number of legitimate packets. •
Dividing the dataset, keeping the class proportion constant, into a training set (70%) and a testing set (30%)
using a stratified sampling approach. Volume 16 (2024), Issue 4 59
Securing the Internet of Things: Evaluating Machine Learning Algorithms for Detecting
IoT Cyberattacks Using CIC-IoT2023 Dataset
Fig.2. Distribution of attacks
3.3. Evaluation Metrics
On the CIC-IoT2023 dataset, we assessed the evaluation of the ML algorithms using various metrics that are frequently employed i
n classification tasks. The most common evaluation metrics are accuracy, precision, recall, and F1-
Score which are briefly described below along with the equation t o calculate. •
Accuracy: The proportion of correctly categorized packets t o all packets.
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃+𝑇𝑁 (1)
𝑇𝑃+𝐹𝑃+𝑇𝑁+𝐹𝑁 • Precision: The proportion o
f harmful packets accurately identified relative t
o all malicious packets expected.
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃 (2) 𝑇𝑃+𝐹𝑃 • Recall: The proportion o
f harmful packets that were accurately identified t o all malicious packets.
𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑃 (3) 𝑇𝑃+𝐹𝑁 •
F1-Score: The harmonic means of the recall and precision.
𝐹1 − 𝑆𝑐𝑜𝑟𝑒 = 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙 (4) 2
4. Results and Discussion To identify Io
T cyberattacks, four ML models have been developed using RF, KNN, DT, and LR algorithms. The performance assessment o f these models wit h precision-recall curves i s displayed i n Fig 3 for each o f these algorithms. 60 Volume 16 (2024), Issue 4
Securing the Internet of Things: Evaluating Machine Learning Algorithms for Detecting
IoT Cyberattacks Using CIC-IoT2023 Dataset
A precision-recall curve is a graph that shows the trade-off between precision and recall at different probability thresholds.
Precision is the percentage of accurate positive predictions, whereas recall i
s the proportion of positive incidents that were
correctly predicted. The curve o
f a perfect model would reach the top right corner, signifying 100% recall and 100%
precision. The model's performance across all thresholds i s gauged b
y the area under the curve (AUC). DT and R F have
the highest AUC, followed by KN N and LR, a
s can be shown. Due to their ability to distinguish between the most hostile
and legitimate packets, DT and R
F are therefore the most accurate and trustworthy models for detecting IoT intrusions. While KNN also works well, it s precision i
s lower than that of DT and RF. The algorithm with the lowest AUC, LR, i s
unsuitable for this task due t o it s high rate o
f false positives and false negatives.
Fig.3. Precision-recall curves
The confusion matrix for each of these ML-based models i s displayed i
n Fig 4, 5, 6, 7. The rows depict the actual
classes, while the columns display the predicted classes. The diagonal elements show th
e correct forecasts. On the other
hand, off-diagonal elements show incorrect forecasts. The confusion matrix ca n be used t o calculate a variety o f metrics, such a
s recall, precision, accuracy, and F -
1 score. In contrast to false positives (FP) and false negatives (FN), which are at
their lowest levels, the percentage of true positives (TP) and true negatives (TN) is highest for DT and RF. This indicates
that they have a low mistake rate and ca
n correctly identify the majority o f packets a s malicious o r legitimate. KNN has a lot of T P and TN a s well, but i t also has more F P and F
N than DT and RF. This indicates that i t has a greater error rate
and that some packets may be mistakenly classified as harmful or legitimate. The proportion of TP and TN i s lowest while the proportion of F P and FN i s largest i n LR. This indicates that i t has a
n extremely high error rate and can rarely
distinguish between malicious and legitimate messages.
The outcomes highlight how crucial it is t
o pick the best ML algorithm for IoT threat detection. Some techniques, such a
s feature selection, dimensionality reduction, parameter tuning, and ensemble methods, can be used t o improve the
evaluation of ML algorithms. These techniques can maximize the algorithms' potential and raise their effectiveness in spotting IoT assaults. Volume 16 (2024), Issue 4 61
Securing the Internet of Things: Evaluating Machine Learning Algorithms for Detecting
IoT Cyberattacks Using CIC-IoT2023 Dataset
Fig.4. Confusion matrix (random forest)
Fig.5. Confusion matrix (logistic regression)
Fig.6. Confusion matrix (decision tree) 62 Volume 16 (2024), Issue 4
Securing the Internet of Things: Evaluating Machine Learning Algorithms for Detecting
IoT Cyberattacks Using CIC-IoT2023 Dataset Fig.7. Confusion matrix ( - k nearest neighbors)
The performance evaluations of each of these models o
n the CIC-IoT2023 dataset for detecting cyber-attacks are presented i
n Table 1. The evaluation metrics include accuracy, precision, recall, and F1 score for comparing these
techniques based on RF, KNN, DT, and LR algorithms.
We can make several significant conclusions and observations on the performance of M L algorithms i n identifying
IoT cyberattacks based on the evaluation metrics that the algorithms DT and RF excelled, attaining the highest accuracy ratings o
f 0.9919 and 0.9916, respectively. Additionally, they earned the highest precision, recall, and F1-score, showing that they are reliable i
n correctly categorizing both valid and malicious packets. With a n accuracy o f 0.9380, KN
N performed reassuringly, effectively and efficiently. Although i t may not be a s accurate a
s DT and RF, KNN shows proficiency with no - n linear data. KN
N can, however, be computationally expensive and i
s sensitive to noise and outliers. With a score of 0.8275, Logistic Regression (LR) had the lowest accuracy o f all the
methods. This may be explained by the linear assumption made by LR, which leads to a high percentage of false positives. Additionally, i t has low precision and F - 1 score values due t o it
s sensitivity to noise and outliers. The most successful and
efficient algorithms for identifying IoT cyberattacks were Decision Tree and Random Forest. These results offer important
information for choosing the best ML algorithms and defense tactics t
o protect the IoT from online threats.
Table 1. Evaluation metrics of the ML models Algorithm Accuracy Precision Recall F1-Score RF 0.9916 0.9913 0.9916 0.9909 KNN 0.9380 0.9366 0.9380 0.9364 DT 0.9919 0.9920 0.9919 0.9919 LR 0.8275 0.8473 0.8275 0.8034
A detailed evaluation of the effectiveness o
f machine learning algorithms in mitigating cyber threats to the Internet
of Things was conducted through our analysis of the CIC-IoT2023 dataset. Upon examining the precise evaluation metrics associated with eac h model, i t was observed that D T and R
F exhibited outstanding performance. R F demonstrated a
remarkable F1 Score of 99.08%, recall of 99.15%, and precision of 99.12%, i n addition t o an accuracy rate o f 99.15%. In
a similar vein, DT demonstrated exceptional performance with a
n accuracy of 99.18%, an F1 Score o f 99.19%, a recall
of 99.18%, and a precision of 99.19%. The robustness o f both models i
n differentiating benign from malevolent packets
in IoT networks is highlighted b
y these metrics. On the other hand, the KNN algorithm demonstrated a noteworthy
accuracy of 93.81%. This was supported by F1 Score, recall, and precision values of 93.64%, 93.81%, and 93.66%,
respectively. In contrast, LR performed less effectively, achieving an accuracy of 82.75%. This resulted in comparatively
lower values for F1 Score, recall, and precision, which were 80.34%, 82.75%, and 84.73% respectively. This detailed examination i s consistent with ou r research goals, as i
t clarifies the intricate functioning o f each model and provides
evidence for the superiority of DT and RF i
n strengthening IoT cybersecurity. 5. Conclusions
In this study, we analyzed four ML algorithms for detecting IoT cyberattacks using the CIC-IoT2023 dataset: RF,
KNN, DT, and LR. The dataset used i
n this study provides a comprehensive and realistic benchmark containing multiple
types of DDoS attacks on different IoT devices. We carried out data preparation, model training, and performance Volume 16 (2024), Issue 4 63
Securing the Internet of Things: Evaluating Machine Learning Algorithms for Detecting
IoT Cyberattacks Using CIC-IoT2023 Dataset
evaluation using relevant metrics like accuracy, precision, recall, and F -
1 score. The results show that D T and R F are the
most successful and efficient algorithms fo
r identifying IoT cyberattacks, with accuracy rates o f 0.9919 and 0.9916,
respectively. These algorithms are also the best i
n terms of precision, recall, and F1-score values, indicating that they can
reliably distinguish between malicious and normal packets. With a
n accuracy of 0.9380, KNN does admirably a s well,
while LR has the lowest accuracy a t 0.8275.
This study provides a substantial critique of the inherent constraints that exist i n existing approaches t o IoT
cybersecurity. Through a comprehensive examination of the effectiveness of machine learning models i n detecting cyber threats i
n the context of the Internet of Things (IoT) and utilizing the CIC-IoT2023 dataset, this research sheds light o n
the limitations of conventional methods in managing the ever-changing and intricate nature of such threats. The potential of DT, and RF algorithms t
o rectify these shortcomings i s highlighted b
y their superior performance. This could result i n
a more dependable and efficient method of detecting and thwarting malicious activities i n interconnected IoT
environments. The results of this research have substantial ramifications for the pragmatic implementation o f machine learning i n fortifying the security o f the Internet o
f Things. The algorithmic capabilities o f DT, and RF demonstrate
exceptional levels of accuracy, precision, and recall, rendering them feasible contenders for prompt implementation i n
IoT defense systems. The ability o f these systems t
o differentiate between benign and malicious traffic provides a strong
basis for developing strategies t
o detect and mitigate threats i
n real time. This provides concrete advantages for industry
stakeholders who are interested i n protecting IoT ecosystems.
Moreover, the study establishes a foundation for numerous expansions and forthcoming trajectories i n the realm of
IoT cybersecurity. Investigating federated learning techniques, incorporating unsupervised learning approaches, and
integrating deep learning models are all potentially fruitful avenues for improving the scalability and adaptability of cyber threat detection mechanisms i n IoT networks. Moreover, t
o address the ever-changing cyber threat landscape, enhancing
the security of IoT infrastructures could be accomplished through the integration o
f machine learning algorithms that ar e
continuously improved and diverse datasets are utilized.
Dataset Availability Statement The dataset used i n this study ca
n be found on https://www.unb.ca/cic/datasets/iotdataset-2023.html, [accessed o n 05 October 2023]. References
[1] U. Tariq, I. Ahmed, A. K. Bashir, K. Shaukat, “A Critical Cybersecurity Analysis and Future Research Directions for the Internet
of Things: A Comprehensive Review”. Sensors, Vol. 23, No. 8, 2023. DOI: https://doi.org/10.3390/s23084117
[2] X. Cheng, J .Zhang, B .Chen, “Cyber Situation Comprehension for IoT Systems based on AP
T Alerts and Logs Correlation”,
Sensors, Vol.19, No.18, 2019. DOI: https://doi.org/10.3390/s19184045
[3] P. K. Sadhu, V. P. Yanambaka, A .Abdelgawad, “Internet of Things: Security and Solutions Survey”, Sensors, Vol. 22, No. 19,
2022. DOI: https://doi.org/10.3390/s22197433
[4] S. Kumar, P. Tiwari, M. Zymbler, “Internet of Things is a revolutionary approach for future technology enhancement: a review”,
Journal of Big Data, Vol.6, No.1, pp.1-21, 2019. DOI: https://doi.org/10.1186/s40537-019-0268-2
[5] J. P .A. Yaacoub, H .N. Noura, O .Salman, A .Chehab, “Robotics cyber security: Vulnerabilities, attacks, countermeasures, and recommendations”, International Journal of Information Security, Vol.21, pp.115-158, 2022. DOI:
https://doi.org/10.1007/s10207-021-00545-8
[6] Check Point Research, “The Tipping Point: Exploring the Surge in IoT Cyberattacks Globally”, 2023. Retrieved on October 12, 2023, from
https://blog.checkpoint.com/security/the-tipping-point-exploring-the-surge-in-iot-cyberattacks-plaguing-the- education-sector/
[7] K. Tsiknas, D. Taketzis, K. Demertzis, C .Skianis, “Cyber threats to industrial IoT: a survey on attacks and countermeasures”, IoT, Vol. 2
, No. 1, pp. 163-186, 2021. DOI: https://doi.org/10.3390/iot2010009
[8] M. Abdullahi, Y .Baashar, H .Alhussian, A. Alwadain, N. Aziz, L. F. Capretz, S. J .Abdulkadir, “Detecting cybersecurity attacks
in the internet of things using artificial intelligence methods: A systematic literature review”, Electronics, Vol. 11, No. 2 , 2022.
DOI: https://doi.org/10.3390/electronics11020198
[9] Ani Petrosyan, “Annual number of IoT attacks global 2022”, 2023. Retrieved on October 12, 2023
https://www.statista.com/statistics/1377569/worldwide-annual-internet-of-things-attacks/
[10] M. Ahsan, K. E .Nygard, R .Gomes, M. M. Chowdhury, N .Rifat, J. F. Connolly, “Cybersecurity threats and their mitigation
approaches using Machine Learning—A Review”, Journal of Cybersecurity and Privacy, Vol. 2, No. 3, pp. 527-555, 2022.
[11] Matthew Urwin, “Machine Learning in Cybersecurity: How It Works and Companies to Know”, 2023. Retrieved on October 12,
2023 https://builtin.com/artificial-intelligence/machine-learning-cybersecurity
[12] J. Singh, J. Singh, “A survey on machine learning-based malware detection in executable files”, Journal of Systems Architecture, Vol. 112, 2021.
[13] N. Vadivelan, K. Bhargavi, S . Kodati, M . Nalini, “Detection of cyber-attacks using machine learning. In AIP Conference
Proceedings.” AIP Publishing. Vol. 2405, No. 1 , 2022.
[14] E. C .P. Neto, S .Dadkhah, R .Ferreira, A .Zohourian, R .Lu, A .A. Ghorbani, “CICIoT2023: A real-time dataset and benchmark
for large-scale attacks in IoT environment”, Sensors, Vol. 23, No. 13, 2023. DOI: https://doi.org/10.3390/s23135941
[15] U. Inayat, M. F .Zia, S .Mahmood, H. M. Khalid, M .Benbouzid, “Learning-based methods for cyber-attack detection in IoT
systems: A survey on methods, analysis, and future prospects”, Electronics, Vol. 11, No. 9, 2022.
[16] E. Adi, A .Anwar, Z .Baig, S .Zeadally, “Machine learning and data analytics for the IoT”, Neural computing and applications, 64 Volume 16 (2024), Issue 4
Securing the Internet of Things: Evaluating Machine Learning Algorithms for Detecting
IoT Cyberattacks Using CIC-IoT2023 Dataset
Vol. 32, pp. 16205-16233, 2020.
[17] C. Malathi, I. N. Padmaja, “Identification o
f cyber-attacks using machine learning in smart IoT networks”, Materials Today:
Proceedings, Vol. 80, pp. 2518-2523, 2023.
[18] O. A. Alkhudaydi, M .Krichen, A. D. Alghamdi, “A Deep Learning Methodology for Predicting Cybersecurity Attacks on the
Internet of Things. Information”, Vol. 14, No. 10, pp. 550, 2023.
[19] J. G .Almaraz-Rivera, J. A .Perez-Diaz, J. A .Cantoral-Ceballos, “Transport and application layer DDoS attacks detection to IoT
devices by using machine learning and deep learning models”, Sensors, Vol. 22, No. 9, 2022.
[20] J. Kim, M. Shim, S .Hong, Y .Shin, E. Choi, E. “Intelligent detection of IoT botnets using machine learning and deep learning”,
Applied Sciences, Vol. 10, No. 19, 2023.
[21] S. Dalal, U. K .Lilhore, N .Foujdar, S. Simaiya, M. Ayadi, N. A .Almujally, A .Ksibi, “Next-generation cyber-attack prediction
for IoT systems: leveraging multi-class SV
M and optimized CHAID decision tree”, Journal of Cloud Computing, Vol. 12, No. 1, pp. 1-20, 2023.
[22] O. Jullian, B .Otero, E .Rodriguez, N. Gutierrez, H. Antona, R .Canal, “Deep-Learning Based Detection for Cyber-Attacks in
IoT Networks: A Distributed Attack Detection Framework”, Journal of Network and Systems Management, Vol. 31, No. 2, pp. 33, 2023. Authors’ Profiles
Dr. Akinul Islam Jony currently holds the position of Associate Professor and serves as the Head of the
Undergraduate Program in Computer Science at American International University-Bangladesh (AIUB). His
research interests encompass a wide range of topics, including cybersecurity, artificial intelligence, machine
learning, e-learning, educational technology, and issues in software engineering.
Arjun Kumar Bose Arnob is a final semester student of BSc in Computer Science and Engineering and majoring
in Software Engineering at the American International University-Bangladesh (AIUB). He is currently working as
a Research Assistant at AIUB and is actively involved in research projects. H
e has a strong passion and proficiency
in Machine Learning and Deep Learning which is reflected in his work. H
e has consistently performed well
academically and is dedicated to his studies.
How to cite this paper:
Akinul Islam Jony, Arjun Kumar Bose Arnob, "Securing the Internet of Things: Evaluating Machine Learning
Algorithms for Detecting IoT Cyberattacks Using CIC-IoT2023 Dataset", International Journal of Information Technology and
Computer Science(IJITCS), Vol.16, No.4, pp.56-65, 2024. DOI:10.5815/ijitcs.2024.04.04 Volume 16 (2024), Issue 4 65