12 trang 24 lượt tải

Road surface crack detection method based on improved YOLOV5 and vehicle-mounted images

Road surface crack detection method based on improved YOLOV5 and vehicle-mounted images. Tài liệu được sưu tầm giúp bạn tham khảo, ôn tập và đạt kết quả cao. Mời bạn đọc đón xem.

Môn: Tài liệu Tổng hợp 3.6 K tài liệu

Trường: Tài liệu khác 3.9 K tài liệu

Tác giả:

Tai Ta

2 tháng trước

Tải xuống Báo cáo

Danh sách Quiz

Measurement 229 (2024) 114443

Available online 5 March 2024

Road surface crack detection method based on improved YOLOv5 and

vehicle-mounted images

Hongwei Hu

, Zirui Li

, Zhiyi He

, Lei Wang

, Su Cao

, Wenhua Du

College of Automotive and Mechanical Engineering, Changsha University of Science and Technology, Changsha Hunan 410114, China

Shanxi Key Laboratory of Advanced Manufacturing Technology, North University of China, Taiyuan 030051, China

College of Civil Engineering, Changsha University of Science and Technology, Changsha, Hunan 410114, China

College of Intelligence Science and Technology, National University of Defense Technology, Changsha, Hunan,410000, China

ARTICLE INFO

Keywords:

Road surface crack detection

YOLO series

Vehicle-mounted images

Attention mechanism

Lightweight design

ABSTRACT

Road surface crack detection methods using vehicle-mounted images have gained substantial attention recently.

Notably, YOLO-based techniques have exhibited effectiveness and real-time performance. However, current

YOLO-based approaches encounter challenges like blurriness of small cracks and incomplete information

extraction from vehicle-mounted images. Therefore, this paper proposes a novel detection method based on the

improved YOLOv5 and vehicle-mounted images. In this method, the Slim-Neck structure enhances crack focus

through a weighted attention mechanism while optimizing network efciency. The integration of the C2f

structure and Decoupled Head better harnesses upper layer output information. Moreover, the SPPCSPC struc-

ture is bifurcated to augment model efciency and accuracy. The training process is optimized by using the Silu

activation function and CIoU loss function. This approach is applied to vehicle-mounted images, with its efcacy

and feasibility afrmed through extensive comparative and ablation experiments. Importantly, compared to ve

other advanced methods, notable enhancements are observed in various evaluation metrics.

1. Introduction

Road surface cracks [1–5]are common defects in road structures.

Their formation is primarily attributed to the aging and degradation of

road surface materials over time, coupled with the inuence of natural

climatic factors such as rain and snow [6,7]. These factors contribute to

the emergence of various types of cracks on the road surface, which

gradually widen and lead to surface damage and structural loosening,

posing signicant threats to both road safety and service life [8,9].

Therefore, it is meaningful to conduct research on road surface crack

detection.

Over the past few decades, numerous scholars and experts have

proposed various methods to detect cracks in road surfaces. These

methods include manual inspection [10], machine vision [11–13], and

infrared imaging [14], among others. However, these methods have

certain limitations. Manual inspections are time-consuming, labor-

intensive, and inefcient, susceptible to environmental and human

factors [10]. Machine vision techniques enable automated detection, but

require high-quality image data and sophisticated algorithms [12].

Infrared imaging can detect temperature variations on road surfaces and

thus infer the location of cracks, but this entails high equipment costs

[14]. Nowadays, deep learning techniques have made remarkable

progress in the eld of image processing. Due to their excellent perfor-

mance, deep learning algorithms have found widespread application in

road surface crack detection.

In general, deep learning-based object detection and recognition

algorithms can be categorized into two-stage detection methods based

on classication and one-stage detection methods based on regression.

For two-stage detection methods, Girshick et al. introduced the R-CNN

[15], which served as a pioneering work for deep learning-based object

detection. Building on this, they improved the method for region pro-

posal extraction and proposed Fast R-CNN [16] and Faster R-CNN [17].

HE introduced an additional output branch for predicting target masks

during candidate object generation, leading to the development of Mask

R-CNN [18]. Kim et al. [19] utilized Mask R-CNN and performed

morphological operations on the detected crack masks to quantify the

cracks. A et al. [20] utilized Transfer Learning and Dynamic-Opposite

Hunger Games Search to optimize feature selection. Additionally, they

employed an algorithm called Improved Articial Rabbits Optimizer

[21] to disregard unimportant features. While these methods have

* Corresponding author at: College of Automotive and Mechanical Engineering, Changsha University of Science and Technology, Changsha Hunan 410114, China.

E-mail address: hezhiyihnu@126.com (Z. He).

Contents lists available at ScienceDirect

Measurement

journal homepage: www.elsevier.com/locate/measurement

https://doi.org/10.1016/j.measurement.2024.114443

Received 10 August 2023; Received in revised form 19 February 2024; Accepted 4 March 2024

Measurement 229 (2024) 114443

continuously enhanced and innovated two-stage detection techniques,

their hardware requirements constrain detection speed and real-time

capabilities, rendering them unsuitable for real-time detection.

To meet the real-time detection requirements, regression-based one-

stage object detection methods directly obtain the probability and po-

sition of the objects without the need for region proposal extraction. The

YOLO [22] series is a classic representative, with YOLOv5 [23] widely

applied in various domains, although it may encounter performance

bottlenecks in specic domains. Therefore, many researchers have

conducted further studies on road surface crack detection. For instance,

Li et al. [24] proposed a YOLO detection algorithm that improves

detection accuracy, but the model size is large and fails to meet real-time

requirements. To address this issue, Wu et al. [25] introduced an

improved YOLOv4 network with pruning techniques and EvoNorm-S0

structure, which enhances detection accuracy and satises real-time

requirements. In other crack detection elds, such as bridge deck

crack detection, Zhang et al. [26] demonstrated the effectiveness of CR-

YOLO for sparsely distributed cracks but with limited performance on

complex cracks. For complex and multivariate cracks, Z et al. [27]

developed MI-YOLO, which exhibits stronger feature extraction capa-

bilities for light-colored and low-denition images but lower accuracy in

identifying small cracks.

Although the aforementioned object detection research has initially

improved the recognition accuracy and efciency of the models, the

datasets used in these studies are based on manually processed tradi-

tional crack images, which are not entirely suitable for the new chal-

lenges posed by road images captured by vehicle-mounted smartphones

for crack detection. This primarily involves two issues. Firstly, the

vehicle-mounted image contains the surrounding environment of the

road, which makes the cracks appear smaller or less prominent in the

image, resulting in blurred tiny cracks. Secondly, the C3 structure in the

YOLO series, which is commonly studied, performs feature extraction at

earlier layers. This may lead to the loss of contextual information of tiny

cracks and incomplete extraction of crack information. Due to these

circumstances, detectors struggle to accurately distinguish the subtle

differences between cracks and the surrounding background. These two

issues are considered the primary challenges faced by current detection

models, particularly when aiming to maintain excellent real-time per-

formance while addressing these problems.

To address the challenges mentioned above, this paper proposes a

novel real-time detection algorithm named Improved YOLOv5. Initially,

a comprehensive analysis of the limitations of the current detection

model is conducted. Targeted structures are selected to ensure that the

choices made have a positive theoretical impact on the existing issues in

the model. Subsequently, the Silu activation function and CIoU loss

function are introduced, optimizing the training process, and all struc-

tures are integrated. Finally, during the structural fusion and adjustment

process, a layer of CBS between SPPCSPC [30] and the neck is removed

to further optimize the overall structure. Through these clever combi-

nations and adjustments, the challenges in crack detection in vehicle-

mounted images are effectively addressed. Validated on a reorganized

dataset, the algorithm exhibits outstanding performance in the eld of

road surface crack detection in vehicle-mounted images, as substanti-

ated by a wealth of experimental results.

The following summarizes the signicant contributions of this study:

1. This paper introduces a novel real-time detection method in the eld

of road surface crack detection in vehicle-mounted images.

2. A new algorithm, named Improved YOLOv5, is proposed, effectively

addressing the challenges in crack detection in vehicle-mounted

through clever combinations and adjustments.

3. Reorganized datasets are used to validate and compare the proposed

method to well-known methods.

The rest of this paper is organized as follows: In Section 2, a review of

the related work is provided, along with a detailed analysis of the

existing methods’ limitations. Section 3 presents a detailed description

of the proposed method. In Section 4, comparative experiments and

ablation experiments are conducted to validate and analyze the pro-

posed approach. Finally, Section 5 summarizes the research ndings of

this paper and suggests potential directions for future improvements.

2. Related work

In this section, the YOLOv5 algorithm, which has been proposed in

recent years, is rst introduced. Subsequently, the existing issues of

YOLOv5 in road surface crack detection are discussed in depth, and

approaches to address these issues are presented.

2.1. YOLOv5

The YOLOv5 object detector consists of three main components:

Backbone, Neck, and Head. The Backbone is responsible for extracting

the feature information from the input image. The Neck plays a crucial

role in further processing and fusing the features extracted by the

Backbone to enhance the accuracy and effectiveness of object detection.

Finally, after being processed by the Head, YOLOv5 outputs the category

and location information of each detected object.

YOLOv5 predicts for each grid on the feature map and compares the

predicted information with the ground truth to guide the model towards

convergence. The loss function aims to measure the discrepancy be-

tween the predicted information and the ground truth. A smaller loss

function value indicates that the predicted information is closer to the

ground truth, which greatly determines the performance of the model.

The loss function in YOLOv5 consists of three main components: the

classication loss, objectness loss, and localization loss. The formulation

of the loss function is as follows:

Loss = λ

cls

+ λ

obj

+ λ

loc

(1)

Where λ corresponds to different loss weights, with default values of

0.5, 1.0, and 0.05, respectively.

For the classication loss and objectness loss, YOLOv5 utilizes the

binary cross-entropy function by default. The binary cross-entropy

function is dened as follows:

L = − ylogp − (1 − y)log(1 − p) =

{

− logp, y = 1

− log(1 − p), y = 0

(2)

Where y represents the label of the input sample (1 for positive

sample, 0 for negative sample), and p represents the probability pre-

dicted by the model that the input sample is a positive sample.

As for the localization loss, YOLOv5 adopts the CIoU (Complete

Intersection over Union) loss, which is expressed by the following for-

mula:

CIoU

= 1 − IoU +

(b, b

)

v (3)

Where IoU represents Intersection over Union, which is used to

measure the overlap between the predicted bounding box and the

ground truth bounding box in object detection. Assuming the predicted

bounding box is represented as A and the ground truth bounding box is

represented as B, the expression for IoU is given by:

IoU =

A ∩ B

A ∪ B

(4)

Where b and b

represent the center points of the predicted bounding

box and the ground truth bounding box, respectively.

denotes the

Euclidean distance between the two center points, and c represents the

diagonal distance of the minimum enclosing region of the predicted and

ground truth bounding boxes. “ gt ” is the abbreviation for ground truth.

v is a penalty term designed to consider the consistency of aspect

ratios between two bounding boxes, aiming to focus more on the shape

of the target. Let’s delve into v. Here, v is a metric used to quantify the

H. Hu et al.

Measurement 229 (2024) 114443

consistency of aspect ratios, and its expression is as follows:

v =

(

tan

− 1

− tan

− 1

)

(5)

In this equation, w

and h

represent the width and height of the

ground truth bounding box, while w and h represent the width and

height of the currently predicted bounding box. Then, the difference in

aspect ratio between the ground truth bounding box and the predicted

bounding box is calculated. When computing this ratio, the arctangent

function, denoted as tan

− 1

, is used to convert the aspect ratio into an

angle. This allows for a more accurate representation of the difference

between aspect ratios. Since the range of the arctangent function is

between 0 and

/2, a scaling factor is needed to adjust the angle dif-

ference. This factor ensures that the angle difference is compared within

the appropriate range.

The term

in the formula acts as a scaling factor, ensuring that the

value of v stays within a reasonable range. By comparing this ratio, we

gain insights into the consistency of the target’s shape.

Another crucial parameter,

, is a positive weight designed to bal-

ance the two factors of IoU and aspect ratio. Its mathematical expression

(1 − IoU) + v

(6)

nonlinearly combines v with 1 − IoU to adjust the weight. When the

IoU is low,

increases, directing the model’s attention more towards the

consistency of aspect ratios. Conversely, when the IoU is high,

de-

creases, emphasizing the accuracy of position and size. In regression,

given higher priority, especially when the two bounding boxes do not

overlap. This implies a greater emphasis on ensuring the precision of the

bounding box’s position and size in object detection tasks. The combi-

nation of these two formulas forms the calculation of CIoU, compre-

hensively measuring the similarity between two bounding boxes in

terms of position, size, and shape. This integrated metric contributes

positively to enhancing the performance and robustness of object

detection models.

2.2. Problems of YOLOv5 in road surface crack detection on vehicle-

mounted images

Currently, YOLOv5 can be utilized in various domains. However, due

to the combined inuence of factors such as dataset characteristics,

target attributes, class balance, and model optimization, YOLOv5 ex-

hibits signicant differences when employed in different domains.

Through existing research and in-depth analysis of the YOLOv5 struc-

ture, the following issues have been identied in its implementation for

road surface crack detection in vehicle-mounted images:

1. The dataset used in this paper consists of road images captured by a

vehicle-mounted smartphone. In these images, cracks are often small

in size or not visually prominent, making it challenging for the

network to directly capture their subtle features. This can result in

the loss of information related to small cracks. Moreover, YOLOv5

may exhibit different performance in detecting small and large ob-

jects. When the target is small, YOLOv5 may struggle to accurately

detect it.

2. The C3 structure in the YOLO series performs feature extraction at

earlier layers, which may result in the loss of contextual information

for small cracks. However, adding too many modules on top of

YOLOv5 to enhance feature extraction would signicantly increase

the number of parameters, thereby sacricing the real-time advan-

tage of the original YOLOv5.

For the rst issue, the addition of an attention mechanism module

was attempted to assist the model in better focusing on crack

information. To mitigate the parameter count, the consideration of

employing the Slim-Neck structure was made.

Regarding the second issue, in order to better utilize the information

from the previous layer and address the problem of incomplete feature

extraction in the C3 structure, a replacement with the C2f structure[50]

was carried out. Furthermore, the Decoupled Head from YOLOX was

introduced to further optimize the representation of multi-scale and

high-dimensional features. To further enhance the speed and accuracy of

the model, the SPPCSPC structure was also introduced. Finally, by

integrating the aforementioned methods and adjusting the parameter

size, the method referred to as improved YOLOv5 was proposed. The

details of this method will be elaborated in the following section.

3. The proposed road surface crack detection method

In this section, the specic steps of the proposed road surface crack

detection method is initially presented, followed by a detailed exposi-

tion of the improved YOLOv5 utilized in this method.

3.1. The specic steps of the proposed method

The road surface crack detection scheme, which is based on the

improved YOLOv5 and proposed in this paper, is outlined in the

following steps. For a detailed visual representation of the process,

readers are referred to Supplementary Fig. 1 in the supplementary

materials.

Step 1: Road images are captured using a vehicle-mounted smart-

phone to establish the dataset.

Step 2: The dataset is utilized to train the improved YOLOv5 model,

resulting in weight les.

Step 3: The trained improved YOLOv5 weight le is subsequently

deployed on the vehicle-mounted device to perform real-time crack

detection on the road surface.

Step 4: The detection results are generated as output.

A simplied representation of the scheme is provided in the main

text (see Fig. 1). A comprehensive ow chart is given in the supple-

mentary material.

Fig. 1. Road surface crack detection method based on improved YOLOv5.

H. Hu et al.

Measurement 229 (2024) 114443

3.2. The proposed improved YOLOv5

In terms of the algorithm, the proposed improved YOLOv5 in this

paper differs from the original YOLOv5 by incorporating six key mod-

ules: Slim-Neck structure, C2f structure, Decoupled Head, SPPCSPC

structure, Silu activation function, and CIoU loss function. Each module

plays a distinct role in the model, and they will be elaborated on in

detail. The overall architecture of the improved YOLOv5 is illustrated in

Fig. 2.

3.2.1. Slim-Neck structure

In this study, the Slim-Neck structure [26] was introduced, with

GSconv replacing Conv and VoVGSCSP replacing C3. Compared to the

original structure, the Slim-Neck structure incorporates lightweight

design, reducing computational complexity and model parameters,

thereby enhancing efciency and reducing computation time and

hardware resource requirements. To meet the specic requirements of

road surface crack detection, this study integrates attention mechanisms

with convolution. Convolutional Neural Networks (CNNs) excel at

capturing local features of images, while attention mechanisms focus on

specic parts of the target, improving the accuracy of target detection.

Considering the use of vehicle-mounted images, where cracks often

appear against complex backgrounds, combining convolution and

attention mechanisms can better handle complex backgrounds, allevi-

ating excessive attention to them. Moreover, attention mechanisms can

be employed for multi-scale feature fusion. In road surface crack

detection, the size and shape of cracks may vary signicantly. By

combining multi-scale features extracted by convolution with attention

mechanisms, the model can better capture target information at

different scales, thereby enhancing its adaptability to scale variations. In

summary, the introduction of the Slim-Neck structure can improve the

efciency, accuracy, and adaptability of the model, making it an effec-

tive model improvement method. The Slim-Neck structure is illustrated

in Fig. 3.

3.2.2. C2f structure

The proposed C2f structure in this paper is primarily used to replace

the C3 structure in the Backbone of YOLOv5. This structure adopts the

shufing idea of CSPNet and the residual structure concept to obtain

richer gradient ow information and better utilization of the informa-

tion from the upstream. The number of stacked C2f structures is

controlled by the parameter “n”, which can vary for models of different

scales. The specic structure of the C2f module is illustrated in Fig. 4,

where CBS represents the combination of Convolution, Batch Normali-

zation, and the Silu activation function. By replacing the C3 structure,

the C2f structure further enhances the network’s representation capa-

bility and detection performance.

3.2.3. Decoupled Head

In this study, the Decoupled Head [29] is introduced, which

effectively addresses issues such as class imbalance and object size

variation compared to traditional detection head structures. The incor-

poration of the Decoupled Head leads to a substantial enhancement in

the model’s detection performance, improving both accuracy and speed.

Additionally, the Decoupled Head enables exible model design, such as

increasing the number of output classes or adjusting the detector’s

receptive eld size. The Decoupled Head structure is illustrated in Fig. 5.

3.2.4. SPPCSPC structure

In order to improve the speed and accuracy of the model, the original

SPPF structure is replaced by the SPPCSPC structure [30] in this study.

Similar to SPPF, SPPCSPC also includes pooling layers of sizes 1x1, 5x5,

9x9, and 13x13, but it introduces an additional 1x1 residual branch. The

structure is divided into two parts, where one part is processed with

traditional convolution and the other part is processed with the SPP

structure. Finally, these two parts are merged. This design reduces the

computational complexity by half, thereby improving speed, while also

enhancing accuracy. Additionally, the SPPCSPC structure can be com-

bined with other detection head structures, especially the previously

proposed C2f structure and Slim-Neck structure. When situated between

these two structures, it can better facilitate multi-scale feature fusion

and further enhance the network performance. The SPPCSPC structure is

illustrated in Fig. 6.

Fig. 2. Improved YOLOv5 architecture.

Fig. 3. Slim-Neck structure.

Fig. 4. C2f structure.

H. Hu et al.

Measurement 229 (2024) 114443

3.2.5. Silu activation function and CIoU loss function

The choice of Silu activation function and CIoU loss function in this

study is motivated by the analysis of the dataset images for crack

detection task. It is crucial to capture the shape, texture, and edges of

crack images effectively. The smoothness and approximate linearity of

the Silu function allow it to preserve the linear relationships of input

features and exhibit a larger dynamic range. This property may aid in

capturing subtle features in crack images and improve the accuracy of

Fig. 5. Decoupled Head.

Fig. 6. SPPCSPC structure.

Fig. 7. Improved yolov5 network.

H. Hu et al.

Measurement 229 (2024) 114443

object detection. Regarding the loss function, different loss functions

have distinct advantages for various tasks. In the case of crack detection,

the CIoU loss function provides more accurate measurement of the

match between predicted boxes and ground truth boxes, particularly for

handling small targets. This leads to improved detection performance

for small targets. Therefore, the selection of Silu activation function and

CIoU loss function further optimizes the performance of the network.

3.2.6. Module integration optimization

To better integrate the modules and enhance their effectiveness, we

decided to remove a layer of CBS between the SPPCSPC structure and

the upsampling. This decision brings several advantages: rstly, by

adjusting the number of channels, we make it more adaptable to the

requirements of subsequent structures. Secondly, simplifying the overall

network structure helps reduce model complexity, alleviate computa-

tional burden, and thus improve inference speed. However, the key

point is that the Slim-Neck structure, designed to capture information

about small cracks, is located within the neck. Moreover, removing the

CBS layer before the neck reduces information loss, ensuring that critical

features can be transmitted to the bottleneck stage of the network

without interference. This measure signicantly contributes to main-

taining the model’s sensitivity to details like small cracks, thereby

enhancing the accuracy of the detection task. The specic architecture of

the improved YOLOv5 network is illustrated in Fig. 7.

4. Experiments

4.1. Datasets

In the eld of road surface crack detection, datasets plays a crucial

role. The widely used dataset for complex environmental conditions in

this domain is provided by the IEEE Big Data Global Road Damage

Detection Challenge [31]. In this study, 13,508 images were selected

from the dataset as the rened dataset. The resolution of the images was

set to 600x600 pixels. We focused on ve categories: longitudinal

cracks, lateral cracks, alligator cracks, potholes, and blurred white lines.

Due to the disorderly nature of the annotation information in the orig-

inal dataset, a Python script was employed to transform it into a stan-

dardized txt format. Subsequently, among the 13,508 txt les, we sifted

through and retained information pertaining to ve categories: longi-

tudinal cracks, lateral cracks, alligator cracks, potholes, and blurred

white lines. These were individually labeled as D00, D10, D20, D40, and

D50. Following this step, data from other categories and instances of

mislabeling were expunged. For images with missing information, we

employed the image annotation tool, labelimg, to conduct additional

annotations. The rened dataset was then divided into training, vali-

dation, and testing sets, with ratios of 0.56, 0.22, and 0.22, respectively.

During the model training process, a random shufing technique was

employed. At each epoch, images from the training and validation sets

were randomly mixed and divided into new training and validation sets

in the same proportion. This data preparation method ensures that the

model is provided with high-quality and diverse data, thus improving its

accuracy and robustness. The rened dataset is illustrated in Fig. 8.

4.2. Platform construction and model training

The software environment and hardware environment congured for

model training and testing in this paper are as follows: Windows10

operating system, Pytorch deep learning framework, CUDAv11.1,

Pythonv3.8.10, torchv1.9.0, GPU NVIDIA GeForce RTX 3080(10 GB),

CPU Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50 GHz.

Due to the low resolution of the dataset used in this experiment,

which is 600*600 pixels, we selected a suitable batch size of 32 and

trained the model for 200 epochs. As for the optimizer, we chose SGD

and set other key parameters as follows: lr0 = 0.01, lrf = 0.01, mo-

mentum = 0.843, weight_decay = 0.00036.

The model adopts a strategy of random weight initialization instead

of utilizing a pre-trained model for the initial setup. Deliberately opting

for training from scratch, the model is designed to directly learn task-

specic features from the provided data, without relying on generic

features learned from other datasets. Throughout the experimental

process, thorough training and ne-tuning have been conducted to

enhance the model’s adaptation to the specic task of road surface crack

detection. This approach allows for better control over the model’s

structure and performance, enabling more effective addressing of chal-

lenges such as the blurriness of small cracks and incomplete information

extraction from vehicle-mounted images.

4.3. Evaluation metric

The evaluation metrics used in this paper for object detection include

precision (P), recall (R), mean average precision (mAP), and F1 score as

the primary evaluation indicators for the model. Precision (P) and recall

(R) respectively represent the proportion of correctly predicted samples

among all detected objects and the proportion of correctly predicted

samples among all objects. The mAP is the average area under the

precision-recall curve for all classes, and the F1 score is the harmonic

mean of precision and recall. Additionally, frames per second (FPS) and

Giga oating-point operations per second (GFLOPs) are also crucial

metrics for evaluating models. FPS represents the model’s capability to

process frames per second, while GFLOPs indicate the number of

oating-point operations a device can perform per second.

In road damage detection, both precision and recall are equally

important as misclassications or false negatives can have an impact on

road maintenance and safety. Therefore, this study focuses on the

overall performance of the model and considers the mean average pre-

cision (mAP) and F1 score, which combine precision and recall, as the

most important evaluation metrics.

4.4. Comparisons with other state-of-the-art methods

To validate the superiority of improved YOLOv5 proposed in this

paper, a comparison is conducted with several commonly used classical

algorithms in the eld of road surface crack detection, including Faster

R-CNN [17], YOLOX [29], YOLOv5, u-YOLO (EM + EP) [32], YOLOv6

[49], YOLOv7 [30], and YOLOv8 [50]. Notably, u-YOLO (EM + EP) is

the method employed by the champion of the 2020 IEEE Big Data Global

Road Damage Detection Challenge. The evaluation metrics employed for

each algorithm include mAP, F1, and FPS. The models are trained and

Fig. 8. Some examples from the dataset.

H. Hu et al.

Measurement 229 (2024) 114443

tested on the same dataset with identical hyperparameters. The detec-

tion results are presented in Table 1 and Fig. 9.

According to the analysis of the detection results in Table 1, the

improved YOLOv5 achieved an mAP@0.5 of 59.17 %, surpassing Faster

R-CNN/ResNet50 + FPN (56.10 %), Faster R-CNN/MobileNetV2 (45.50

%), YOLOX (52.82 %), u-YOLO (EM + EP) (55.40 %), YOLOv5 (55.76

%), YOLOv6 (58.80 %), and YOLOv8 (56.90 %), and approached the

performance level of YOLOv7 (60.20 %). Similarly, it demonstrated a

similar trend in F1, being higher than other detectors and approaching

the performance level of YOLOv7. Additionally, the improved YOLOv5

performed well in mAP@0.5:0.95. In terms of inference speed, the

improved YOLOv5 reached 85FPS, although lower than YOLOX (179),

YOLOv5 (114), and YOLOv8 (1 08) in one-stage detectors, it exceeded

all other detectors. Considering that 85FPS is sufcient for real-time

detection requirements in most scenarios, sacricing some FPS to ach-

ieve higher performance is worthwhile without signicantly affecting

overall performance. However, compared to YOLOv7, although its mAP

and F1 scores are slightly higher than the improved YOLOv5, its FPS is

only 43, which is only half of the model proposed in this paper. This is

not only unworthy but also does not meet real-time requirements.

The proposed improved YOLOv5 algorithm in this paper is an

improvement upon YOLOv5. Compared to the original YOLOv5 algo-

rithm, the improved YOLOv5 algorithm achieved a 3.41 % improvement

in mAP and a 2.25 % improvement in F1, while maintaining a FPS of 85.

When compared to other leading algorithms in this eld, the improved

YOLOv5 proposed in this paper has the highest cost-effectiveness in

overall evaluation.

4.5. Ablation study

Extensive ablation studies were also conducted on our reorganized

dataset to validate each module of the proposed improved YOLOv5 in

this paper. Table 2 presents the detailed roadmap from YOLOv5 to

improved YOLOv5.

The proposed YOLOv5 + Slim-Neck in this paper improved the

performance from 55.8 % mAP (Row 1) to 56.7 % mAP (Row 8),

demonstrating the effectiveness of the introduced attention mechanism

in enhancing accuracy and focusing on cracks. Furthermore, compared

to other attention mechanisms (Rows 2–7), Slim-Neck not only achieved

the highest accuracy but also lightweighted the network, reducing the

parameter count from 7,023,610 (Row 1) to 5,846,490 (Row 8), which

corresponds to a reduction of approximately 16.8 % in parameters. In

terms of GFLOPs, it was successfully reduced from 15.8 (Row 1) to 12.6

(Row 8). Simultaneously, there was an improvement in accuracy. This

highlights the effectiveness of Slim-Neck in performing road surface

crack detection tasks within the model, successfully achieving the goal

of lightweighting. These results validate the previous choice of Slim-

Neck in the design.

Subsequently, further ablation experiments based on YOLOv5 +

Slim-Neck were performed. C2f was used as a replacement for C3, which

could maintain the superior receptive eld of C3 while making more

effective use of the information from the previous layer, thereby further

improving the feature extraction capability and raising the performance

from 56.7 % mAP (Row 8) to 57.3 % mAP (Row 9). When compared with

other modules (Rows 8–12), C2f outperformed the other methods. Only

C3TR achieved consistent performance improvement with C2f and had a

smaller parameter count. However, it showed poor performance when

fused with other structures (Row 14).

Next, as envisioned, the Decoupled Head from YOLOX was intro-

duced to further optimize multi-scale and high-dimensional feature

representation. The experimental results showed an improvement in

performance from 57.3 % mAP (Row 9) to 58.4 % mAP (Row 13). On the

other hand, the introduced IDetect Head (Row 15), although having a

smaller parameter count, led to a signicant performance drop.

Following the introduction of the SPPCSPC structure (Row 16),

which primarily introduces 1x1 residual at the outermost layer and

effectively processes regular information and SPPF structure during

training, this design not only improves accuracy but also maintains

speed. Additionally, the SPPFCSPC [49] structure is introduced for

comparison (Row 17). Experimental results show that both structures

have the same number of parameters, but the mAP of SPPFCSPC is

slightly higher than SPPCSPC, while the F1 score is lower.

Finally, the paper removes one layer of CBS between SPPCSPC and

the neck (Row 19), as well as between SPPFCSPC and the neck (Row 18).

Experimental results validate the earlier hypothesis that this measure

not only simplies the overall network structure and reduces model

complexity but also achieves the best performance of 59.2 % mAP and

58.8 % F1 (Row 19), surpassing 57.5 % mAP and 58.7 % F1 (Row 16).

This further demonstrates that this measure, while improving real-time

performance, also maintains the model’s sensitivity to details such as

small cracks, thereby enhancing the accuracy of detection tasks.

Compared to YOLOv5 (Row 1), although there is an increase in model

parameters and GFLOPs, this may be necessary for handling pavement

crack detection tasks. This incremental change allows for the use of more

complex models, thereby enhancing the model’s expressive power to

better address challenges such as the ambiguity of tiny cracks and

incomplete information extraction from onboard images, which are the

Table 1

Comparisons with other object detection methods on our dataset. The FPS is

tested on a single Nvidia GTX 3080 GPU.

method mAP@0.5 mAP@0.5:0.95 F1 time FPS

Two-Stage Detector:

Faster R-CNN/

ResNet50 + FPN

[17]

56.10 % 25.90 % \ 31.8

Faster R-CNN/

MobileNetV2[33]

45.50 % 20.00 % \ 18.1

One-Stage Detector:

YOLOX[29] 52.82 % 27.47 % 51.28

5.58

179

u-YOLO (EM + EP)

[32]

55.40 % 25.1 % 45.43

27.7

YOLOv5[23] 55.76 % 26.01 % 56.51

8.7 ms 114

YOLOv6[49] 58.80 % 29.20 % \ 10.77

YOLOv7[30] 60.20 % 28.80 % 59.55

23.1

YOLOv8[50] 56.90 % 28.60 % 56.94

9.2 ms 108

Proposed YOLOv5 59.17 % 28.17 % 58.76

11.7

Fig. 9. Object Detection Performance Comparison.

H. Hu et al.

Measurement 229 (2024) 114443

issues addressed in this paper.

4.5.1. Ablation study of activation function

For the selection of activation functions, the experiments were con-

ducted as shown in Table 3. In Section 3.5, the Silu function was pro-

posed for capturing ne features in crack images in the context of crack

detection tasks. To verify this proposition, ve commonly used activa-

tion functions were compared. Based on the results presented in Table 3,

the Silu function achieved an mAP of 59.2 % and an F1 score of 58.8 %,

which were signicantly higher than those of Sigmoid (52.5 % and 53.8

%), Relu (57.5 % and 57.6 %), LeakyRelu (57.7 % and 57.7 %),

Hardwish (58.1 % and 58.0 %), and Mish (58.7 % and 58.1 %). This

further conrms the suitability of the Silu function for crack detection

tasks.

4.5.2. Ablation study of loss function

For the selection of the loss function, the experiments were con-

ducted as shown in Table 4. The experimental results demonstrated that

the CIoU loss function used in this study achieved an mAP of 59.2 %,

which was higher than that of GIoU (58.1 %), DIoU (58.1 %), and EIoU

(58.9 %). Additionally, CIoU outperformed other loss functions in terms

of precision (P) and F1 score, with only EIoU having the highest recall

(R). As mentioned in Section 3.5, the CIoU loss function accurately

measures the match between predicted and ground truth boxes, partic-

ularly for small objects. This superiority over GIoU and DIoU can be

attributed to the CIoU’s ability. On the other hand, EIoU incorporates

Focal Loss to address the issue of imbalanced difculty levels in samples.

However, since the dataset used in this study is relatively balanced, EIoU

slightly underperformed compared to CIoU.

4.6. Various crack detection results

Table 5 presents the detection results for different classes of cracks.

The category “all” represents all types of cracks, where D00 represents

longitudinal cracks, D10 represents lateral cracks, D20 represents alli-

gator cracks, D40 represents potholes, and D50 represents blurred white

lines. From Table 5, it can be observed that the proposed improved

YOLOv5 model consistently outperforms YOLOv5 in terms of evaluation

metrics for the “all” category. Throughout the 200 epochs, improved

YOLOv5 consistently leads in precision (P), recall (R), mAP, and F1, as

illustrated in Fig. 10. Additionally, when examining individual cate-

gories of cracks, the improved YOLOv5 shows varying degrees of

improvement compared to YOLOv5, without any performance degra-

dation. Notably, the most signicant improvement is observed for D10,

with a 5.9 % increase in precision, a 0.9 % increase in recall, a 4.8 %

increase in mAP, and a 2.5 % increase in F1. This signies that our model

is capable of better capturing and utilizing the unique features of lateral

cracks, allowing for more accurate differentiation between lateral cracks

and other classes, thereby enhancing overall precision. These ndings

further validate the effectiveness of the Slim-Neck architecture, as

referenced in our study. This structure focuses on the tiny details of

small cracks, contributing to addressing the issues present in the current

detection models.

Compared to u-YOLO (EM + EP), it is evident from Table 5 and

Fig. 10 that its precision is signicantly lower than YOLOv5 and

Improved YOLOv5, while recall is much higher than both. This obser-

vation can be attributed to u-YOLO being an ensemble model, gener-

ating more prediction boxes with the primary aim of avoiding missed

detections. However, this also results in a higher false positive rate.

Additionally, the mAP value of u-YOLO is close to that of YOLOv5, while

the F1 score is slightly lower than YOLOv5. In conclusion, our proposed

Improved YOLOv5 exhibits the best overall performance.

Fig. 11 illustrates the P-R curves for the two algorithms, providing a

more intuitive comparison between them. It is known that when the P-R

curve is closer to the top-right corner, it indicates that the model can

Table 2

A detailed ablation study of YOLO-RCD (The unchanged parts are the original

YOLOv5 parts).

Row method P R mAP F1 Parameters GFLOPs

1 YOLOv5 57.1 55.9 55.8 56.5 7,023,610 15.8

2 YOLOv5 + SE

[34]

57.7 55.4 55.8 56.5 7,066,618 15.8

3 YOLOv5 +

CBAM [35]

56.2 55.8 55.2 56.0 7,090,380 16.0

4 YOLOv5 +

ECA [36]

57.0 56.5 55.5 56.7 7,023,622 15.8

5 YOLOv5 +

GAM [37]

57.4 57.5 56.8 57.4 8,762,874 17.2

6 YOLOv5 +

Shufe [38]

57.0 55.1 55.5 56.0 7,023,802 15.8

7 YOLOv5 +

GSConv [28]

57.5 56.6 56.6 57.0 6,582,650 15.2

8 YOLOv5 +

Slim-Neck

[28]

59.2 55.8 56.7 57.4 5,846,490 12.6

9 YOLOv5 +

Slim-Neck +

C2f

56.4 58.4 57.3 57.4 7,085,530 16.3

10 YOLOv5 +

Slim-Neck +

C3TR [39]

59.0 56.6 57.3 57.8 5,847,258 12.4

11 YOLOv5 +

Slim-Neck +

C3SPP

57.0 56.6 56.6 56.8 5,354,842 12.2

12 YOLOv5 +

Slim-Neck +

C3Ghost [40]

57.1 56.9 56.3 57.0 5,228,570 12.1

13 YOLOv5 +

Slim-Neck +

C2f + det

[29]

58.4 57.0 58.4 57.7 14,392,794 56.7

14 YOLOv5 +

Slim-Neck +

C3TR + det

58.2 56.7 57.1 57.4 13,154,522 52.8

15 YOLOv5 +

Slim-Neck +

C2f + idet

[30]

52.3 56.8 49.1 54.5 7,086,516 16.3

16 YOLOv5 +

Slim-Neck +

C2f + det +

SPPSCPS

61.6 56.2 57.5 58.7 15,584,122 57.9

17 YOLOv5 +

Slim-Neck +

C2f + det +

SPPFSCPS

58.2 56.7 57.8 57.4 15,584,122 57.9

18 Row17-CBS 59.0 57.1 56.8 58.0 15,570,010 57.7

19 Row16-CBS

(improved

YOLOv5)

59.9 57.7 59.2 58.8 15,570,010 57.7

Table 3

Ablation studies of the activation function.

Activation function P R mAP F1

Sigmoid 54.6 53.1 52.5 53.8

Relu [41] 57.5 57.7 57.5 57.6

LeakyRelu [42] 58.1 57.3 57.7 57.7

Hardswish [43] 59.2 56.9 58.1 58.0

Mish [44] 59.4 56.8 58.7 58.1

Silu [45] 59.9 57.7 59.2 58.8

Table 4

Ablation studies of the loss function.

Loss function P R mAP F1

GIoU [46] 57.9 57.8 58.1 57.8

DIoU [47] 59.5 57.0 58.1 58.2

EIoU [48] 58.1 58.3 58.9 58.2

CIoU [23] 59.9 57.7 59.2 58.8

H. Hu et al.

Measurement 229 (2024) 114443

simultaneously maintain high precision and high recall during pre-

dictions. Specically, the P-R curves for D00 and D10 in YOLOv5 exhibit

a concave shape, while those for D00 and D10 in improved YOLOv5 are

relatively at and convex, indicating the greater extent of improvement

for D10. As for the curves of other classes, they are all convex, but the

curves in improved YOLOv5 are more convex compared to YOLOv5.

Moreover, the curves in YOLOv5 are relatively sparser compared to the

improved YOLOv5, implying that our detector achieves closer detection

performance across various types of cracks, making it more practical and

valuable.

4.7. Qualitative comparisons

Fig. 12 presents a qualitative comparison between improved

YOLOv5 and other state-of-the-art methods, including YOLOX [29],

YOLOv5, and u-YOLO (EM + EP) [32], on our dataset. To ensure a fair

comparison, we trained and tested the models with the same parame-

ters, and placed the condence scores above the predicted boxes for

better visualization.

YOLOX, as a more advanced algorithm in 2021, has the advantage of

a smaller network model and very fast processing speed. From Fig. 12

(b), it can be observed that YOLOX demonstrates certain detection ca-

pabilities for road surface cracks, although it may have some limitations

in detecting small cracks. However, it accurately locates, sizes, and

classies the predicted boxes for other types of defects, resulting in

overall good performance with slightly lower precision.

In contrast, u-YOLO (EM + EP) achieves signicant improvement in

precision. However, as an ensemble model, it tends to generate more

predicted boxes, as shown in Fig. 12 (c), which can affect its precision.

YOLOv5 further enhances the performance based on u-YOLO (EM + EP)

while signicantly improving speed while maintaining accuracy.

Although Fig. 12 (d) demonstrates overall good performance, YOLOv5

tends to have more false positives, marking cracks in blank areas, which

affects the overall precision of the model.

Among the four algorithms mentioned, the proposed method based

on improved YOLOv5 performs the best. Fig. 12 (e) illustrates that the

predicted boxes of the improved YOLOv5 algorithm are closest to the

actual position and size distribution of defects, with the highest con-

dence scores. Compared to other common algorithms, our proposed

method shows more satisfactory results in defect recognition.

5. Conclusion

This paper is based on the study of road images captured by vehicle-

Table 5

Detection results of each class of crack.

Class Labels P R mAP F1

u-YOLO（（EM þ EP））:

all 6020 33.1 72.4 55.4 45.4

D00 1140 27.5 64.6 46.6 38.6

D10 1139 25.6 61.4 37.8 36.1

D20 1768 38.9 76.2 61.4 51.5

D40 654 30.8 75.4 61.3 43.7

D50 1319 42.5 84.2 69.6 56.5

YOLOv5:

all 6020 57.1 55.9 55.8 56.5

D00 1140 51.1 49.2 47.3 50.1

D10 1139 49.6 33.6 37.1 40.1

D20 1768 67.6 60.8 63.1 64.0

D40 654 55.1 60.6 60.5 57.7

D50 1319 62.0 75.4 70.8 68.0

Improved YOLOv5:

all 6020 59.9 57.7 59.2 58.8

D00 1140 55.2 50.8 50.2 52.9

D10 1139 55.5 34.5 41.9 42.6

D20 1768 68.2 63.2 67.2 65.6

D40 654 57.4 62.1 62.2 59.7

D50 1319 63.1 78.0 74.3 69.8

Fig. 10. Comparison of three algorithms.

H. Hu et al.

Measurement 229 (2024) 114443

Fig. 11. P-R Curve of the two algorithms.

Fig. 12. Qualitative comparison results on our dataset.

H. Hu et al.

Measurement 229 (2024) 114443

mounted smartphones, where the widely studied YOLO series faces

challenges such as the blurring of tiny cracks and incomplete informa-

tion extraction. Therefore, a method based on improved YOLOv5 is

proposed for road surface crack detection. First, a dataset is reorganized.

Second, the specic issues encountered by the original YOLOv5 on this

dataset are addressed by introducing the Slim-Neck structure, C2f

structure, Decoupled Head, and SPPCSPC structure, along with the

adoption of the Silu activation function and CIoU loss function. These

improvements optimize the aforementioned two issues from various

aspects, enhancing the accuracy and comprehensiveness of target object

feature extraction while achieving lightweight results and maintaining

superior model inference speed. Experimental results demonstrate that

the proposed method based on improved YOLOv5 exhibits high detec-

tion accuracy and real-time performance on the curated dataset. Spe-

cically, the mAP value reaches 59.17 % and the F1 score reaches 58.76

%. In comparison to the original YOLOv5, the proposed method attains a

3.41 % increase in mAP and a 2.25 % increase in the F1 score. Addi-

tionally, the algorithm maintains a FPS of 85, enabling fast and accurate

detection of road surface cracks. Compared to other leading algorithms

in this eld, the proposed algorithm demonstrates signicant advan-

tages in various evaluation metrics. Although this method has made

optimizations for the problem addressed in this paper, it still encounters

some remaining issues. Future work can be pursued in the following two

aspects: (1) During the validation of various images using the model, it

was observed that the accuracy is lower under low-light conditions. To

further enhance the performance of the proposed detector, future

research will focus on improving detection accuracy under low-light

conditions. There are plans to explore and optimize image enhance-

ment techniques to address environments with insufcient illumination.

Simultaneously, advanced algorithms for low-light image processing

will be studied and integrated to improve detection accuracy under

visually challenging conditions. (2) To further enhance the overall

performance of the model, we plan to introduce a new loss function that

better aligns with the requirements of the experimental process. Spe-

cically, we intend to incorporate a mechanism for dynamically

adjusting weights within the loss function. The design of this mechanism

aims to adaptively adjust the weights of various components in the loss

function based on the characteristics of input images, enabling more

effective adaptation to changes in different conditions. More details and

code are available at https://github.com/dakehe/improved-YOLO

v5-and-vehicle-mounted-images.

CRediT authorship contribution statement

Hongwei Hu: Writing – review & editing, Supervision. Zirui Li:

Writing – review & editing, Writing – original draft. Zhiyi He: Super-

vision, Project administration. Lei Wang: Supervision. Su Cao: Soft-

ware. Wenhua Du: Supervision.

Declaration of competing interest

The authors declare that they have no known competing nancial

interests or personal relationships that could have appeared to inuence

the work reported in this paper.

Data availability

The authors do not have permission to share data.

Acknowledgments

This work was supported by Hunan Provincial Key Research and

Development Program (Grant No. 2022GK2058), the Natural Science

Foundation of Hunan Province (Grant Nos. 2023JJ60158,

2023JJ60546, 2023JJ50237, 2022JJ40477), and the Opening Project of

Shanxi Key Laboratory of Advanced Manufacturing Technology (No.

XJZZ202103).

Appendix A. Supplementary data

Supplementary data to this article can be found online at https://doi.

org/10.1016/j.measurement.2024.114443.

Appendix C. Supplementary data

Supplementary data to this article can be found online at https://doi.

org/10.1016/j.measurement.2024.114443.

References

[1] G.H. Luo, J.T. Wang, J.W. Pan, A frame work for concrete crack monitoring using

surface wave transmission method, Measurement. 218 (2023) 113211.

[2] G. Do

gan, B. Ergen, A new mobile convolutional neural network-based approach

for pixel-wise road surface crack detection, Measurement. 195 (2022) 111119.

[3] L. Yang, J. Fan, B. Huo, E. Li, Y. Liu, A nondestructive automatic defect detection

method with pixelwise segmentation, Knowl-Based Syst. 242 (2022) 108338.

[4] E.M. Thompson, A. Ranieri, S. Biasotti, M. Chicchon, I. Sipiran, M. Pham,

T. Nguyen-Ho, H. Nguyen, M. Tran, Shrec,, Pothole and crack detection in the road

pavement using images and RGB-D data, Comput. Graph-UK 107 (2022) (2022)

161–171.

[5] S. Liu, Y. Han, L. Xu, Recognition of road cracks based on multi-scale Retinex fused

with wavelet transform, Array. 15 (2022) 100193.

[6] H. Zhang, J. Li, F. Kang, J. Zhang, Monitoring depth and width of cracks in

underwater concrete structures using embedded smart aggregates, Measurement.

204 (2022) 112078.

[7] H. Bae, Y.K. An, Computer vision-based statistical crack quantication for concrete

structures, Measurement. 211 (2023) 112632.

[8] Y. Deng, J. Gui, H. Zhang, A. Taliercio, P. Zhang, S.H.F. Wong, A. Khan, L. Li,

Y. Tang, X. Chen, Study on crack width and crack resistance of eccentrically

tensioned steel-reinforced concrete members prestressed by CFRP tendons, Eng.

Struct. 252 (2022) 113651.

[9] L. Song, H. Sun, J. Liu, Z. Yu, C. Cui, Automatic segmentation and quantication of

global cracks in concrete structures based on deep learning, Measurement. 199

(2022) 111550.

[10] H. Zhang, Y. Chen, B. Liu, X. Guan, X. Le, Soft matching network with application

to defect inspection, Knowl-Based Syst. 225 (2021) 107045.

[11] M. Hu, Q. Hu, Design of basketball game image acquisition and processing system

based on machine vision and image processor, Microprocess. Microsy. 82 (2021)

103904.

[12] D. Ireri, E. Belal, C. Okinda, N. Makange, C. Ji, A computer vision system for defect

discrimination and grading in tomatoes using machine learning and image

processing, Artif. Intell, Agri. 2 (2019) 28–37.

[13] Y. Tang, Z. Huang, Z. Chen, M. Chen, H. Zhou, H. Zhang, J. Sun, Novel visual crack

width measurement based on backbone double-scale features for improved

detection automation, Eng. Struct 274 (2023) 115158.

[14] T. Yu, A. Zhu, Y. Chen, Efcient crack detection method for tunnel lining surface

cracks based on infrared images, J. Comput. Civil. Eng. 31 (3) (2017) 04016067.

[15] R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate

object detection and semantic segmentation, in, in: Proceedings of the IEEE

Conference on Computer Vision and Pattern Recognition, 2014, pp. 580–587.

[16] R. Girshick, Fast r-cnn, in: Proceedings of the IEEE international conference on

computer vision. 2015, pp. 1440-1448.

[17] S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards real-time object detection

with region proposal networks, in: Advances in neural information processing

systems. 2015, pp. 91-99.

[18] K. He, G. Gkioxari, P. Dollar, R. Girshick, Mask r-cnn, in: Proceedings of the IEEE

international conference on computer vision. 2017, pp. 2961-2969.

[19] B. Kim, S. Cho, Image-based concrete crack assessment using mask and region-

based convolutional neural network, Struct. Control. Hlth. 26 (8) (2019) e2381.

[20] A. Dahou, A.O. Aseeri, A. Mabrouk, R.A. Ibrahim, M.A. Al-Betar, M.A. Elaziz,

Optimal Skin Cancer Detection Model Using Transfer Learning and Dynamic-

Opposite Hunger Games Search, Diagnostics. 13 (2023) 1579.

[21] M.A. Elaziz, A. Dahou, A. Mabrouk, S. El-Sappagh, A.O. Aseeri, An efcient

articial rabbits optimization based on mutation strategy for skin cancer

prediction, Comput Biol Med. 163 (2023) 107154.

[22] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unied, real-

time object detection, in, in: Proceedings of the IEEE Conference on Computer

Vision and Pattern Recognition, 2016, pp. 779–788.

[23] X. Zhu, S. Lyu, X. Wang, Q. Zhao, TPH-YOLOv5: Improved YOLOv5 based on

transformer prediction head for object detection on drone-captured scenarios, in:

In: Proceedings of the IEEE/CVF International Conference on Computer Vision,

2021, pp. 2778–2788.

[24] Y. Li, H. Sun, Y. Hu, Y. Han, Electrode defect YOLO detection algorithm based on

attention mechanism and multi-scale feature fusion, Control and Decision (2022).

[25] P. Wu, A. Liu, J. Fu, X. Ye, Y. Zhao, Autonomous surface crack identication of

concrete structures based on an improved one-stage object detection algorithm,

Eng. Struct. 272 (2022) 114962.

H. Hu et al.

Measurement 229 (2024) 114443

[26] J. Zhang, S. Qian, C. Tan, Automated bridge surface crack detection and

segmentation using computer vision-based deep learning model, Eng. Appl. Artif.

Intel. 115 (2022) 105225.

[27] Z. Xiaoxun, H. Xinyu, G. Xiaoxia, Y. Xing, X. Zixu, W. Yu, L. Huaxin, Research on

crack detection method of wind turbine blade based on a deep learning method,

Appl. Energ. 328 (2022) 120241.

[28] H. Li, J. Li, H. Wei, Z. Liu, Z. Zhan, Q. Ren, Slim-neck by GSConv: A better design

paradigm of detector architectures for autonomous vehicles, arXiv: 2206.02424

Available: https://arxiv.org/abs/2206.02424.

[29] Z. Ge, S. Liu, F. Wang, Z. Li, J. Sun, Yolox: Exceeding yolo series in 2021, arXiv:

2107.08430 Available: https://arxiv.org/abs/2107.08430.

[30] C.Y. Wang, A. Bochkovskiy, H.Y.M. Liao, YOLOv7: Trainable bag-of-freebies sets

new state-of-the-art for real-time object detectors, in, in: Proceedings of the IEEE/

CVF Conference on Computer Vision and Pattern Recognition, 2023,

pp. 7464–7475.

[31] D. Arya, H. Maeda, S.K. Ghosh, D. Toshniwal, Y. Sekimoto, RDD2022: A multi-

national image dataset for automatic Road Damage Detection, arXiv: 2209.08538

Available: https://arxiv.org/abs/2209.08538.

[32] V. Hegde, D. Trivedi, A. Alfarrarjeh, A. Deepak, S.H. Kim, C. Shahabi, Yet another

deep learning approach for road damage detection using ensemble learning, in: In:

2020 IEEE International Conference on Big Data (big Data), 2020, pp. 5553–5558.

[33] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.C. Chen, Mobilenetv 2: Inverted

residuals and linear bottlenecks, in, in: Proceedings of the IEEE Conference on

Computer Vision and Pattern Recognition, 2018, pp. 4510–4520.

[34] J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, in, in: Proceedings of the

IEEE Conference on Computer Vision and Pattern Recognition, 2018,

pp. 7132–7141.

[35] S. Woo, J. Park, J.Y. Lee, I.S. Kweon, Cbam: Convolutional block attention module,

in, in: Proceedings of the European Conference on Computer Vision (ECCV), 2018,

pp. 3–19.

[36] Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, Q. Hu, ECA-Net: Efcient channel attention

for deep convolutional neural networks, in: In: Proceedings of the IEEE/CVF

Conference on Computer Vision and Pattern Recognition, 2020, pp. 11534–11542.

[37] D. Guo, Y. Shao, Y. Cui, Z. Wang, L. Zhang, C. Shen, Graph attention tracking, in,

in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern

Recognition, 2021, pp. 9543–9552.

[38] Q.L. Zhang, Y.B. Yang Sa-net, Shufe attention for deep convolutional neural

networks, in: In: ICASSP 2021–2021 IEEE International Conference on Acoustics,

Speech and Signal Processing (ICASSP), 2021, pp. 2235–2239.

[39] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L. Kaiser, L.

Polosukhin, Attention is all you need, in: Advances in neural information

processing systems, 2017, pp. 5998–6008.

[40] X. Dong, S. Yan, C. Duan, A lightweight vehicles detection network model based on

YOLOv5, Eng Appl Artif Intel 113 (2022) 104914.

[41] B. Xu, N. Wang, T. Chen, M. Li, Empirical evaluation of rectied activations in

convolutional network, arXiv: 1505.00853 Available: https://arxiv.org/abs/

1505.00853.

[42] A.L. Maas, A.Y. Hannun, A.Y. Ng, Rectier nonlinearities improve neural network

acoustic models, In:proc. Icml. (2013).

[43] A. Howard, M. Sandler, G. Chu, L.C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu,

R. Pang, V. Vasudevan, Q.V. Le, H. Adam, Searching for mobilenetv3, in, in:

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019,

pp. 1314–1324.

[44] D. Misra, Mish: A self regularized non-monotonic activation function, arXiv:

1908.08681 Available: https://arxiv.org/abs/1908.08681.

[45] S. Elfwing, E. Uchibe, K. Doya, Sigmoid-weighted linear units for neural network

function approximation in reinforcement learning, Neural Networks 107 (2018)

3–11.

[46] H. Rezatoghi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, S. Savarese, Generalized

intersection over union: A metric and a loss for bounding box regression, in: In:

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern

Recognition, 2019, pp. 658–666.

[47] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, D. Ren, Distance-IoU loss: Faster and better

learning for bounding box regression, in: In: Proceedings of the AAAI Conference

on Articial Intelligence, 2020, pp. 12993–13000.

[48] Y.F. Zhang, W. Ren, Z. Zhang, Z. Jia, L. Wang, T. Tan, Focal and efcient IOU loss

for accurate bounding box regression, Neurocomputing 506 (2022) 146–157.

[49] C. Li, L. Li, H. Jiang, K. Wen, Y. Geng, L. Li, Z. Ke, Q. Li, M. Cheng, W. Nie,

YOLOv6: A single-stage object detection framework for industrial applications,

arXiv:2209.02976 Available: https://doi.org/10.48550/arXiv.2209.02976.

[50] F.M. Talaat, H. ZainEldin, An improved re detection approach based on YOLO-v8

for smart cities, Neural Comput Appl 35 (28) (2023) 20939–20954.

H. Hu et al.

Bấm Tải xuống để xem toàn bộ.

Preview text:

Measurement 229 (2024) 114443
Contents lists available at ScienceDirect Measurement
journal homepage: www.elsevier.com/locate/measurement
Road surface crack detection method based on improved YOLOv5 and vehicle-mounted images
Hongwei Hu a, Zirui Li a, Zhiyi He a,b,*, Lei Wang c, Su Cao d, Wenhua Du b
a College of Automotive and Mechanical Engineering, Changsha University of Science and Technology, Changsha Hunan 410114, China
b Shanxi Key Laboratory of Advanced Manufacturing Technology, North University of China, Taiyuan 030051, China
c College of Civil Engineering, Changsha University of Science and Technology, Changsha, Hunan 410114, China
d College of Intelligence Science and Technology, National University of Defense Technology, Changsha, Hunan,410000, China A R T I C L E I N F O A B S T R A C T Keywords:
Road surface crack detection methods using vehicle-mounted images have gained substantial attention recently. Road surface crack detection
Notably, YOLO-based techniques have exhibited effectiveness and real-time performance. However, current YOLO series
YOLO-based approaches encounter challenges like blurriness of small cracks and incomplete information Vehicle-mounted images
extraction from vehicle-mounted images. Therefore, this paper proposes a novel detection method based on the Attention mechanism Lightweight design
improved YOLOv5 and vehicle-mounted images. In this method, the Slim-Neck structure enhances crack focus
through a weighted attention mechanism while optimizing network efficiency. The integration of the C2f
structure and Decoupled Head better harnesses upper layer output information. Moreover, the SPPCSPC struc-
ture is bifurcated to augment model efficiency and accuracy. The training process is optimized by using the Silu
activation function and CIoU loss function. This approach is applied to vehicle-mounted images, with its efficacy
and feasibility affirmed through extensive comparative and ablation experiments. Importantly, compared to five
other advanced methods, notable enhancements are observed in various evaluation metrics. 1. Introduction
thus infer the location of cracks, but this entails high equipment costs
[14]. Nowadays, deep learning techniques have made remarkable
Road surface cracks [1–5]are common defects in road structures.
progress in the field of image processing. Due to their excellent perfor-
Their formation is primarily attributed to the aging and degradation of
mance, deep learning algorithms have found widespread application in
road surface materials over time, coupled with the influence of natural road surface crack detection.
climatic factors such as rain and snow [6,7]. These factors contribute to
In general, deep learning-based object detection and recognition
the emergence of various types of cracks on the road surface, which
algorithms can be categorized into two-stage detection methods based
gradually widen and lead to surface damage and structural loosening,
on classification and one-stage detection methods based on regression.
posing significant threats to both road safety and service life [8,9].
For two-stage detection methods, Girshick et al. introduced the R-CNN
Therefore, it is meaningful to conduct research on road surface crack
[15], which served as a pioneering work for deep learning-based object detection.
detection. Building on this, they improved the method for region pro-
Over the past few decades, numerous scholars and experts have
posal extraction and proposed Fast R-CNN [16] and Faster R-CNN [17].
proposed various methods to detect cracks in road surfaces. These
HE introduced an additional output branch for predicting target masks
methods include manual inspection [10], machine vision [11–13], and
during candidate object generation, leading to the development of Mask
infrared imaging [14], among others. However, these methods have
R-CNN [18]. Kim et al. [19] utilized Mask R-CNN and performed
certain limitations. Manual inspections are time-consuming, labor-
morphological operations on the detected crack masks to quantify the
intensive, and inefficient, susceptible to environmental and human
cracks. A et al. [20] utilized Transfer Learning and Dynamic-Opposite
factors [10]. Machine vision techniques enable automated detection, but
Hunger Games Search to optimize feature selection. Additionally, they
require high-quality image data and sophisticated algorithms [12].
employed an algorithm called Improved Artificial Rabbits Optimizer
Infrared imaging can detect temperature variations on road surfaces and
[21] to disregard unimportant features. While these methods have
* Corresponding author at: College of Automotive and Mechanical Engineering, Changsha University of Science and Technology, Changsha Hunan 410114, China.
E-mail address: hezhiyihnu@126.com (Z. He).
https://doi.org/10.1016/j.measurement.2024.114443
Received 10 August 2023; Received in revised form 19 February 2024; Accepted 4 March 2024 Available online 5 March 2024
0263-2241/© 2024 Elsevier Ltd. All rights reserved. H. Hu et Measurement al. 229 (2024) 114443
continuously enhanced and innovated two-stage detection techniques,
existing methods’ limitations. Section 3 presents a detailed description
their hardware requirements constrain detection speed and real-time
of the proposed method. In Section 4, comparative experiments and
capabilities, rendering them unsuitable for real-time detection.
ablation experiments are conducted to validate and analyze the pro-
To meet the real-time detection requirements, regression-based one-
posed approach. Finally, Section 5 summarizes the research findings of
stage object detection methods directly obtain the probability and po-
this paper and suggests potential directions for future improvements.
sition of the objects without the need for region proposal extraction. The
YOLO [22] series is a classic representative, with YOLOv5 [23] widely 2. Related work
applied in various domains, although it may encounter performance
bottlenecks in specific domains. Therefore, many researchers have
In this section, the YOLOv5 algorithm, which has been proposed in
conducted further studies on road surface crack detection. For instance,
recent years, is first introduced. Subsequently, the existing issues of
Li et al. [24] proposed a YOLO detection algorithm that improves
YOLOv5 in road surface crack detection are discussed in depth, and
detection accuracy, but the model size is large and fails to meet real-time
approaches to address these issues are presented.
requirements. To address this issue, Wu et al. [25] introduced an
improved YOLOv4 network with pruning techniques and EvoNorm-S0 2.1. YOLOv5
structure, which enhances detection accuracy and satisfies real-time
requirements. In other crack detection fields, such as bridge deck
The YOLOv5 object detector consists of three main components:
crack detection, Zhang et al. [26] demonstrated the effectiveness of CR-
Backbone, Neck, and Head. The Backbone is responsible for extracting
YOLO for sparsely distributed cracks but with limited performance on
the feature information from the input image. The Neck plays a crucial
complex cracks. For complex and multivariate cracks, Z et al. [27]
role in further processing and fusing the features extracted by the
developed MI-YOLO, which exhibits stronger feature extraction capa-
Backbone to enhance the accuracy and effectiveness of object detection.
bilities for light-colored and low-definition images but lower accuracy in
Finally, after being processed by the Head, YOLOv5 outputs the category identifying small cracks.
and location information of each detected object.
Although the aforementioned object detection research has initially
YOLOv5 predicts for each grid on the feature map and compares the
improved the recognition accuracy and efficiency of the models, the
predicted information with the ground truth to guide the model towards
datasets used in these studies are based on manually processed tradi-
convergence. The loss function aims to measure the discrepancy be-
tional crack images, which are not entirely suitable for the new chal-
tween the predicted information and the ground truth. A smaller loss
lenges posed by road images captured by vehicle-mounted smartphones
function value indicates that the predicted information is closer to the
for crack detection. This primarily involves two issues. Firstly, the
ground truth, which greatly determines the performance of the model.
vehicle-mounted image contains the surrounding environment of the
The loss function in YOLOv5 consists of three main components: the
road, which makes the cracks appear smaller or less prominent in the
classification loss, objectness loss, and localization loss. The formulation
image, resulting in blurred tiny cracks. Secondly, the C3 structure in the
of the loss function is as follows:
YOLO series, which is commonly studied, performs feature extraction at
earlier layers. This may lead to the loss of contextual information of tiny
Loss = λ1Lcls + λ2Lobj + λ3Lloc (1)
cracks and incomplete extraction of crack information. Due to these
Where λ corresponds to different loss weights, with default values of
circumstances, detectors struggle to accurately distinguish the subtle
0.5, 1.0, and 0.05, respectively.
differences between cracks and the surrounding background. These two
For the classification loss and objectness loss, YOLOv5 utilizes the
issues are considered the primary challenges faced by current detection
binary cross-entropy function by default. The binary cross-entropy
models, particularly when aiming to maintain excellent real-time per-
function is defined as follows:
formance while addressing these problems. {
To address the challenges mentioned above, this paper proposes a − logp, y = 1
L = − ylogp − (1 − y)log(1 − p) = (2)
novel real-time detection algorithm named Improved YOLOv5. Initially,
− log(1 − p), y = 0
a comprehensive analysis of the limitations of the current detection
Where y represents the label of the input sample (1 for positive
model is conducted. Targeted structures are selected to ensure that the
sample, 0 for negative sample), and p represents the probability pre-
choices made have a positive theoretical impact on the existing issues in
dicted by the model that the input sample is a positive sample.
the model. Subsequently, the Silu activation function and CIoU loss
As for the localization loss, YOLOv5 adopts the CIoU (Complete
function are introduced, optimizing the training process, and all struc-
Intersection over Union) loss, which is expressed by the following for-
tures are integrated. Finally, during the structural fusion and adjustment mula:
process, a layer of CBS between SPPCSPC [30] and the neck is removed
to further optimize the overall structure. Through these clever combi- ρ2(b, bgt) L + αv (3)
nations and adjustments, the challenges in crack detection in vehicle- CIoU = 1 − IoU + c2
mounted images are effectively addressed. Validated on a reorganized
Where IoU represents Intersection over Union, which is used to
dataset, the algorithm exhibits outstanding performance in the field of
measure the overlap between the predicted bounding box and the
road surface crack detection in vehicle-mounted images, as substanti-
ground truth bounding box in object detection. Assuming the predicted
ated by a wealth of experimental results.
bounding box is represented as A and the ground truth bounding box is
The following summarizes the significant contributions of this study:
represented as B, the expression for IoU is given by:
1. This paper introduces a novel real-time detection method in the field A ∩ B IoU = (4)
of road surface crack detection in vehicle-mounted images. A ∪ B
2. A new algorithm, named Improved YOLOv5, is proposed, effectively
Where b and bgt represent the center points of the predicted bounding
addressing the challenges in crack detection in vehicle-mounted
box and the ground truth bounding box, respectively. ρ denotes the
through clever combinations and adjustments.
Euclidean distance between the two center points, and c represents the
3. Reorganized datasets are used to validate and compare the proposed
diagonal distance of the minimum enclosing region of the predicted and method to well-known methods.
ground truth bounding boxes. “ gt ” is the abbreviation for ground truth.
αv is a penalty term designed to consider the consistency of aspect
The rest of this paper is organized as follows: In Section 2, a review of
ratios between two bounding boxes, aiming to focus more on the shape
the related work is provided, along with a detailed analysis of the
of the target. Let’s delve into v. Here, v is a metric used to quantify the 2 H. Hu et Measurement al. 229 (2024) 114443
consistency of aspect ratios, and its expression is as follows:
information. To mitigate the parameter count, the consideration of ( )
employing the Slim-Neck structure was made. 4 wgt w 2 v = tan− 1 − tan− 1 (5)
Regarding the second issue, in order to better utilize the information π2 hgt h
from the previous layer and address the problem of incomplete feature
In this equation, wgt and hgt represent the width and height of the
extraction in the C3 structure, a replacement with the C2f structure[50]
ground truth bounding box, while w and h represent the width and
was carried out. Furthermore, the Decoupled Head from YOLOX was
height of the currently predicted bounding box. Then, the difference in
introduced to further optimize the representation of multi-scale and
aspect ratio between the ground truth bounding box and the predicted
high-dimensional features. To further enhance the speed and accuracy of
bounding box is calculated. When computing this ratio, the arctangent
the model, the SPPCSPC structure was also introduced. Finally, by
function, denoted as tan− 1, is used to convert the aspect ratio into an
integrating the aforementioned methods and adjusting the parameter
angle. This allows for a more accurate representation of the difference
size, the method referred to as improved YOLOv5 was proposed. The
between aspect ratios. Since the range of the arctangent function is
details of this method will be elaborated in the following section.
between 0 and π/2, a scaling factor is needed to adjust the angle dif-
ference. This factor ensures that the angle difference is compared within
3. The proposed road surface crack detection method the appropriate range. The term 4
In this section, the specific steps of the proposed road surface crack
π2 in the formula acts as a scaling factor, ensuring that the
value of v stays within a reasonable range. By comparing this ratio, we
detection method is initially presented, followed by a detailed exposi-
gain insights into the consistency of the target
tion of the improved YOLOv5 utilized in this method. ’s shape.
Another crucial parameter, α, is a positive weight designed to bal-
ance the two factors of IoU and aspect ratio. Its mathematical expression
3.1. The specific steps of the proposed method is
The road surface crack detection scheme, which is based on the v α = (6)
improved YOLOv5 and proposed in this paper, is outlined in the (1 − IoU) + v
following steps. For a detailed visual representation of the process,
α nonlinearly combines v with 1 − IoU to adjust the weight. When the
readers are referred to Supplementary Fig. 1 in the supplementary
IoU is low, α increases, directing the model’s attention more towards the materials.
consistency of aspect ratios. Conversely, when the IoU is high, α de-
Step 1: Road images are captured using a vehicle-mounted smart-
creases, emphasizing the accuracy of position and size. In regression, α is
phone to establish the dataset.
given higher priority, especially when the two bounding boxes do not
Step 2: The dataset is utilized to train the improved YOLOv5 model,
overlap. This implies a greater emphasis on ensuring the precision of the resulting in weight files.
bounding box’s position and size in object detection tasks. The combi-
Step 3: The trained improved YOLOv5 weight file is subsequently
nation of these two formulas forms the calculation of CIoU, compre-
deployed on the vehicle-mounted device to perform real-time crack
hensively measuring the similarity between two bounding boxes in
detection on the road surface.
terms of position, size, and shape. This integrated metric contributes
Step 4: The detection results are generated as output.
positively to enhancing the performance and robustness of object
A simplified representation of the scheme is provided in the main detection models.
text (see Fig. 1). A comprehensive flow chart is given in the supple- mentary material.
2.2. Problems of YOLOv5 in road surface crack detection on vehicle- mounted images
Currently, YOLOv5 can be utilized in various domains. However, due
to the combined influence of factors such as dataset characteristics,
target attributes, class balance, and model optimization, YOLOv5 ex-
hibits significant differences when employed in different domains.
Through existing research and in-depth analysis of the YOLOv5 struc-
ture, the following issues have been identified in its implementation for
road surface crack detection in vehicle-mounted images:
1. The dataset used in this paper consists of road images captured by a
vehicle-mounted smartphone. In these images, cracks are often small
in size or not visually prominent, making it challenging for the
network to directly capture their subtle features. This can result in
the loss of information related to small cracks. Moreover, YOLOv5
may exhibit different performance in detecting small and large ob-
jects. When the target is small, YOLOv5 may struggle to accurately detect it.
2. The C3 structure in the YOLO series performs feature extraction at
earlier layers, which may result in the loss of contextual information
for small cracks. However, adding too many modules on top of
YOLOv5 to enhance feature extraction would significantly increase
the number of parameters, thereby sacrificing the real-time advan- tage of the original YOLOv5.
For the first issue, the addition of an attention mechanism module
was attempted to assist the model in better focusing on crack
Fig. 1. Road surface crack detection method based on improved YOLOv5. 3 H. Hu et Measurement al. 229 (2024) 114443
3.2. The proposed improved YOLOv5
In terms of the algorithm, the proposed improved YOLOv5 in this
paper differs from the original YOLOv5 by incorporating six key mod-
ules: Slim-Neck structure, C2f structure, Decoupled Head, SPPCSPC
structure, Silu activation function, and CIoU loss function. Each module
plays a distinct role in the model, and they will be elaborated on in
detail. The overall architecture of the improved YOLOv5 is illustrated in Fig. 2.
3.2.1. Slim-Neck structure
In this study, the Slim-Neck structure [26] was introduced, with
Fig. 3. Slim-Neck structure.
GSconv replacing Conv and VoVGSCSP replacing C3. Compared to the
original structure, the Slim-Neck structure incorporates lightweight
design, reducing computational complexity and model parameters,
thereby enhancing efficiency and reducing computation time and
hardware resource requirements. To meet the specific requirements of
road surface crack detection, this study integrates attention mechanisms
with convolution. Convolutional Neural Networks (CNNs) excel at
capturing local features of images, while attention mechanisms focus on
specific parts of the target, improving the accuracy of target detection.
Considering the use of vehicle-mounted images, where cracks often
appear against complex backgrounds, combining convolution and
attention mechanisms can better handle complex backgrounds, allevi- Fig. 4. C2f structure.
ating excessive attention to them. Moreover, attention mechanisms can
be employed for multi-scale feature fusion. In road surface crack
detection, the size and shape of cracks may vary significantly. By
effectively addresses issues such as class imbalance and object size
combining multi-scale features extracted by convolution with attention
variation compared to traditional detection head structures. The incor-
mechanisms, the model can better capture target information at
poration of the Decoupled Head leads to a substantial enhancement in
different scales, thereby enhancing its adaptability to scale variations. In
the model’s detection performance, improving both accuracy and speed.
summary, the introduction of the Slim-Neck structure can improve the
Additionally, the Decoupled Head enables flexible model design, such as
efficiency, accuracy, and adaptability of the model, making it an effec-
increasing the number of output classes or adjusting the detector’s
tive model improvement method. The Slim-Neck structure is illustrated
receptive field size. The Decoupled Head structure is illustrated in Fig. 5. in Fig. 3.
3.2.4. SPPCSPC structure 3.2.2. C2f structure
In order to improve the speed and accuracy of the model, the original
The proposed C2f structure in this paper is primarily used to replace
SPPF structure is replaced by the SPPCSPC structure [30] in this study.
the C3 structure in the Backbone of YOLOv5. This structure adopts the
Similar to SPPF, SPPCSPC also includes pooling layers of sizes 1x1, 5x5,
shuffling idea of CSPNet and the residual structure concept to obtain
9x9, and 13x13, but it introduces an additional 1x1 residual branch. The
richer gradient flow information and better utilization of the informa-
structure is divided into two parts, where one part is processed with
tion from the upstream. The number of stacked C2f structures is
traditional convolution and the other part is processed with the SPP controlled by the parameter
structure. Finally, these two parts are merged. This design reduces the
“n”, which can vary for models of different
scales. The specific structure of the C2f module is illustrated in Fig. 4,
computational complexity by half, thereby improving speed, while also
where CBS represents the combination of Convolution, Batch Normali-
enhancing accuracy. Additionally, the SPPCSPC structure can be com-
zation, and the Silu activation function. By replacing the C3 structure,
bined with other detection head structures, especially the previously
the C2f structure further enhances the network
proposed C2f structure and Slim-Neck structure. When situated between ’s representation capa-
bility and detection performance.
these two structures, it can better facilitate multi-scale feature fusion
and further enhance the network performance. The SPPCSPC structure is 3.2.3. Decoupled Head illustrated in Fig. 6.
In this study, the Decoupled Head [29] is introduced, which
Fig. 2. Improved YOLOv5 architecture. 4 H. Hu et Measurement al. 229 (2024) 114443
Fig. 5. Decoupled Head.
Fig. 6. SPPCSPC structure.
3.2.5. Silu activation function and CIoU loss function
crack images effectively. The smoothness and approximate linearity of
The choice of Silu activation function and CIoU loss function in this
the Silu function allow it to preserve the linear relationships of input
study is motivated by the analysis of the dataset images for crack
features and exhibit a larger dynamic range. This property may aid in
detection task. It is crucial to capture the shape, texture, and edges of
capturing subtle features in crack images and improve the accuracy of
Fig. 7. Improved yolov5 network. 5 H. Hu et Measurement al. 229 (2024) 114443
object detection. Regarding the loss function, different loss functions
4.2. Platform construction and model training
have distinct advantages for various tasks. In the case of crack detection,
the CIoU loss function provides more accurate measurement of the
The software environment and hardware environment configured for
match between predicted boxes and ground truth boxes, particularly for
model training and testing in this paper are as follows: Windows10
handling small targets. This leads to improved detection performance
operating system, Pytorch deep learning framework, CUDAv11.1,
for small targets. Therefore, the selection of Silu activation function and
Pythonv3.8.10, torchv1.9.0, GPU NVIDIA GeForce RTX 3080(10 GB),
CIoU loss function further optimizes the performance of the network.
CPU Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50 GHz.
Due to the low resolution of the dataset used in this experiment,
3.2.6. Module integration optimization
which is 600*600 pixels, we selected a suitable batch size of 32 and
To better integrate the modules and enhance their effectiveness, we
trained the model for 200 epochs. As for the optimizer, we chose SGD
decided to remove a layer of CBS between the SPPCSPC structure and
and set other key parameters as follows: lr0 = 0.01, lrf = 0.01, mo-
the upsampling. This decision brings several advantages: firstly, by
mentum = 0.843, weight_decay = 0.00036.
adjusting the number of channels, we make it more adaptable to the
The model adopts a strategy of random weight initialization instead
requirements of subsequent structures. Secondly, simplifying the overall
of utilizing a pre-trained model for the initial setup. Deliberately opting
network structure helps reduce model complexity, alleviate computa-
for training from scratch, the model is designed to directly learn task-
tional burden, and thus improve inference speed. However, the key
specific features from the provided data, without relying on generic
point is that the Slim-Neck structure, designed to capture information
features learned from other datasets. Throughout the experimental
about small cracks, is located within the neck. Moreover, removing the
process, thorough training and fine-tuning have been conducted to
CBS layer before the neck reduces information loss, ensuring that critical
enhance the model’s adaptation to the specific task of road surface crack
features can be transmitted to the bottleneck stage of the network
detection. This approach allows for better control over the model’s
without interference. This measure significantly contributes to main-
structure and performance, enabling more effective addressing of chal-
taining the model’s sensitivity to details like small cracks, thereby
lenges such as the blurriness of small cracks and incomplete information
enhancing the accuracy of the detection task. The specific architecture of
extraction from vehicle-mounted images.
the improved YOLOv5 network is illustrated in Fig. 7. 4.3. Evaluation metric 4. Experiments
The evaluation metrics used in this paper for object detection include 4.1. Datasets
precision (P), recall (R), mean average precision (mAP), and F1 score as
the primary evaluation indicators for the model. Precision (P) and recall
In the field of road surface crack detection, datasets plays a crucial
(R) respectively represent the proportion of correctly predicted samples
role. The widely used dataset for complex environmental conditions in
among all detected objects and the proportion of correctly predicted
this domain is provided by the IEEE Big Data Global Road Damage
samples among all objects. The mAP is the average area under the
Detection Challenge [31]. In this study, 13,508 images were selected
precision-recall curve for all classes, and the F1 score is the harmonic
from the dataset as the refined dataset. The resolution of the images was
mean of precision and recall. Additionally, frames per second (FPS) and
set to 600x600 pixels. We focused on five categories: longitudinal
Giga floating-point operations per second (GFLOPs) are also crucial
cracks, lateral cracks, alligator cracks, potholes, and blurred white lines.
metrics for evaluating models. FPS represents the model’s capability to
Due to the disorderly nature of the annotation information in the orig-
process frames per second, while GFLOPs indicate the number of
inal dataset, a Python script was employed to transform it into a stan-
floating-point operations a device can perform per second.
dardized txt format. Subsequently, among the 13,508 txt files, we sifted
In road damage detection, both precision and recall are equally
through and retained information pertaining to five categories: longi-
important as misclassifications or false negatives can have an impact on
tudinal cracks, lateral cracks, alligator cracks, potholes, and blurred
road maintenance and safety. Therefore, this study focuses on the
white lines. These were individually labeled as D00, D10, D20, D40, and
overall performance of the model and considers the mean average pre-
D50. Following this step, data from other categories and instances of
cision (mAP) and F1 score, which combine precision and recall, as the
mislabeling were expunged. For images with missing information, we
most important evaluation metrics.
employed the image annotation tool, labelimg, to conduct additional
annotations. The refined dataset was then divided into training, vali-
4.4. Comparisons with other state-of-the-art methods
dation, and testing sets, with ratios of 0.56, 0.22, and 0.22, respectively.
During the model training process, a random shuffling technique was
To validate the superiority of improved YOLOv5 proposed in this
employed. At each epoch, images from the training and validation sets
paper, a comparison is conducted with several commonly used classical
were randomly mixed and divided into new training and validation sets
algorithms in the field of road surface crack detection, including Faster
in the same proportion. This data preparation method ensures that the
R-CNN [17], YOLOX [29], YOLOv5, u-YOLO (EM + EP) [32], YOLOv6
model is provided with high-quality and diverse data, thus improving its
[49], YOLOv7 [30], and YOLOv8 [50]. Notably, u-YOLO (EM + EP) is
accuracy and robustness. The refined dataset is illustrated in Fig. 8.
the method employed by the champion of the 2020 IEEE Big Data Global
Road Damage Detection Challenge. The evaluation metrics employed for
each algorithm include mAP, F1, and FPS. The models are trained and
Fig. 8. Some examples from the dataset. 6 H. Hu et Measurement al. 229 (2024) 114443
tested on the same dataset with identical hyperparameters. The detec-
tion results are presented in Table 1 and Fig. 9.
According to the analysis of the detection results in Table 1, the
improved YOLOv5 achieved an mAP@0.5 of 59.17 %, surpassing Faster
R-CNN/ResNet50 + FPN (56.10 %), Faster R-CNN/MobileNetV2 (45.50
%), YOLOX (52.82 %), u-YOLO (EM + EP) (55.40 %), YOLOv5 (55.76
%), YOLOv6 (58.80 %), and YOLOv8 (56.90 %), and approached the
performance level of YOLOv7 (60.20 %). Similarly, it demonstrated a
similar trend in F1, being higher than other detectors and approaching
the performance level of YOLOv7. Additionally, the improved YOLOv5
performed well in mAP@0.5:0.95. In terms of inference speed, the
improved YOLOv5 reached 85FPS, although lower than YOLOX (179),
YOLOv5 (114), and YOLOv8 (108) in one-stage detectors, it exceeded
all other detectors. Considering that 85FPS is sufficient for real-time
detection requirements in most scenarios, sacrificing some FPS to ach-
ieve higher performance is worthwhile without significantly affecting
overall performance. However, compared to YOLOv7, although its mAP
and F1 scores are slightly higher than the improved YOLOv5, its FPS is
only 43, which is only half of the model proposed in this paper. This is
not only unworthy but also does not meet real-time requirements.
Fig. 9. Object Detection Performance Comparison.
The proposed improved YOLOv5 algorithm in this paper is an
improvement upon YOLOv5. Compared to the original YOLOv5 algo-
terms of GFLOPs, it was successfully reduced from 15.8 (Row 1) to 12.6
rithm, the improved YOLOv5 algorithm achieved a 3.41 % improvement
(Row 8). Simultaneously, there was an improvement in accuracy. This
in mAP and a 2.25 % improvement in F1, while maintaining a FPS of 85.
highlights the effectiveness of Slim-Neck in performing road surface
When compared to other leading algorithms in this field, the improved
crack detection tasks within the model, successfully achieving the goal
YOLOv5 proposed in this paper has the highest cost-effectiveness in
of lightweighting. These results validate the previous choice of Slim- overall evaluation. Neck in the design.
Subsequently, further ablation experiments based on YOLOv5 +
Slim-Neck were performed. C2f was used as a replacement for C3, which 4.5. Ablation study
could maintain the superior receptive field of C3 while making more
effective use of the information from the previous layer, thereby further
Extensive ablation studies were also conducted on our reorganized
improving the feature extraction capability and raising the performance
dataset to validate each module of the proposed improved YOLOv5 in
from 56.7 % mAP (Row 8) to 57.3 % mAP (Row 9). When compared with
this paper. Table 2 presents the detailed roadmap from YOLOv5 to
other modules (Rows 8–12), C2f outperformed the other methods. Only improved YOLOv5.
C3TR achieved consistent performance improvement with C2f and had a
The proposed YOLOv5 + Slim-Neck in this paper improved the
smaller parameter count. However, it showed poor performance when
performance from 55.8 % mAP (Row 1) to 56.7 % mAP (Row 8),
fused with other structures (Row 14).
demonstrating the effectiveness of the introduced attention mechanism
Next, as envisioned, the Decoupled Head from YOLOX was intro-
in enhancing accuracy and focusing on cracks. Furthermore, compared
duced to further optimize multi-scale and high-dimensional feature
to other attention mechanisms (Rows 2–7), Slim-Neck not only achieved
representation. The experimental results showed an improvement in
the highest accuracy but also lightweighted the network, reducing the
performance from 57.3 % mAP (Row 9) to 58.4 % mAP (Row 13). On the
parameter count from 7,023,610 (Row 1) to 5,846,490 (Row 8), which
other hand, the introduced IDetect Head (Row 15), although having a
corresponds to a reduction of approximately 16.8 % in parameters. In
smaller parameter count, led to a significant performance drop.
Following the introduction of the SPPCSPC structure (Row 16), Table 1
which primarily introduces 1x1 residual at the outermost layer and
Comparisons with other object detection methods on our dataset. The FPS is
effectively processes regular information and SPPF structure during
tested on a single Nvidia GTX 3080 GPU.
training, this design not only improves accuracy but also maintains method mAP@0.5 mAP@0.5:0.95 F1 time FPS
speed. Additionally, the SPPFCSPC [49] structure is introduced for
comparison (Row 17). Experimental results show that both structures Two-Stage Detector: Faster R-CNN/ 56.10 % 25.90 % \ 31.8 31
have the same number of parameters, but the mAP of SPPFCSPC is ResNet50 + FPN ms
slightly higher than SPPCSPC, while the F1 score is lower. [17]
Finally, the paper removes one layer of CBS between SPPCSPC and Faster R-CNN/ 45.50 % 20.00 % \ 18.1 55
the neck (Row 19), as well as between SPPFCSPC and the neck (Row 18). MobileNetV2[33] ms One-Stage Detector:
Experimental results validate the earlier hypothesis that this measure YOLOX[29] 52.82 % 27.47 % 51.28 5.58 179
not only simplifies the overall network structure and reduces model % ms
complexity but also achieves the best performance of 59.2 % mAP and u-YOLO (EM + EP) 55.40 % 25.1 % 45.43 27.7 36
58.8 % F1 (Row 19), surpassing 57.5 % mAP and 58.7 % F1 (Row 16). [32] % ms
This further demonstrates that this measure, while improving real-time YOLOv5[23] 55.76 % 26.01 % 56.51 8.7 ms 114 %
performance, also maintains the model’s sensitivity to details such as YOLOv6[49] 58.80 % 29.20 % \ 10.77 92
small cracks, thereby enhancing the accuracy of detection tasks. ms
Compared to YOLOv5 (Row 1), although there is an increase in model YOLOv7[30] 60.20 % 28.80 % 59.55 23.1 43
parameters and GFLOPs, this may be necessary for handling pavement % ms YOLOv8[50] 56.90 % 28.60 % 56.94 9.2 ms 108
crack detection tasks. This incremental change allows for the use of more %
complex models, thereby enhancing the model’s expressive power to
Proposed YOLOv5 59.17 % 28.17 % 58.76 11.7 85
better address challenges such as the ambiguity of tiny cracks and % ms
incomplete information extraction from onboard images, which are the 7 H. Hu et Measurement al. 229 (2024) 114443 Table 2
the Silu function achieved an mAP of 59.2 % and an F1 score of 58.8 %,
A detailed ablation study of YOLO-RCD (The unchanged parts are the original
which were significantly higher than those of Sigmoid (52.5 % and 53.8 YOLOv5 parts).
%), Relu (57.5 % and 57.6 %), LeakyRelu (57.7 % and 57.7 %), Row method P R mAP F1 Parameters GFLOPs
Hardwish (58.1 % and 58.0 %), and Mish (58.7 % and 58.1 %). This
further confirms the suitability of the Silu function for crack detection 1 YOLOv5 57.1 55.9 55.8 56.5 7,023,610 15.8 2 YOLOv5 + SE 57.7 55.4 55.8 56.5 7,066,618 15.8 tasks. [34] 3 YOLOv5 + 56.2 55.8 55.2 56.0 7,090,380 16.0
4.5.2. Ablation study of loss function CBAM [35]
For the selection of the loss function, the experiments were con- 4 YOLOv5 + 57.0 56.5 55.5 56.7 7,023,622 15.8
ducted as shown in Table 4. The experimental results demonstrated that ECA [36] 5 YOLOv5 + 57.4 57.5 56.8 57.4 8,762,874 17.2
the CIoU loss function used in this study achieved an mAP of 59.2 %, GAM [37]
which was higher than that of GIoU (58.1 %), DIoU (58.1 %), and EIoU 6 YOLOv5 + 57.0 55.1 55.5 56.0 7,023,802 15.8
(58.9 %). Additionally, CIoU outperformed other loss functions in terms Shuffle [38]
of precision (P) and F1 score, with only EIoU having the highest recall 7 YOLOv5 + 57.5 56.6 56.6 57.0 6,582,650 15.2 GSConv [28]
(R). As mentioned in Section 3.5, the CIoU loss function accurately 8 YOLOv5 + 59.2 55.8 56.7 57.4 5,846,490 12.6
measures the match between predicted and ground truth boxes, partic- Slim-Neck
ularly for small objects. This superiority over GIoU and DIoU can be [28]
attributed to the CIoU’s ability. On the other hand, EIoU incorporates 9 YOLOv5 + 56.4 58.4 57.3 57.4 7,085,530 16.3
Focal Loss to address the issue of imbalanced difficulty levels in samples. Slim-Neck + C2f
However, since the dataset used in this study is relatively balanced, EIoU 10 YOLOv5 + 59.0 56.6 57.3 57.8 5,847,258 12.4
slightly underperformed compared to CIoU. Slim-Neck + C3TR [39] 11 YOLOv5 + 57.0 56.6 56.6 56.8 5,354,842 12.2
4.6. Various crack detection results Slim-Neck + C3SPP
Table 5 presents the detection results for different classes of cracks. 12 YOLOv5 + 57.1 56.9 56.3 57.0 5,228,570 12.1 Slim-Neck +
The category “all” represents all types of cracks, where D00 represents C3Ghost [40]
longitudinal cracks, D10 represents lateral cracks, D20 represents alli- 13 YOLOv5 + 58.4 57.0 58.4 57.7 14,392,794 56.7
gator cracks, D40 represents potholes, and D50 represents blurred white Slim-Neck +
lines. From Table 5, it can be observed that the proposed improved C2f + det [29]
YOLOv5 model consistently outperforms YOLOv5 in terms of evaluation 14 YOLOv5 + 58.2 56.7 57.1 57.4 13,154,522 52.8
metrics for the “all” category. Throughout the 200 epochs, improved Slim-Neck +
YOLOv5 consistently leads in precision (P), recall (R), mAP, and F1, as C3TR + det
illustrated in Fig. 10. Additionally, when examining individual cate- 15 YOLOv5 + 52.3 56.8 49.1 54.5 7,086,516 16.3
gories of cracks, the improved YOLOv5 shows varying degrees of Slim-Neck + C2f + idet
improvement compared to YOLOv5, without any performance degra- [30]
dation. Notably, the most significant improvement is observed for D10, 16 YOLOv5 + 61.6 56.2 57.5 58.7 15,584,122 57.9
with a 5.9 % increase in precision, a 0.9 % increase in recall, a 4.8 % Slim-Neck +
increase in mAP, and a 2.5 % increase in F1. This signifies that our model C2f + det + SPPSCPS
is capable of better capturing and utilizing the unique features of lateral 17 YOLOv5 + 58.2 56.7 57.8 57.4 15,584,122 57.9
cracks, allowing for more accurate differentiation between lateral cracks Slim-Neck +
and other classes, thereby enhancing overall precision. These findings C2f + det +
further validate the effectiveness of the Slim-Neck architecture, as SPPFSCPS
referenced in our study. This structure focuses on the tiny details of 18 Row17-CBS 59.0 57.1 56.8 58.0 15,570,010 57.7 19 Row16-CBS 59.9 57.7 59.2 58.8 15,570,010 57.7
small cracks, contributing to addressing the issues present in the current (improved detection models. YOLOv5)
Compared to u-YOLO (EM + EP), it is evident from Table 5 and
Fig. 10 that its precision is significantly lower than YOLOv5 and
issues addressed in this paper.
Improved YOLOv5, while recall is much higher than both. This obser-
vation can be attributed to u-YOLO being an ensemble model, gener-
4.5.1. Ablation study of activation function
ating more prediction boxes with the primary aim of avoiding missed
For the selection of activation functions, the experiments were con-
detections. However, this also results in a higher false positive rate.
ducted as shown in Table 3. In Section 3.5, the Silu function was pro-
Additionally, the mAP value of u-YOLO is close to that of YOLOv5, while
posed for capturing fine features in crack images in the context of crack
the F1 score is slightly lower than YOLOv5. In conclusion, our proposed
detection tasks. To verify this proposition, five commonly used activa-
Improved YOLOv5 exhibits the best overall performance.
tion functions were compared. Based on the results presented in Table 3,
Fig. 11 illustrates the P-R curves for the two algorithms, providing a
more intuitive comparison between them. It is known that when the P-R
curve is closer to the top-right corner, it indicates that the model can Table 3
Ablation studies of the activation function. Table 4 Activation function P R mAP F1
Ablation studies of the loss function. Sigmoid 54.6 53.1 52.5 53.8 Relu [41] 57.5 57.7 57.5 57.6 Loss function P R mAP F1 LeakyRelu [42] 58.1 57.3 57.7 57.7 GIoU [46] 57.9 57.8 58.1 57.8 Hardswish [43] 59.2 56.9 58.1 58.0 DIoU [47] 59.5 57.0 58.1 58.2 Mish [44] 59.4 56.8 58.7 58.1 EIoU [48] 58.1 58.3 58.9 58.2 Silu [45] 59.9 57.7 59.2 58.8 CIoU [23] 59.9 57.7 59.2 58.8 8 H. Hu et Measurement al. 229 (2024) 114443 Table 5
4.7. Qualitative comparisons
Detection results of each class of crack. Class Labels P R mAP F1
Fig. 12 presents a qualitative comparison between improved
YOLOv5 and other state-of-the-art methods, including YOLOX [29],
u-YOLO（EM þ EP）:
YOLOv5, and u-YOLO (EM + EP) [32], on our dataset. To ensure a fair all 6020 33.1 72.4 55.4 45.4 D00 1140 27.5 64.6 46.6 38.6
comparison, we trained and tested the models with the same parame- D10 1139 25.6 61.4 37.8 36.1
ters, and placed the confidence scores above the predicted boxes for D20 1768 38.9 76.2 61.4 51.5 better visualization. D40 654 30.8 75.4 61.3 43.7
YOLOX, as a more advanced algorithm in 2021, has the advantage of D50 1319 42.5 84.2 69.6 56.5 YOLOv5:
a smaller network model and very fast processing speed. From Fig. 12 all 6020 57.1 55.9 55.8 56.5
(b), it can be observed that YOLOX demonstrates certain detection ca- D00 1140 51.1 49.2 47.3 50.1
pabilities for road surface cracks, although it may have some limitations D10 1139 49.6 33.6 37.1 40.1
in detecting small cracks. However, it accurately locates, sizes, and D20 1768 67.6 60.8 63.1 64.0
classifies the predicted boxes for other types of defects, resulting in D40 654 55.1 60.6 60.5 57.7 D50 1319 62.0 75.4 70.8 68.0
overall good performance with slightly lower precision. Improved YOLOv5:
In contrast, u-YOLO (EM + EP) achieves significant improvement in all 6020 59.9 57.7 59.2 58.8
precision. However, as an ensemble model, it tends to generate more D00 1140 55.2 50.8 50.2 52.9
predicted boxes, as shown in Fig. 12 (c), which can affect its precision. D10 1139 55.5 34.5 41.9 42.6 D20 1768 68.2 63.2 67.2 65.6
YOLOv5 further enhances the performance based on u-YOLO (EM + EP) D40 654 57.4 62.1 62.2 59.7
while significantly improving speed while maintaining accuracy. D50 1319 63.1 78.0 74.3 69.8
Although Fig. 12 (d) demonstrates overall good performance, YOLOv5
tends to have more false positives, marking cracks in blank areas, which
simultaneously maintain high precision and high recall during pre-
affects the overall precision of the model.
dictions. Specifically, the P-R curves for D00 and D10 in YOLOv5 exhibit
Among the four algorithms mentioned, the proposed method based
a concave shape, while those for D00 and D10 in improved YOLOv5 are
on improved YOLOv5 performs the best. Fig. 12 (e) illustrates that the
relatively flat and convex, indicating the greater extent of improvement
predicted boxes of the improved YOLOv5 algorithm are closest to the
for D10. As for the curves of other classes, they are all convex, but the
actual position and size distribution of defects, with the highest confi-
curves in improved YOLOv5 are more convex compared to YOLOv5.
dence scores. Compared to other common algorithms, our proposed
Moreover, the curves in YOLOv5 are relatively sparser compared to the
method shows more satisfactory results in defect recognition.
improved YOLOv5, implying that our detector achieves closer detection
performance across various types of cracks, making it more practical and 5. Conclusion valuable.
This paper is based on the study of road images captured by vehicle-
Fig. 10. Comparison of three algorithms. 9 H. Hu et Measurement al. 229 (2024) 114443
Fig. 11. P-R Curve of the two algorithms.
Fig. 12. Qualitative comparison results on our dataset. 10 H. Hu et Measurement al. 229 (2024) 114443
mounted smartphones, where the widely studied YOLO series faces XJZZ202103).
challenges such as the blurring of tiny cracks and incomplete informa-
tion extraction. Therefore, a method based on improved YOLOv5 is
Appendix A. Supplementary data
proposed for road surface crack detection. First, a dataset is reorganized.
Second, the specific issues encountered by the original YOLOv5 on this
Supplementary data to this article can be found online at https://doi.
dataset are addressed by introducing the Slim-Neck structure, C2f
org/10.1016/j.measurement.2024.114443.
structure, Decoupled Head, and SPPCSPC structure, along with the
adoption of the Silu activation function and CIoU loss function. These
Appendix C. Supplementary data
improvements optimize the aforementioned two issues from various
aspects, enhancing the accuracy and comprehensiveness of target object
Supplementary data to this article can be found online at https://doi.
feature extraction while achieving lightweight results and maintaining
org/10.1016/j.measurement.2024.114443.
superior model inference speed. Experimental results demonstrate that
the proposed method based on improved YOLOv5 exhibits high detec- References
tion accuracy and real-time performance on the curated dataset. Spe-
cifically, the mAP value reaches 59.17 % and the F1 score reaches 58.76
[1] G.H. Luo, J.T. Wang, J.W. Pan, A frame work for concrete crack monitoring using
%. In comparison to the original YOLOv5, the proposed method attains a
surface wave transmission method, Measurement. 218 (2023) 113211.
[2] G. Do˘gan, B. Ergen, A new mobile convolutional neural network-based approach
3.41 % increase in mAP and a 2.25 % increase in the F1 score. Addi-
for pixel-wise road surface crack detection, Measurement. 195 (2022) 111119.
tionally, the algorithm maintains a FPS of 85, enabling fast and accurate
[3] L. Yang, J. Fan, B. Huo, E. Li, Y. Liu, A nondestructive automatic defect detection
detection of road surface cracks. Compared to other leading algorithms
method with pixelwise segmentation, Knowl-Based Syst. 242 (2022) 108338.
[4] E.M. Thompson, A. Ranieri, S. Biasotti, M. Chicchon, I. Sipiran, M. Pham,
in this field, the proposed algorithm demonstrates significant advan-
T. Nguyen-Ho, H. Nguyen, M. Tran, Shrec,, Pothole and crack detection in the road
tages in various evaluation metrics. Although this method has made
pavement using images and RGB-D data, Comput. Graph-UK 107 (2022) (2022)
optimizations for the problem addressed in this paper, it still encounters 161–171.
[5] S. Liu, Y. Han, L. Xu, Recognition of road cracks based on multi-scale Retinex fused
some remaining issues. Future work can be pursued in the following two
with wavelet transform, Array. 15 (2022) 100193.
aspects: (1) During the validation of various images using the model, it
[6] H. Zhang, J. Li, F. Kang, J. Zhang, Monitoring depth and width of cracks in
was observed that the accuracy is lower under low-light conditions. To
underwater concrete structures using embedded smart aggregates, Measurement.
further enhance the performance of the proposed detector, future 204 (2022) 112078.
[7] H. Bae, Y.K. An, Computer vision-based statistical crack quantification for concrete
research will focus on improving detection accuracy under low-light
structures, Measurement. 211 (2023) 112632.
conditions. There are plans to explore and optimize image enhance-
[8] Y. Deng, J. Gui, H. Zhang, A. Taliercio, P. Zhang, S.H.F. Wong, A. Khan, L. Li,
ment techniques to address environments with insufficient illumination.
Y. Tang, X. Chen, Study on crack width and crack resistance of eccentrically
tensioned steel-reinforced concrete members prestressed by CFRP tendons, Eng.
Simultaneously, advanced algorithms for low-light image processing Struct. 252 (2022) 113651.
will be studied and integrated to improve detection accuracy under
[9] L. Song, H. Sun, J. Liu, Z. Yu, C. Cui, Automatic segmentation and quantification of
visually challenging conditions. (2) To further enhance the overall
global cracks in concrete structures based on deep learning, Measurement. 199 (2022) 111550.
performance of the model, we plan to introduce a new loss function that
[10] H. Zhang, Y. Chen, B. Liu, X. Guan, X. Le, Soft matching network with application
better aligns with the requirements of the experimental process. Spe-
to defect inspection, Knowl-Based Syst. 225 (2021) 107045.
cifically, we intend to incorporate a mechanism for dynamically
[11] M. Hu, Q. Hu, Design of basketball game image acquisition and processing system
based on machine vision and image processor, Microprocess. Microsy. 82 (2021)
adjusting weights within the loss function. The design of this mechanism 103904.
aims to adaptively adjust the weights of various components in the loss
[12] D. Ireri, E. Belal, C. Okinda, N. Makange, C. Ji, A computer vision system for defect
function based on the characteristics of input images, enabling more
discrimination and grading in tomatoes using machine learning and image
processing, Artif. Intell, Agri. 2 (2019) 28
effective adaptation to changes in different conditions. More details and –37.
[13] Y. Tang, Z. Huang, Z. Chen, M. Chen, H. Zhou, H. Zhang, J. Sun, Novel visual crack
code are available at https://github.com/dakehe/improved-YOLO
width measurement based on backbone double-scale features for improved
v5-and-vehicle-mounted-images.
detection automation, Eng. Struct 274 (2023) 115158.
[14] T. Yu, A. Zhu, Y. Chen, Efficient crack detection method for tunnel lining surface
cracks based on infrared images, J. Comput. Civil. Eng. 31 (3) (2017) 04016067.
CRediT authorship contribution statement
[15] R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate
object detection and semantic segmentation, in, in: Proceedings of the IEEE Hongwei Hu: Writing
Conference on Computer Vision and Pattern Recognition, 2014, pp. 580–587.
– review & editing, Supervision. Zirui Li:
[16] R. Girshick, Fast r-cnn, in: Proceedings of the IEEE international conference on
Writing – review & editing, Writing – original draft. Zhiyi He: Super-
computer vision. 2015, pp. 1440-1448.
vision, Project administration. Lei Wang: Supervision. Su Cao: Soft-
[17] S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards real-time object detection
ware. Wenhua Du: Supervision.
with region proposal networks, in: Advances in neural information processing systems. 2015, pp. 91-99.
[18] K. He, G. Gkioxari, P. Dollar, R. Girshick, Mask r-cnn, in: Proceedings of the IEEE
Declaration of competing interest
international conference on computer vision. 2017, pp. 2961-2969.
[19] B. Kim, S. Cho, Image-based concrete crack assessment using mask and region-
based convolutional neural network, Struct. Control. Hlth. 26 (8) (2019) e2381.
The authors declare that they have no known competing financial
[20] A. Dahou, A.O. Aseeri, A. Mabrouk, R.A. Ibrahim, M.A. Al-Betar, M.A. Elaziz,
interests or personal relationships that could have appeared to influence
Optimal Skin Cancer Detection Model Using Transfer Learning and Dynamic-
the work reported in this paper.
Opposite Hunger Games Search, Diagnostics. 13 (2023) 1579.
[21] M.A. Elaziz, A. Dahou, A. Mabrouk, S. El-Sappagh, A.O. Aseeri, An efficient
artificial rabbits optimization based on mutation strategy for skin cancer Data availability
prediction, Comput Biol Med. 163 (2023) 107154.
[22] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-
time object detection, in, in: Proceedings of the IEEE Conference on Computer
The authors do not have permission to share data.
Vision and Pattern Recognition, 2016, pp. 779–788.
[23] X. Zhu, S. Lyu, X. Wang, Q. Zhao, TPH-YOLOv5: Improved YOLOv5 based on Acknowledgments
transformer prediction head for object detection on drone-captured scenarios, in:
In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 2778–2788.
This work was supported by Hunan Provincial Key Research and
[24] Y. Li, H. Sun, Y. Hu, Y. Han, Electrode defect YOLO detection algorithm based on
Development Program (Grant No. 2022GK2058), the Natural Science
attention mechanism and multi-scale feature fusion, Control and Decision (2022).
Foundation of Hunan Province (Grant Nos. 2023JJ60158,
[25] P. Wu, A. Liu, J. Fu, X. Ye, Y. Zhao, Autonomous surface crack identification of
concrete structures based on an improved one-stage object detection algorithm,
2023JJ60546, 2023JJ50237, 2022JJ40477), and the Opening Project of
Eng. Struct. 272 (2022) 114962.
Shanxi Key Laboratory of Advanced Manufacturing Technology (No. 11 H. Hu et Measurement al. 229 (2024) 114443
[26] J. Zhang, S. Qian, C. Tan, Automated bridge surface crack detection and
[38] Q.L. Zhang, Y.B. Yang Sa-net, Shuffle attention for deep convolutional neural
segmentation using computer vision-based deep learning model, Eng. Appl. Artif.
networks, in: In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Intel. 115 (2022) 105225.
Speech and Signal Processing (ICASSP), 2021, pp. 2235–2239.
[27] Z. Xiaoxun, H. Xinyu, G. Xiaoxia, Y. Xing, X. Zixu, W. Yu, L. Huaxin, Research on
[39] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L. Kaiser, L.
crack detection method of wind turbine blade based on a deep learning method,
Polosukhin, Attention is all you need, in: Advances in neural information
Appl. Energ. 328 (2022) 120241.
processing systems, 2017, pp. 5998–6008.
[28] H. Li, J. Li, H. Wei, Z. Liu, Z. Zhan, Q. Ren, Slim-neck by GSConv: A better design
[40] X. Dong, S. Yan, C. Duan, A lightweight vehicles detection network model based on
paradigm of detector architectures for autonomous vehicles, arXiv: 2206.02424
YOLOv5, Eng Appl Artif Intel 113 (2022) 104914.
Available: https://arxiv.org/abs/2206.02424.
[41] B. Xu, N. Wang, T. Chen, M. Li, Empirical evaluation of rectified activations in
[29] Z. Ge, S. Liu, F. Wang, Z. Li, J. Sun, Yolox: Exceeding yolo series in 2021, arXiv:
convolutional network, arXiv: 1505.00853 Available: https://arxiv.org/abs/
2107.08430 Available: https://arxiv.org/abs/2107.08430. 1505.00853.
[30] C.Y. Wang, A. Bochkovskiy, H.Y.M. Liao, YOLOv7: Trainable bag-of-freebies sets
[42] A.L. Maas, A.Y. Hannun, A.Y. Ng, Rectifier nonlinearities improve neural network
new state-of-the-art for real-time object detectors, in, in: Proceedings of the IEEE/
acoustic models, In:proc. Icml. (2013).
CVF Conference on Computer Vision and Pattern Recognition, 2023,
[43] A. Howard, M. Sandler, G. Chu, L.C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, pp. 7464–7475.
R. Pang, V. Vasudevan, Q.V. Le, H. Adam, Searching for mobilenetv3, in, in:
[31] D. Arya, H. Maeda, S.K. Ghosh, D. Toshniwal, Y. Sekimoto, RDD2022: A multi-
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019,
national image dataset for automatic Road Damage Detection, arXiv: 2209.08538 pp. 1314–1324.
Available: https://arxiv.org/abs/2209.08538.
[44] D. Misra, Mish: A self regularized non-monotonic activation function, arXiv:
[32] V. Hegde, D. Trivedi, A. Alfarrarjeh, A. Deepak, S.H. Kim, C. Shahabi, Yet another
1908.08681 Available: https://arxiv.org/abs/1908.08681.
deep learning approach for road damage detection using ensemble learning, in: In:
[45] S. Elfwing, E. Uchibe, K. Doya, Sigmoid-weighted linear units for neural network
2020 IEEE International Conference on Big Data (big Data), 2020, pp. 5553–5558.
function approximation in reinforcement learning, Neural Networks 107 (2018)
[33] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.C. Chen, Mobilenetv 2: Inverted 3–11.
residuals and linear bottlenecks, in, in: Proceedings of the IEEE Conference on
[46] H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, S. Savarese, Generalized
Computer Vision and Pattern Recognition, 2018, pp. 4510–4520.
intersection over union: A metric and a loss for bounding box regression, in: In:
[34] J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, in, in: Proceedings of the
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
IEEE Conference on Computer Vision and Pattern Recognition, 2018,
Recognition, 2019, pp. 658–666. pp. 7132–7141.
[47] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, D. Ren, Distance-IoU loss: Faster and better
[35] S. Woo, J. Park, J.Y. Lee, I.S. Kweon, Cbam: Convolutional block attention module,
learning for bounding box regression, in: In: Proceedings of the AAAI Conference
in, in: Proceedings of the European Conference on Computer Vision (ECCV), 2018,
on Artificial Intelligence, 2020, pp. 12993–13000. pp. 3–19.
[48] Y.F. Zhang, W. Ren, Z. Zhang, Z. Jia, L. Wang, T. Tan, Focal and efficient IOU loss
[36] Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, Q. Hu, ECA-Net: Efficient channel attention
for accurate bounding box regression, Neurocomputing 506 (2022) 146–157.
for deep convolutional neural networks, in: In: Proceedings of the IEEE/CVF
[49] C. Li, L. Li, H. Jiang, K. Wen, Y. Geng, L. Li, Z. Ke, Q. Li, M. Cheng, W. Nie,
Conference on Computer Vision and Pattern Recognition, 2020, pp. 11534–11542.
YOLOv6: A single-stage object detection framework for industrial applications,
[37] D. Guo, Y. Shao, Y. Cui, Z. Wang, L. Zhang, C. Shen, Graph attention tracking, in,
arXiv:2209.02976 Available: https://doi.org/10.48550/arXiv.2209.02976.
in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
[50] F.M. Talaat, H. ZainEldin, An improved fire detection approach based on YOLO-v8
Recognition, 2021, pp. 9543–9552.
for smart cities, Neural Comput Appl 35 (28) (2023) 20939–20954. 12
Document Outline

Road surface crack detection method based on improved YOLOv5 and vehicle-mounted images
- 1 Introduction
- 2 Related work
  - 2.1 YOLOv5
  - 2.2 Problems of YOLOv5 in road surface crack detection on vehicle-mounted images
- 3 The proposed road surface crack detection method
  - 3.1 The specific steps of the proposed method
  - 3.2 The proposed improved YOLOv5
    - 3.2.1 Slim-Neck structure
    - 3.2.2 C2f structure
    - 3.2.3 Decoupled Head
    - 3.2.4 SPPCSPC structure
    - 3.2.5 Silu activation function and CIoU loss function
    - 3.2.6 Module integration optimization
- 4 Experiments
  - 4.1 Datasets
  - 4.2 Platform construction and model training
  - 4.3 Evaluation metric
  - 4.4 Comparisons with other state-of-the-art methods
  - 4.5 Ablation study
    - 4.5.1 Ablation study of activation function
    - 4.5.2 Ablation study of loss function
  - 4.6 Various crack detection results
  - 4.7 Qualitative comparisons
- 5 Conclusion
- CRediT authorship contribution statement
- Declaration of competing interest
- Data availability
- Acknowledgments
- Appendix A Supplementary data
- Appendix C Supplementary data
- References

Road surface crack detection method based on improved YOLOV5 and vehicle-mounted images

Tài liệu liên quan:

Ung dung game hoa trong cac chien dich MKT

Bao cao Chi so TMDT Viet Nam 2025

Thông tư quy định về việc phân quyền, phân cấp và phân định thẩm quyền quản lý nhà nước về giáo dục cho chính quyền địa phương

Nghị quyết về phát huy các giá trị di sản văn hóa gắn với phát triên du lịch bền vững tỉnh Khánh Hòa đến năm 2025, định hướng đến năm 2030

Quyết định phê duyệt Chiến lược phát triển du lịch Việt Nam đến năm 2030