CN112101169B - Attention mechanism-based road image target detection method and related equipment - Google Patents
Attention mechanism-based road image target detection method and related equipment Download PDFInfo
- Publication number
- CN112101169B CN112101169B CN202010936332.6A CN202010936332A CN112101169B CN 112101169 B CN112101169 B CN 112101169B CN 202010936332 A CN202010936332 A CN 202010936332A CN 112101169 B CN112101169 B CN 112101169B
- Authority
- CN
- China
- Prior art keywords
- target detection
- detection
- frame
- road image
- loss value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 284
- 230000007246 mechanism Effects 0.000 title claims abstract description 35
- 238000000034 method Methods 0.000 claims abstract description 28
- 230000010354 integration Effects 0.000 claims abstract description 26
- 238000012545 processing Methods 0.000 claims abstract description 21
- 230000002194 synthesizing effect Effects 0.000 claims abstract description 10
- 238000012549 training Methods 0.000 claims description 41
- 238000002372 labelling Methods 0.000 claims description 11
- 238000011156 evaluation Methods 0.000 claims description 4
- 238000005516 engineering process Methods 0.000 abstract description 4
- 238000010801 machine learning Methods 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 239000013598 vector Substances 0.000 description 4
- 230000015572 biosynthetic process Effects 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 235000019800 disodium phosphate Nutrition 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/53—Recognition of crowd images, e.g. recognition of crowd congestion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Human Computer Interaction (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the application belongs to the technical field of machine learning, is applied to the field of smart cities, and relates to a road image target detection method based on an attention mechanism, which comprises the steps of obtaining a road image to be detected; inputting the road image to be detected into a trained detection integrated frame and attention weight values corresponding to each target detection frame; performing target detection processing on the road image to be detected through each target detection frame, and outputting to obtain a pedestrian detection result; and synthesizing the pedestrian detection results according to the attention weight value to obtain a target detection result of the detection integration framework. The application also provides a road image target detection device based on the attention mechanism, a computer device and a storage medium. In addition, the application also relates to a block chain technology, and the technical problem that the target detection framework in the prior art is difficult to realize target detection by targeted set indexes can be solved by adopting the method.
Description
Technical Field
The present disclosure relates to the field of machine learning technologies, and in particular, to a method and apparatus for detecting a road image target based on an attention mechanism, a computer device, and a storage medium.
Background
With the rapid development of deep learning in recent years, the field of computer vision also makes a great breakthrough. The target detection is one of the core tasks in the field of computer vision, and the method is obviously improved compared with the traditional method under the addition of deep learning. The task of object detection is to find objects of interest in an image or video while detecting their location, size and class. The Yolo series and SSD series models in the prior art can adopt one stage algorithm to realize classification and regression, and only one CNN network is used for directly predicting the types and positions of different targets, so that the detection speed of the models is improved, but the precision is greatly lost. The performance evaluation index of the target detection can also be different from task to task, and different indexes are selected, including accuracy, precision, recall, mean Average Precision (mAP), intersection over-unit (IoU) and the like, and are often needed for different tasks. Meanwhile, the autopl technology is also applied in the field of object detection, for example, mnanet and NetAdapt are used in MobileNetV3 to coarsely search and fine tune the network. It can be said that there are many algorithms and indexes in the target detection field, the performance of each model may have a large difference between different tasks, and different indexes are required to be used for defining the performance of the model for different tasks, so that the target detection framework has difficulty in achieving the purpose of target detection with the target set in a targeted manner on a specific task.
Disclosure of Invention
Based on the above, the present application provides a method, an apparatus, a computer device and a storage medium for detecting a road image target based on an attention mechanism, so as to solve the technical problem that a target detection framework in the prior art is difficult to realize target detection with a target set in a targeted manner.
A method for road image target detection based on an attention mechanism, the method comprising:
acquiring a road image to be detected;
inputting the road image to be detected into a trained detection integrated frame, wherein the detection integrated frame comprises at least two target detection frames and attention weight values corresponding to the target detection frames;
performing target detection processing on the road image to be detected through each target detection frame, and outputting to obtain a pedestrian detection result;
and synthesizing the pedestrian detection results according to the attention weight value to obtain a target detection result of the detection integration framework.
A road image object detection apparatus based on an attention mechanism, the apparatus comprising:
the data module is used for acquiring a road image to be detected;
the input module is used for inputting the road image to be detected into a trained detection integration frame, wherein the detection integration frame comprises at least two target detection frames and attention weight values corresponding to the target detection frames;
the detection module is used for carrying out target detection processing on the road image to be detected through each target detection frame and outputting a pedestrian detection result;
and the comprehensive module is used for integrating the pedestrian detection results according to the attention weight value to obtain a target detection result of the detection integrated frame.
A computer device comprising a memory and a processor, and computer readable instructions stored in the memory and executable on the processor, which when executed by the processor implement the steps of the attention mechanism based road image object detection method described above.
A computer readable storage medium storing computer readable instructions which when executed by a processor implement the steps of the attention mechanism based road image target detection method described above.
According to the road image target detection method, the device, the computer equipment and the storage medium based on the attention mechanism, attention weight values are set for the selected target detection frames, then each target detection frame respectively carries out target detection processing on the input road image to be detected to obtain a detection result composed of a plurality of detection indexes, then the detection results output by each target detection frame are synthesized through the attention weight values to obtain the target detection result.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic view of an application environment of a road image object detection method based on an attention mechanism;
FIG. 2 is a flow chart of a road image object detection method based on an attention mechanism;
FIG. 3 is another flow chart of a road image object detection method based on an attention mechanism;
FIG. 4 is a schematic diagram of a road image object detection apparatus based on an attention mechanism;
FIG. 5 is a schematic diagram of a computer device in one embodiment.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description and claims of the present application and in the description of the figures above are intended to cover non-exclusive inclusions. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The road image target detection method based on the attention mechanism provided by the embodiment of the invention can be applied to an application environment shown in fig. 1. The application environment may include, among other things, a terminal 102, a network for providing a communication link medium between the terminal 102 and the server 104, and a server 104, which may include various connection types, such as wired, wireless communication links, or fiber optic cables, etc.
A user may interact with the server 104 through a network using the terminal 102 to receive or send messages, etc. The terminal 102 may have installed thereon various communication client applications such as web browser applications, shopping class applications, search class applications, instant messaging tools, mailbox clients, social platform software, and the like.
The terminal 102 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, moving picture experts compression standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture experts compression standard audio layer 4) players, laptop and desktop computers, and the like.
The server 104 may be a server that provides various services, such as a background server that provides support for pages displayed on the terminal 102.
It should be noted that, the road image object detection method based on the attention mechanism provided in the embodiments of the present application is generally executed by a server/terminal, and accordingly, the road image object detection device based on the attention mechanism is generally disposed in the server/terminal device.
The subject application is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The application can be applied to the field of smart cities, thereby promoting the construction of the smart cities. For example, in unmanned, traffic violation detection.
It should be understood that the number of terminals, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Wherein the terminal 102 communicates with the server 104 through a network. The road image to be detected sent by the terminal 102 is received and input into the trained detection integration frame, the road image to be detected is detected through each target detection frame in the detection integration frame, pedestrian detection results corresponding to each target detection frame are obtained, and then the pedestrian detection results are synthesized according to attention weight values corresponding to each target detection frame, so that the target detection results of the detection integration frame are obtained. The terminal 102 and the server 104 are connected through a network, which may be a wired network or a wireless network, where the terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.
In one embodiment, as shown in fig. 2, a road image object detection method based on an attention mechanism is provided, and the method is applied to the server in fig. 1 for illustration, and includes the following steps:
step 202, obtaining a road image to be detected.
In some embodiments, the technical scheme of the application can be applied to scenes such as unmanned driving, traffic violation monitoring and the like in a smart city; the road image to be detected can be a frame image in a video acquired by a camera, or can be a road image directly shot.
It should be noted that the above application scenario is merely an example, and besides the above application scenario, the technical solution of the present application may also be applied to other scenarios that detect objects of interest from images, and detect positions, sizes, and categories of the objects of interest at the same time, such as detecting positions, sizes, and men and women of pedestrians in the images; the number, position, size, etc. of animals in the image are detected.
And 204, inputting the road image to be detected into a trained detection integrated frame, wherein the detection integrated frame comprises at least two target detection frames and attention weight values corresponding to the target detection frames.
The detection integration framework comprises at least two selected target detection frameworks as subframes, and attention weight values corresponding to the subframes. Target detection frameworks such as YoloV5, efficientDet algorithm, SSD algorithm, etc.
The sum of attention weight values of the target detection frames is 1, the attention weight values are N-dimensional vectors, wherein N is the number of indexes of the results output by the target detection frames, and the indexes are also expressed by dimensions.
And 206, performing target detection processing on the road image to be detected through each target detection frame, and outputting to obtain a pedestrian detection result.
In some embodiments, each object detection framework performs object detection on the road image to be detected, taking an SSD algorithm as an example, and a backbone network of the SSD algorithm is based on a traditional image classification network, such as VGG, res net, and the like; the road image to be detected is firstly processed by a plurality of convolution layers and pooling layers to obtain a feature map, regression is needed on the basis of the feature map to obtain the position and the category of the road image to be detected, the final output comprises an object position detection frame, a center coordinate difference value of the detection frame and an intersection ratio (IoU), if whether the detected pedestrian or the vehicle is still needed to be judged, indexes such as accuracy, recall rate and the like are output, wherein IoU calculates the ratio of the intersection and the union of a predicted frame and a real frame, and the data is also called Jaccard index. For example, if the pedestrian is detected, the output result may be an indicator of the pedestrian detection result.
Correspondingly, each target detection result has a corresponding attention weight value, and the dimension of the attention weight value corresponds to the dimension of the output index.
And step 208, synthesizing the pedestrian detection results according to the attention weight values to obtain a target detection result of the detection integration framework.
The final target detection result of the detection integration frame can be obtained according to the attention weight value and the output pedestrian detection result, and specifically, the target detection result can be expressed by a formula (1):
wherein O is total For target detection result, O j And detecting the pedestrian detection result of the frame as the j-th target. A single target detection framework may be accurate for prediction of a certain type of picture, other predictions may be inaccurate, and the detection integration framework may aggregate the advantages of multiple subframes. For example, SSD algorithm runs faster than Yolo algorithm blocks, but parameters of candidate boxes in the network cannot be obtained through learning, and require manual setting, and the size and shape of the candidate boxes used by each layer of feature in the network are just less uniform, which results in a debugging process that is very experience-dependent. Therefore, the advantages of other target detection frameworks can be combined, and the defects of the SSD algorithm are supplemented.
In the road image target detection method based on the attention mechanism, attention weight values are set for the selected target detection frames, then each target detection frame respectively carries out target detection processing on the input road image to be detected to obtain a detection result composed of a plurality of detection indexes, and then the detection results output by each target detection frame are synthesized through the attention weight values to obtain the target detection result.
In some embodiments, as shown in fig. 3, prior to step 204, further comprising:
step 302, a road sample image is acquired, wherein the road sample image comprises pedestrian labeling information.
The pedestrian labeling information can be index information such as a position block diagram, a size, a category (such as pedestrians) and the like of the labeled object in the image.
Step 304, selecting at least two target detection frameworks according to the target detection task.
In some embodiments, if it is a pedestrian detection, then an algorithm such as YoloV5, efficientDet, SSD512 may be selected as the target detection framework.
And 306, comprehensively obtaining the integrated framework to be trained according to the training weight values of the target detection frameworks which are randomly set.
At the first iteration, the values of the training weight values may be set randomly, for example, a gaussian distribution may be used to randomly assign attention weight values to the selected N target detection frames. If the output index of a target detection frame is a three-dimensional vector, [ A, B, C ], the corresponding attention weight value can be [0.1,0.5,0.4], and the purpose of the target detection frame is to control the proportion of each index in output.
In step 308, sample detection processing is performed on the road sample image through each target detection frame, so as to obtain a first sample detection result of each target detection frame.
The first sample detection result is an output of each target detection frame, such as a position block diagram, a size, a center block diagram deviation value, and the like of the pedestrian, to which attention weight calculation has not been added.
Step 310, calculating a first loss value of the integrated frame to be trained according to the pedestrian labeling information, the first sample detection result and the training weight value.
In some embodiments, specifically, the loss value of the index in each pedestrian labeling information of each target detection frame may be calculated according to the first sample detection result and the pedestrian labeling information to obtain a second loss value, which may be denoted as an N-dimensional vector and denoted as L j . And then synthesizing the second loss value and the training weight value to obtain a first loss value of the integrated framework to be trained.
Specifically, the second loss value may be directly multiplied by the training weight value to obtain a first loss value of the integrated frame to be trained.
Optionally, the second loss value and the preset index weight are subjected to element product, and then multiplied by the training weight value to obtain the first loss value.
The preset index weight is an index weight customized by a user according to a task scene, for example, in an automatic driving scene, the index weight can be set for each index, for example, the obtained index weight of the cross-point ratio is set to be higher than the index weight of a recall rate and a central block diagram deviation value, wherein the recall rate is specific to a sample image and indicates how many positive examples in the sample are predicted correctly.
For example, the second loss function is [1,2,3], the user-defined index weight is [0.6,0.3,0.1], and then the element product of the user-defined index weight and the second loss function is: [1,2,3 ]. Sup. 0.6,0.3,0.1] = [0.6,0.6,0.3], and then, the obtained result and the training weight vector corresponding to each target detection frame are multiplied and summed to obtain a first loss value.
Specifically, it can be expressed by the formula (2):
wherein E is a first loss value, W j Defining index weight for user j Is a training weight value after softmax treatment.
In some embodiments, the loss value of each target detection frame relative to the real sample data is calculated through the custom index weight under different application scenes, so that the pertinence of the obtained result is stronger, the pertinence of the calculated loss value is better, and the predicted result is more accurate.
And step 312, when the first loss value is greater than the preset threshold, adjusting the training weight value of each target detection frame according to the first loss value, and repeating the operations of sample detection processing and calculating the first loss value of the integrated frame to be trained on the road sample image until the first loss value is not greater than the preset threshold or the number of times of adjusting the training weight value exceeds the preset number of times, outputting the current training weight value, thereby obtaining the trained detection integrated frame.
Specifically, it can be according to formula (3):
adjusting training weight values, wherein w on the left of the equal sign i For the training weight value of the ith target detection frame after the current adjustment, w on the right side of the equal sign i And detecting the training weight value and the first loss value of the frame for the ith target before the current adjustment.
After updating the training weight value, the training weight value can be normalized, so that subsequent calculation is facilitated. For example, the process may be performed by Softmax, by equation (3):
wherein S is i Is training weight value after softmax treatment, w i Training weight values for the ith target detection frame prior to softmax processing.
Further, in some embodiments, the detection integration framework may be optimized based on an autopl neural network architecture. Further, to reduce the amount of calculation, an algorithm AutoFusion may be used to search for a convolution kernel in a network (typically a convolutional neural network) for extracting image features in each target detection frame, that is, a possible combination is tried in a concentrated convolution form (no connection, skip connection,3×3 linked conv,5×5 linked conv) in the search space, and the combination with the highest maps is selected, so as to obtain an optimized target detection frame. Wherein, mAP is the evaluation index of the target detection model. By the method, the target detection frame can be initially optimized, the iteration times are reduced, and the calculated amount is reduced.
It should be emphasized that, to further ensure the privacy and safety of the road image information to be detected, the road image to be detected may also be stored in a node of a blockchain.
In some embodiments, the technical solution of the present application applies the attention mechanism and the automatic machine learning to the target detection frame by using the integration mode of the target detection frame based on the attention mechanism, so as to integrate the target detection frame, and the integration frame can integrate the advantages of multiple frames, and update the frame parameters according to the index weight and the training weight value defined by the user, so that the detection integration frame can obtain targeted optimization in a specific application scenario, and has very strong basic performance. According to the method and the device, the problems of multiple frames in the field of target detection frames, long time consumption and complex operation of frame adjustment selection can be effectively solved, so that a task based on target detection can inherit a plurality of excellent advanced models or algorithms in the industry, and the task can be adjusted in a targeted mode to adapt to a service mode.
It should be understood that, although the steps in the flowcharts of fig. 2-3 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 2-3 may include multiple sub-steps or phases that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the sub-steps or phases are performed need to be sequential, but may be performed in turn or alternately with at least a portion of the sub-steps or phases of other steps or steps.
In one embodiment, as shown in fig. 4, there is provided a road image object detection apparatus based on an attention mechanism, which corresponds one-to-one to the road image object detection method based on an attention mechanism in the above embodiment. The road image object detection device based on the attention mechanism comprises:
a data module 402, configured to acquire an image of a road to be detected.
The input module 404 is configured to input the road image to be detected into a trained detection integration frame, where the detection integration frame includes at least two target detection frames and attention weight values corresponding to the target detection frames.
And the detection module 406 is configured to perform target detection processing on the road image to be detected through each target detection frame, and output a pedestrian detection result.
And a synthesis module 408, configured to synthesize each pedestrian detection result according to the attention weight value, so as to obtain a target detection result of the detection integration framework.
Further, before the input module 404, the method further includes:
the system comprises a sample module, a detection module and a detection module, wherein the sample module is used for acquiring a road sample image, and the road sample image comprises pedestrian annotation information;
the selection module is used for selecting at least two target detection frames according to the target detection task;
the comprehensive module is used for comprehensively obtaining an integrated frame to be trained according to the training weight values of the target detection frames which are randomly set;
the calculation module is used for carrying out sample detection processing on the road sample image through each target detection frame to obtain a first sample detection result of each target detection frame; and is combined with
The loss module is used for calculating a first loss value of the integrated frame to be trained according to the pedestrian labeling information, the first sample detection result and the training weight value;
and the iteration module is used for adjusting the training weight value of each target detection frame according to the first loss value when the first loss value is larger than a preset threshold value, repeating the operations of carrying out sample detection processing on the road sample image and calculating the first loss value of the integrated frame to be trained until the first loss value is not larger than the preset threshold value or the frequency of adjusting the training weight value exceeds the preset frequency, and outputting the current training weight value to obtain the trained detection integrated frame.
Further, the loss module includes:
the loss submodule is used for calculating a second loss value of each target detection frame according to the pedestrian labeling information and the first sample detection result;
and the synthesis sub-module is used for synthesizing the second loss value and the training weight value to obtain a first loss value of the integrated framework to be trained.
It should be emphasized that, to further ensure the privacy and safety of the road image information to be detected, the road image to be detected may also be stored in a node of a blockchain.
According to the road image target detection device based on the attention mechanism, attention weight values are set for the selected target detection frames, then each target detection frame respectively carries out target detection processing on the input road image to be detected to obtain a detection result composed of a plurality of detection indexes, then the detection results output by each target detection frame are synthesized through the attention weight values to obtain the target detection result, the advantages of the plurality of target detection frames can be integrated, the attention weight values can be adaptively updated under different application scenes, the detection integration frame can obtain targeted detection on specific tasks, the problems that the frames in the target detection field are numerous, the frame is selected, the optimal time consumption is long and the operation is complex can be effectively solved, the advantages of the plurality of frames can be integrated, and the technical problem that the target detection frames are difficult to realize targeted set indexes to carry out target detection in the prior art is solved.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 5. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for the execution of an operating system and computer-readable instructions in a non-volatile storage medium. The database of the computer device is used for storing road images to be detected. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer readable instructions when executed by a processor implement a method for road image object detection based on an attention mechanism. The attention weight value is set for the selected target detection frames, then each target detection frame respectively carries out target detection processing on the input road image to be detected to obtain a detection result composed of a plurality of detection indexes, then the detection results output by each target detection frame are synthesized through the attention weight value to obtain the target detection result, the advantages of the target detection frames can be integrated, the attention weight value can be adaptively updated in different application scenes, the detection integration frame can obtain targeted detection on specific tasks, the problems that the frames in the target detection field are numerous, the frame is selected, the optimal time is consumed and the operation is complex can be effectively solved, the advantages of the frames are integrated, and the technical problem that the target detection frame is difficult to realize the target detection with the targeted set indexes in the prior art is solved.
It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculations and/or information processing in accordance with predetermined or stored instructions, the hardware of which includes, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (fields-Programmable Gate Array, FPGAs), digital processors (Digital Signal Processor, DSPs), embedded devices, etc.
In one embodiment, a computer readable storage medium is provided, on which computer readable instructions are stored, which when executed by a processor, implement the steps of the attention mechanism based road image object detection method in the above embodiment, such as steps 202 to 208 shown in fig. 2, or the processor, when executing the computer readable instructions, implement the functions of the modules/units of the attention mechanism based road image object detection apparatus in the above embodiment, such as the functions of the modules 402 to 408 shown in fig. 4. According to the method, attention weight values are set for a plurality of selected target detection frames, then each target detection frame respectively carries out target detection processing on an input road image to be detected to obtain a detection result composed of a plurality of detection indexes, then the detection results output by each target detection frame are integrated through the attention weight values to obtain the target detection result, the advantages of the plurality of target detection frames can be integrated, the attention weight values can be adaptively updated in different application scenes, the detection integration frame can obtain targeted detection on specific tasks, the problems that the frames in the target detection field are numerous, the frames are selected, the time consumption is long and the operation is complex are effectively solved, the advantages of the plurality of frames are integrated, and the technical problem that the target detection frame is difficult to realize target detection with targeted set indexes in the prior art is solved.
Those skilled in the art will appreciate that implementing all or part of the processes of the methods of the embodiments described above may be accomplished by instructing the associated hardware by computer readable instructions stored on a non-transitory computer readable storage medium, which when executed may comprise processes of embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The blockchain referred to in the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that, for those skilled in the art, several modifications, improvements or equivalent substitutions for some technical features may be made without departing from the concept of the present application, and these modifications or substitutions do not make the essence of the same technical solution deviate from the spirit and scope of the technical solution of the embodiments of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.
Claims (7)
1. A method for detecting a road image target based on an attention mechanism, the method comprising:
acquiring a road image to be detected;
inputting the road image to be detected into a trained detection integrated frame, wherein the detection integrated frame comprises at least two target detection frames and attention weight values corresponding to the target detection frames;
performing target detection processing on the road image to be detected through each target detection frame, and outputting to obtain a pedestrian detection result;
synthesizing each pedestrian detection result according to the attention weight value to obtain a target detection result of the detection integration framework;
wherein, before the target detection processing is performed on the road image to be detected through each target detection frame and the pedestrian detection result is output, the method further comprises:
searching the form of convolution kernel in the network for extracting the image features in each target detection frame by using an algorithm Autofusion, namely, trying possible combinations in a concentrated convolution form in a search space, and selecting the combination with the highest mAP to obtain an optimized target detection frame; wherein, mAP is the evaluation index of the target detection model;
before the road image to be detected is input into the trained detection integration frame, the method further comprises the following steps:
acquiring a road sample image, wherein the road sample image comprises pedestrian labeling information;
selecting at least two target detection frames according to the target detection task;
according to the training weight values of the target detection frames which are randomly set, comprehensively obtaining an integrated frame to be trained;
sample detection processing is carried out on the road sample image through each target detection frame, and a first sample detection result of each target detection frame is obtained; and is combined with
Calculating a first loss value of the integrated frame to be trained according to the pedestrian labeling information, the first sample detection result and the training weight value;
when the first loss value is larger than a preset threshold value, adjusting the training weight value of each target detection frame according to the first loss value, repeating the operations of carrying out sample detection processing on the road sample image and calculating the first loss value of the integrated frame to be trained until the first loss value is not larger than the preset threshold value or the frequency of adjusting the training weight value exceeds the preset frequency, and outputting the current training weight value to obtain a trained detection integrated frame;
wherein said adjusting training weight values of each of said target detection frames according to said first loss value comprises:
according to the formula:
adjusting the training weight value, wherein w i ' training weight value and w of ith target detection frame after current adjustment i The training weight value of the ith target detection frame before the current adjustment is set as the first loss value;
wherein the step of synthesizing each pedestrian detection result according to the attention weight value to obtain a target detection result of the detection integration frame includes:
according to the formula:
obtaining the target detection result, wherein O total For target detection result, O j For the pedestrian detection result of the jth target detection frame, S j Is a training weight value after softmax treatment.
2. The method of claim 1, wherein the calculating the first loss value of the integrated frame to be trained based on the pedestrian labeling information, the first sample detection result, and the training weight value comprises:
calculating a second loss value of each target detection frame according to the pedestrian labeling information and the first sample detection result;
and synthesizing the second loss value and the training weight value to obtain a first loss value of the integrated framework to be trained.
3. The method of claim 2, wherein the synthesizing the second loss value and the training weight value to obtain the first loss value comprises:
multiplying the second loss value by the training weight value to obtain a first loss value of the integrated framework to be trained.
4. The method according to claim 2, wherein said synthesizing the second loss value and the training weight value to obtain the first loss value of the integrated frame to be trained comprises:
and multiplying the element product of the second loss value and a preset index weight by the training weight value to obtain the first loss value.
5. A road image object detection apparatus based on an attention mechanism, which performs the road image object detection method of an attention mechanism according to any one of claims 1 to 4 at a time of operation, characterized in that the road image object detection apparatus based on an attention mechanism comprises:
the data module is used for acquiring a road image to be detected;
the input module is used for inputting the road image to be detected into a trained detection integration frame, wherein the detection integration frame comprises at least two target detection frames and attention weight values corresponding to the target detection frames;
the detection module is used for carrying out target detection processing on the road image to be detected through each target detection frame and outputting a pedestrian detection result;
the comprehensive module is used for integrating the pedestrian detection results according to the attention weight value to obtain a target detection result of the detection integrated frame;
wherein, the road image target detection device based on the attention mechanism further comprises:
searching the form of convolution kernel in the network for extracting the image features in each target detection frame by using an algorithm Autofusion, namely, trying possible combinations in a concentrated convolution form in a search space, and selecting the combination with the highest mAP to obtain an optimized target detection frame; wherein, mAP is the evaluation index of the target detection model.
6. A computer device comprising a memory storing computer readable instructions and a processor, wherein the processor when executing the computer readable instructions performs the steps of the method of any one of claims 1 to 4.
7. A computer readable storage medium having stored thereon computer readable instructions, which when executed by a processor, implement the steps of the method of any of claims 1 to 4.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010936332.6A CN112101169B (en) | 2020-09-08 | 2020-09-08 | Attention mechanism-based road image target detection method and related equipment |
PCT/CN2020/125412 WO2021151336A1 (en) | 2020-09-08 | 2020-10-30 | Road image target detection method based on attentional mechanism and related device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010936332.6A CN112101169B (en) | 2020-09-08 | 2020-09-08 | Attention mechanism-based road image target detection method and related equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112101169A CN112101169A (en) | 2020-12-18 |
CN112101169B true CN112101169B (en) | 2024-04-05 |
Family
ID=73751664
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010936332.6A Active CN112101169B (en) | 2020-09-08 | 2020-09-08 | Attention mechanism-based road image target detection method and related equipment |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112101169B (en) |
WO (1) | WO2021151336A1 (en) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112699848B (en) * | 2021-01-15 | 2022-05-31 | 上海交通大学 | Counting method and system for dense crowd of image |
CN113808200B (en) * | 2021-08-03 | 2023-04-07 | 嘉洋智慧安全科技(北京)股份有限公司 | Method and device for detecting moving speed of target object and electronic equipment |
CN113657382B (en) * | 2021-08-24 | 2024-03-01 | 凌云光技术股份有限公司 | Method and device for selecting optimal detection model in target detection task |
CN113752255B (en) * | 2021-08-24 | 2022-12-09 | 浙江工业大学 | Mechanical arm six-degree-of-freedom real-time grabbing method based on deep reinforcement learning |
CN113822185A (en) * | 2021-09-09 | 2021-12-21 | 安徽农业大学 | Method for detecting daily behavior of group health pigs |
CN113850770B (en) * | 2021-09-17 | 2024-07-02 | 华中科技大学 | Image AI digital detection method and device for shield tunnel and shock insulation tunnel |
CN113989487A (en) * | 2021-10-20 | 2022-01-28 | 国网山东省电力公司信息通信公司 | Fault defect detection method and system for live-action scheduling |
CN114037684B (en) * | 2021-11-08 | 2024-06-14 | 南京信息工程大学 | Defect detection method based on yolov and attention mechanism model |
CN114494815B (en) * | 2022-01-27 | 2024-04-09 | 北京百度网讯科技有限公司 | Neural network training method, target detection method, device, equipment and medium |
CN114663706A (en) * | 2022-03-25 | 2022-06-24 | 北京易华录信息技术股份有限公司 | Road asset detection model construction and road asset detection method |
CN114821575B (en) * | 2022-05-19 | 2024-08-13 | 湖北工业大学 | Refrigerator food material detection method and device based on attention mechanism and ensemble learning |
CN114677597B (en) * | 2022-05-26 | 2022-10-11 | 武汉理工大学 | Gear defect visual inspection method and system based on improved YOLOv5 network |
CN115049878B (en) * | 2022-06-17 | 2024-05-03 | 平安科技(深圳)有限公司 | Target detection optimization method, device, equipment and medium based on artificial intelligence |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018089158A1 (en) * | 2016-11-10 | 2018-05-17 | Qualcomm Incorporated | Natural language object tracking |
CN108171141A (en) * | 2017-12-25 | 2018-06-15 | 淮阴工学院 | The video target tracking method of cascade multi-pattern Fusion based on attention model |
WO2018121690A1 (en) * | 2016-12-29 | 2018-07-05 | 北京市商汤科技开发有限公司 | Object attribute detection method and device, neural network training method and device, and regional detection method and device |
CN110276269A (en) * | 2019-05-29 | 2019-09-24 | 西安交通大学 | A kind of Remote Sensing Target detection method based on attention mechanism |
CN110765886A (en) * | 2019-09-29 | 2020-02-07 | 深圳大学 | Road target detection method and device based on convolutional neural network |
CN111062413A (en) * | 2019-11-08 | 2020-04-24 | 深兰科技(上海)有限公司 | Road target detection method and device, electronic equipment and storage medium |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109726771B (en) * | 2019-02-27 | 2023-05-02 | 锦图计算技术(深圳)有限公司 | Abnormal driving detection model building method, device and storage medium |
-
2020
- 2020-09-08 CN CN202010936332.6A patent/CN112101169B/en active Active
- 2020-10-30 WO PCT/CN2020/125412 patent/WO2021151336A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018089158A1 (en) * | 2016-11-10 | 2018-05-17 | Qualcomm Incorporated | Natural language object tracking |
WO2018121690A1 (en) * | 2016-12-29 | 2018-07-05 | 北京市商汤科技开发有限公司 | Object attribute detection method and device, neural network training method and device, and regional detection method and device |
CN108171141A (en) * | 2017-12-25 | 2018-06-15 | 淮阴工学院 | The video target tracking method of cascade multi-pattern Fusion based on attention model |
CN110276269A (en) * | 2019-05-29 | 2019-09-24 | 西安交通大学 | A kind of Remote Sensing Target detection method based on attention mechanism |
CN110765886A (en) * | 2019-09-29 | 2020-02-07 | 深圳大学 | Road target detection method and device based on convolutional neural network |
CN111062413A (en) * | 2019-11-08 | 2020-04-24 | 深兰科技(上海)有限公司 | Road target detection method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112101169A (en) | 2020-12-18 |
WO2021151336A1 (en) | 2021-08-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112101169B (en) | Attention mechanism-based road image target detection method and related equipment | |
CN109522942B (en) | Image classification method and device, terminal equipment and storage medium | |
CN111950638B (en) | Image classification method and device based on model distillation and electronic equipment | |
EP3893125A1 (en) | Method and apparatus for searching video segment, device, medium and computer program product | |
CN110162993B (en) | Desensitization processing method, model training device and computer equipment | |
CN110008397B (en) | Recommendation model training method and device | |
CN116249991A (en) | Neural network distillation method and device | |
CN113095346A (en) | Data labeling method and data labeling device | |
CN114550053A (en) | Traffic accident responsibility determination method, device, computer equipment and storage medium | |
CN113095475A (en) | Neural network training method, image processing method and related equipment | |
CN113807399A (en) | Neural network training method, neural network detection method and neural network detection device | |
CN113240071B (en) | Method and device for processing graph neural network, computer equipment and storage medium | |
WO2024041483A1 (en) | Recommendation method and related device | |
CN112016502B (en) | Safety belt detection method, safety belt detection device, computer equipment and storage medium | |
CN111652245B (en) | Vehicle contour detection method, device, computer equipment and storage medium | |
CN112329762A (en) | Image processing method, model training method, device, computer device and medium | |
CN116684330A (en) | Traffic prediction method, device, equipment and storage medium based on artificial intelligence | |
CN110555861B (en) | Optical flow calculation method and device and electronic equipment | |
CN117217284A (en) | Data processing method and device | |
CN113627421B (en) | Image processing method, training method of model and related equipment | |
CN116894802B (en) | Image enhancement method, device, computer equipment and storage medium | |
CN114241411A (en) | Counting model processing method and device based on target detection and computer equipment | |
US20230410465A1 (en) | Real time salient object detection in images and videos | |
CN110378936B (en) | Optical flow calculation method and device and electronic equipment | |
CN115392361A (en) | Intelligent sorting method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |