CN115620022A

CN115620022A - Object detection method, device, equipment and storage medium

Info

Publication number: CN115620022A
Application number: CN202211131694.3A
Authority: CN
Inventors: 吴肖; 汪浩; 黄文涵
Original assignee: China Automotive Innovation Co Ltd
Current assignee: China Automotive Innovation Co Ltd
Priority date: 2022-09-15
Filing date: 2022-09-15
Publication date: 2023-01-17

Abstract

The invention discloses an object detection method, a device, equipment and a storage medium, wherein the method comprises the steps of carrying out target detection processing on an image to be processed to obtain an initial detection frame corresponding to a target object; performing area expansion on the initial area image based on preset proportion information to obtain an initial expanded area image corresponding to the initial detection frame; performing area mapping processing on the initial expansion area image to obtain a mapping area image containing a mapping detection frame; obtaining target offset information of the target object compared with the mapping detection frame based on the mapping area image; performing offset processing on the mapping detection frame based on the target offset information, and determining the target detection frame containing the target object; the method and the device can improve the precision of the target object detection frame so that the target object detection frame is attached to the edge of the target object, and further improve the target object detection precision.

Description

Object detection method, device, equipment and storage medium

Technical Field

The invention relates to the field of image detection, in particular to an object detection method, device, equipment and storage medium.

Background

To further improve the accuracy of target detection, a higher definition camera (e.g., 800 ten thousand pixels) is typically used. At present, the resolution of images shot by cameras commonly used in the field of automatic driving is 1920 × 1080, 3840 × 2160 and the like, and when a target detection network based on deep learning is used for vehicle target detection, images are often required to be scaled (such as 640 × 640) to reduce the calculated amount, so as to ensure the inference speed of the network; because the zoomed far target is small, for a far vehicle, the target detection frame may have a certain deviation, cannot be well attached to the edge of the vehicle, is easy to be misdetected, and cannot accurately identify other information such as a vehicle type, a license plate and the like while detecting, a method for improving the precision of the vehicle detection frame is required to be provided.

Disclosure of Invention

In order to overcome the defects and shortcomings in the prior art, the invention discloses an object detection method, device, equipment and storage medium, which can improve the precision of a target object detection frame, so that the target object detection frame is attached to the edge of a target object, and the target object detection precision is further improved. The method comprises the following steps:

carrying out target detection processing on the image to be processed to obtain an initial detection frame corresponding to a target object;

performing area expansion on the initial area image based on preset proportion information to obtain an initial expanded area image containing the initial detection frame; the initial region image is an image region corresponding to the initial detection frame in the image to be processed;

performing area mapping processing on the initial expansion area image to obtain a mapping area image containing a mapping detection frame; the size of the mapping area image is a preset image size, and the size of the mapping detection frame is a preset detection frame size; the ratio information of the preset image size and the preset detection frame size meets the preset ratio information;

obtaining target offset information of the target object compared with the mapping detection frame based on the mapping area image;

performing offset processing on the mapping detection frame based on the target offset information, and determining the target detection frame containing the target object; the target detection box fits the target object edge better than the mapping detection box.

Further, the method further comprises:

determining an offset interval based on the size information of the mapping detection frame and preset proportion information; the preset proportion information parameter is determined based on ratio information of the size of the initial detection frame and the size of the initial area image;

the obtaining of the target offset information of the target detection frame object compared with the mapping detection frame based on the mapping region image includes:

obtaining the target offset information based on the mapping area image and the offset interval of the mapping detection frame; the target offset information is located within the offset interval.

Further, the obtaining the target offset information based on the mapping region image and the offset section of the mapping detection frame includes:

acquiring an object detection network; the object detection network comprises a feature extraction layer and an output layer;

performing feature extraction on the mapping region image based on the feature extraction layer to obtain image feature information;

and inputting the image characteristic information and preset proportion information into the output layer, and performing data processing based on the output layer to obtain the target offset information.

Further, the feature extraction layer comprises a plurality of levels of feature extraction layers, wherein a low level of feature extraction layer is used for extracting feature information corresponding to a first identification type, and a high level of feature extraction layer is used for extracting feature information corresponding to a second identification type; the scale of the first identification type in the image to be processed is larger than that of the second identification type in the image to be processed;

the extracting the feature of the mapping region image based on the feature extraction layer to obtain image feature information includes:

determining a target hierarchy based on the type of the target object;

and performing feature extraction on the mapping region image based on the feature extraction layer of the target level to obtain the image feature information.

Further, the target hierarchy comprises a first hierarchy corresponding to the first identified type, and a second hierarchy corresponding to the second identified type; the image characteristic information comprises first image characteristic information output by the first level and second image characteristic information output by the second level;

the feature extraction layer based on the target hierarchy performs feature extraction on the mapping region image to obtain the image feature information, and the method further includes:

performing data processing on the first image characteristic information based on an output layer corresponding to the first level to obtain the target offset information;

and performing object recognition on the second image characteristic information based on the output layer corresponding to the second hierarchy to obtain a sub-object contained in the target object.

In another aspect, the present application further provides an object detecting apparatus, including:

the processing module is used for carrying out target detection processing on the image to be processed to obtain an initial detection frame corresponding to a target object;

the area expansion module is used for carrying out area expansion on the initial area image based on preset proportion information to obtain an initial expansion area image containing the initial detection frame; the initial region image is an image region corresponding to the initial detection frame in the image to be processed;

the area mapping module is used for carrying out area mapping processing on the initial expansion area image to obtain a mapping area image containing a mapping detection frame; performing area mapping processing on the initial expansion area image to obtain a mapping area image containing a mapping detection frame; the size of the mapping area image is a preset image size, and the size of the mapping detection frame is a preset detection frame size; the ratio information of the preset image size and the preset detection frame size meets the preset ratio information;

a target offset information generation module, configured to obtain target offset information of the target object compared to the mapping detection frame based on the mapping region image;

the offset processing module is used for carrying out offset processing on the mapping detection frame based on the target offset information and determining the target detection frame containing the target object; the target detection box fits the target object edge more closely than the mapping detection box.

In a third aspect, the present application further provides an electronic device, which includes a processor and a memory, where at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the memory, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement an object detection method as described above.

In a fourth aspect, the present application further provides a computer-readable storage medium having at least one instruction, at least one program, code set, or instruction set stored therein, the at least one instruction, at least one program, code set, or instruction set being loaded by a processor and executing an object detection method as described above.

The implementation of the invention has the following beneficial effects:

the method comprises the steps of carrying out object detection processing on an image to be processed to obtain an initial detection frame corresponding to a target object, and carrying out region expansion on an initial region image to obtain an initial expanded image in order to enable the initial detection frame to have a larger correction space; the method comprises the steps of carrying out area mapping processing on an initial detection frame comprising an initial image to obtain a mapping area image comprising the mapping detection frame, mapping the size of the initial detection frame in different images to be processed into a preset detection frame size through area mapping processing, mapping the size of the initial area image in the different images to be processed into a preset image size, obtaining target offset information compared with the mapping detection frame based on the mapping detection frame with the uniform size and the mapping area image with the uniform size by taking the size of the preset detection frame as a reference, reducing the complexity of data processing, carrying out offset processing on the mapping detection frame based on the target offset information to obtain a target detection frame, and enabling the target detection frame to be more fit with the edge of a target object compared with the initial detection frame so as to improve the detection precision of the target object.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a flowchart of an object detection method according to an embodiment of the present invention;

FIG. 2 is a diagram of an image to be processed according to an embodiment of the present invention;

FIG. 3 is a mapping region image provided by an embodiment of the present invention;

FIG. 4 is a schematic diagram of a target detection box according to an embodiment of the present invention;

fig. 5 is a flowchart of a method for determining target offset information according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of feature extraction of a deep learning model according to an embodiment of the present invention;

fig. 7 is a block diagram of an object detection apparatus according to an embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In this embodiment, the technical problem to be solved by the present invention is to improve the accuracy of the target object detection frame, so that the target object detection frame fits the edge of the target object, thereby improving the target object detection accuracy.

Referring to fig. 1, the method includes:

s110: carrying out target detection processing on the image to be processed to obtain an initial detection frame corresponding to a target object;

the execution main body of the embodiment is a vehicle body processor or a server, the to-be-processed image is shot in real time through the vehicle-mounted camera, the to-be-processed image can also be an image stored in a cloud, the to-be-processed image comprises various target detection objects, such as pedestrians, vehicles, traffic signs and the like, and the to-be-processed image is shown in fig. 2, wherein the target object enclosed by the frame is a vehicle.

In this embodiment, a target object is identified in an image to be processed by a target detection method, and a target detection algorithm of a YOLOv5 model is adopted for target detection processing, so that YOLOv5 has a faster training speed, and YOLOv5s has a smaller model size, which is beneficial to rapid deployment of the model. As shown in fig. 2, in the to-be-processed image, a vehicle is detected from the to-be-processed image through a target detection algorithm of the YOLOv5 model, and a target object of the vehicle is marked through an initial detection frame, so as to subsequently extract feature information of the target object, such as a vehicle type, a license plate and other feature information of the vehicle, from an initial area image corresponding to the initial detection frame.

In the prior art, it is known that an initial detection frame has a certain precision, but when a target object is far away from a detection point, the initial detection frame may have a certain deviation, and cannot be well attached to an edge of the target object, and even the target object is not completely contained in the initial detection frame, which may cause incomplete feature extraction of the target object, for example, if the initial detection frame exceeds the edge of the target object too much, and interference data unrelated to feature data of the target object is too much, the difficulty in feature extraction of the target object is increased, and then the detection precision of the target object is reduced.

Therefore, the detection frame of the target object should be attached to the edge of the target object, so that the feature extraction completeness of the target object can be ensured, and the feature extraction difficulty of the target object can be reduced.

S120: performing area expansion on the initial area image based on preset proportion information to obtain an initial expanded area image containing an initial detection frame; the initial area image is an image area corresponding to the initial detection frame in the image to be processed;

in an image to be processed, an image in an initial detection frame is an initial area image, and there are two situations in a general initial detection frame, one is that a target object is completely in the initial detection frame, and the other is that the target object exceeds the outside of the initial detection frame; if the initial area image is directly cut from the image to be processed, the initial detection frame can only shift towards the direction of the initial area image, but there is not enough space to enable the initial detection frame to shift towards the opposite direction of the initial area image, so that the initial area image needs to be subjected to area expansion based on a preset proportion to obtain an initial expanded area image, the initial detection frame is located inside the initial expanded area image, and in the initial expanded area image, the initial detection frame can shift towards the direction of the initial area image and can also shift towards the opposite direction of the initial area image, so as to ensure that the detection frame of the target object is attached to the edge of the target object; if the initial expansion area image exceeds the range of the image to be processed, the exceeding area is filled with a value of 0.

In this embodiment, the preset proportion information is 0.125 times or 0.25 times the height and width of the initial detection frame, and as an optimal implementation manner, the preset proportion information is 0.125, that is, the upper edge of the initial region image is extended upward by 0.125 times of the height of the initial detection frame, the lower edge of the initial region image is extended downward by 0.125 times of the height of the initial detection frame, the left edge of the initial region image is extended leftward by 0.125 times of the width of the initial detection frame, the right edge of the initial region image is extended rightward by 0.125 times of the width of the initial detection frame, and the extended image region is the initial extended region image, in the vertical and horizontal directions of the initial region image in the image to be processed, with the central point of the initial detection frame as the reference. Therefore, the height and width of the initial extended area image are 1.25 times of the height and width of the initial area image, where 1.25=1+0.125 × 2.

In order to extract the feature information of the target object from the image to be processed, the initial extended area image is cut out from the image to be processed, and referring to fig. 3, the object recognition is performed only on the initial extended area image in order to reduce the calculation amount and ensure the processing speed.

S130: carrying out area mapping processing on the initial extended area image to obtain a mapping area image containing a mapping detection frame; the size of the mapping area image is the preset image size, and the size of the mapping detection frame is the preset detection frame size; the ratio information of the preset image size and the preset detection frame size meets the preset ratio information;

referring to the map area image shown in fig. 3, the detection frame in fig. 3 is a map detection frame, and the outside of the map detection frame is an expanded area; after the area mapping process, the size of the initial detection frame in the different images to be processed is mapped to a preset detection frame size, the size of the initial area image in the different images to be processed is mapped to a preset image size, the ratio information between the preset image size and the preset detection frame size satisfies preset ratio information, that is, the distance between the mapping detection frame and the mapping area image, and the ratio information between the mapping detection frame size and the preset detection frame size is equal to the preset ratio information, for example, the preset detection frame size is 128 × 128, the preset image size is 160 × 160, the distance between the mapping detection frame and the mapping area image is 16, the ratio of 16 to 128 is equal to the preset ratio information, that is, 0.125, and the preset extension parameter is 16 pixels.

S140: obtaining target offset information of the target object compared with the mapping detection frame based on the mapping area image;

inputting the mapping area image into a deep learning model, and performing feature extraction on the mapping area image to obtain target offset information compared with a mapping detection frame; in order to obtain target offset information compared with a mapping detection frame, a deep learning model needs to be trained in advance, training samples are input into the model to be trained, the size of the training samples is consistent with the size of an image of a mapping area, the target detection frame is calibrated manually, target offset information is determined based on the target detection frame, the training samples are input into the model to be trained, offset information is generated by the model to be trained, loss information is determined according to the target offset information and the offset information, parameters of the model to be trained are adjusted based on the loss information, a plurality of training samples are input into the model to be trained to perform iterative training until the loss information is smaller than a loss threshold, the deep learning model is obtained after training is finished, the image of the mapping area is input into the deep learning model, and the deep learning model after parameter adjustment can output the target offset information.

S150: performing offset processing on the mapping detection frame based on the target offset information, and determining a target detection frame containing a target object; the target detection box fits the target object edge better than the mapping detection box.

And under the condition that the mapping area image completely contains the target object, carrying out offset processing on the mapping detection frame based on different target offset information to obtain a target detection frame corresponding to the target object.

In one embodiment, the method further comprises:

determining an offset interval based on the mapping detection frame and the preset proportion information; the preset proportion information is determined based on the ratio information of the size of the initial detection frame and the size of the initial area image;

if a large vehicle is in front of a small vehicle, and the target object corresponding to the output target offset information cannot be determined to be the large vehicle or the small vehicle after the small vehicle, if the range of the target offset information is not constrained, the output target offset information may be positive infinity or negative infinity, and the deep learning model is difficult to converge, so before determining the target offset information, an offset interval needs to be determined, the target offset information is determined based on the offset interval, the offset interval is determined according to the mapping detection frame and preset proportion information, for example, the coordinates of the mapping detection frame at the upper left and lower right in the mapping area image are [16, 16, 144, 144], the detection frame size is 128 × 128, the preset image size is 160 × 160, the preset proportion information is 0.125, the preset expansion parameter is 16 (128 × 0.125), therefore, the mapping detection frame can be shifted outwards by 16 to the maximum, and the inwards maximum shift is also 16, so that the shift interval is 0-32, when the target detection frame is at the maximum size in the mapping area image, the coordinates are [0,0, 160, 160], the detection frame with the coordinates of [0,0, 160, 160] is determined as the first detection frame, the coordinates at the minimum size are [32, 32, 128, 128], and the detection frame with the coordinates of [32, 32, 128, 128] is determined as the second detection frame.

Obtaining target offset information of the target detection frame compared with the mapping detection frame based on the mapping area image, including:

obtaining target offset information based on the mapping area image and the offset interval of the mapping detection frame; the target offset information is located within an offset interval.

Referring to fig. 4, the detection frames from outside to inside are, in sequence, a first detection frame, a mapping detection frame, a target detection frame, and a second detection frame, where the distance between the first detection frame and the second detection frame is d, d is 2 times of a preset extension pixel, that is, twice of 16 pixels is 32 pixels, and target offset information is determined in an offset interval, and further, the target detection frame is located between the first detection frame and the second detection frame; before the target offset information is determined, the offset section is determined, so that when a plurality of objects appear in the mapping area image, the target offset information corresponding to the target object can be accurately output, and the precision of the target detection frame is improved.

In one embodiment, obtaining target offset information of the target detection frame compared to the mapping detection frame based on the mapping region image, with reference to fig. 5, includes:

s510: acquiring an object detection network; the object detection network comprises a feature extraction layer and an output layer;

the deep learning model comprises an object detection Network, the object detection Network comprises a Backbone Network for extracting basic features, an FPN (Feature Pyramid Network), an FAN (Path Aggregation Network), and the Backbone, FPN and FAN respectively comprise a plurality of Feature extraction layers.

S520: carrying out feature extraction on the mapping area image based on the feature extraction layer to obtain image feature information;

inputting the mapping region image into a feature extraction layer, as shown in fig. 6, a backhaul-based feature extraction network; the FPN is used as a feature pyramid from top to bottom, and can transfer high-level strong semantic features to enhance the whole pyramid. In the PAN, a pyramid from bottom to top is added on the basis of the FPN, so that the semantic information of the FPN can be supplemented, and the positioning features of the lower layer are transmitted, so that the object detection network not only fuses the semantic information, but also has the positioning information, and the image feature information corresponding to the mapping area image is obtained.

S530: and inputting the image characteristic information and the preset proportion information into an output layer, and performing data processing based on the output layer to obtain target offset information.

Further, in the present embodiment, the calculation formula of the target offset information is:

y _t,b,l,r ＝d*(sigmoid(x _t,b,l,r )-0.5)

wherein x is _t,b,l,r D is 2 times of the preset expansion parameter, namely the image characteristic information (characteristic vector output by the last layer of characteristic extraction layer) is output by the last layer of characteristic extraction layer; inputting the image characteristic information and the preset extended parameters into an output layer based on the offset range which can be obtained by the preset extended parameters to obtain target offset information; in FIG. 6, head is the detection header, which is the output layer whose output content divides the four offsets corresponding to the detection frameThe shift amount is target shift information, and the shift amount also comprises C (type number of object types), which can be two types of non-vehicles and motor vehicles, or a plurality of types of non-vehicles, trucks, SUVs, cars and the like, wherein N in the graph is the number of pictures for one-time inference; the mapping area image is subjected to feature extraction through the multiple layers of feature extraction layers, so that the accuracy and comprehensiveness of image feature information are improved, and the precision of a subsequent target detection frame is improved.

In one embodiment, the feature extraction layer comprises a plurality of levels of feature extraction layers, and specifically, each of the backhaul, the FPN, and the FAN comprises a low level feature extraction layer and a high level feature extraction layer, wherein the low level feature extraction layer is used for extracting feature information corresponding to a first identification type, and the high level feature extraction layer is used for extracting feature information corresponding to a second identification type; the scale of the first identification type in the image to be processed is larger than that of the second identification type in the image to be processed; correspondingly, the image characteristic information comprises first image characteristic information and second image characteristic information, and the characteristic information is output as the first image characteristic information based on the high-level characteristic extraction layer; and outputting the feature information as second image feature information based on the feature extraction layer of the low hierarchy.

Carrying out feature extraction on the mapping area image based on the feature extraction layer to obtain image feature information, wherein the feature extraction comprises the following steps:

determining a target hierarchy based on the type of the target object;

and performing feature extraction on the mapping area image based on the feature extraction layer of the target level to obtain image feature information.

Determining the type of a target object based on the image feature information, wherein the type of the target object comprises a first recognition type and a second recognition type, the first recognition type comprises recognition of the whole vehicle, the second recognition type comprises a license plate or other vehicle features of the vehicle and other vehicle features and other smaller-size target objects, if the target object is the first recognition type, a feature extraction layer (namely output of a PAN top layer in figure 6) with a high-level determined target level is used, the feature extraction layer with the high-level has rich semantic information and is suitable for extracting vehicle contours and the like, the first image feature information and a preset expansion parameter are input into an output layer, target offset information corresponding to the first recognition type is obtained, multi-scale output is not needed, and the complexity of a deep learning model is also reduced; if the target object is of the second recognition type, the target level is determined to be a low-level feature extraction layer (namely, the output of the PAN bottom layer in the figure 6), the features of the low-level feature extraction layer such as the outline, the edge, the color, the texture and the shape are extracted, but the recognizable area range is not as wide as that of the high-level feature extraction layer, the method is suitable for detecting vehicle identification information with a small size and accurately extracting license plate numbers and the like, the high-level feature extraction layer has rich semantic information and is suitable for extracting vehicle outlines and the like, the complete recognition of the vehicle outlines can be ensured, the accuracy of target offset information is improved, the second image feature information and preset expansion parameters are input into the output layer, the license plate information can be accurately obtained, the target level is determined through different recognition types for feature extraction, the comprehensiveness and the accuracy of the image feature information can be ensured, and the complexity of the model is reduced.

In one embodiment, the target hierarchy includes a first hierarchy corresponding to a first identified type, and a second hierarchy corresponding to a second identified type; the image characteristic information comprises first image characteristic information output by a first level and second image characteristic information output by a second level;

based on the feature extraction layer of the target hierarchy, after the feature extraction is performed on the mapping area image to obtain the image feature information, the method further includes:

performing data processing on the first image characteristic information based on an output layer corresponding to the first layer level to obtain target offset information;

and carrying out object recognition on the second image characteristic information based on the output layer corresponding to the second hierarchy to obtain the sub-objects contained in the target object.

The first level corresponds to the feature extraction layer of the high level and is suitable for setting heads at the first level and the second level respectively for the image of the second identification type, and the mapping area image is subjected to feature extraction through the feature extraction layer of the first level to obtain first image feature information; performing feature extraction on the mapping area image through a second-level feature extraction layer to obtain second image feature information; when the type of the target object is a first identification type, performing data processing on the first image characteristic information through a head corresponding to a first hierarchy, and outputting target offset information; when the type of the target object is a second recognition type, performing object recognition on the second image feature information through the head corresponding to the second hierarchy, wherein output layers corresponding to different hierarchies may be different, and corresponding data processing functions may also be different, for example, some output layers may be used for offset calculation, and some output layers may be used for object recognition classification; sub-objects included in the target object, for example, the target object is a vehicle with license plate information, the vehicle outline is a first identification type, the license plate information is a second identification type, target offset information corresponding to a mapping detection frame compared with the vehicle is determined, the first identification type maps a mapping area image through a first-level feature extraction layer, the head performs data processing based on first image feature information output by the first level to obtain target offset information corresponding to the mapping detection frame, and the target detection frame of the vehicle is further obtained through the target offset information; in addition, the license plate information is a sub-object of the vehicle, the license plate information is a second identification type, the second identification type is mapped to the region image through a second-level feature extraction layer, the second-level feature extraction layer is high in identification precision and suitable for extracting the license plate information, the Head identifies the license plate information based on second image feature information output by the second level, and the license plate information of the vehicle is obtained.

In one embodiment, the shifting the mapping detection frame based on the target shift information to determine the target detection frame containing the target object includes:

and adding the target offset information to the coordinate information of the mapping detection frame in the mapping area image to obtain the coordinate information of the target detection frame in the mapping area image.

Referring to fig. 6, coordinate information of the mapping detection frame in the mapping area image is determined, and the target offset information is four offset vectors of the coordinate information of the mapping detection frame in the mapping area image, and the coordinate information of the mapping detection frame in the mapping area image is added with the target offset information to obtain the coordinate information of the target detection frame in the mapping area image.

This embodiment also provides an object detection apparatus, which can implement all the above method steps, as shown in fig. 7, the apparatus includes:

the processing module 710 is configured to perform target detection processing on the image to be processed based on the preset proportion information to obtain an initial detection frame corresponding to the target object;

a region expansion module 720, configured to perform region expansion on the initial region image to obtain an initial expansion region image including the initial detection frame; the initial area image is an image area corresponding to the initial detection frame in the image to be processed;

the region mapping module 730 is configured to perform region mapping processing on the initial extended region image to obtain a mapping region image including a mapping detection frame; carrying out area mapping processing on the initial expansion area image to obtain a mapping area image containing a mapping detection frame; the size of the mapping area image is a preset image size, and the size of the mapping detection frame is a preset detection frame size; the ratio information of the preset image size and the preset detection frame size meets the preset ratio information;

a target offset information generating module 740, configured to obtain target offset information of the target object compared to the mapping detection frame based on the mapping region image;

an offset processing module 750, configured to perform offset processing on the mapping detection frame based on the target offset information, and determine a target detection frame including the target object; the target detection box fits the edge of the target object better than the mapping detection box.

Still further, the apparatus further comprises:

the first determining module is used for determining an offset interval based on the mapping detection frame and a preset expansion parameter; the preset expansion parameter is determined based on the ratio information of the size of the initial detection frame and the size of the initial area image;

the second processing module is used for obtaining target offset information based on the mapping area image and the offset interval of the mapping detection frame; the target offset information is located within an offset interval.

An acquisition module for acquiring an object detection network; the object detection network comprises a feature extraction layer and an output layer;

the characteristic extraction module is used for extracting the characteristics of the mapping area image based on the characteristic extraction layer to obtain image characteristic information;

and the input module is used for inputting the image characteristic information and the preset proportion information into the output layer, and performing data processing based on the output layer to obtain target offset information.

The second determination module is used for determining a feature extraction layer of a target level based on the type of the target object; the feature extraction layer comprises a plurality of levels of feature extraction layers, wherein the lower level of feature extraction layer is used for extracting feature information corresponding to a first identification type, and the higher level of feature extraction layer is used for extracting feature information corresponding to a second identification type; the scale of the first identification type in the image to be processed is larger than that of the second identification type in the image to be processed;

and the third determining module is used for determining the characteristic information of the characteristic extraction layer of the target level as the image characteristic information.

The third processing module is used for carrying out data processing on the first image characteristic information based on the output layer corresponding to the first layer level to obtain target offset information;

and the fourth processing module is used for carrying out object identification on the second image characteristic information based on the output layer corresponding to the second level to obtain the sub-objects contained in the target object.

And the fifth processing module is used for obtaining the coordinate information of the target detection frame in the mapping area image based on the sum of the coordinate information of the mapping detection frame in the mapping area image and the target offset information.

The embodiment has the following effects:

1. the method comprises the steps of carrying out object detection processing on an image to be processed to an initial detection frame corresponding to a target object, and carrying out region expansion on an initial region image to obtain an initial expanded image in order to enable the initial detection frame to have a larger correction space; the method comprises the steps of carrying out area mapping processing on an initial detection frame comprising an initial image to obtain a mapping area image comprising the mapping detection frame, fixing the size of the initial detection frame in different images to be processed to be a uniform size through area mapping processing, fixing the size of the initial area image in the different images to be processed to be the uniform size, obtaining target offset information compared with the mapping detection frame based on the mapping detection frame with the uniform size and the mapping area image with the uniform size, carrying out offset processing on the mapping detection frame based on the target offset information to obtain a target detection frame, wherein the target detection frame is more attached to the edge of a target object compared with the initial detection frame, and therefore the detection precision of the target object is improved.

2. The method comprises the steps of expanding/reducing preset times of sizes of different initial detection frames to be mapped into a uniform size, namely mapping the detection frames, and expanding/reducing the preset times of sizes of different initial area images to be mapped into the uniform size, namely mapping the area images, wherein the target detection frame most fitting the edge of a target object can be determined only by determining target offset information compared with the mapping detection frames, and the processing complexity can be reduced.

Embodiments of the present invention also provide an electronic device, which includes a processor and a memory, where at least one instruction, at least one program, a code set, or a set of instructions is stored in the memory, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by the processor to implement an object detection method as in the method embodiments.

Embodiments of the present invention also provide a storage medium, which may be disposed in a server to store at least one instruction, at least one program, a code set, or a set of instructions for implementing an object detection method in the method embodiments, where the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by the processor to implement an object detection method provided in the method embodiments.

Alternatively, in this embodiment, the storage medium may be located in at least one network server of a plurality of network servers of a computer network. Optionally, in this embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The foregoing description has disclosed fully preferred embodiments of the present invention. It should be noted that those skilled in the art will be able to make modifications to the embodiments of the present invention without departing from the scope of the appended claims. Accordingly, the scope of the appended claims is not to be limited to the specific embodiments described above.

Claims

1. An object detection method, comprising:

carrying out object detection processing on an image to be processed to obtain an initial detection frame corresponding to a target object;

obtaining target offset information of a target detection frame compared with the mapping detection frame based on the mapping area image;

performing offset processing on the mapping detection frame based on the target offset information, and determining the target detection frame containing the target object; the target detection box fits the target object edge more closely than the mapping detection box.

2. An object detection method as claimed in claim 1, characterized in that the method further comprises:

determining an offset interval based on the size information of the mapping detection frame and the preset proportion information; the preset proportion information is determined based on ratio information of the size of the initial detection frame and the size of the initial area image;

the obtaining target offset information of the target object compared with the mapping detection frame based on the mapping region image includes:

3. The object detection method according to claim 2, wherein obtaining the target offset information based on the mapping region image and an offset section of the mapping detection frame comprises:

4. The object detection method according to claim 3, wherein the feature extraction layer comprises a plurality of levels of feature extraction layers, wherein a lower level is used for extracting feature information corresponding to a first identification type, and a higher level is used for extracting feature information corresponding to a second identification type; the image scale corresponding to the first identification type is larger than the image scale corresponding to the second identification type;

determining a target hierarchy based on the type of the target object;

5. The object detection method according to claim 4, wherein the target hierarchy comprises a first hierarchy corresponding to the first recognition type and a second hierarchy corresponding to the second recognition type; the image characteristic information comprises first image characteristic information output by the first level and second image characteristic information output by the second level;

6. The object detection method of claim 1, wherein the shifting the mapping detection box based on the target shift information to determine the target detection box containing the target object comprises:

and obtaining the coordinate information of the target detection frame in the mapping area image based on the sum of the coordinate information of the mapping detection frame in the mapping area image and the target offset information.

7. An object detecting apparatus, characterized by comprising:

8. An electronic device, comprising a processor and a memory, wherein at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the memory, and wherein the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement an object detection method according to any one of claims 1 to 6.

9. A computer readable storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions for being loaded by a processor and for performing a method of object detection as claimed in any one of claims 1 to 6.