CN112101361B - Target detection method, device and equipment for fisheye image and storage medium - Google Patents

Target detection method, device and equipment for fisheye image and storage medium Download PDF

Info

Publication number
CN112101361B
CN112101361B CN202011306070.1A CN202011306070A CN112101361B CN 112101361 B CN112101361 B CN 112101361B CN 202011306070 A CN202011306070 A CN 202011306070A CN 112101361 B CN112101361 B CN 112101361B
Authority
CN
China
Prior art keywords
image
detection
area
training
detection frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011306070.1A
Other languages
Chinese (zh)
Other versions
CN112101361A (en
Inventor
董颖
刘国清
郑伟
杨广
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Youjia Innovation Technology Co.,Ltd.
Original Assignee
Shenzhen Minieye Innovation Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Minieye Innovation Technology Co Ltd filed Critical Shenzhen Minieye Innovation Technology Co Ltd
Priority to CN202011306070.1A priority Critical patent/CN112101361B/en
Publication of CN112101361A publication Critical patent/CN112101361A/en
Application granted granted Critical
Publication of CN112101361B publication Critical patent/CN112101361B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The application relates to the technical field of computer vision, and provides a target detection method, a device, equipment and a storage medium for fisheye images, wherein the method comprises the following steps: obtaining a fisheye image to be processed; extracting a to-be-detected region image corresponding to each preset image region on the fisheye image; the imaging deformation degrees corresponding to different preset image areas are different; respectively inputting each area image to be detected to a corresponding target detection model; the different target detection models are used for carrying out target detection on images corresponding to different preset image areas by using detection frames with different angles; the angle of the detection frame is adaptive to the imaging deformation degree; acquiring target detection results output by each target detection model; according to the target detection results output by the target detection models, the position of the detected target in the fisheye image is obtained, distortion removal correction of the fisheye image is not needed, and the calculation cost brought by data set manufacturing is reduced.

Description

Target detection method, device and equipment for fisheye image and storage medium
Technical Field
The present application relates to the field of computer vision technologies, and in particular, to a method and an apparatus for detecting a target of a fisheye image, a computer device, and a storage medium.
Background
The target detection based on the computer vision is widely applied to various fields, particularly the driving safety field, and the position information of the detected early warning target (such as a vehicle, a pedestrian and the like) is sent to a driver by means of the computer vision technology, so that the driver can timely know the nearby conditions in the current driving process, the driving safety problem is avoided, and the occurrence of driving accidents is reduced.
The computer vision-based object detection may be classified into an object detection method based on a general image and an object detection method based on a fisheye image according to the type of a camera. The fisheye image has a wider observation visual field, so that the fisheye image is suitable for target detection with a wider detection range.
In the current target detection technology based on fisheye images, considering that target deformation at different positions in images is large, usually, a correction operation of distortion removal is performed on the fisheye images, and then a detection algorithm is designed to determine the target positions. In particular, if the problem of large differences in the target shapes at different positions in the fisheye image is solved by image distortion removal correction, the calculation cost is increased for data set production.
Disclosure of Invention
In view of the above, it is necessary to provide a method, an apparatus, a computer device and a storage medium for detecting an object of a fisheye image.
A method of object detection for fisheye images, the method comprising:
obtaining a fisheye image to be processed;
extracting the images of the areas to be detected corresponding to the preset image areas on the fisheye images; the imaging deformation degrees corresponding to different preset image areas are different;
respectively inputting each area image to be detected to a corresponding target detection model; the different target detection models are used for carrying out target detection on images corresponding to different preset image areas by using detection frames with different angles; the angle of the detection frame is adaptive to the imaging deformation degree;
acquiring target detection results output by each target detection model;
and obtaining the position of the detected target in the fisheye image according to the target detection result output by each target detection model.
In one of the embodiments, the first and second electrodes are,
the method further comprises the following steps:
acquiring a sample image to be marked corresponding to any preset image area from a sample fisheye image based on any preset image area selected from each preset image area; wherein the sample image to be annotated is horizontal relative to the sample fisheye image;
rotating the sample image to be marked according to the angle which is adaptive to the imaging deformation degree corresponding to any preset image area to obtain a rotating sample image to be marked; the target to be marked in the sample image to be marked is rotated to be horizontal relative to the sample fisheye image, and the sample image to be marked is rotated relative to the sample fisheye image;
obtaining a size expansion image based on the rotating sample image to be marked; the image size of the size-expanded image is larger than that of the sample image to be marked in the rotating mode and is horizontal relative to the sample fisheye image; the size expansion image comprises the rotating sample image to be marked and a black edge image surrounding the rotating sample image to be marked, the rotating angle of the rotating sample image to be marked relative to the size expansion image is the adaptive angle, and the target to be marked is horizontal relative to the size expansion image;
marking the target to be marked in the sample image to be marked in the rotation mode by using the detection frame which is horizontal relative to the size-expanded image to obtain a corresponding horizontal marking detection frame, and taking the sample image to be marked in the rotation mode, which comprises the horizontal marking detection frame, as a first marking sample image;
according to the adaptive angle, performing rotation processing on the first labeling sample image and the horizontal labeling detection frame aiming at the rotation processing to respectively obtain a second labeling sample image which is horizontal relative to the size-expanded image and a rotation labeling detection frame which rotates relative to the size-expanded image; the rotation angle of the rotation labeling detection frame relative to the size expansion image is the adaptive angle, and a black edge image of the size expansion image is changed from a sample image to be labeled around the rotation to the second labeled sample image;
acquiring a second labeling sample image except for a black edge image in the size-expanded image, and converting a coordinate system where the rotating labeling detection frame is located from the coordinate system of the size-expanded image into the coordinate system of the second labeling sample image;
taking a second labeled sample image of the rotating labeled detection frame in the coordinate system of the second labeled sample image as a training sample image;
and carrying out model training based on the training sample image to obtain a target detection model aiming at any preset image area.
In one of the embodiments, the first and second electrodes are,
the model training based on the training sample image to obtain a target detection model for any preset image area comprises:
determining a first corner point and a second corner point which have a diagonal relation and are of a rotating labeling detection frame of the training sample image;
determining a position training label of the training sample image according to the offset of each pixel point in the rotation labeling detection frame of the training sample image relative to the first corner point and the second corner point respectively;
and performing model training based on the training sample image and the position training label to obtain a target detection model for any preset image area.
In one embodiment, the determining the position training label of the training sample image according to offsets of pixel points in the rotation labeling detection frame of the training sample image with respect to the first corner point and the second corner point respectively includes:
acquiring a heat map of which the image size corresponds to the image size of the training sample image;
determining a heat area corresponding to a rotation marking detection frame of the training sample image in the heat map; the heat degree area comprises a first area point corresponding to the first corner point and a second area point corresponding to the second corner point;
determining a corresponding predicted value of each pixel point in the heat region in model training based on the offset of each pixel point in the heat region relative to the first region point or the second region point;
and obtaining the position training label based on a heat map of the heat area with the predicted value.
In one embodiment, the offsets correspond to different position training labels, and the offsets include an offset of a pixel point relative to the first corner point along a first image edge direction of the sample fisheye image, an offset of a pixel point relative to the first corner point along a second image edge direction of the fisheye image orthogonal to the first image edge, an offset of a pixel point relative to the second corner point along the first image edge direction, and an offset of a pixel point relative to the second corner point along the second image edge.
In one embodiment, each target detection model comprises a category detection head for determining the category of the target and a position detection head for determining the position of the target; the target detection result further comprises the category of the detected target determined by the target detection model based on the category detection head;
the position training labels are for the position detection head;
the heat map as a class training label for the class detection head includes: the method comprises the following steps that a heat area with pixel values of pixel points all being first preset values and a background area with pixel values of pixel points all being second preset values; the heat area corresponds to a rotation mark detection frame of the training sample image, the background area corresponds to a background area in the training sample image except the rotation mark detection frame, and the first preset value is different from the second preset value.
In one embodiment, the hot region of the class training label is obtained by inward contraction of the original plate region of the class training label corresponding to the rotation labeling detection frame of the training sample image; the pixel value of the area between the original plate area and the heat area is a third preset value; and pixel values of all pixel points in the heat area and pixel values of all pixel points in the background area participate in the model training and return corresponding loss, and pixel values of an area between the original edition area and the heat area do not participate in the model training and return corresponding loss.
In one embodiment, the performing model training based on the training sample image to obtain a target detection model for any one of the preset image regions includes:
carrying out rotation transformation on a rotation prediction detection frame obtained in model training and a rotation labeling detection frame of the training sample image to respectively obtain a horizontal prediction detection frame and a horizontal labeling detection frame;
calculating the intersection ratio loss of the horizontal prediction detection frame and the horizontal marking detection frame by using an intersection ratio loss function aiming at the horizontal detection frame;
and transmitting the calculation result based on the intersection ratio loss calculation back to the trained model, completing model training and obtaining the target detection model.
In one embodiment, the intersection-to-parallel ratio loss function for the horizontal detection frame is performed based on the offset of the pixel point relative to each side of the horizontal detection frame; and the rotation transformation is to convert the offset of the pixel point in the rotation detection frame relative to one corner point of the rotation detection frame into the offset of the pixel point relative to the edge forming the one corner point, and obtain the corresponding horizontal detection frame.
In one embodiment, the rotation transformation is performed by using an offset conversion formula constructed based on an angle adapted to the imaging deformation degree of each preset image area.
An object detection apparatus for fisheye images, the apparatus comprising:
the fisheye image acquisition module is used for acquiring a fisheye image to be processed;
the image extraction module of the area to be detected is used for extracting the image of the area to be detected corresponding to each preset image area on the fisheye image; the imaging deformation degrees corresponding to different preset image areas are different;
the model detection module is used for respectively inputting the images of the areas to be detected to the corresponding target detection models; the different target detection models are used for carrying out target detection on images corresponding to different preset image areas by using detection frames with different angles; the angle of the detection frame is adaptive to the imaging deformation degree;
the detection result acquisition module is used for acquiring target detection results output by each target detection model;
and the target positioning module is used for obtaining the position of the detected target in the fisheye image according to the target detection result output by each target detection model.
A computer device comprising a memory storing a computer program and a processor implementing the method when executing the computer program.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the above-mentioned method.
The target detection method, the device, the computer equipment and the storage medium for the fisheye image acquire the fisheye image to be processed; extracting the images of the areas to be detected corresponding to the preset image areas on the fisheye images; the imaging deformation degrees corresponding to different preset image areas are different; respectively inputting each area image to be detected to a corresponding target detection model; the different target detection models are used for carrying out target detection on images corresponding to different preset image areas by using detection frames with different angles; the angle of the detection frame is adaptive to the imaging deformation degree; acquiring target detection results output by each target detection model; and obtaining the position of the detected target in the fisheye image according to the target detection result output by each target detection model. In the method, different preset image areas correspond to different imaging deformation degrees in the fisheye image, the images of the areas to be detected corresponding to the preset image areas are extracted from the fisheye image and are respectively input into corresponding target detection models, the different target detection models detect the corresponding images of the areas to be detected by using detection frames with angles adaptive to the imaging deformation degrees, distortion removal correction of the fisheye image is not needed, and the calculation cost brought by data set manufacturing is reduced.
Drawings
FIG. 1 is a flow diagram illustrating a method for detecting a target in a fisheye image according to an embodiment;
FIG. 2 is a schematic diagram of an image acquired by a fisheye camera, ROI area setting and detection frame setting;
FIG. 3 is a flow chart of the labeling process for the rotating label detection box;
FIG. 4 is a model architecture diagram of one of the object detection models;
FIG. 5 is an illustration of a training label for a horizontal label detection box and a rotational label detection box;
FIG. 6 is a schematic diagram of a detection result obtained by the target detection method of the present application in one embodiment;
FIG. 7 is a block diagram of an apparatus for detecting objects in a fisheye image according to an embodiment;
FIG. 8 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
Some technical words related to the present application are introduced first:
deep Neural Networks (DNNs): the method is a framework of deep learning, and is a neural network with at least one hidden layer. Similar to the shallow neural network, the deep neural network can also provide modeling for a complex nonlinear system, but the extra levels provide higher abstraction levels for the model, thereby improving the capability of the model. Deep neural networks are a discriminant model that can be trained using back-propagation algorithms.
Object Detection (Object Detection): and extracting the position information of the target from the image based on a computer vision algorithm.
Anchor point (Anchor): a method for setting a reference frame commonly used in an object detection algorithm can be generally classified into an Anchor-based (Anchor-based) detection algorithm and an Anchor-free (Anchor-free) detection algorithm according to whether an Anchor point is used in the algorithm.
The target detection based on the computer vision is widely applied to various fields, particularly the driving safety field, and the position information of the detected early warning target (such as a vehicle, a pedestrian and the like) is sent to a driver by means of the computer vision technology, so that the driver can timely know the nearby conditions in the current driving process, the driving safety problem is avoided, and the occurrence of driving accidents is reduced.
The detection of targets in vehicle blind areas is taken as an example for introduction: target detection of vehicle blind areas is an important technology of an active safety system of an automobile. The method can assist a driver in monitoring a region with a limited visual field, when early warning targets such as vehicles, pedestrians and the like appear in a blind area, the detection of the blind area targets can detect the category and position information of the early warning targets based on a computer vision technology, and alarm signals of corresponding grades are sent to the driver, so that the driver is reminded to keep alert, and traffic accidents are avoided.
The target detection comprises a target detection method based on the fusion of various sensors, such as a radar and a camera, and a target detection method based on vision, and as the scheme of the fusion of various sensors is higher in cost, and with the rapid development of deep learning, the accuracy of the target detection and target identification based on vision is greatly improved, and the target detection based on vision is also rapidly developed.
The computer vision-based object detection may be classified into an object detection method based on a general image and an object detection method based on a fisheye image according to the type of a camera. The fisheye image has a wider observation visual field, so that the fisheye image is suitable for target detection with a wider detection range.
In the conventional target detection technology based on the fisheye image, in consideration of large target deformation at different positions in the image, usually, a distortion removal correction operation is performed on the fisheye image, and then a detection algorithm is designed to determine the target position. In particular, if the problem of large differences in the target shapes at different positions in the fisheye image is solved by image distortion removal correction, the calculation cost is increased for data set production. Moreover, the image distortion removal process has pixel loss, which adds uncertainty to subsequent detection. In addition, the traditional target detection method adopts a labeling scheme of a horizontal detection frame, and if the fisheye image is not subjected to distortion removal correction and is directly detected in a full image, the problem of mode complexity increase due to larger deformation of early warning targets at different positions can occur.
Therefore, the traditional fish-eye image target detection method has the following problems:
firstly, a fisheye image needs to be subjected to a distortion removal step, so that the data set manufacturing cost is high, and the calculation is time-consuming;
secondly, the same horizontal detection frame is adopted at different positions in the fisheye image, so that the shape difference of the early warning target at different positions in the image is large, the feature complexity is increased, and meanwhile, the distance measurement of the early warning target with large deformation under the horizontal detection frame is not accurate enough.
Detection of the current rotation detection frame is usually based on an Anchor-based method, and an appropriate Anchor scale needs to be set according to a data set, so that the detection is complicated.
In the method for detecting the target of the fisheye image, the fisheye image is divided into a plurality of preset image regions (also called as ROI, region of interest) for target detection, and different target detection models are used for target detection of images corresponding to different image regions, and each target detection model uses a detection frame adapted to the imaging deformation degree for target detection, for example, for an image region with a large imaging deformation degree, a detection frame with a larger rotation angle is used, and for an image region with a small imaging deformation, a horizontal detection frame is used; in addition, a detection backbone (backbone) of each target detection model is a multi-scale fusion convolutional neural network, and a detection head (detection head) is a rotation detection frame representation method without an anchor point.
The application provides a target detection method for fisheye images, which can realize that: firstly, target detection is carried out without a fisheye image distortion removal step, so that data set production is reduced, and calculation cost is reduced; secondly, the detection method without the anchor point avoids parameter setting based on the anchor point method, and is simpler and more convenient to implement; compared with the traditional target detection scheme that horizontal detection frames are adopted, different image areas are isolated, the detection of a rotary detection frame is designed for a target with a rotary angle, the position of the detected target in a fisheye image is accurately determined, and the distance between the detected targets can be further accurately determined; and fourthly, aiming at different image areas, different target detection models are utilized to carry out target detection, and the problem of complex characteristic patterns in the whole image training is solved.
As shown in fig. 1, the present application provides a target detection method for a fisheye image, which may be applied to a computer device, and specifically may include the following steps:
step S101, obtaining a fisheye image to be processed;
step S102, extracting to-be-detected area images corresponding to all preset image areas on the fisheye image;
step S103, inputting each area image to be detected to a corresponding target detection model; the different target detection models are used for carrying out target detection on images corresponding to different preset image areas by using detection frames with different angles; the angle of the detection frame is adaptive to the imaging deformation degree.
As shown in fig. 2, the single-side fisheye image corresponds to a vehicle, and it can be seen that the imaging deformation degree of the vehicle in the close view region close to the fisheye camera is small, and the imaging deformation degree in the distant view region far from the fisheye camera is small, so that the fisheye image can be divided into a plurality of preset image regions, such as ROI-1 and ROI-2, according to the imaging deformation degree, where the imaging deformation degrees corresponding to different preset image regions are different.
In addition, because the imaging deformation degree of the vehicle in the ROI-1 is large, the rotation angle of the detection frame is large, and the corresponding detection frame can be called as a rotation detection frame; the imaging deformation of the vehicle in ROI-2 is small, and therefore, a horizontal detection frame for no imaging deformation can be adopted. The rotation detection box for the labeling process may be referred to as a rotation labeling detection box 201, and the horizontal detection box may be referred to as a horizontal labeling detection box 202; for the model prediction process, the rotation detection block may be referred to as a rotation prediction detection block and the horizontal detection block may be referred to as a horizontal prediction detection block.
That is to say, after obtaining the image to be processed, the computer device extracts the images of the regions to be detected corresponding to the ROI-1 and the ROI-2 from the fisheye image to be processed, and respectively inputs the images of the regions to be detected corresponding to the ROI-1 into the target detection model corresponding to the ROI-1, and the target detection model corresponding to the ROI-1 performs target detection on the input images of the regions to be detected by using the rotation detection frame; the processing of the computer device for the image of the region to be detected corresponding to the ROI-2 is consistent with the image of the region to be detected corresponding to the ROI-1, which is not repeated herein.
Step S104, acquiring target detection results output by each target detection model;
and step S105, obtaining the position of the detected target in the fisheye image according to the target detection result output by each target detection model.
The target detection result output by the target detection model may include coordinate information of a detection frame for the detected target in the fisheye image, and therefore, the computer device may determine, based on the obtained coordinate information, the corresponding detection frame, determine that the detected target is located in the detection frame, and then complete target position detection of the fisheye image.
In the target detection method for the fisheye image, the fisheye image to be processed is obtained; extracting a to-be-detected region image corresponding to each preset image region on the fisheye image; the imaging deformation degrees corresponding to different preset image areas are different; respectively inputting each area image to be detected to a corresponding target detection model; the different target detection models are used for carrying out target detection on images corresponding to different preset image areas by using detection frames with different angles; the angle of the detection frame is adaptive to the imaging deformation degree; acquiring target detection results output by each target detection model; and obtaining the position of the detected target in the fisheye image according to the target detection result output by each target detection model. In the method, different preset image areas correspond to different imaging deformation degrees in the fisheye image, the images of the areas to be detected corresponding to the preset image areas are extracted from the fisheye image and are respectively input into corresponding target detection models, the different target detection models detect the corresponding images of the areas to be detected by using detection frames with angles adaptive to the imaging deformation degrees, distortion removal correction of the fisheye image is not needed, and the calculation cost brought by data set manufacturing is reduced. And when the target detection model is trained, if the corresponding labeling detection frame is tightly attached to the target, the target detection model can be used for detecting the target by using the detection frame with the angle which is suitable for the imaging deformation degree in the prediction stage, so that the detection frame can be more tightly attached to the detected target in the fisheye image, the position of the detected target is more accurate based on the target detection result output by each target detection model, and the accuracy of the subsequent calculation of the distance between the detected targets can be improved.
In one embodiment, since different target detection models respectively detect images of regions to be detected corresponding to different preset image regions, when training each target detection model, training is performed by using training sample images corresponding to different preset image regions, for example, for the target detection model of ROI-1, a training sample image corresponding to ROI-1 is used, and for the target detection model of ROI-2, a training sample image corresponding to ROI-2 is used, thereby reducing the difficulty in model training caused by imaging deformation difference.
In the process of labeling the sample image, the following steps can be included:
acquiring a sample image to be marked corresponding to any preset image area from the sample fisheye image based on any preset image area selected from each preset image area; wherein, the sample image to be marked is horizontal relative to the sample fisheye image;
rotating the sample image to be marked according to the angle which is adaptive to the imaging deformation degree corresponding to any preset image area to obtain a rotating sample image to be marked; rotating the target to be marked in the sample image to be marked to be horizontal relative to the sample fisheye image, and rotating the sample image to be marked to be rotary relative to the sample fisheye image;
obtaining a size expansion image based on the rotation of the sample image to be marked; the image size of the size expansion image is larger than the image size of the sample image to be marked in a rotating mode, and the image size is horizontal to the sample fisheye image; the size expansion image comprises a sample image to be marked and a black edge image which rotates the sample image to be marked, the rotation angle of the sample image to be marked relative to the size expansion image is a suitable angle, and the target to be marked is horizontal relative to the size expansion image;
marking a target to be marked in a sample image to be marked by rotation by using a detection frame which is horizontal relative to the size-expanded image to obtain a corresponding horizontal marking detection frame, and taking the sample image to be marked which comprises the horizontal marking detection frame and rotates as a first marking sample image;
according to the adaptive angle, performing rotation processing on the first labeled sample image and the horizontal label detection frame aiming at rotation processing to respectively obtain a second labeled sample image horizontal to the size expansion image and a rotation label detection frame rotating relative to the size expansion image; the rotation angle of the rotation labeling detection frame relative to the size expansion image is a suitable angle, and a black edge image of the size expansion image is changed from a sample image to be labeled around the rotation to a second labeled sample image;
acquiring a second labeling sample image except for the black edge image in the size expansion image, and converting a coordinate system where the rotary labeling detection frame is positioned from the coordinate system of the size expansion image into the coordinate system of the second labeling sample image;
and taking the second labeling sample image comprising the rotating labeling detection frame in the coordinate system of the second labeling sample image as a training sample image.
The above steps will be described with reference to fig. 3 (a) to 3 (d) by taking the example of labeling the sample image corresponding to ROI-1:
fig. 3 (a) shows a sample image to be annotated corresponding to ROI-1 obtained from a sample fisheye image, and it can be seen that the sample image to be annotated is horizontal to the sample fisheye image of fig. 2.
Because under the fixed mounting position of fisheye camera, the imaging deformation degree of fisheye image has corresponding rotation angle
Figure 406841DEST_PATH_IMAGE001
Correspondingly, the angle of the detection frame is the rotation angle
Figure 386299DEST_PATH_IMAGE001
Rotation angle of
Figure 838140DEST_PATH_IMAGE001
Angle of rotation of target to rotated state in normal imaging state for early warning, in fish-eye images on both sides
Figure 436611DEST_PATH_IMAGE001
Are positive and negative with each other.
In step S301, after obtaining FIG. 3 (a), the rotation angle is adjusted according to the above
Figure 329612DEST_PATH_IMAGE001
Absolute value of (can be noted as
Figure 479971DEST_PATH_IMAGE002
) Rotating the vehicle of fig. 3 (a) in a clockwise direction until the vehicle of fig. 3 (a) rotates to a horizontal state; for the rotation of FIG. 3 (a), the image size of FIG. 3 (a) remains unchanged and a black edge image, figure, around FIG. 3 (a) appearsFIG. 3 (b); since fig. 3 (b) is larger in image size than fig. 3 (a), fig. 3 (b) can be referred to as a size-expanded image; at this time, in FIG. 3 (b), xO2Marking the vehicle by using a horizontal marking detection frame in a y coordinate system, namely marking a top left vertex 2011 and a bottom right vertex 2012, and obtaining coordinate values of four vertexes of the horizontal marking detection frame correspondingly; it can be seen that the angle of rotation is described above
Figure 825633DEST_PATH_IMAGE002
Is adapted to the degree of imaging deformation, i.e. the greater the degree of imaging deformation, the greater the angle of rotation
Figure 493374DEST_PATH_IMAGE002
The larger.
Step S302, after obtaining the horizontal mark detection frame, according to the above rotation angle, rotating the ROI-1 image in FIG. 3 (b) counterclockwise to the horizontal state consistent with FIG. 3 (a), wherein after rotating, the horizontal mark detection frame has the above rotation angle
Figure 490149DEST_PATH_IMAGE002
The rotating label detection box.
Step S303, since the above notation is in FIG. 3 (b) xO2y, and therefore the rotating mark detection box is also relative to fig. 3 (b) xO2y in terms of a coordinate system; in this case, the coordinate system of the rotation mark detection frame needs to be changed from FIG. 3 (b) xO2y is changed to FIG. 3 (d) xO1In this coordinate system conversion process, the image in which the ROI-1 is extracted by removing the black-edge image from the size-expanded image is used. And fig. 3 (d) will be obtained as a training sample image.
After the computer equipment obtains the training sample images obtained according to the steps and after the training sample images are obtained after labeling is completed, the training sample images can be divided into a training set, a testing set and a verification set for model training, and a target detection model for any preset image area is obtained.
For the sample image labeling of the ROI-2, because the imaging deformation degree corresponding to the ROI-2 is small, a horizontal detection frame for labeling a normal image, that is, an upper left vertex 2021 and a lower right vertex 2022 of the horizontal labeling detection frame are calibrated.
Further, the location training label generation process of the training sample image may include the steps of: determining a first corner point and a second corner point which have a diagonal relation and are of a rotary labeling detection frame of a training sample image; determining a position training label of the training sample image according to the offset of each pixel point in the rotation labeling detection frame of the training sample image relative to the first corner point and the second corner point respectively; and performing model training based on the training sample images and the position training labels to obtain a target detection model aiming at any preset image area.
For example, as shown in fig. 2, the offset of each pixel point in the rotation labeling detection box 201 with respect to the top-left vertex 2011 (equivalent to a first corner point) and the bottom-right vertex 2012 (equivalent to a second corner point) is obtained to obtain a corresponding position training label. Model training is carried out by utilizing the offset of the pixel points relative to the corner points of the detection frame, so that the model can learn the pixel distribution condition of each pixel point relative to the corner points in the detection frame, and the accuracy of subsequent target detection is favorably improved.
Further, if the position training tag is in a heat map (heatmap) manner, the step of specifically generating the position training tag in the form of a heat map may include: acquiring a heat map corresponding to the image size of the training sample image; determining a heat area corresponding to a rotation marking detection frame of a training sample image in a heat map; the heat degree area comprises a first area point corresponding to the first corner point and a second area point corresponding to the second corner point; determining a corresponding predicted value of each pixel point in the heat region in model training based on the offset of each pixel point in the heat region relative to the first region point or the second region point; and obtaining the position training label based on the heat map of the heat area with the predicted value.
Furthermore, different offsets correspond to different position training labels, and the different offsets include an offset of a pixel point relative to a first corner point along a first image edge direction of the sample fisheye image, an offset of the pixel point relative to the first corner point along a second image edge direction of the fisheye image orthogonal to the first image edge, an offset of the pixel point relative to a second corner point along the first image edge direction, and an offset of the pixel point relative to the second corner point along the second image edge.
FIG. 5 shows a plurality of position training labels corresponding to the rotating label detection box: dt2、dl2、db2And dr2Let x be the coordinate of a certain pixel in the feature map (the heat region corresponding to the rotation label detection box) of the position training labeli, yjI belongs to w, j belongs to h, and w and h are the width and height of the output characteristic diagram respectively; the top left corner vertex coordinate and the bottom right corner vertex coordinate of the rotating label detection box may be expressed as: (x)2011, y2011),(x2012, y2012);
dt2The offset of each pixel point in the rotation labeling detection frame relative to the top left corner vertex 2011 along the y direction (corresponding to the side direction of the first image) is represented; i.e. dt2,j=yj–y2011The gradual change of black to white indicates that the offset is changed from small to large;
dl2represents the offset of each pixel point in the rotation label detection frame relative to the top left vertex 2011 along the x direction (corresponding to the side direction of the second image), i.e. dl2,i=xi–x2011
db2Represents the offset of each pixel point in the rotation mark detection box along the y direction (corresponding to the first image edge direction) relative to the top 2012 of the lower right corner, i.e. db2,j=yj–y2012
dr2Represents the offset of each pixel point in the rotation mark detection box along the x direction (corresponding to the second image edge direction) relative to the top 2012 of the lower right corner, i.e. dr2,i=xi–x2012
It should be noted that the top left corner vertex and the bottom right corner vertex of the above-described rotation label detection box correspond to fig. 2.
In one embodiment, each target detection model comprises a category detection head for determining the category of the target and a position detection head for determining the position of the target; the target detection result also comprises the category of the detected target determined by the target detection model based on the category detection head.
As shown in fig. 4, the two detection heads output by the object detection model are a category detection head (which may be referred to as a category branch) and a position detection head branch (which may be referred to as a detection frame branch); the backbone network of the target detection model can be a multi-scale feature fusion deep neural network.
Correspondingly, the position training labels described above are for the position detection head.
The heat map as the class training label for the class detection head includes: the method comprises the following steps that a heat area with pixel values of pixel points all being first preset values and a background area with pixel values of pixel points all being second preset values; the heat area corresponds to a rotation mark detection frame of the training sample image, the background area corresponds to a background area except the rotation mark detection frame in the training sample image, and the first preset value is different from the second preset value.
Further, the hot region of the class training label is obtained by inwards shrinking the class training label and the original edition region corresponding to the rotating label detection frame of the training sample image; the pixel value of the area between the original plate area and the heat area is a third preset value; and the pixel value of each pixel point in the heat area and the pixel value of each pixel point in the background area participate in model training and return corresponding loss, and the pixel value of the area between the original edition area and the heat area does not participate in model training and return corresponding loss.
Fig. 5 shows a class training label of a rotation labeling detection box, because a small amount of background images still exist in the rotation labeling detection box, in order to avoid interference of the small amount of background images on model training, the small amount of background images belong to an ignorable part, do not participate in the model training and do not return loss, and correspondingly, a pixel value can be set to-1, which appears gray; aiming at the fact that in the rotary labeling detection frame, an image area needing to participate in model training and return loss belongs to an effective part, the pixel value of the image area can be set to be 1, and the image area appears white; in the category training label, an area (corresponding to an original plate area) formed by the gray area and the white area corresponds to the rotation mark detection frame, and correspondingly, the white area can be regarded as an area obtained by shrinking the original plate area; further, the original edition area may be scaled back according to a preset scaling factor to obtain a white area, for example, the center of the original edition area is used as the origin of coordinates, and the corner coordinates of the rotation mark detection box are scaled by 0.5 times to obtain the corner coordinates of the white area. In addition, for the background image outside the rotation labeling detection frame, which needs to participate in training and return loss, the corresponding pixel value may be set to 0.
In the embodiment, because the heat map belongs to a processing mode without anchor points, the anchor point dimension does not need to be set additionally during model training, and the processing process is simplified.
In one embodiment, for the target detection model of ROI-1, since there is a certain rotation angle in the corresponding detection frame, the cross-over ratio loss function of the horizontal detection frame cannot be directly used for the cross-over ratio loss calculation. Therefore, for the target detection model of ROI-1, before the intersection-to-loss calculation is performed by using the intersection-to-loss function of the horizontal detection frame, the following steps may be further performed: and carrying out rotation transformation on the rotation prediction detection frame obtained in the model training and the rotation labeling detection frame of the training sample image to respectively obtain a horizontal prediction detection frame and a horizontal labeling detection frame. Further, carrying out intersection ratio loss calculation on the horizontal prediction detection frame and the horizontal marking detection frame by utilizing an intersection ratio loss function aiming at the horizontal detection frame; and finishing model training and obtaining a target detection model based on loss processing of intersection ratio loss calculation.
Further, the intersection-to-parallel ratio loss function for the horizontal detection frame is performed based on the offset of the pixel point relative to each side of the horizontal detection frame; and the rotation transformation is to convert the offset of the pixel point in the rotation detection frame relative to one corner point of the rotation detection frame into the offset of the pixel point relative to the edge forming one corner point, and obtain the corresponding horizontal detection frame.
Further, the rotation transformation is performed using an offset conversion formula constructed based on the angle to which the degree of imaging deformation of each preset image region is adapted.
Illustratively, the cross-over loss (IOU loss) is adopted for both the position detection head of the horizontal labeling detection frame and the position detection head of the rotary labeling detection frame, and since the cross-over loss requires the calculation of the areas of the output detection frame and the labeling detection frame and the overlapping part of the two detection frames, the rotary detection frame (including the rotary labeling detection frame and the rotary prediction detection frame) cannot be calculated by directly using the cross-over loss function of the horizontal detection frame as the horizontal detection frame; therefore, the rotation labeling detection frame and the rotation prediction detection frame can be subjected to rotation transformation, converted into the horizontal labeling detection frame and the horizontal prediction detection frame, and then the intersection ratio loss can be calculated. Wherein, the rotation transformation is as the following formula:
Figure 562142DEST_PATH_IMAGE003
wherein the content of the first and second substances,
Figure 785312DEST_PATH_IMAGE004
an offset value of a certain pixel point in the heat map of the rotation prediction detection frame from the coordinate of a certain vertex of the rotation prediction detection frame,
Figure 850220DEST_PATH_IMAGE005
the center coordinates of the rotation can be set as the center coordinates of the output heat map,
Figure 452234DEST_PATH_IMAGE001
the size of the rotation angle is the same as the size of the rotation angle,
Figure 85341DEST_PATH_IMAGE006
the deviation of the corresponding pixel point in the horizontal detection frame heat map obtained through rotation transformation from the coordinate of the corresponding vertex of the detection frame is obtained. After the transformation of the formula, the method can be used,the output rotation detection frame can be converted into a corresponding horizontal detection frame to calculate the intersection ratio loss.
In order to better understand the above method, an application example of the object detection method for the fisheye image in the present application is described in detail below. The application example is applied to a target detection scene of a vehicle blind area.
The application example does not need to carry out distortion removal correction on the fisheye image, so that the calculation cost is reduced; different ROI are divided through observation and statistics of imaging deformation of early warning targets at different positions in an image, a target detection model of a general horizontal detection frame is adopted for a region with smaller deformation of the early warning targets in the image, and a target detection model of a rotary detection frame with an angle adaptive to the imaging deformation degree is adopted for the early warning target rotation problem which usually occurs at the image boundary. Wherein, the target detection model to rotatory detection frame mainly includes: the method comprises a labeling method of a rotation detection frame, a generation method of a training label and a loss calculation method of the rotation detection frame.
The application example can effectively detect early warning targets aiming at different ROIs, reduces the problem of complex characteristic modes of full-image training, and simultaneously, the rotary detection frame is more compact, so that the non-maximum value is more robust in inhibition, and great help is provided for subsequent ranging.
The application example mainly comprises the following steps:
1. and (5) installing and calibrating the lateral camera.
2. Driving image data (video frame segments) of different lighting conditions and different scenes are collected in a large number.
3. And designing a ROI detection and detection frame scheme according to the imaging difference of early warning targets at different positions in the image.
4. According to different detection ROI and detection frame schemes, classification labeling is carried out on the collected image, such as horizontal detection frame labeling and rotary detection frame labeling, and a training label of the rotary detection frame is generated.
5. And dividing the marked images into a training set, a testing set and a verification set.
6. And designing a deep neural network aiming at a hardware platform needing adaptation.
And 7, designing corresponding loss functions according to different detection frame schemes.
8. And respectively training the deep neural network in different training sets.
9. And testing the trained deep neural network.
Specifically, the method comprises the following steps:
1. firstly, a fisheye camera installed on a vehicle needs to be calibrated, and internal and external parameters of the camera are calibrated.
2. Road driving scene images (continuous video segments) of different lighting conditions and different scenes are collected in a large quantity. The driving data includes driving data in various scenes such as sunny days, rainy days, daytime, nighttime, high speed, urban areas and the like.
3. And observing the acquired fisheye image data, and setting the ROI and the type of the detection frame. As shown in fig. 2, according to the installation position of the fisheye camera on the left side, the obtained fisheye image has a large imaging difference of the early warning vehicle target at different positions, wherein the rotation deformation degree of the vehicle at the near position is small, and the rotation deformation degree of the vehicle at the far position is large. For this reason, designing two different ROIs (ROI-1 and ROI-2) reduces the difficulty of model training due to imaging differences. Meanwhile, for the early warning target in the ROI-1, a detection scheme of a rotary detection frame is designed, and because the imaging of the early warning target has a fixed rotation angle under the fixed installation position of the fisheye camera, the unified rotation angle is set to be
Figure 654862DEST_PATH_IMAGE001
Rotation angle of
Figure 996896DEST_PATH_IMAGE001
To warn the rotation angle of the target to the rotated state in the normal imaging state, the rotation angle is set in the fisheye images on both sides
Figure 968263DEST_PATH_IMAGE001
Are positive and negative; and adopting a normal horizontal detection frame detection scheme for the early warning target in the ROI-2.
4. Manually marking the acquired data according to the setting of the ROI and the detection frame, wherein the early warning target in the ROI-2 can be marked according to a normal horizontal detection frame marking mode, namely determining the top left corner vertex coordinate and the bottom right corner vertex coordinate of the horizontal marking detection frame; for the early warning target in ROI-1, as shown in FIG. 3, taking the left fisheye image as an example, step S301 first rotates the ROI-1 clockwise by a certain angle
Figure 772271DEST_PATH_IMAGE007
Rotating the early warning target to a horizontal state, then labeling based on the horizontal detection frame, obtaining coordinate values of four vertexes of the horizontal labeling detection frame, and then rotating the obtained horizontal labeling detection frame counterclockwise by a rotation angle in step S302
Figure 579821DEST_PATH_IMAGE002
And finally, performing translation conversion in step S303, namely converting the labels of the rotating label detection frame into a coordinate system of the ROI-1 diagram. And after the labeling is finished, dividing the labeled sample images into a training set, a testing set and a verification set.
5. A deep neural network is designed aiming at a hardware platform needing adaptation, a backbone network adopts the deep neural network with multi-scale feature fusion, two branches are output and respectively serve as a category branch and a detection frame branch, and the structure is shown in figure 4.
6. Training labels are designed aiming at the horizontal labeling detection frame and the rotary labeling detection frame, and the labels comprise two types, one type is a category training label, and the other type is a position training label.
As shown in fig. 5, the class training labels take the form of a heatmap (heatmap) without anchor points, where the pixel value of the black area may be set to 0, indicating a background portion, participating in training and returning loss, the pixel value of the gray area may be set to-1, indicating a negligible portion, not participating in training and not returning loss, and the pixel value of the white area may be set to 1, indicating a valid box portion, participating in training and returning loss. For the horizontal labeling detection frame, a heat map can be generated through simple array operation, and for the rotary labeling detection frame, the heat map is generated in a polygon filling mode in opencv.
The position training label is the offset of the pixel position in the white area in the heat map relative to the coordinates of two vertexes of the labeled detection box, for example, dt represents the offset of the pixel in the white area to the y coordinate of the vertex at the upper left corner of the detection box.
7. Designing a loss function, and adopting the same focal loss (focal loss) for the category detection head, the early warning target marked by the horizontal detection frame and the early warning target marked by the rotary detection frame; for the position detection head, the horizontal detection frame and the rotation detection frame both adopt intersection loss (IOU loss), because the intersection loss needs to calculate the areas of the output detection frame and the label detection frame and the overlapping part of the two detection frames, the rotation detection frame can not be directly calculated like the horizontal detection frame, therefore, the rotation conversion is firstly carried out on the output rotation detection frame and the labeled rotation detection frame, the output rotation detection frame and the labeled rotation detection frame are converted into the horizontal detection frame, and then the intersection loss is calculated. The rotational transformation is as follows:
Figure 986531DEST_PATH_IMAGE008
wherein the content of the first and second substances,
Figure 687771DEST_PATH_IMAGE009
for the deviation value of the coordinate of a certain pixel point in the output rotation detection frame heat map from a certain vertex of the detection frame,
Figure 803626DEST_PATH_IMAGE010
as the rotation center coordinate, the center coordinate of the output heat map may be set,
Figure 82161DEST_PATH_IMAGE001
the size of the rotation angle is the same as the size of the rotation angle,
Figure 167928DEST_PATH_IMAGE011
is changed into by rotationAnd shifting the corresponding pixel point in the horizontal detection frame heat map from the coordinate of the corresponding vertex of the detection frame. After the conversion of the formula, the output rotation detection frame can be converted into a horizontal detection frame to calculate the intersection ratio loss.
8. According to the section 6, many redundant pixel points in the heat map obtain category information and detection frame information, and a Non-Maximum Suppression (Non-Maximum Suppression) method is adopted to filter out redundant results in the testing stage, wherein the calculation of the cross-to-parallel ratio is involved. Therefore, the same rotation conversion step as that of part 7 is taken for the output result of the rotation detection block, thereby filtering out redundant rotation detection blocks.
9. Respectively training a ROI-1 deep neural network and a ROI-2 deep neural network by using training sets of different ROIs and labeling detection frames, and finally testing two obtained deep neural network models, wherein a test result is shown in FIG. 6, rotary prediction detection frames 601, 602 and 604 predict by using a target detection model aiming at an ROI-1 image to obtain a vehicle position identification result, and the rotary prediction detection frames 601, 602 and 604 have the same angle and are consistent with a rotation angle adopted by the time labeling detection frame when the target detection model is trained; the horizontal prediction detection frames 603 and 605 are vehicle position identification results obtained by predicting by using a target detection model aiming at the ROI-2 image, and the horizontal prediction detection frames 603 and 605 have the same angle and are consistent with the angle of a horizontal labeling detection frame adopted during training of the target detection model; the effectiveness of the object detection method for the fisheye image in the application can be seen from fig. 6.
The application example has the beneficial effects that: the fisheye image is not required to be subjected to distortion removal operation, different detection ROI only need to be divided according to imaging effect, and then corresponding detection frame types are set. For the early warning target with rotation deformation, the invention provides a rotation detection frame detection method without an anchor point, on one hand, the anchor point setting is avoided, the flow is simplified, on the other hand, the rotation detection frame is more compact, so that the subsequent post-processing result of non-maximum suppression is more robust, and the distance measurement is more accurate.
It should be understood that, although the steps in the flowcharts of fig. 1 to 6 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1 to 6 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the other steps or stages.
In one embodiment, as shown in fig. 7, there is provided an object detection apparatus for a fisheye image, including:
a fisheye image obtaining module 701, configured to obtain a fisheye image to be processed;
an image extraction module 702 of the region to be detected, configured to extract the image of the region to be detected corresponding to each preset image region in the fisheye image; the imaging deformation degrees corresponding to different preset image areas are different;
the model detection module 703 is configured to input each to-be-detected region image to a corresponding target detection model; the different target detection models are used for carrying out target detection on images corresponding to different preset image areas by using detection frames with different angles; the angle of the detection frame is adaptive to the imaging deformation degree;
a detection result obtaining module 704, configured to obtain target detection results output by each target detection model;
and the target positioning module 705 is configured to obtain a position of the detected target in the fisheye image according to the target detection result output by each target detection model.
In one embodiment, the above apparatus further comprises: the model training module is used for acquiring a sample image to be marked corresponding to any preset image area from a sample fisheye image based on any preset image area selected from each preset image area; wherein the sample image to be annotated is horizontal relative to the sample fisheye image; rotating the sample image to be marked according to the angle which is adaptive to the imaging deformation degree corresponding to any preset image area to obtain a rotating sample image to be marked; the target to be marked in the sample image to be marked is rotated to be horizontal relative to the sample fisheye image, and the sample image to be marked is rotated relative to the sample fisheye image; obtaining a size expansion image based on the rotating sample image to be marked; the image size of the size-expanded image is larger than that of the sample image to be marked in the rotating mode and is horizontal relative to the sample fisheye image; the size expansion image comprises the rotating sample image to be marked and a black edge image surrounding the rotating sample image to be marked, the rotating angle of the rotating sample image to be marked relative to the size expansion image is the adaptive angle, and the target to be marked is horizontal relative to the size expansion image; marking the target to be marked in the sample image to be marked in the rotation mode by using the detection frame which is horizontal relative to the size-expanded image to obtain a corresponding horizontal marking detection frame, and taking the sample image to be marked in the rotation mode, which comprises the horizontal marking detection frame, as a first marking sample image; according to the adaptive angle, performing rotation processing on the first labeling sample image and the horizontal labeling detection frame aiming at the rotation processing to respectively obtain a second labeling sample image which is horizontal relative to the size-expanded image and a rotation labeling detection frame which rotates relative to the size-expanded image; the rotation angle of the rotation labeling detection frame relative to the size expansion image is the adaptive angle, and a black edge image of the size expansion image is changed from a sample image to be labeled around the rotation to the second labeled sample image; acquiring a second labeling sample image except for a black edge image in the size-expanded image, and converting a coordinate system where the rotating labeling detection frame is located from the coordinate system of the size-expanded image into the coordinate system of the second labeling sample image; taking a second labeled sample image of the rotating labeled detection frame in the coordinate system of the second labeled sample image as a training sample image; and carrying out model training based on the training sample image to obtain a target detection model aiming at any preset image area.
In an embodiment, the model training module is further configured to determine a first corner and a second corner of the rotation labeling detection box of the training sample image, which have a diagonal relationship; determining a position training label of the training sample image according to the offset of each pixel point in the rotation labeling detection frame of the training sample image relative to the first corner point and the second corner point respectively; and performing model training based on the training sample image and the training label to obtain a target detection model for any preset image area.
In an embodiment, the model training module is further configured to obtain a heat map with an image size corresponding to an image size of the training sample image; determining a heat area corresponding to a rotation marking detection frame of the training sample image in the heat map; the heat degree area comprises a first area point corresponding to the first corner point and a second area point corresponding to the second corner point; determining a corresponding predicted value of each pixel point in the heat region in model training based on the offset of each pixel point in the heat region relative to the first region point or the second region point; and obtaining the position training label based on the heat map of the heat area with the predicted value.
In one embodiment, different offsets correspond to different position training labels, where the different offsets include an offset of a pixel point relative to the first corner along a first image edge direction of the sample fisheye image, an offset of a pixel point relative to the first corner along a second image edge direction of the fisheye image orthogonal to the first image edge, an offset of a pixel point relative to the second corner along the first image edge direction, and an offset of a pixel point relative to the second corner along the second image edge.
In one embodiment, each target detection model comprises a category detection head for determining the category of the target and a position detection head for determining the position of the target; the target detection result further comprises the category of the detected target determined by the target detection model based on the category detection head; the position training labels are for the position detection head; the heat map as a class training label for the class detection head includes: the method comprises the following steps that a heat area with pixel values of pixel points all being first preset values and a background area with pixel values of pixel points all being second preset values; the heat area corresponds to a rotation mark detection frame of the training sample image, the background area corresponds to a background area in the training sample image except the rotation mark detection frame, and the first preset value is different from the second preset value.
In one embodiment, the hot region of the class training label is obtained by inward contraction of the original plate region of the class training label corresponding to the rotation labeling detection frame of the training sample image; the pixel value of the area between the original plate area and the heat area is a third preset value; and pixel values of all pixel points in the heat area and pixel values of all pixel points in the background area participate in the model training and return corresponding loss, and pixel values of an area between the original edition area and the heat area do not participate in the model training and return corresponding loss.
In an embodiment, the model training module is further configured to perform rotation transformation on a rotation prediction detection frame obtained in the model training and a rotation labeling detection frame of the training sample image to obtain a horizontal prediction detection frame and a horizontal labeling detection frame, respectively; calculating the intersection ratio loss of the horizontal prediction detection frame and the horizontal marking detection frame by using an intersection ratio loss function aiming at the horizontal detection frame; and transmitting the calculation result based on the intersection ratio loss calculation back to the trained model, finishing the model training and obtaining the target detection model.
In one embodiment, the intersection-to-parallel ratio loss function for the horizontal detection frame is performed based on the offset of the pixel point relative to each side of the horizontal detection frame; and the rotation transformation is to convert the offset of the pixel point in the rotation detection frame relative to one corner point of the rotation detection frame into the offset of the pixel point relative to the edge forming the one corner point, and obtain the corresponding horizontal detection frame.
In one embodiment, the rotation transformation is performed by using an offset conversion formula constructed based on an angle adapted to the imaging deformation degree of each preset image area.
For specific definition of the object detection apparatus for a fisheye image, reference may be made to the above definition of the object detection method for a fisheye image, and details are not repeated here. All or part of the modules in the target detection device for the fisheye image can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, the internal structure of which may be as shown in FIG. 8. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of object detection for fisheye images. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 8 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps of the above-described method embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the respective method embodiment as described above.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method of object detection for fisheye images, the method comprising:
obtaining a fisheye image to be processed;
extracting the images of the areas to be detected corresponding to the preset image areas on the fisheye images; the imaging deformation degrees corresponding to different preset image areas are different;
respectively inputting each area image to be detected to a corresponding target detection model; the different target detection models are used for carrying out target detection on images corresponding to different preset image areas by using detection frames with different angles; the angle of the detection frame is adaptive to the imaging deformation degree; each target detection model is detected based on a detection frame which has no anchor point and has an angle adaptive to the imaging deformation degree;
acquiring target detection results output by each target detection model;
and obtaining the position of the detected target in the fisheye image according to the target detection result output by each target detection model.
2. The method of claim 1,
the method further comprises the following steps:
acquiring a sample image to be marked corresponding to any preset image area from a sample fisheye image based on any preset image area selected from each preset image area; wherein the sample image to be annotated is horizontal relative to the sample fisheye image;
rotating the sample image to be marked according to the angle which is adaptive to the imaging deformation degree corresponding to any preset image area to obtain a rotating sample image to be marked; the target to be marked in the sample image to be marked is rotated to be horizontal relative to the sample fisheye image, and the sample image to be marked is rotated relative to the sample fisheye image;
obtaining a size expansion image based on the rotating sample image to be marked; the image size of the size-expanded image is larger than that of the sample image to be marked in the rotating mode and is horizontal relative to the sample fisheye image; the size expansion image comprises the rotating sample image to be marked and a black edge image surrounding the rotating sample image to be marked, the rotating angle of the rotating sample image to be marked relative to the size expansion image is the adaptive angle, and the target to be marked is horizontal relative to the size expansion image;
marking the target to be marked in the sample image to be marked in the rotation mode by using the detection frame which is horizontal relative to the size-expanded image to obtain a corresponding horizontal marking detection frame, and taking the sample image to be marked in the rotation mode, which comprises the horizontal marking detection frame, as a first marking sample image;
according to the adaptive angle, performing rotation processing on the first labeling sample image and the horizontal labeling detection frame aiming at the rotation processing to respectively obtain a second labeling sample image which is horizontal relative to the size-expanded image and a rotation labeling detection frame which rotates relative to the size-expanded image; the rotation angle of the rotation labeling detection frame relative to the size expansion image is the adaptive angle, and a black edge image of the size expansion image is changed from a sample image to be labeled around the rotation to the second labeled sample image;
acquiring a second labeling sample image except for a black edge image in the size-expanded image, and converting a coordinate system where the rotating labeling detection frame is located from the coordinate system of the size-expanded image into the coordinate system of the second labeling sample image;
taking a second labeled sample image of the rotating labeled detection frame in the coordinate system of the second labeled sample image as a training sample image;
and carrying out model training based on the training sample image to obtain a target detection model aiming at any preset image area.
3. The method of claim 2,
the model training based on the training sample image to obtain a target detection model for any preset image area comprises:
determining a first corner point and a second corner point which have a diagonal relation and are of a rotating labeling detection frame of the training sample image;
determining a position training label of the training sample image according to the offset of each pixel point in the rotation labeling detection frame of the training sample image relative to the first corner point and the second corner point respectively;
and performing model training based on the training sample image and the position training label to obtain a target detection model for any preset image area.
4. The method of claim 3, wherein determining the position training label of the training sample image according to offsets of pixel points in the rotation labeling detection frame of the training sample image relative to the first corner point and the second corner point respectively comprises:
acquiring a heat map of which the image size corresponds to the image size of the training sample image;
determining a heat area corresponding to a rotation marking detection frame of the training sample image in the heat map; the heat degree area comprises a first area point corresponding to the first corner point and a second area point corresponding to the second corner point;
determining a corresponding predicted value of each pixel point in the heat region in model training based on the offset of each pixel point in the heat region relative to the first region point or the second region point;
and obtaining the position training label based on a heat map of the heat area with the predicted value.
5. The method of claim 4, wherein offsets correspond to different position training labels, and the offsets include an offset of a pixel point relative to the first corner along a first image edge direction of the sample fisheye image, an offset of a pixel point relative to the first corner along a second image edge direction of the fisheye image orthogonal to the first image edge, an offset of a pixel point relative to the second corner along the first image edge direction, and an offset of a pixel point relative to the second corner along the second image edge.
6. The method according to claim 4, wherein each object detection model comprises a category detection head for determining a category to which an object belongs and a position detection head for determining a position where the object is located; the target detection result further comprises the category of the detected target determined by the target detection model based on the category detection head;
the position training labels are for the position detection head;
the heat map as a class training label for the class detection head includes: the method comprises the following steps that a heat area with pixel values of pixel points all being first preset values and a background area with pixel values of pixel points all being second preset values; the heat area corresponds to a rotation mark detection frame of the training sample image, the background area corresponds to a background area in the training sample image except the rotation mark detection frame, and the first preset value is different from the second preset value; the hot degree area of the class training label is obtained by inwards shrinking the class training label and an original edition area corresponding to the rotating label detection frame of the training sample image; the pixel value of the area between the original plate area and the heat area is a third preset value; and pixel values of all pixel points in the heat area and pixel values of all pixel points in the background area participate in the model training and return corresponding loss, and pixel values of an area between the original edition area and the heat area do not participate in the model training and return corresponding loss.
7. The method according to claim 2, wherein the performing model training based on the training sample image to obtain a target detection model for any one of the preset image regions comprises:
carrying out rotation transformation on a rotation prediction detection frame obtained in model training and a rotation labeling detection frame of the training sample image to respectively obtain a horizontal prediction detection frame and a horizontal labeling detection frame;
calculating the intersection ratio loss of the horizontal prediction detection frame and the horizontal marking detection frame by using an intersection ratio loss function aiming at the horizontal detection frame;
and transmitting the calculation result based on the intersection ratio loss calculation back to the trained model, completing model training and obtaining the target detection model.
8. An object detection apparatus for a fisheye image, the apparatus comprising:
the fisheye image acquisition module is used for acquiring a fisheye image to be processed;
the image extraction module of the area to be detected is used for extracting the image of the area to be detected corresponding to each preset image area on the fisheye image; the imaging deformation degrees corresponding to different preset image areas are different;
the model detection module is used for respectively inputting the images of the areas to be detected to the corresponding target detection models; the different target detection models are used for carrying out target detection on images corresponding to different preset image areas by using detection frames with different angles; the angle of the detection frame is adaptive to the imaging deformation degree; each target detection model is detected based on a detection frame which has no anchor point and has an angle adaptive to the imaging deformation degree;
the detection result acquisition module is used for acquiring target detection results output by each target detection model;
and the target positioning module is used for obtaining the position of the detected target in the fisheye image according to the target detection result output by each target detection model.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the method of any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 7.
CN202011306070.1A 2020-11-20 2020-11-20 Target detection method, device and equipment for fisheye image and storage medium Active CN112101361B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011306070.1A CN112101361B (en) 2020-11-20 2020-11-20 Target detection method, device and equipment for fisheye image and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011306070.1A CN112101361B (en) 2020-11-20 2020-11-20 Target detection method, device and equipment for fisheye image and storage medium

Publications (2)

Publication Number Publication Date
CN112101361A CN112101361A (en) 2020-12-18
CN112101361B true CN112101361B (en) 2021-04-23

Family

ID=73785273

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011306070.1A Active CN112101361B (en) 2020-11-20 2020-11-20 Target detection method, device and equipment for fisheye image and storage medium

Country Status (1)

Country Link
CN (1) CN112101361B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112799055A (en) * 2020-12-28 2021-05-14 深圳承泰科技有限公司 Method and device for detecting detected vehicle and electronic equipment
CN112733677B (en) * 2020-12-31 2021-11-30 桂林海威科技股份有限公司 People flow rate statistical system and method
CN112863183B (en) * 2021-01-14 2022-04-08 深圳尚桥信息技术有限公司 Traffic flow data fusion method and system
CN112906691B (en) * 2021-01-29 2023-11-24 深圳安智杰科技有限公司 Distance measurement method and device, storage medium and electronic equipment
CN114445794A (en) * 2021-12-21 2022-05-06 北京罗克维尔斯科技有限公司 Parking space detection model training method, parking space detection method and device
CN114363522A (en) * 2022-01-17 2022-04-15 Oppo广东移动通信有限公司 Photographing method and related device
CN117671229A (en) * 2022-09-07 2024-03-08 影石创新科技股份有限公司 Image correction method, apparatus, computer device, and computer-readable storage medium
CN117011474B (en) * 2023-09-26 2024-01-30 深圳魔视智能科技有限公司 Fisheye image sample generation method, device, computer equipment and storage medium
CN117541761A (en) * 2023-11-14 2024-02-09 珠海安联锐视科技股份有限公司 Deep learning-based fisheye lens parcel detection method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5739722B2 (en) * 2011-04-26 2015-06-24 株式会社日立情報通信エンジニアリング Object recognition method and recognition apparatus
CN106846415B (en) * 2017-01-24 2019-09-20 长沙全度影像科技有限公司 A kind of multichannel fisheye camera binocular calibration device and method
US10796402B2 (en) * 2018-10-19 2020-10-06 Tusimple, Inc. System and method for fisheye image processing
CN111754394B (en) * 2020-06-29 2022-06-10 苏州科达科技股份有限公司 Method and device for detecting object in fisheye image and storage medium

Also Published As

Publication number Publication date
CN112101361A (en) 2020-12-18

Similar Documents

Publication Publication Date Title
CN112101361B (en) Target detection method, device and equipment for fisheye image and storage medium
CN107424116B (en) Parking space detection method based on side surround view camera
CN111652097B (en) Image millimeter wave radar fusion target detection method
TWI722355B (en) Systems and methods for correcting a high-definition map based on detection of obstructing objects
Goldbeck et al. Lane detection and tracking by video sensors
Gupta et al. Detection and localization of potholes in thermal images using deep neural networks
CN113989305B (en) Target semantic segmentation method and street target abnormity detection method applying same
CN112488083B (en) Identification method, device and medium of traffic signal lamp based on key point extraction of hetmap
CN112037249A (en) Method and device for tracking object in image of camera device
CN111768332A (en) Splicing method of vehicle-mounted all-around real-time 3D panoramic image and image acquisition device
CN110659548B (en) Vehicle and target detection method and device thereof
Bu et al. Mask-CDNet: A mask based pixel change detection network
CN113111722A (en) Automatic driving target identification method based on improved Mask R-CNN
CN115327524A (en) Road side end target detection method and device based on millimeter wave radar and vision fusion
WO2020199057A1 (en) Self-piloting simulation system, method and device, and storage medium
CN114332644A (en) Large-view-field traffic density acquisition method based on video satellite data
CN110751598B (en) Vehicle hinge point coordinate calibration method and device, computer equipment and storage medium
CN112785653A (en) Vehicle-mounted camera attitude angle calibration method
CN113942503B (en) Lane keeping method and device
CN114255450A (en) Near-field vehicle jamming behavior prediction method based on forward panoramic image
CN115565134B (en) Diagnostic method, system, equipment and storage medium for monitoring blind area of ball machine
CN113255405A (en) Parking space line identification method and system, parking space line identification device and storage medium
Geda et al. Automatic Top-View Transformation and Image Stitching of In-Vehicle Smartphone Camera for Road Crack Evaluation
Saponara Real-time color/shape-based traffic signs acquisition and recognition system
Li et al. Automatic Multi-Camera Calibration and Refinement Method in Road Scene for Self-driving Car

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: Floor 25, Block A, Zhongzhou Binhai Commercial Center Phase II, No. 9285, Binhe Boulevard, Shangsha Community, Shatou Street, Futian District, Shenzhen, Guangdong 518000

Patentee after: Shenzhen Youjia Innovation Technology Co.,Ltd.

Address before: 518051 1101, west block, Skyworth semiconductor design building, 18 Gaoxin South 4th Road, Gaoxin community, Yuehai street, Nanshan District, Shenzhen City, Guangdong Province

Patentee before: SHENZHEN MINIEYE INNOVATION TECHNOLOGY Co.,Ltd.