CN112528907B

CN112528907B - Anchor frame generation and label frame adaptation method and device and computer storage medium

Info

Publication number: CN112528907B
Application number: CN202011507519.0A
Authority: CN
Inventors: 易长渝
Original assignee: Sichuan Yuncong Tianfu Artificial Intelligence Technology Co ltd
Current assignee: Sichuan Yuncong Tianfu Artificial Intelligence Technology Co ltd
Priority date: 2020-12-18
Filing date: 2020-12-18
Publication date: 2024-04-09
Anticipated expiration: 2040-12-18
Also published as: CN112528907A

Abstract

An anchor frame generation method, a device and a computer storage medium thereof, wherein the anchor frame generation method comprises the steps of converting a target image comprising grid units into a target characteristic sub-image comprising target positions, dividing each grid unit into a plurality of grid sub-units according to anchor frame fineness parameters, generating anchor frames meeting anchor frame unit quantity parameters in each grid sub-unit, the label frame adaptation method comprises the steps of converting an image to be detected with a label frame into a characteristic sub-image to be detected, determining one grid sub-unit matched with the label frame from each grid sub-unit according to the central position of the label frame and a preset step length, taking the grid sub-unit as the grid sub-unit to be matched, and respectively calculating the adaptation degree between each anchor frame and the label frame according to each anchor frame generated in each grid sub-unit to be matched, so as to determine one anchor frame matched with the label frame. Therefore, the detection rate of the small target can be increased, and the positioning accuracy of the marking frame is improved.

Description

Anchor frame generation and label frame adaptation method and device and computer storage medium

Technical Field

The embodiment of the application relates to an image detection technology, in particular to an anchor frame generation and annotation frame adaptation method, device and computer storage medium.

Background

In the task of full-structure-pedestrian detection, the detection model is usually deployed at a higher camera position such as a traffic lane, a building entrance and the like, so that pedestrian targets to be detected are often smaller, particularly pedestrians with heights below 60 pixels are more, the faces and the heads of the pedestrian targets are smaller, and in the pedestrian detection test set, the human bodies with heights below 60 pixels occupy 63.46% of the total human body, and the faces and the heads with heights below 30 pixels occupy 73.81% and 70.71% of the total human face and the total head respectively.

Therefore, in the full-structure-pedestrian detection task, how to improve the recall rate of the small targets (namely, the ratio of the detected correct targets to the total targets) is one of the most important factors for improving the overall performance of the model.

In addition, the speed of reasoning (e.g., achieving a real-time reasoning effect) of the pedestrian detection model is one of the most fundamental requirements. Therefore, how to increase the recall rate of the small target and simultaneously consider the reasoning speed and the occupied video memory of the model is the technical problem to be solved in the application.

Disclosure of Invention

In view of the foregoing, the present application provides a method, apparatus and computer storage medium for anchor frame generation and label frame adaptation thereof, so as to overcome or at least partially solve the foregoing problems.

The first aspect of the present application provides an anchor frame generating method, which includes: performing conversion on a target image comprising a plurality of grid cells based on a first preset feature conversion rule to obtain a target feature sub-image comprising a plurality of target positions, wherein each target position in the target feature sub-image corresponds to each grid cell in the target image; dividing each grid unit corresponding to each target position into a plurality of grid sub-units according to the anchor frame fineness parameters; and generating anchor frames meeting the anchor frame unit number parameters in each grid subunit according to the anchor frame unit number parameters.

A second aspect of the present application provides a computer storage medium having stored therein instructions for performing each of the steps of the anchor frame generation method described in the first aspect above.

The third aspect of the application provides a labeling frame adaptation method, which comprises the steps of performing conversion on an image to be detected with a labeling frame based on a second preset feature conversion rule to obtain a feature sub-image to be detected containing the labeling frame; determining one grid subunit which is matched with the annotation frame from the grid subunits according to the central position of the annotation frame of the feature sub-image to be detected and a preset step length, wherein the grid subunits are used as grid subunits to be matched; and according to each anchor frame generated in the grid subunit to be adapted, respectively calculating the adaptation degree between each anchor frame and the marking frame, obtaining each adaptation value corresponding to each anchor frame, and determining one anchor frame adapted to the marking frame according to each adaptation value corresponding to each anchor frame.

A fourth aspect of the present application provides a computer storage medium having stored therein instructions for performing each of the steps of the method for adapting a callout box as described in the third aspect above.

A fifth aspect of the present application provides an anchor frame generating device, which includes a first conversion module that performs conversion on a target image including a plurality of grid cells based on a first preset feature conversion rule, and obtains a target feature sub-image including a plurality of target positions, where each of the target positions in the target feature sub-image corresponds to each of the grid cells in the target image; the dividing module divides each grid unit corresponding to each target position into a plurality of grid sub-units according to the anchor frame fineness parameters; and the anchor frame generation module is used for generating anchor frames meeting the anchor frame unit number parameters in each grid subunit according to the anchor frame unit number parameters.

A sixth aspect of the present application provides a labeling frame adapting device, which includes a second conversion module, configured to perform conversion on an image to be detected having a labeling frame based on a second preset feature conversion rule, to obtain a sub-image of a feature to be detected including the labeling frame; the adaptation module is used for determining one grid subunit which is matched with the marking frame from the grid subunits according to the central position of the marking frame of the characteristic sub-image to be detected and a preset step length to serve as the grid subunit to be matched, respectively calculating the adaptation degree between each anchor frame and the marking frame according to each anchor frame generated in the grid subunit to be matched, obtaining each adaptation value corresponding to each anchor frame, and determining one anchor frame which is matched with the marking frame according to each adaptation value corresponding to each anchor frame.

According to the technical scheme, the anchor frame generation method, the anchor frame generation device, the label frame adaptation method, the anchor frame generation device and the computer storage medium can increase the detection rate of small targets and improve the accuracy of registration of the label frames. Furthermore, the anchor frame generation and label frame adaptation technology provided by the embodiment of the application has the advantages of high processing speed and lower equipment operation load.

In addition, the method and the device can meet different detection precision requirements by adjusting the fineness parameters of the anchor frame, can adapt to various target detection algorithms based on the anchor frame, and have the advantages of wide application range and high flexibility.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following description will briefly introduce the drawings that are required to be used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present application, and other drawings may also be obtained according to these drawings for a person having ordinary skill in the art.

Fig. 1 is a flow chart illustrating an anchor frame generating method according to a first embodiment of the present application.

Fig. 2 is a schematic flow chart of a labeling block adaptation method according to a third embodiment of the application.

Fig. 3 is a schematic diagram of an architecture shown in an anchor frame generating device according to a fifth embodiment of the present application.

Fig. 4 is a schematic architecture diagram of a label frame adapting device according to a sixth embodiment of the present application.

Element labels

300: an anchor frame generation device; 301: a first conversion module; 302: dividing the module; 303: an anchor frame generation module; 400: marking frame adapting device; 401: a second conversion module; 402: adapter module

Detailed Description

In order to better understand the technical solutions in the embodiments of the present application, the following descriptions will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the embodiments of the present application shall fall within the scope of protection of the embodiments of the present application.

Currently, for conventional single-stage detectors, the following two approaches are generally adopted for increasing the small target detection rate:

the first idea is to enlarge the size of the picture input: experiments show that the same multiple of the input size of the picture is enlarged in the training stage and the testing stage, so that obvious gains are brought to the effect of the detection model. When the size of the input picture is multiplied by 2 times, the mAP can be raised by more than 5 points; however, as the width and height of the picture are increased by 2 times, the size of the characteristic picture at each stage is also increased by 2 times, so that the calculated amount of the characteristic picture becomes 4 times of the original calculated amount (about 4 times is actually measured through actual speed measurement) in theory, and the reasoning time is greatly increased.

The second idea is to use a large resolution feature map for detection: the present pedestrian detector uses the feature layers of the 2 nd, 3 rd and 4 th stage of the backbone network (corresponding to 1/8,1/16 and 1/32 of the resolution of the input picture respectively). For example, when the input picture size is 640×384, the largest feature map for detection is only 80×48, and in order to improve the small target detection rate, a 1/4 resolution feature map is used for the detection head, but the effect is not ideal due to weak semantic information of the feature map of the lower stage, and the detection task is not greatly enhanced.

In view of the above-mentioned various technical problems existing in the full-structured-pedestrian detection task, the present application provides a method, a device and a computer storage medium for anchor frame generation and label frame adaptation, which can effectively improve the recall rate of a small target on the premise of considering the model reasoning speed and occupying the video memory. Embodiments of the present application are further described below with reference to the accompanying drawings of embodiments of the present application.

First embodiment

The anchor frame generation method of the embodiment is suitable for various target detection algorithms based on anchor frames, and is particularly suitable for a third-generation YOLO algorithm.

As shown in the figure, the anchor frame generating method in the embodiment of the present application mainly includes:

step S101, performing conversion on a target image including a plurality of grid cells based on a first preset feature conversion rule, to obtain a target feature sub-image including a plurality of target positions, where each target position in the target feature sub-image corresponds to each grid cell in the target image.

Optionally, the first preset feature transformation rule is, for example, a 1/8 resolution transformation rule, a 1/16 resolution transformation rule, a 1/32 resolution transformation rule, and the like, and the target feature sub-image generated based on the first preset feature transformation rule is, for example, a 1/8 resolution feature map, a 1/16 resolution feature map, a 1/32 resolution feature map, and the like.

Step S102, dividing each grid unit corresponding to each target position into a plurality of grid sub-units according to the anchor frame fineness parameters.

In this embodiment, the anchor frame fineness parameter is between 2 and 4, i.e. any one of 2, 3 and 4.

Specifically, the number of grid sub-units to be generated may be determined based on the square value of the anchor frame fineness parameter, and each grid sub-unit may be equally divided based on the number of grid sub-units to be generated, so as to obtain each grid sub-unit.

For example, when the anchor frame fineness parameter is 2, each grid cell is equally divided into 4 grid sub-cells; when the fineness parameter of the anchor frame is 3, each grid unit is equally divided into 9 grid sub-units; the anchor frame fineness parameter is 4, each grid cell is equally divided into 16 grid sub-cells.

And step S103, generating anchor frames meeting the anchor frame unit number parameters in each grid subunit according to the anchor frame unit number parameters.

Optionally, the anchor frame unit number parameter is at least one.

In this embodiment, the center position of at least one anchor frame generated in the grid sub-unit may be determined according to the center position of the grid sub-unit, and the aspect ratio of at least one anchor frame generated in the grid sub-unit may be determined based on a clustering algorithm.

Alternatively, when the anchor frame unit number parameter is at least two, the plurality of anchor frames generated in each grid sub-unit have the same center position but different aspect ratios.

The anchor frame generation method of the embodiment of the present application will be exemplarily described below by taking a third generation YOLO algorithm as an example.

In the current third generation YOLO algorithm, 3 anchor frames with different height-to-width ratios are adopted at each target position (i.e. each grid unit of the target image) of the target feature sub-image (e.g. 1/8 resolution feature image, 1/16 resolution feature image, 1/32 resolution feature image), so that the number of anchor frame units parameter can be set to 3, but not limited thereto, and the number of anchor frame units parameter can be adjusted arbitrarily according to the actual detection requirement, which is not limited thereto in the present application.

For example, when the anchor frame fineness parameter is set to 2, the number of grid sub-units to be generated can be determined to be 4 based on the square value of the anchor frame fineness parameter, and each grid unit in the target image can be equally divided into 4 grid sub-units based on the square value, and if the upper left vertex of each grid unit in the target image is taken as the origin of coordinates, it can be deduced that the center positions of each of the 4 grid sub-units generated in each grid unit are (0.25 ), (0.25,0.75), (0.75,0.25) and (0.75 ), respectively. And then, respectively generating 3 anchor frames with the same center position and different height-width ratios in the generated 4 grid subunits according to the unit number parameters of the anchor frames, wherein the height-width ratio of each anchor frame can be set along the height-width ratio of each anchor frame of the third generation YOLO algorithm or can be set according to the requirements. Accordingly, with the anchor frame generation method of the present application, 12 anchor frames can be generated at each target position of the target feature sub-image (i.e., each grid cell of the target image).

Alternatively, the anchor frame generation method of the present embodiment is performed by means of a CNN model.

In this embodiment, the convolution output channel of the CNN model is determined based on the anchor frame fineness parameter and the anchor frame unit number parameter.

Specifically, when the anchor frame fineness parameter is 2 and the anchor frame unit number parameter is 3, the convolution output channel of the CNN model is a (1+n) ² +c), wherein the number of channels of a corresponds to the anchor frame unit number parameter; n is a coordinate information channel, the number of which corresponds to the anchor frame fineness parameter and is used for identifying the coordinate information of each predicted grid subunit; 1 is a confidence channel for identifying that there is a confidence score for the detected object at the target location of the target feature sub-image (i.e., the grid cell of the target image), and C is a category channel, typically set to 1, for identifying the category of the visual detected object.

For example, taking a structured-pedestrian detection model as an example, since a multi-branch detection head is adopted, C of each individual branch is set to 1, when the anchor frame unit number parameter a is set to 3 and the anchor frame fineness parameter n is set to 2, the number of convolution output channels of the CNN model is 18; when the anchor frame unit number parameter a is set to 12 and the anchor frame fineness parameter n is set to 2, the number of convolution output channels of the CNN model is 72.

Compared with the existing target detection algorithm based on the anchor frame, the anchor frame generation method provided by the embodiment of the application can realize a finer anchor frame generation technical scheme by only increasing fewer channels and operation amount, and can assist in improving the detection rate of small targets (especially faces and heads).

Second embodiment

A second embodiment of the present application provides a computer storage medium having stored therein instructions for performing each of the steps of the anchor frame generating method described in the first embodiment.

Third embodiment

Fig. 2 is a schematic flow chart of a labeling block adaptation method according to a third embodiment of the application. The label frame adaptation method of the present embodiment is applied to the anchor frame generation method described in the first embodiment, and mainly includes the following steps:

step S201, performing conversion on the image to be detected with the labeling frame based on a second preset feature conversion rule to obtain a sub-image of the feature to be detected containing the labeling frame.

Optionally, the second preset feature transformation rule is, for example, a 1/8 resolution transformation rule, a 1/16 resolution transformation rule, a 1/32 resolution transformation rule, and the like, and the corresponding feature sub-image to be detected is, for example, a 1/8 resolution feature map, a 1/16 resolution feature map, a 1/32 resolution feature map, and the like.

Step S202, according to the central position of the labeling frame of the characteristic sub-image to be detected and the preset step length, determining one grid sub-unit matched with the labeling frame from the grid sub-units as the grid sub-unit to be matched.

In this embodiment, the preset step size is determined based on the image size of the image to be detected and the size of the feature sub-image to be detected.

For example, when the size of the image to be detected is 640×640 and the size of the feature sub-image to be detected is 80×80, the preset step size is 8.

In this embodiment, the coordinate value of the center position of the labeling frame may be divided by a preset step length to obtain a corresponding quotient; and determining the grid sub-unit to be adapted, which is adapted to the labeling frame, according to the numerical value of the decimal part in the quotient and the anchor frame fineness parameter.

For example, when each grid cell in the target image is equally divided into 4 grid sub-cells, after dividing the coordinate value of the central position of the labeling frame by a preset step length, if the decimal values of X and Y are all smaller than 0.5, determining the grid sub-cell located at the upper left corner of the grid cell as a grid sub-cell to be adapted to the labeling frame; if the decimal value of X is smaller than 0.5 and the decimal value of Y is larger than 0.5, determining the grid subunit positioned at the upper right corner of the grid unit as a grid subunit to be adapted, which is adapted to the labeling frame; if the decimal value of X is greater than 0.5 and the decimal value of Y is less than 0.5, determining the grid subunit positioned at the lower left corner of the grid unit as a grid subunit to be adapted, which is adapted to the labeling frame; if the decimal values of X and Y are both larger than 0.5, determining the grid sub-unit positioned at the lower right corner of the grid unit as the grid sub-unit to be adapted, which is adapted to the labeling frame.

For another example, when each grid cell in the target image is equally divided into 9 grid sub-cells, after dividing the coordinate value of the central position of the labeling frame by the preset step length, the decimal values of each X, Y may be divided according to the interval of 0 to 1/3, the interval of 1/3 to 2/3, and the interval of 2/3 to 1, so as to determine the grid sub-cell to be adapted to the labeling frame, which is not described herein.

Step S203, according to each anchor frame generated in the grid sub-unit to be adapted, the adaptation degree between each anchor frame and the marking frame is calculated, each adaptation value corresponding to each anchor frame is obtained, and one anchor frame adapted to the marking frame is determined according to each adaptation value corresponding to each anchor frame.

In this embodiment, ioU between each anchor frame and the labeling frame is calculated according to each anchor frame diagonal coordinate value and the labeling frame diagonal coordinate value of each anchor frame to obtain each IoU value corresponding to each anchor frame, and one anchor frame adapted to the labeling frame is determined according to the maximum one of each IoU values corresponding to each anchor frame.

It should be noted that, besides the third generation YOLO algorithm described above, the present application may also be adapted to other common anchor frame-based target detection algorithms, including but not limited to RetinaNet, etc.

In summary, the embodiment of the present application correspondingly designs a labeling frame adaptation method based on the anchor frame generation method of the first embodiment, so as to implement a finer labeling frame-anchor frame adaptation scheme, so as to increase the positioning accuracy of the labeling frame, and thus improve the detection rate of the small target.

Furthermore, the method for adapting the annotation frame in the embodiment of the application only increases less operation time and memory consumption, and can meet the technical requirement of real-time reasoning.

Fourth embodiment

A fourth embodiment of the present application provides a computer storage medium storing instructions for executing each of the steps of the method for adapting a label frame described in the third embodiment.

Fifth embodiment

Fig. 3 is a schematic diagram of an architecture shown in an anchor frame generating device according to a fifth embodiment of the present application. As shown in the figure, the anchor frame generating device 300 of the present application mainly includes a first conversion module 301, a division module 302, and an anchor frame generating module 303.

The first conversion module 301 is configured to perform conversion on a target image including a plurality of grid cells based on a first preset feature conversion rule, and obtain a target feature sub-image including a plurality of target positions, where each target position in the target feature sub-image corresponds to each grid cell in the target image.

Optionally, the first preset feature transformation rule includes at least one of a 1/8 resolution transformation rule, a 1/16 resolution transformation rule, and a 1/32 resolution transformation rule, and the target feature sub-image includes at least one of a 1/8 resolution feature map, a 1/16 resolution feature map, and a 1/32 resolution feature map.

The dividing module 302 is configured to divide each grid unit corresponding to each target position into a plurality of grid sub-units according to the anchor frame fineness parameter.

Optionally, the partitioning module 302 further determines a generation number of the grid sub-units according to a square value of the anchor frame fineness parameter, and equally partitions each of the grid sub-units according to the generation number of the grid sub-units to obtain each of the grid sub-units.

Optionally, the anchor frame fineness parameter is between 2 and 4, and the anchor frame unit number parameter is at least one.

Optionally, the partitioning module 302 further includes determining a center position of at least one of the anchor frames generated in the grid sub-unit according to a center position of the grid sub-unit; and determining an aspect ratio of at least one of the anchor boxes generated in the grid sub-unit based on a clustering algorithm.

Optionally, the partitioning module 302 further includes generating at least two anchor frames having the same center position and different height-width ratios in each of the grid sub-units according to the anchor frame unit number parameter.

The anchor frame generating module 303 is configured to generate, in each of the grid sub-units, an anchor frame that meets the anchor frame unit number parameter according to the anchor frame unit number parameter.

Optionally, the apparatus includes a CNN model, wherein a convolution output channel of the CNN model is determined based on the anchor frame fineness parameter and the anchor frame unit number parameter.

In addition, the anchor frame generating device 300 of the embodiment of the present invention may be further used to implement other steps in the foregoing embodiments of the anchor frame generating method, and has the beneficial effects of the corresponding embodiments of the method steps, which are not described herein again.

Sixth embodiment

Fig. 4 is a schematic architecture diagram of a label frame adapting device according to a sixth embodiment of the present application. As shown in the figure, the labeling frame adapting device 400 mainly includes a second conversion module 401 and an adapting module 402.

The second conversion module 401 is configured to perform conversion on the image to be detected having the labeling frame based on a second preset feature conversion rule, so as to obtain a sub-image of the feature to be detected including the labeling frame.

Optionally, the second preset feature conversion rule at least includes one of a 1/8 resolution conversion rule, a 1/16 resolution conversion rule, and a 1/32 resolution conversion rule, and the feature sub-image to be detected at least includes one of a 1/8 resolution feature map, a 1/16 resolution feature map, and a 1/32 resolution feature map.

The adaptation module 402 is configured to determine, from each of the grid sub-units, one grid sub-unit adapted to the labeling frame as a grid sub-unit to be adapted according to a center position of the labeling frame of the feature sub-image to be detected and a preset step length, and calculate, according to each of the anchor frames generated in the grid sub-unit to be adapted, an adaptation degree between each of the anchor frames and the labeling frame, obtain each adaptation value corresponding to each of the anchor frames, and determine, according to each of the adaptation values corresponding to each of the anchor frames, one of the anchor frames adapted to the labeling frame.

Optionally, the adapting module 402 further includes determining the preset step size based on an image size of the image to be detected and a size of the feature sub-image to be detected.

Optionally, the adapting module 402 further includes dividing coordinate values of the center position of the labeling frame by the preset step length to obtain a corresponding quotient; and determining the grid sub-unit to be adapted, which is adapted to the labeling frame, according to the decimal numerical value in the quotient and the anchor frame fineness parameter.

Optionally, the adapting module 402 further includes calculating IoU between each anchor frame and the labeling frame according to each anchor frame diagonal coordinate value of each anchor frame and the labeling frame diagonal coordinate value of the labeling frame, so as to obtain each IoU value corresponding to each anchor frame.

Optionally, the adapting module 402 further includes determining one of the anchor boxes adapted to the labeling box according to a maximum of the IoU values corresponding to the anchor boxes.

In addition, the label frame adapting device 400 of the embodiment of the present invention may be further used to implement other steps in the foregoing label frame adapting method embodiments, and has the beneficial effects of the corresponding method step embodiments, which are not described herein again.

In summary, the anchor frame generation and label frame adaptation techniques provided in the embodiments of the present application may be applied to various target detection algorithms based on anchor frames, and may improve the positioning accuracy of label frames-anchor frames, so as to increase the detection rate of small targets (especially, head and face).

Furthermore, the anchor frame generation and label frame adaptation technology provided by the embodiments of the application only increases less operation time and memory consumption, and can meet the technical requirements of real-time reasoning.

In addition, the anchor frame fineness parameter and the anchor frame unit quantity parameter can be flexibly adjusted according to the actual detection precision requirement, and the anchor frame fineness parameter and the anchor frame unit quantity parameter have the advantages of being wide in application range and high in operation flexibility.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the embodiments of the present application, and are not limited thereto; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

1. An anchor frame generation method, comprising:

performing conversion on a target image comprising a plurality of grid cells based on a first preset feature conversion rule to obtain a target feature sub-image comprising a plurality of target positions, wherein each target position in the target feature sub-image corresponds to each grid cell in the target image; the first preset feature conversion rule at least comprises one of a 1/8 resolution conversion rule, a 1/16 resolution conversion rule and a 1/32 resolution conversion rule;

dividing each grid unit corresponding to each target position into a plurality of grid sub-units according to the anchor frame fineness parameters; and

and generating anchor frames meeting the anchor frame unit number parameters in each grid subunit according to the anchor frame unit number parameters.

2. The anchor frame generation method according to claim 1, wherein the dividing each of the grid cells corresponding to each of the target positions into a plurality of grid sub-cells according to an anchor frame fineness parameter comprises:

determining the generation quantity of the grid sub-units according to the square value of the anchor frame fineness parameter;

and equally dividing each grid cell according to the generated number of the grid sub-cells to obtain each grid sub-cell.

3. The anchor frame generation method according to claim 2, wherein the anchor frame fineness parameter is between 2 and 4, and the anchor frame unit number parameter is at least one.

4. The anchor frame generation method according to claim 3, wherein the generating anchor frames satisfying the anchor frame unit number parameter in each of the mesh sub-units according to the anchor frame unit number parameter comprises:

determining the central position of at least one anchor frame generated in the grid sub-unit according to the central position of the grid sub-unit;

an aspect ratio of at least one of the anchor boxes generated in the grid sub-unit is determined based on a clustering algorithm.

5. The anchor frame generating method according to claim 4, wherein the number of anchor frame unit parameters is at least two, and the generating anchor frames satisfying the number of anchor frame unit parameters in each of the grid sub-units according to the number of anchor frame unit parameters is specifically:

and generating at least two anchor frames with the same center position and different height-width ratios in each grid subunit according to the anchor frame unit quantity parameters.

6. The anchor frame generation method according to claim 1, wherein the target feature sub-image includes at least one of a 1/8 resolution feature map, a 1/16 resolution feature map, and a 1/32 resolution feature map.

7. The anchor frame generation method according to claim 1, wherein the method is applied to an anchor frame-based object detection algorithm.

8. The anchor frame generation method according to claim 1, wherein the method is performed by a CNN model, wherein a convolution output channel of the CNN model is determined based on the anchor frame fineness parameter and the anchor frame unit number parameter.

9. A method of annotation frame adaptation, characterized in that the method is applied to the anchor frame generation method according to any one of claims 1 to 8, and comprises:

performing conversion on the image to be detected with the labeling frame based on a second preset feature conversion rule to obtain a feature sub-image to be detected containing the labeling frame; the second preset feature conversion rule at least comprises one of a 1/8 resolution conversion rule, a 1/16 resolution conversion rule and a 1/32 resolution conversion rule;

determining one grid subunit which is matched with the annotation frame from the grid subunits according to the central position of the annotation frame of the feature sub-image to be detected and a preset step length, wherein the grid subunits are used as grid subunits to be matched; and

and respectively calculating the adaptation degree between each anchor frame and the marking frame according to each anchor frame generated in the grid sub-unit to be adapted, obtaining each adaptation value corresponding to each anchor frame, and determining one anchor frame adapted to the marking frame according to each adaptation value corresponding to each anchor frame.

10. The method of claim 9, wherein the feature sub-images to be detected comprise at least one of a 1/8 resolution feature map, a 1/16 resolution feature map, and a 1/32 resolution feature map.

11. The method of annotation frame adaptation as claimed in claim 9, further comprising:

and determining the preset step length based on the image size of the image to be detected and the size of the characteristic sub-image to be detected.

12. The method for adapting a label frame according to claim 11, wherein determining, from the grid sub-units, one grid sub-unit adapted to the label frame as a grid sub-unit to be adapted according to a center position of the label frame and a preset step length includes:

dividing the coordinate value of the central position of the labeling frame by the preset step length to obtain a corresponding quotient;

and determining the grid sub-unit to be adapted, which is adapted to the labeling frame, according to the decimal numerical value in the quotient and the anchor frame fineness parameter.

13. The method for adapting a label frame according to claim 9, wherein calculating the adaptation degree between each anchor frame and the label frame, and obtaining each adaptation value corresponding to each anchor frame comprises:

and calculating IoU between each anchor frame and the labeling frame according to the diagonal coordinate value of each anchor frame and the diagonal coordinate value of the labeling frame of each anchor frame so as to obtain each IoU value corresponding to each anchor frame.

14. The method of claim 13, wherein determining one of the anchor boxes to be adapted to the annotation box based on each of the adaptation values corresponding to each of the anchor boxes comprises:

and determining one anchor frame matched with the labeling frame according to the maximum value of the IoU values corresponding to the anchor frames.

15. A computer storage medium having stored therein instructions for performing the steps of the anchor frame generation method according to any one of claims 1 to 8 or instructions for performing the steps of the annotation frame adaptation method according to any one of claims 9 to 14.

16. An anchor frame generation device, the device comprising:

a first conversion module, which performs conversion on a target image containing a plurality of grid cells based on a first preset feature conversion rule, and obtains a target feature sub-image containing a plurality of target positions, wherein each target position in the target feature sub-image corresponds to each grid cell in the target image; the first preset feature conversion rule at least comprises one of a 1/8 resolution conversion rule, a 1/16 resolution conversion rule and a 1/32 resolution conversion rule;

the dividing module divides each grid unit corresponding to each target position into a plurality of grid sub-units according to the anchor frame fineness parameters;

and the anchor frame generation module is used for generating anchor frames meeting the anchor frame unit number parameters in each grid subunit according to the anchor frame unit number parameters.

17. A label frame adapting device for use with the anchor frame generating device of claim 16, comprising:

the second conversion module is used for executing conversion on the image to be detected with the labeling frame based on a second preset feature conversion rule, and obtaining a feature sub-image to be detected containing the labeling frame; the second preset feature conversion rule at least comprises one of a 1/8 resolution conversion rule, a 1/16 resolution conversion rule and a 1/32 resolution conversion rule;

the adaptation module is used for determining one grid subunit which is matched with the marking frame from the grid subunits according to the central position of the marking frame of the characteristic sub-image to be detected and a preset step length to serve as the grid subunit to be matched, respectively calculating the adaptation degree between each anchor frame and the marking frame according to each anchor frame generated in the grid subunit to be matched, obtaining each adaptation value corresponding to each anchor frame, and determining one anchor frame which is matched with the marking frame according to each adaptation value corresponding to each anchor frame.