CN116645499A - Determination method, determination device, determination apparatus, determination device, determination program storage medium, and determination program product - Google Patents

Determination method, determination device, determination apparatus, determination device, determination program storage medium, and determination program product Download PDF

Info

Publication number
CN116645499A
CN116645499A CN202310691574.7A CN202310691574A CN116645499A CN 116645499 A CN116645499 A CN 116645499A CN 202310691574 A CN202310691574 A CN 202310691574A CN 116645499 A CN116645499 A CN 116645499A
Authority
CN
China
Prior art keywords
target
candidate
region
bounding box
area
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310691574.7A
Other languages
Chinese (zh)
Inventor
张伟俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Insta360 Innovation Technology Co Ltd
Original Assignee
Insta360 Innovation Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Insta360 Innovation Technology Co Ltd filed Critical Insta360 Innovation Technology Co Ltd
Priority to CN202310691574.7A priority Critical patent/CN116645499A/en
Publication of CN116645499A publication Critical patent/CN116645499A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The present application relates to a method, an apparatus, a device, a storage medium and a program product for determining a bounding box. The method comprises the following steps: acquiring a candidate boundary frame and a corresponding candidate region of an object to be detected in a current picture image; dividing the candidate region to obtain a target region in the candidate region; and correcting the candidate boundary frames based on the target area to obtain target boundary frames corresponding to the object to be detected. By adopting the method, the target boundary box of the object to be detected can be accurately obtained.

Description

Determination method, determination device, determination apparatus, determination device, determination program storage medium, and determination program product
Technical Field
The present application relates to the field of object detection technology, and in particular, to a method, an apparatus, a device, a storage medium, and a program product for determining a bounding box.
Background
With the continuous development of object detection technology, object detection is increasingly applied in various fields. By performing target detection on the target image, a boundary box of an object to be detected in the image can be obtained, and the boundary box can be used for subsequent image editing, object tracking, target recognition and other processes.
In the related art, when a target image is displayed through a terminal interactive interface, a user can manually draw a bounding box of an object to be detected in the target image in the terminal interactive interface, and the drawn bounding box is used as a target bounding box of the object to be detected in the target image.
However, the related art method cannot accurately acquire the target bounding box of the object to be detected.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a determination method, apparatus, device, storage medium, and program product for a bounding box that can accurately acquire a target bounding box of an object to be detected.
In a first aspect, the present application provides a method for determining a bounding box, the method comprising:
acquiring a candidate boundary frame and a corresponding candidate region of an object to be detected in a current picture image;
dividing the candidate region to obtain a target region in the candidate region;
and correcting the candidate boundary frames based on the target area to obtain target boundary frames corresponding to the object to be detected.
In one embodiment, segmenting the candidate region to obtain a target region in the candidate region includes:
acquiring reference features of the candidate region;
And dividing the candidate region according to the reference characteristics to obtain a target region in the candidate region.
In one embodiment, obtaining the reference feature of the candidate region of the candidate bounding box includes:
acquiring depth features and gray features of the candidate region;
determining a depth difference value between a target region in the candidate region and a background region outside the candidate bounding box based on the depth features; and determining a gray difference value between the target region in the candidate region and the background region outside the candidate boundary box based on the gray features;
and determining the reference characteristic of the candidate region of the candidate boundary box according to the depth difference value and the gray level difference value.
In one embodiment, determining the reference feature of the candidate region of the candidate bounding box based on the depth difference and the gray scale difference comprises:
obtaining a depth ratio between the depth difference and a background depth average value of a background area; the gray ratio between the gray difference value and the background gray average value of the background area is obtained;
if the depth ratio is larger than the gray ratio, determining the depth characteristic as a reference characteristic of the candidate region; and if the gray scale ratio is larger than the depth ratio, determining the gray scale characteristic as the reference characteristic of the candidate region.
In one embodiment, the dividing the candidate region according to the reference feature to obtain the target region in the candidate region includes:
acquiring a depth value of each pixel point in the candidate region under the condition that the reference feature is a depth feature;
and determining a candidate region corresponding to the pixel point with the depth value larger than the preset depth threshold as a target region.
In one embodiment, the preset depth threshold is determined according to the peak value of the target area and the peak value of other areas except the target area in the candidate area; the peak value of the target area and the peak value of other areas are determined according to the number of pixels corresponding to the depth value.
In one embodiment, the dividing the candidate region according to the reference feature to obtain the target region in the candidate region includes:
under the condition that the reference characteristic is a gray characteristic, analyzing the gray characteristic of the candidate region, and determining gray thresholds of the target region characteristic and other region characteristics except the target region in the candidate region;
and dividing the candidate region by using a gray threshold value to obtain a target region corresponding to the candidate region.
In one embodiment, correcting the candidate bounding box based on the target area to obtain a target bounding box corresponding to the object to be detected includes:
Acquiring a target area centroid and a target area of a target area in the candidate area;
and correcting the candidate boundary frames according to the mass center of the target area and the area of the target area to obtain the target boundary frames corresponding to the object to be detected.
In one embodiment, correcting the candidate bounding box according to the centroid of the target area and the area of the target area to obtain a target bounding box corresponding to the object to be detected includes:
correcting the size and the rotation angle of the candidate bounding box according to the mass center of the target area and the area of the target area to obtain a target bounding box corresponding to the object to be detected; the difference between the internal area of the target bounding box and the target area is less than a preset threshold.
In one embodiment, correcting the size and the rotation angle of the candidate bounding box according to the centroid of the target area and the area of the target area to obtain a target bounding box corresponding to the object to be detected, including:
performing primary correction on the size and the rotation angle of the candidate bounding box according to the mass center of the target area and the area of the target area to obtain a primary corrected candidate bounding box;
acquiring the intersection area of the internal area of the candidate boundary box subjected to primary correction and the target area;
And carrying out iterative correction on the size and the rotation angle of the candidate boundary frame subjected to the primary correction based on the intersection area, and taking the corrected candidate boundary frame as a target boundary frame corresponding to the object to be detected.
In one embodiment, correcting the size and the rotation angle of the candidate bounding box according to the centroid of the target area and the area of the target area to obtain a target bounding box corresponding to the object to be detected, including:
based on the candidate bounding boxes and the mass centers of the target areas, acquiring a plurality of correction bounding boxes corresponding to the target areas in the current correction process;
correcting the size and the rotation angle of each correction boundary frame, and acquiring the intersection area of the internal area of each corrected correction boundary frame and the target area;
and taking the correction boundary frame corresponding to the maximum intersection area in each intersection area as a candidate boundary frame in the next correction process until the corrected candidate boundary frame meets a preset convergence condition, and taking the corrected candidate boundary frame as a target boundary frame corresponding to the object to be detected.
In one embodiment, correcting the size and the rotation angle of the candidate bounding box according to the centroid of the target area and the area of the target area to obtain a target bounding box corresponding to the object to be detected, including:
Acquiring a region boundary frame of a target region;
and carrying out iterative correction on the size and the rotation angle of the candidate bounding box according to the area bounding box, the mass center of the target area and the area of the target area to obtain a target bounding box corresponding to the object to be detected.
In one embodiment, obtaining a candidate bounding box of an object to be detected in a current frame image includes:
responding to a target detection instruction of a current picture image, and acquiring an initial boundary frame of an object to be detected in the current picture image according to target point coordinates carried in the target detection instruction;
performing target detection on the current picture image by using a target detection model to obtain a plurality of detection frames;
and determining candidate boundary frames of the object to be detected according to each detection frame and the initial boundary frame.
In one embodiment, according to the coordinates of the target point carried in the target detection instruction, the method for obtaining the initial bounding box of the object to be detected in the current picture image includes:
dividing the current picture image to obtain a plurality of super-pixel areas;
determining a target super-pixel region according to each super-pixel region and the coordinates of the target point;
an initial bounding box of the object to be detected is determined based on the target superpixel region.
In one embodiment, determining the target superpixel region based on each superpixel region and the target point coordinates includes:
screening out a super-pixel area corresponding to the coordinate of the target point from each super-pixel area;
clustering each super-pixel region according to a preset clustering rule to obtain super-pixel regions of the object to be detected;
post-processing is carried out on the super-pixel area of the object to be detected, so that a target super-pixel area is obtained; the post-processing operations include at least one of a filling operation and a filtering operation.
In one embodiment, determining a candidate bounding box of the object to be detected based on each detection box and the initial bounding box includes:
obtaining the distance between each detection frame and the initial boundary frame;
if the distances are larger than the preset distance threshold value, determining the initial boundary frame as a candidate boundary frame; and if the minimum distance in the distances is smaller than or equal to the preset distance threshold value, determining the detection frame corresponding to the minimum distance as a candidate boundary frame.
In one embodiment, obtaining the distance between each detection box and the initial bounding box includes:
for any detection frame, acquiring the intersection ratio of the detection frame and the initial boundary frame;
Based on the intersection ratio, a distance between the detection box and the initial bounding box is determined.
In one embodiment, obtaining a candidate region corresponding to a candidate bounding box includes:
obtaining the expansion value of the candidate boundary frame;
performing outer expansion processing on the candidate boundary frames based on the outer expansion values to obtain outer expansion boundary frames;
the inner region of the expanded bounding box is determined as a candidate region.
In one embodiment, the method further comprises: based on the target bounding box, the photographing device is controlled to track the target in the target bounding box.
In a second aspect, the present application also provides a device for determining a bounding box, the device comprising:
the acquisition module is used for acquiring candidate boundary frames and corresponding candidate areas of the object to be detected in the current picture image;
the segmentation module is used for segmenting the candidate areas of the candidate boundary boxes to obtain target areas in the candidate areas;
and the correction module is used for correcting the candidate boundary frames based on the target area to obtain target boundary frames corresponding to the object to be detected.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the contents of the method of determining a bounding box of any one of the above first aspects when the processor executes the computer program.
In a fourth aspect, the present application also provides a computer-readable storage medium. A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the contents of the method of determining a bounding box of any one of the above-described first aspects.
In a fifth aspect, the present application also provides a computer program product. A computer program product comprising a computer program which when executed by a processor implements the contents of the method of determining a bounding box of any one of the above first aspects.
The method, the device, the equipment, the storage medium and the program product for determining the boundary frames acquire candidate boundary frames of the object to be detected in the current picture image and corresponding candidate areas, the candidate areas are segmented to obtain target areas in the candidate areas, and the candidate boundary frames are corrected based on the target areas to obtain target boundary frames corresponding to the object to be detected. According to the method, the candidate region of the candidate boundary frame in the current picture image is obtained, the target region in the candidate region is segmented, the target region is the region corresponding to the object to be detected in the candidate region, and the candidate boundary frame is corrected based on the target region, so that the obtained target boundary frame can completely cover the object to be detected, does not contain excessive irrelevant contents, and can accurately obtain the target boundary frame of the object to be detected.
Drawings
FIG. 1 is an application environment diagram of a method of determining bounding boxes in one embodiment;
FIG. 2 is a flow diagram of a method of determining a bounding box in one embodiment;
FIG. 3 is a schematic diagram of candidate bounding boxes in a current picture in one embodiment;
FIG. 4 is a schematic diagram of a target region in a candidate region in one embodiment;
FIG. 5 is a schematic diagram of a target bounding box of an object to be inspected in one embodiment;
FIG. 6 is a flow diagram of determining a target area in one embodiment;
FIG. 7 is a flow diagram of determining a baseline characteristic of a candidate region in one embodiment;
FIG. 8 is a schematic diagram of a current picture image in one embodiment;
FIG. 9 is a flow chart of determining a baseline characteristic of a candidate region in another embodiment;
FIG. 10 is a flow diagram of determining a target region in one embodiment;
FIG. 11 is a flowchart of determining a target area according to another embodiment;
FIG. 12 is a flow diagram of determining a target bounding box in one embodiment;
FIG. 13 is a flow chart of determining a target bounding box in another embodiment;
FIG. 14 is a flow chart of determining a target bounding box in yet another embodiment;
FIG. 15 is a flow chart of determining a target bounding box in yet another embodiment;
FIG. 16 is a flow diagram of determining candidate bounding boxes in one embodiment;
FIG. 17 is a flow diagram of determining an initial bounding box in one embodiment;
FIG. 18 is a flow diagram of determining a target superpixel region in one embodiment;
FIG. 19 is a flow chart of another embodiment of determining candidate bounding boxes;
FIG. 20 is a flow diagram of determining a distance in one embodiment;
FIG. 21 is a flow diagram of determining candidate regions in one embodiment;
FIG. 22 is a flow chart of a method for determining a bounding box in another embodiment;
FIG. 23a is a schematic diagram of candidate bounding boxes of an object under test in one embodiment;
FIG. 23b is a schematic diagram of a target bounding box of an object under test in one embodiment;
fig. 24 is a block diagram showing the configuration of a determination device of a bounding box in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The method for determining the bounding box provided by the embodiment of the application can be applied to an application environment shown in fig. 1. The computer device may be a server including a processor, memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used to store data in determining the bounding box. The network interface of the computer device is for communicating with an external terminal via a network connection, the computer program being executed by the processor to implement a method of determining a bounding box. The computer device may be implemented by a stand-alone computer device or a computer device cluster formed by a plurality of computer devices.
In one embodiment, as shown in fig. 2, a method for determining a bounding box is provided, and the method is applied to the computer device in fig. 1 for illustration, and includes the following steps:
s201, obtaining a candidate boundary frame and a corresponding candidate region of an object to be detected in the current picture image.
The candidate bounding box may include more redundant information besides the object to be detected, or the candidate bounding box may include only a portion of the object to be detected. The candidate bounding boxes may be rectangular, circular, triangular, etc., or may be adaptive in shape. The current picture image may include at least one object to be detected, where each object to be detected corresponds to a candidate bounding box. Fig. 3 is a schematic diagram of candidate bounding boxes in a current picture, and the objects to be detected in fig. 3 may be green plants, pedestrians and the like in electric vehicles, automobiles and flower beds, and the shape of each candidate bounding box corresponding to each object to be detected takes a rectangle as an example, and each object to be detected corresponds to one candidate bounding box. Since the candidate bounding box may only include a part of the object to be detected, in order to ensure the accuracy of the bounding box determination process, it is necessary to perform expansion processing on the candidate bounding box, determine an internal area of the expanded bounding box after expansion processing as a candidate area, and correct the candidate bounding box based on the candidate area, so that the obtained target bounding box is more accurate. The area of the candidate region is larger than the area of the interior region of the candidate bounding box.
In this embodiment, when a plurality of objects to be detected are included in the current frame image, the computer device may acquire a bounding box generated by clicking the objects to be detected in the current frame image by the user, and use the generated bounding box as a candidate bounding box of the objects to be detected. Or, the computer device may further detect a plurality of objects to be detected in the current picture image by using an image detection algorithm, so as to obtain a plurality of detection frames, and use the detection frame clicked by the user as a candidate boundary frame of the objects to be detected. Alternatively, the computer device may compare the generated bounding box with the detection box, and determine the candidate bounding box of the object to be detected according to the comparison result.
S202, dividing the candidate region to obtain a target region in the candidate region.
The target area is a part of the candidate area required by the user, for example, the candidate area comprises a wall and a portrait hung in the middle of the wall, and when the area required by the user is the portrait, the target area is the portrait in the current picture image. Fig. 4 is a schematic diagram of a target area in a candidate area, when an object to be detected is a green plant in a flower bed, the candidate area is a green plant and an area around a preset range, namely, the candidate area is a No. 1 rectangular frame area, the target area in the candidate area is a green plant, namely, the target area is a No. 2 rectangular frame area, a No. 3 rectangular frame in the figure is a candidate boundary frame which may include more redundant information besides the object to be detected, the No. 3 candidate boundary frame not only includes a green plant, but also includes backgrounds such as an electric vehicle, a pedestrian, a flower bed, a smoke extinguishing pile and the like on a roadside, and a No. 4 rectangular frame in the figure is a candidate boundary frame which only includes a part of the object to be detected.
In this embodiment, the computer device may obtain the depth feature of the candidate region, and segment the candidate region based on the depth feature, to obtain the target region in the candidate region. Alternatively, the computer device may segment the candidate region using an image segmentation algorithm to obtain a target region for the candidate region. For example, the image segmentation algorithm may be an adaptive threshold segmentation algorithm, a Canny algorithm, a watershed algorithm, or the like. The present embodiment does not limit the division manner of the candidate region.
And S203, correcting the candidate boundary frames based on the target area to obtain target boundary frames corresponding to the object to be detected.
In this embodiment, the computer device may correct the relevant parameters of the candidate bounding box by using the target area, so that the corrected candidate bounding box can accurately cover the object to be detected, and meanwhile, does not include the content of other objects to be detected, and at this time, the corrected candidate bounding box is determined to be the target bounding box corresponding to the object to be detected. The relevant parameters of the candidate bounding box may be the shape, size, rotation angle, etc. of the candidate bounding box. Fig. 5 is a schematic diagram of a target bounding box of an object to be detected, where the target bounding box only includes green plants, and an area of an inner region of the target bounding box is the same as an area of a target region.
In the method for determining the boundary frame, the candidate boundary frame of the object to be detected in the current picture image and the corresponding candidate region are obtained, the candidate region is segmented to obtain the target region in the candidate region, and the candidate boundary frame is corrected based on the target region to obtain the target boundary frame corresponding to the object to be detected. According to the method, the candidate region of the candidate boundary frame in the current picture image is obtained, the target region in the candidate region is segmented, the target region is the region corresponding to the object to be detected in the candidate region, and the candidate boundary frame is corrected based on the target region, so that the obtained target boundary frame can completely cover the object to be detected, does not contain excessive irrelevant contents, and can accurately obtain the target boundary frame of the object to be detected.
On the basis of the above embodiment, the present embodiment describes the content related to "dividing the candidate region to obtain the target region in the candidate region" in step S202 in fig. 2. As shown in fig. 6, the step S202 may include the following:
s301, acquiring reference features of the candidate region.
The reference feature is a reference feature when dividing the candidate region, and may be, for example, a gray feature, a depth feature, or the like.
In this embodiment, the purpose of determining the reference feature of the candidate region is to accommodate the situation that the feature difference between the object to be detected and the surrounding objects is not significant, for example, when the user selects one object to be detected in the wall painting, the depth information of the wall and the depth information of the object to be detected in the wall painting are the same, and it is difficult to segment the object to be detected from the background environment only by using the depth map. For this problem, it is necessary to select one of various feature maps (for example, a gray-scale map and a depth map) in which the difference in the features of the foreground and background regions is more remarkable, so as to more accurately segment the object to be detected from the current picture image. The computer device may extract gray features of the candidate region using a gray feature extraction algorithm, and extract depth features of the candidate region using a depth feature extraction algorithm, determine a depth feature difference value and a gray feature difference value corresponding to the depth features, compare the depth feature difference value and the gray feature difference value, and screen reference features of the candidate region from the depth features and the gray features.
S302, dividing the candidate region according to the reference features to obtain a target region corresponding to the candidate region.
In this embodiment, after the reference feature of the candidate region is acquired, the computer device may determine a corresponding segmentation threshold according to the reference feature of the candidate region, where the segmentation threshold may be a depth threshold or a gray level threshold, and segment the candidate region based on the segmentation threshold, and use the other region of the candidate region than the background region as a target region, and use the pixel points greater than the segmentation threshold as non-target regions of the candidate region. Taking fig. 4 as an example, the green plant area in fig. 4 is a target area, and the other areas except for the green plant area are non-target areas.
In the determination method of the bounding box, the reference feature of the candidate region is obtained, and the candidate region is segmented according to the reference feature, so that the target region corresponding to the candidate region is obtained. The method can more accurately segment the target area of the candidate area by utilizing the reference characteristics of the candidate area, so that the obtained target area reflects the object to be detected in the candidate area more truly.
On the basis of the above-described embodiment, the present embodiment is described with respect to the content of "acquiring the reference feature of the candidate region" in step S301 in fig. 6. As shown in fig. 7, the step S301 may include the following:
S401, obtaining depth features and gray features of the candidate areas.
In this embodiment, the computer device may convert the RGB values in the candidate region into the gray values, taking the average value of the three channels as the gray values of the candidate region. The gray scale features may be represented by means of histograms, gray scale co-occurrence matrices, etc. And the computer equipment can analyze the candidate region through a monocular depth estimation method, generate a depth map corresponding to the candidate region, and obtain the depth value of each pixel point based on the depth map. The larger the depth value of the pixel point is, the farther the distance is represented; the smaller the depth value, the smaller the distance. The monocular depth estimation method may be a self-supervision depth estimation method, an unsupervised depth estimation method, or the like.
S402, determining a depth difference value between a target area in the candidate area and a background area outside the candidate boundary box based on the depth characteristics; and determining a gray difference value of the target region in the candidate region and the background region outside the candidate bounding box based on the gray features.
The depth difference is the difference between depth features between a target region in the candidate region and a background region outside the candidate bounding box, that is, the absolute value of the difference between the foreground depth average value and the background depth average value, and the gray difference is the absolute value of the difference between the foreground gray average value and the background gray average value. Fig. 8 is a schematic diagram of a current picture image, and region 1 in fig. 8 is a target region of a candidate region, and region 2 is a background region outside the candidate bounding box.
In this embodiment, the candidate region includes a target region and a non-target region, and the region corresponding to the object to be detected is the target region, so the area of the target region is necessarily smaller than that of the candidate region, and therefore, when determining the target region in the candidate region, the computer device may perform reduction processing on the candidate region according to a preset reduction rule, and use the reduced candidate region as the target region and use other regions outside the candidate bounding box in the current picture image as the non-target region. The candidate bounding box corresponding to the candidate region is assumed to be represented as (x 1 ,y 1 ,x 2 ,y 2 ) Refers to the coordinates of the upper left corner of the candidate bounding box, (x) 2 ,y 2 ) Refers to the coordinates of the lower right corner of the candidate bounding box. For the candidate bounding box, the coordinates (c) of its center point x ,c y ) Can be expressed as:
c x =(x 1 +x 2 )/2
c y =(y 1 +y 2 )/2
assuming that the target region is scaled by 0.5 times the candidate region, i.e., the size of the target region is 0.5 times the size of the candidate region, the coordinates of the target region in the candidate region can be expressed as:
width of target area=0.5 x (x 2 -x 1 )
Height of target area=0.5 x (y 2 -y 1 )
Correspondingly, the coordinates (f x1 ,fy 1 ,f x2 ,f y2 ) Can be expressed as:
f x1 =c x width/2 of target area
f y1 =c y Height/2 of target area
f x2 =c x Width/2 of+ target area
f y2 =c y Height/2 of+ target area
For the depth difference value, the computer equipment can acquire the depth values of all the pixel points in the target area, calculate the average value of the depth values of all the pixel points, and take the average value as the depth average value of the target area; and obtaining depth values of all pixel points in the non-target area, calculating an average value of all pixel points, taking the average value as the non-target area depth average value, and calculating an absolute value of a difference value between the target area depth average value and the non-target area depth average value to obtain a depth difference value between the target area and the non-target area. The depth difference may be expressed as:
depth difference = |target region depth average-non-target region depth average |
For the gray difference value, the computer equipment can acquire gray values of all pixel points in the target area, calculate an average value of the gray values of all pixel points, and take the average value as a foreground gray average value; and acquiring gray values of all pixel points in the background area, calculating an average value of all pixel points, taking the average value as a non-target area gray average value, and calculating an absolute value of a difference value between the target area gray average value and the non-target area gray average value to obtain a gray difference value between the target area and the background area. The gray scale difference value can be expressed as:
Gray difference = |target area gray average-non-target area gray average |
S403, determining the reference characteristics of the candidate areas of the candidate boundary boxes according to the depth difference value and the gray level difference value.
In this embodiment, the computer device may compare the depth difference value with the gray difference value, determine a more significant feature in the target area and the background area corresponding to the object to be detected in the current picture image, and if the depth difference value is larger, make the difference between the depth features more significant, and use the depth feature as the reference feature of the candidate area; if the gray level difference value is relatively large, the gray level features are distinguished more remarkably, and the gray level features are used as reference features of the candidate regions.
In the method for determining the boundary box, the depth characteristics and the gray level characteristics of the candidate region are obtained, and the depth difference value between the target region in the candidate region and the background region outside the candidate boundary box is determined based on the depth characteristics; and determining a gray difference value between the target region in the candidate region and the background region outside the candidate boundary box based on the gray features, and determining a reference feature of the candidate region of the candidate boundary box according to the depth difference value and the gray difference value. The method starts from two angles of the gray level characteristic and the depth characteristic of the candidate region, and more comprehensively considers the characteristic information of the candidate region, so that the depth difference value and the depth difference value of a target region in the candidate region and a background region outside the candidate boundary box can be accurately calculated, and the reference characteristic of the candidate region is accurately determined based on the gray level difference value and the depth difference value.
On the basis of the above-described embodiment, the present embodiment describes the content of the "determining the reference feature of the candidate region of the candidate bounding box from the depth difference value and the gray-scale difference value" in step S403 in fig. 7. As shown in fig. 9, the step S403 may include the following:
s501, obtaining a depth ratio between a depth difference value and a background depth average value of a background area; and acquiring a gray ratio between the gray difference value and a background gray average value of the background area.
In this embodiment, after the computer device obtains the depth difference value and the gray level difference value, a ratio between the depth difference value and the background depth average value may be calculated, where the ratio represents a score of the significance degree of the depth feature, and the higher the score, the more obvious the depth feature difference between the target area and the background area is represented; the lower the score, the less pronounced the depth feature difference between the target region and the background region. The score for the significance level of the depth feature can be expressed as:
score of degree of significance of depth feature = depth difference/background depth average
Meanwhile, the computer equipment can also calculate the ratio between the gray level difference value and the background gray level average value, the ratio represents the score of the significance degree of the gray level characteristic, and the higher the score is, the more obvious the gray level characteristic difference between the target area and the background area is; the lower the score, the less pronounced the gray feature difference between the target region and the background region.
Score of the degree of saliency of a gray feature = gray difference/background gray average
S502, if the depth ratio is larger than the gray ratio, determining the depth characteristic as a reference characteristic of the candidate region; and if the gray scale ratio is larger than the depth ratio, determining the gray scale characteristic as the reference characteristic of the candidate region.
In this embodiment, after obtaining the scores of the significant degrees of the depth feature and the gray feature, the computer device may compare the two scores, that is, the depth ratio is compared with the gray ratio, if the depth ratio is greater than the gray ratio, it indicates that the feature difference of the depth feature is more significant, and the depth feature is used as the reference feature of the candidate region; if the gray scale ratio is larger than the depth ratio, the characteristic difference of the gray scale characteristics is more obvious, and the gray scale characteristics are used as the reference characteristics of the candidate region; if the depth ratio is the same as the gray ratio, it is explained that the difference between the gray feature and the depth feature is the same to a significant extent, and at this time, the gray feature or the depth feature may be used as a reference feature of the candidate region.
In the method for determining the bounding box, a depth ratio between a depth difference value and a background depth average value of a background area is obtained; the gray ratio between the gray difference value and the background gray average value of the background area is obtained, and if the depth ratio is larger than the gray ratio, the depth characteristic is determined as the reference characteristic of the candidate area; and if the gray scale ratio is larger than the depth ratio, determining the gray scale characteristic as the reference characteristic of the candidate region. According to the method, the characteristic difference corresponding to the depth characteristic can be accurately quantized by acquiring the depth ratio corresponding to the depth difference, the characteristic difference corresponding to the gray level characteristic can also be accurately quantized by acquiring the gray level ratio corresponding to the gray level difference, and the reference characteristic of the candidate region can be accurately determined from the gray level characteristic and the depth characteristic according to comparison of the two characteristic difference values.
On the basis of the above embodiment, the present embodiment describes the content related to "dividing the candidate region according to the reference feature to obtain the target region corresponding to the candidate region" in step S302 in fig. 6. As shown in fig. 10, the step S302 may include the following:
s601, when the reference feature is a depth feature, a depth value of each pixel point in the candidate region is acquired.
In this embodiment, when the reference feature is determined to be a depth feature, the computer device may analyze the candidate region by using a monocular depth estimation method, generate a depth map corresponding to the candidate region, and obtain a depth value of each pixel point based on the depth map. The larger the depth value of the pixel point is, the farther the distance is represented; the smaller the depth value, the smaller the distance.
S602, determining a candidate region corresponding to the pixel point with the depth value larger than a preset depth threshold as the target region.
On the basis of the embodiment, the candidate region corresponding to the pixel point with the depth value larger than the preset depth threshold value is determined as the target region.
Optionally, the preset depth threshold is determined according to a peak value of the target area and a peak value of other areas except the target area in the candidate area; the peak value of the target area and the peak value of other areas are determined according to the number of pixels corresponding to the depth value. Calculating a characteristic value histogram N in the candidate region by taking the characteristic value span as r, wherein the histogram N (x, x+r) represents that the depth values of all pixel points are in [ x, x+r ] ]In the interval, as the characteristic value distribution of the target area and the non-target area has obvious peak values, the first characteristic peak value N is selected from the large to the small f (x i ,x i +r) as peaks in other areas. From the characteristic value x i After all peaks are found, the maximum peak value N is selected as a target area peak value f (x j ,x j +r). At (x) j ,x i ) And searching the minimum (x, x+r) in the range, and taking the median x+r/2 as a segmentation threshold value of the target region and the non-target region, namely a preset depth threshold value. For example, the computer device may screen out the first depth peak from the depth values in order from the top to the bottom, the first depth peak being the depth value with the largest number of pixels at this stage. For example, if the depth value is in the range of 0 to 30, the number of pixels with the depth value of 30 is 2, the number of pixels with the depth value of 29 is 5, the number of pixels with the depth value of 28 is 10, and the number of pixels with the depth value of 27 is 7, the depth value of 28 is regarded as the peak value of the other region. And selecting the depth value with the largest number of pixels from the depth values 0-28 as a foreground peak value. For example, between the depth values 0 to 28, the number of pixels is 15, the number of corresponding pixels is 50, and the depth value 15 is taken as the peak value of the target area. Further, selecting a depth value 21 with the minimum number of pixels from the depth values 15 and 28, wherein the number of corresponding pixels is 8, and taking the depth value 21 as a preset depth threshold; alternatively, if the depth value 22 is a depth value having the smallest number of pixels other than the depth value 21, the average value of the depth value 21 and the depth value 22 is set as the preset depth threshold.
In this embodiment, the computer device may screen the maximum depth value from the depth values corresponding to the pixel points, determine the maximum depth value as another area peak value, re-screen the maximum depth value from the other area peak values, determine the maximum depth value as a target area peak value, and determine an average value of the other area peak values and the target area peak value as a preset depth threshold value. Further, the computer device may compare the depth value with a preset depth threshold, and if the depth value is greater than the preset depth threshold, classify the pixel as a target pixel, and determine, after comparing the depth values of all the pixels, an area composed of all the target pixels as a target area in the candidate area.
In the above-mentioned determination method of a bounding box, when the reference feature is a depth feature, a depth value of each pixel point in the candidate region is obtained, and the candidate region corresponding to the pixel point whose depth value is greater than the preset depth threshold is determined as the target region. The preset depth threshold value is determined according to the peak value of the target area and the peak value of other areas except the target area in the candidate area; the peak value of other areas is the maximum depth value in the preset depth range; the target area peak is the maximum depth value of the depth range other than the preset depth range. According to the method, the target area peak value and other area peak values in the candidate area can be accurately obtained through the depth characteristics, a more accurate preset depth threshold value can be obtained based on the target area peak value and other area peak values, the depth value of each pixel point is compared with the preset depth threshold value, the target area in the candidate area can be strictly screened, and the obtained target area is more accurate.
In one embodiment, the description is given to the content related to "dividing the candidate region according to the reference feature to obtain the target region corresponding to the candidate region" in step S302 in fig. 6. As shown in fig. 11, the step S302 may include the following:
s701, when the reference feature is a gray feature, the gray feature of the candidate region is analyzed, and the gray threshold values of the target region feature and the other region features except the target region in the candidate region are determined.
In this embodiment, the computer device may determine the histogram corresponding to the candidate region according to the gray value of the candidate region, and may obtain the number of pixels corresponding to each gray level, calculate the total number of pixels N of the candidate region and the total weight W of the gray level, where the total weight represents the gray level multiplied by the number of pixels of the gray level. Initializing the pixel quantity, weight and variance of the target area and other areas, traversing all possible threshold values one by one, and dividing the candidate area into the target area and other areas. For each threshold t, calculating the pixel number, weight and average gray value of the target area and other areas, determining an inter-class Variance (Between-class Variance) according to the pixel number, weight and average gray value of the target area and other areas, and taking the maximum threshold of the inter-class Variance as the gray threshold of the target area characteristic and other area characteristics.
S702, dividing the candidate region by using the gray threshold value to obtain a target region corresponding to the candidate region.
In this embodiment, the computer device divides the candidate regions according to the gradation threshold value, uses a region larger than the gradation threshold value as the other region of the candidate regions, and uses a region smaller than or equal to the gradation threshold value as the target region of the candidate regions. The shape of the target area may be regular or irregular depending on the shape of the object to be detected.
In the above method for determining a bounding box, when the reference feature is a gray feature, the gray feature of the candidate region is analyzed, the gray threshold of the target region feature and the gray threshold of the other region features except the target region in the candidate region are determined, and the candidate region is segmented by using the gray threshold, so as to obtain the target region corresponding to the candidate region. According to the method, the gray level threshold value of the foreground characteristic and the background characteristic can be accurately obtained by utilizing the gray level characteristic, so that the foreground characteristic and the background characteristic can be accurately segmented based on the gray level threshold value, and a target region corresponding to a more accurate candidate region is obtained.
On the basis of the above embodiment, the present embodiment describes the related content of "correcting the candidate bounding box based on the target area to obtain the target bounding box corresponding to the object to be detected" in step S203 in fig. 2. As shown in fig. 12, the step S203 may include the following:
S801, a target area centroid and a target area of a target area are obtained.
The target area may be an irregular area, the centroid of the target area represents the centroid of the irregular target area, and the area of the target area is the area of the irregular target area.
In this embodiment, the computer device may obtain coordinates of all the pixels of the irregular target area, calculate a weighted average corresponding to the coordinates of all the pixels, and use the weighted average as the centroid of the target area. For example, when the number of pixel points in the irregular target area is N, the coordinates of the respective pixel points are (x i ,y i ) The target area centroid can be expressed as:
x c =(Σx i )/N
y c =(Σy i )/N
in addition, the computer device may traverse all the pixels of the irregular target area, taking all the pixels as the area of the irregular area.
S802, correcting the candidate boundary frames according to the mass center of the target area and the area of the target area to obtain target boundary frames corresponding to the object to be detected.
In this embodiment, after the centroid of the target area is obtained, the computer device may move the center position of the candidate bounding box according to the centroid of the target area, so that the center position of the candidate bounding box and the centroid of the area of the target area coincide with each other, and then adjust the size of the candidate bounding box according to the area of the target area, so that the adjusted candidate bounding box can completely cover the object to be detected, and does not contain too much irrelevant content, and the adjusted candidate bounding box is determined to be the target bounding box corresponding to the object to be detected.
In the method for determining the bounding box, the mass center of the target area and the area of the target area in the candidate area are obtained, and the candidate bounding box is corrected according to the mass center of the area and the area of the area, so that the target bounding box corresponding to the object to be detected is obtained. The method can quantify the correction process of the candidate boundary frames based on the mass center of the target area and the area of the target area, so that the candidate boundary frames can be corrected more efficiently, the time required by the correction process is reduced, and the efficiency of the correction process can be improved; meanwhile, the candidate bounding boxes can be corrected more accurately according to specific parameters of the target area, and the obtained target bounding boxes are more accurate.
Based on the above embodiment, the present embodiment describes the content related to "correcting the candidate bounding box according to the centroid of the target area and the area of the target area" in step S802 in fig. 12 to obtain the target bounding box corresponding to the object to be detected. The step S802 may include the following: correcting the size and the rotation angle of the candidate bounding box according to the mass center of the target area and the area of the target area to obtain a target bounding box corresponding to the object to be detected; the difference between the internal area of the target bounding box and the target area is less than a preset threshold.
In this embodiment, the computer device may move the center point of the candidate bounding box to the position of the centroid of the target area, and adjust the aspect ratio and the overall rotation angle of the candidate bounding box so as to minimize the error between the area of the internal area of the obtained target bounding box and the area of the target area, and at the same time, the adjusted target bounding box may also completely cover the size of the object to be detected, and does not contain too much extraneous content.
In the method for determining the boundary frame, the size and the rotation angle of the candidate boundary frame are corrected according to the mass center of the target area and the area of the target area, so that the target boundary frame corresponding to the object to be detected is obtained; the difference between the internal area of the target bounding box and the target area is less than a preset threshold. The method is based on the mass center of the target area and the area of the target area, and can quantify the correction process of the size and the rotation angle of the candidate boundary frame, so that the candidate boundary frame can be corrected more efficiently, the time required by the correction process is reduced, and the efficiency of the correction process can be improved; meanwhile, the size and the rotation angle of the candidate boundary frames are corrected according to the specific parameters of the target area, so that the candidate boundary frames can be corrected more accurately, and the obtained target boundary frames are more accurate.
Based on the above embodiments, in this embodiment, the description is presented on the related content of "correcting the size and rotation angle of the candidate bounding box according to the centroid of the target area and the area of the target area" in the above bounding box determination method, so as to obtain the target bounding box corresponding to the object to be detected. As shown in fig. 13, the above method may include the following:
and S901, performing primary correction on the size and the rotation angle of the candidate bounding box according to the mass center of the target area and the area of the target area to obtain the primary corrected candidate bounding box.
In this embodiment, the candidate bounding boxes are iteratively corrected mainly using an optimization algorithm, which may be, for example, a genetic algorithm, a particle swarm algorithm, or the like. When the optimization method is used for primary optimization, the length proportion and the initial angle of the candidate boundary frame are required to be adjusted, the center position point of the candidate boundary frame is moved to the position of the mass center of the target area, the size of the candidate boundary frame is adjusted, the area of the inner area of the candidate boundary frame is the same as the area of the target area, and the length L and the width W of the candidate boundary frame are assumed. An initial aspect ratio, e.g., 1:1, may be set based on the area A. Calculating the length and width values of the adjusted candidate boundary boxes: l=w=sqrt (a); meanwhile, the initial angle of the candidate bounding box is set to 0 °, i.e., the initial angle of the candidate bounding box is parallel to the X-axis.
S902, an intersection area of the internal region of the candidate bounding box for the initial correction and the target region is acquired.
In this embodiment, the computer device may acquire the pixel points whose coordinates are the same as those of the target region in the internal region of the candidate bounding box for initial correction, and use the pixel points whose coordinates are the same as the intersection area of the internal region of the candidate bounding box for initial correction and the target region.
S903, carrying out iterative correction on the size and the rotation angle of the candidate boundary frames subjected to the primary correction based on the intersection area, and taking the corrected candidate boundary frames which are larger than the preset area threshold as target boundary frames corresponding to the object to be detected.
In this embodiment, after the intersection area is obtained, the candidate bounding box for the first correction is subjected to subsequent iterative correction according to the intersection area, and each correction process is performed according to the intersection area obtained last time, that is, the intersection area is smaller during the first correction, and as the number of correction times increases, the intersection area is also larger and larger, the candidate bounding box obtained by correction is closer to the target bounding box, and finally, when the iteration reaches the convergence condition or reaches the maximum iteration number, the candidate bounding box for the last correction is taken as the target bounding box corresponding to the object to be detected.
In the above method for determining a bounding box, according to the centroid of the target area and the area of the target area, the size and the rotation angle of the candidate bounding box are primarily corrected to obtain the primarily corrected candidate bounding box, the intersection area of the primarily corrected candidate bounding box and the target area is obtained, the size and the rotation angle of the primarily corrected candidate bounding box are iteratively corrected based on the intersection area, and the corrected candidate bounding box which is larger than the threshold value of the preset area is used as the target bounding box corresponding to the object to be detected. In the method, the size and the rotation angle of the candidate bounding box can be corrected more accurately by utilizing the mass center of the target area and the area of the target area in the primary correction process, the subsequent correction process is guided according to the intersection area obtained by the primary correction, and each correction process is corrected based on the intersection area obtained by the previous correction, so that the correction process gradually approaches to the real area corresponding to the object to be detected of the object to be detected, and the obtained target bounding box is more accurate.
Based on the above embodiments, in this embodiment, the description is presented on the related content of "correcting the size and rotation angle of the candidate bounding box according to the centroid of the target area and the area of the target area" in the above bounding box determination method, so as to obtain the target bounding box corresponding to the object to be detected. As shown in fig. 14, the above method may include the following:
S1001, based on the candidate bounding box, the target area and the target area centroid, acquiring a plurality of correction bounding boxes corresponding to the target area in the current correction process.
In this embodiment, a monte carlo (Monte Carlo Method) method based on random sampling is mainly used to solve a target bounding box corresponding to an object to be detected. The monte carlo method has some global search capability, but may require more iterations to obtain better results. The maximum number of iterations is recommended. If the problem is not of high complexity, typically several iterations can achieve a result that is near optimal. Limiting the maximum iteration number is beneficial to quickly outputting results, and is applied to scenes with requirements on real-time performance.
The method comprises the following steps: (1) Firstly, the aspect ratio and the initial angle of the candidate bounding box are set, which is the same as the mode of the step S901, and the specific content is referred to the above mode; (2) Setting parameters of a Monte Carlo method, setting iteration times N, and generating the number of correction boundary frames as M; (3) In each iteration process, M correction boundary boxes are randomly generated, namely, the correction boundary boxes corresponding to the target area in the current correction process are generated.
S1002, the size and the rotation angle of each correction bounding box are corrected, and the intersection area of the internal area of each corrected correction bounding box and the target area is obtained.
In this embodiment, after obtaining a plurality of correction bounding boxes corresponding to the target area in the current correction process, the computer device adjusts the size and the rotation angle of each correction bounding box, obtains pixel points with the same positions of the internal area of the adjusted correction bounding box and the target area, and determines the intersection area of each correction bounding box and the target area based on the pixel points with the same positions.
And S1003, taking the corrected boundary frame corresponding to the maximum intersection area in the intersection areas as a candidate boundary frame in the next correction process until the corrected candidate boundary frame meets a preset convergence condition, and taking the corrected candidate boundary frame as a target boundary frame corresponding to the object to be detected.
In this embodiment, the larger the intersection area, the more accurately the bounding box corresponding to the intersection area reflects the region corresponding to the object to be detected. The computer equipment screens out the largest intersection area from the intersection areas, the correction boundary frame corresponding to the largest intersection area is the optimal boundary frame in the current correction process, the correction boundary frame corresponding to the largest intersection area is used as a candidate boundary frame in the next correction process, and the iteration is continued for a plurality of times according to the mode until the preset convergence condition is met or the maximum iteration number is reached; and generating a target boundary box corresponding to the object to be detected according to the aspect ratio, the rotation angle and the regional centroid of the candidate boundary box obtained by the last correction.
In the method for determining the bounding box, based on the candidate region, the target region area and the target region centroid, a plurality of correction bounding boxes corresponding to the target region in the current correction process are obtained, the size and the rotation angle of each correction bounding box are corrected, the intersection area of each corrected correction bounding box and the internal region is obtained, the correction bounding box corresponding to the largest intersection area in each intersection area is used as the candidate bounding box in the next correction process until the candidate bounding box obtained by correction meets the preset convergence condition, and the candidate bounding box obtained by correction is used as the target bounding box corresponding to the object to be detected. According to the method, in each iteration process, a plurality of correction boundary frames are generated, candidate boundary frames in the next iteration process can be accurately screened out from the plurality of correction boundary frames according to the intersection area of the inner area of the plurality of correction boundary frames and the target area, and the target boundary frame corresponding to the object to be detected can be accurately obtained through a plurality of iteration processes.
Based on the above embodiments, in this embodiment, the description is presented on the related content of "correcting the size and rotation angle of the candidate bounding box according to the centroid of the target area and the area of the target area" in the above bounding box determination method, so as to obtain the target bounding box corresponding to the object to be detected. As shown in fig. 15, the above method may include the following:
S1101, a region bounding box of the target region is acquired.
In this embodiment, the computer device may obtain boundary points of the irregular target area in four different directions, where the four boundary points include a highest point and a lowest point in an up-down direction, a leftmost point and a rightmost point in a left-right direction, and a rectangular frame formed by the four boundary points is used as an area boundary frame of the irregular target area.
And S1102, carrying out iterative correction on the size and the rotation angle of the candidate bounding box according to the area bounding box, the mass center of the target area and the area of the target area to obtain a target bounding box corresponding to the object to be detected.
In this embodiment, the computer device may perform iterative correction on the size and rotation angle of the candidate bounding box by using a random sampling method, in the iterative correction process, perform multiple random sampling, randomly select a width or height of the candidate bounding box that is the same as the area of the target area, and calculate the length of the other side to keep the area constant, and adjust the size of the candidate bounding box, so that the adjusted candidate area of the candidate bounding box is the same as the area of the target area; and calculating the intersection area between the adjusted candidate boundary frames and the area boundary frames, comparing the intersection area with a preset intersection area, stopping iterative correction if the intersection area is larger than the preset intersection area, and taking the candidate boundary frame corrected last time as a target boundary frame corresponding to the object to be detected.
In the above method for determining the bounding box, the region bounding box of the target region is obtained, and the size and the rotation angle of the candidate bounding box are iteratively corrected according to the region bounding box, the mass center of the target region and the area of the target region, so as to obtain the target bounding box corresponding to the object to be detected. According to the method, the regional boundary box of the target region is used as a boundary limiting condition, so that the candidate boundary box can be corrected more accurately, and the target boundary box corresponding to the obtained object to be detected is more accurate.
On the basis of the above-described embodiment, the present embodiment describes the content related to the "candidate bounding box for obtaining the object to be detected in the current screen image" in step S201 in fig. 2. As shown in fig. 16, the above step S201 may include the following:
s1201, responding to a target detection instruction of the current picture image, and acquiring an initial boundary box of an object to be detected in the current picture image according to target point coordinates carried in the target detection instruction.
The initial bounding box refers to an area of the object to be detected, which corresponds to the coordinates of the target point, and the shape of the initial bounding box is the same as that of the candidate bounding box, and may be square, rectangle, or circle.
In this embodiment, the target detection instruction of the current picture image may be input by a user, for example, the user may click, double click, or press a certain position of the screen where the current picture image is located. Or may be object detection instructions sent by other computer devices. After receiving a target detection instruction from a current picture image sent by a user or other computer equipment, the computer equipment can analyze the target detection instruction, determine target point coordinates carried in the target detection instruction, and determine an initial boundary box corresponding to an object to be detected from the current picture image based on the target point coordinates and a determination rule of the boundary box, wherein the initial boundary box comprises an area where the object to be detected is located. For example, when the lower left corner coordinates of the current screen image are (0, 0), the right direction is the positive X-axis direction, the upward direction is the positive Y-axis direction, the initial bounding box is square, the length is 4, and if the coordinates of the target point are (2, 3), the coordinates of the four points of the corresponding initial bounding box are (0, 1), (0, 5), (4, 5), and (4, 1), respectively.
S1202, performing target detection on the current picture image by using a target detection model to obtain a plurality of detection frames.
The object detection model may be obtained through training of a large number of sample images, for example, the object detection model may be a YOLO algorithm, a single-stage object detection algorithm (Single Shot MultiBox Detector, SSD), a region-based convolutional neural network algorithm (Regions with CNN features, R-CNN), or the like. The target detection model may also be a detection method based on manual features, such as a template matching method, a key point matching method, or a key feature method.
In this embodiment, when the target detection model is a model obtained by training a neural network model with a sample image, the computer device may input the current picture image into the target detection model, and perform target detection on the current picture image with the target detection model, so as to obtain a plurality of detection frames corresponding to the current picture image. Or when the target detection model can also be a detection method based on manual characteristics, the computer equipment can analyze the current picture image by using the detection method based on manual characteristics, determine the area where the object to be detected in the current picture image is located, and obtain a plurality of detection frames in the current picture image.
S1203, determining a candidate bounding box of the object to be detected according to each detection box and the initial bounding box.
In this embodiment, the computer device may compare the initial bounding box with each detection frame, determine an overlapping area between the initial bounding box and each detection frame, and if the largest overlapping area is greater than a preset area threshold, determine the detection frame corresponding to the largest overlapping area as a candidate bounding box of the object to be detected; and if the maximum overlapping area is smaller than or equal to the preset area threshold value, the detection result is inaccurate, and the initial bounding box is taken as a candidate bounding box of the object to be detected.
In the method for determining the bounding box, in response to a target detection instruction of the current picture image, an initial bounding box of an object to be detected in the current picture image is obtained according to target point coordinates carried in the target detection instruction, target detection is performed on the current picture image by using a target detection model, a plurality of detection frames are obtained, and candidate bounding boxes of the object to be detected are determined according to each detection frame and the initial bounding box. According to the method, the initial boundary frame of the object to be detected in the current picture image can be obtained through the coordinates of the target point clicked by the user, meanwhile, the detection frame in the current picture image is detected by utilizing the target detection model, and one-time verification of the initial boundary frame is completed based on the initial boundary frame and the detection frame, so that more accurate candidate boundary frames of the object to be detected are obtained.
On the basis of the above embodiment, the present embodiment describes the content related to "acquiring the initial bounding box of the object to be detected in the current frame image according to the target point coordinates carried in the target detection instruction" in step S1201 in fig. 16. As shown in fig. 17, the step S1201 may include the following:
s1301, dividing the current picture image to obtain a plurality of super-pixel areas.
Wherein, super-pixels are graphs dividing a graph at a pixel level corresponding to a current picture image into regions. The super-pixel region refers to a region formed by a series of pixel points which are adjacent in position and have similar characteristics such as color, brightness, texture and the like in the current picture image.
In this embodiment, the computer device may divide the current picture image by using a super-pixel division algorithm, and divide the pixel points that are adjacent in position and have similar features such as color, brightness, texture, and the like into one super-pixel area, so as to obtain multiple super-pixel areas corresponding to the current picture image. Among them, the super-pixel segmentation algorithm may be a cluster-based pixel segmentation method (Simple Linear Iterative Clustering, SLIC), a gradient-descent-based segmentation method, or the like. The SLIC algorithm is a cluster-based pixel segmentation method that groups similar pixels into superpixels.
S1302, determining a target super-pixel area according to each super-pixel area and the coordinates of the target point.
In this embodiment, the computer device may search for the super-pixel region where the target point coordinate is located from among the super-pixel regions according to the target point coordinate, and determine the super-pixel region where the target point coordinate is located as the target super-pixel region. When the target point coordinates are at the boundary positions of the several super-pixel regions, the region composed of the several super-pixel regions is taken as the target super-pixel region. Alternatively, the computer device may set an area composed of the super-pixel area where the target point coordinates are located and the surrounding adjacent super-pixel area as the target super-pixel area.
S1303, determining an initial bounding box of the object to be detected based on the target super-pixel area.
In this embodiment, the target super-pixel region is a region generated according to a user instruction, and therefore, the target super-pixel region corresponding to the target point coordinates is taken as an initial bounding box of the object to be detected.
In the above method for determining a bounding box, the current frame image is divided to obtain a plurality of super-pixel regions, a target super-pixel region is determined according to the respective super-pixel regions and the coordinates of the target point, and an initial bounding box of the object to be detected is determined based on the target super-pixel region. According to the method, the current picture image is divided, the pixel points with the same characteristics are divided into one super-pixel area, and the target super-pixel area corresponding to the object to be detected is rapidly screened from the super-pixel areas based on the coordinates of the target points, so that the determination efficiency of the initial boundary box of the object to be detected is higher.
On the basis of the above-described embodiment, the present embodiment describes the content related to "determining the target super-pixel region from each super-pixel region and the target point coordinates" in step S1302 in fig. 17. As shown in fig. 18, the step S1302 may include the following:
s1401, screening out the super-pixel areas corresponding to the coordinates of the target points from the super-pixel areas.
In this embodiment, the computer device may obtain centroid position coordinates of each superpixel region, calculate distances between each centroid position coordinate and the target point coordinates, and use the superpixel region corresponding to the minimum distance among all the distances as the superpixel region corresponding to the target point coordinates. For example, if the linear distance between the centroid position coordinates of the a-ultrasonic pixel region and the target point coordinates is 10 and the linear distance between the centroid position coordinates of the b-ultrasonic pixel region and the target point coordinates is 15, the a-ultrasonic pixel region is determined as the ultrasonic pixel region corresponding to the target point coordinates.
S1402, clustering the super pixel areas according to a preset clustering rule to obtain the super pixel areas of the object to be detected.
The preset clustering rule may be to classify that feature similarity of colors, textures and shapes is greater than a preset similarity threshold.
In this embodiment, since one object to be detected may be divided into a plurality of super pixel regions in the super pixel region dividing process. Therefore, in order to obtain a more accurate super-pixel region corresponding to the object to be detected, after obtaining the super-pixel region corresponding to the target point coordinate, the computer device may obtain feature similarity between the super-pixel region corresponding to the target point coordinate and other super-pixel regions by using a related clustering algorithm, compare the feature similarity with a preset similarity threshold, and fuse other super-pixel regions greater than the preset similarity threshold with the super-pixel region corresponding to the target point coordinate, where the fused super-pixel region is the super-pixel region of the object to be detected. The related art algorithm may be a K-means clustering algorithm (K-means Clustering Algorithm, K-means), a Density-based clustering algorithm (Density-Based Spatial Clustering of Applications with Noise, DBSCAN), a Mean Shift algorithm (Mean Shift), or the like.
S1403, performing post-processing operation on the super-pixel region of the object to be detected to obtain a target super-pixel region; the post-processing operations include at least one of a filling operation and a filtering operation.
In this embodiment, since there may be a small hole area or a small edge area between the fused superpixel areas, post-processing is required to be performed on the fused superpixel areas, that is, filling operation is performed on the small hole area, and filtering operation is performed on the small edge area, so as to optimize the shape of the superpixel area of the object to be detected, so that the superpixel area is smoother, and the target superpixel area of the object to be detected is obtained.
In the method for determining the bounding box, the super-pixel areas corresponding to the coordinates of the target points are screened out from the super-pixel areas, the super-pixel areas of the object to be detected are clustered according to a preset clustering rule, the super-pixel areas of the object to be detected are obtained, and post-processing operation is carried out on the super-pixel areas of the object to be detected, so that the target super-pixel areas are obtained; the post-processing operations include at least one of a filling operation and a filtering operation. The method can accurately screen the super-pixel area corresponding to the target point coordinates from the super-pixel areas based on the target point coordinates, determine the super-pixel area corresponding to the object to be detected based on the clustering rule, and perform post-processing on the super-pixel area, so that the shape of the super-pixel area of the object to be detected can be optimized, and the target super-pixel area of the object to be detected is more accurate.
On the basis of the above-described embodiment, the present embodiment describes the description of the contents related to "determining the candidate bounding box of the object to be detected from each detection box and the initial bounding box" in step S1203 in fig. 16. As shown in fig. 19, the above step S1203 may include the following:
s1501, a distance between each detection frame and the initial bounding box is acquired.
In this embodiment, the computer device may obtain the centroid positions of each detection frame and the initial bounding box, respectively, and calculate the distance between each detection frame and the initial bounding box according to the centroid positions of each detection frame and the centroid positions of the initial bounding box. Alternatively, the computer device may also obtain a cross-over ratio between each detection frame and the initial bounding box, and determine a distance between each detection frame and the initial bounding box based on the cross-over ratio. It should be noted that, the cross ratio may be represented by the IOU or the GIOU, where the IOU or the GIOU has a negative correlation with the distance, that is, the greater the IOU or the GIOU, the smaller the distance between the detection frame and the initial bounding box; the smaller the IOU or GIOU, the greater the distance between the detection box and the initial bounding box.
S1502, if each distance is greater than a preset distance threshold, determining the initial bounding box as a candidate bounding box; and if the minimum distance in the distances is smaller than or equal to the preset distance threshold value, determining the detection frame corresponding to the minimum distance as a candidate boundary frame.
In this embodiment, for any one detection frame, the computer device may compare the distance between the detection frame and the initial bounding box with a preset distance threshold, and if each distance is greater than the preset distance threshold, it is indicated that the distances between the detection frames and the areas corresponding to the objects to be detected are both greater, and there is an inaccurate problem in the detection process, where the initial bounding box needs to be used as a candidate bounding box. The computer device may further obtain a minimum distance from the distances, compare the minimum distance with a preset distance threshold, and if the minimum distance is less than or equal to the preset distance threshold, indicate that the detection frame corresponding to the minimum distance is similar to the initial bounding box, and the detection frame obtained by the target detection algorithm is more accurate than the initial detection frame, so that the detection frame corresponding to the minimum distance is determined as the candidate bounding box.
In the above method for determining the bounding box, the distances between each detection frame and the initial bounding box are obtained, and if each distance is greater than a preset distance threshold, the initial bounding box is determined as a candidate bounding box; and if the minimum distance in the distances is smaller than or equal to the preset distance threshold value, determining the detection frame corresponding to the minimum distance as a candidate boundary frame. According to the method, the distances between the plurality of detection frames and the initial boundary frame are calculated, each distance is compared with the preset distance, and according to different results, the detection frames and the initial boundary frame can be judged more accurately, so that the candidate boundary frame of the object to be detected can be determined accurately.
On the basis of the above-described embodiment, the present embodiment is described with respect to the content of "obtaining the distance between each detection frame and the initial bounding box" in step S1501 in fig. 19. As shown in fig. 20, the above step S1501 may include the following:
s1601, for any one detection frame, an intersection ratio of the detection frame and the initial bounding box is acquired.
In this embodiment, for any one detection frame, the computer device may obtain the pixel points with the same positions in the detection frame and the initial boundary frame, calculate the ratio between the pixel points with the same positions and the total pixel points, and determine the ratio as the intersection ratio of the detection frame and the initial boundary frame. For example, when the detection frame is a and the initial bounding frame is B, the intersection ratio of the detection frame and the initial bounding frame may be expressed as:
it should be noted that, the larger the intersection ratio of the detection frame and the initial boundary frame is, the higher the overlap ratio between the detection frame and the initial boundary frame is; the smaller the intersection ratio of the detection frame and the initial bounding box is, the lower the coincidence between the detection frame and the initial bounding box is.
S1602, based on the intersection ratio, determines a distance between the detection frame and the initial bounding box.
In the present embodiment, for the IOU, if the value of the IOU is always 0 when the detection frame and the initial bounding box are not overlapped, the distance between the non-overlapped detection frame and the initial bounding box cannot be determined, and therefore, compared with the IOU, the GIOU can pay attention not only to two areas overlapped with each other but also to represent two areas not overlapped with each other. The computer device may determine a generalized distance between the detection box and the initial bounding box based on the value of GIOU. For example, when the value of GIOU is within the range of [ -1,1], the smaller the distance between the detection frame and the initial bounding box when GIOU is close to 1; when GIOU is close to-1, the greater the distance between the detection box and the initial bounding box.
In the above method for determining a bounding box, for any one of the detection boxes, an intersection ratio of the detection box and the initial bounding box is obtained, and a distance between the detection box and the initial bounding box is determined based on the intersection ratio. According to the method, the intersection between the detection frame and the initial boundary frame can be represented in a quantized mode through calculating the intersection ratio of the detection frame and the initial boundary frame, and the distance between the detection frame and the initial boundary frame can be obtained more accurately based on the intersection ratio.
On the basis of the above-described embodiment, the present embodiment describes the description of the content related to the "candidate region corresponding to the acquisition candidate bounding box" in step S201 in fig. 2. As shown in fig. 21, the step S202 may include the following:
s1701, obtaining the expansion value of the candidate boundary box.
In this embodiment, since the candidate bounding box may only include a part of the content of the object to be detected, in order to enable the corrected target bounding box to include the whole content of the object to be detected, the candidate bounding box needs to be subjected to the expansion processing, and the computer device may use the historical expansion value as the expansion value of the candidate bounding box, or may determine the expansion value corresponding to the candidate bounding box according to the mapping relationship between the candidate bounding box and the expansion value.
And S1702, performing the expansion processing on the candidate bounding boxes based on the expansion value to obtain the expansion bounding boxes.
In this embodiment, after the computer device obtains the expansion value of the candidate bounding box, the candidate bounding box may be enlarged according to the expansion value, or the user may scroll the mouse wheel to control the candidate bounding box to be enlarged, so as to obtain the expansion bounding box. For example, if the expansion value is 2, the candidate bounding box is enlarged twice, and the obtained expansion bounding box is 2 times that of the candidate bounding box.
And S1703, determining the inner area of the external expansion boundary box as a candidate area.
In this embodiment, after performing the expansion processing on the candidate bounding box, the internal area of the expanded bounding box after the expansion processing is taken as the candidate area corresponding to the candidate bounding box.
In the method for determining the boundary frame, the expansion value of the candidate boundary frame is obtained, the candidate boundary frame is subjected to expansion processing based on the expansion value, the expansion boundary frame is obtained, and the inner area of the expansion boundary frame is determined as the candidate area. The method is based on the expansion value of the candidate boundary frame, and can accurately perform expansion processing on the candidate boundary frame, so that the obtained candidate region is more accurate.
Further, after the target bounding box corresponding to the object to be detected is obtained, the specific content of tracking the object to be detected is described in this embodiment, and the process includes the following contents: and controlling shooting equipment to track the object to be detected in the target boundary box based on the target boundary box.
In this embodiment, after the target bounding box of the object to be detected is obtained, in the case that the object to be detected moves, the target bounding box moves along with the object to be detected, and the computer device may send a control instruction to the photographing device, where the control instruction carries identification information of the target bounding box, and the photographing device tracks the object to be detected in the target bounding box according to the control instruction, so as to ensure that the object to be detected is in a picture of the photographing device, and obtain video information of the object to be detected in the moving process.
In the method for determining the bounding box, based on the target bounding box, the shooting equipment is controlled to track the object to be detected in the target bounding box. The area in the target boundary box is the same as the area of the object to be detected, the target detection box does not contain too much irrelevant information and comprises all information of the object to be detected, and therefore the object to be detected can be tracked more accurately through the target boundary box.
In one embodiment, the method for determining the bounding box is described in detail below, and as shown in fig. 22, the method may include:
s1801, responding to a target detection instruction of a current picture image, and dividing the current picture image to obtain a plurality of super-pixel areas;
S1802, screening out super pixel areas corresponding to the coordinates of the target points from the super pixel areas;
s1803, clustering the super pixel areas according to a preset clustering rule to obtain super pixel areas of the object to be detected;
s1804, performing post-processing operation on the super-pixel area of the object to be detected to obtain a target super-pixel area;
s1805, taking the target super-pixel area as an initial boundary box of the object to be detected;
s1806, performing target detection on the current picture image by using a target detection model to obtain a plurality of detection frames;
s1807, for any detection frame, acquiring the intersection ratio of the detection frame and the initial boundary frame;
s1808, determining the distance between the detection frame and the initial boundary frame based on the cross-over ratio;
s1809, if each distance is larger than a preset distance threshold, determining the initial bounding box as a candidate bounding box; if the minimum distance in the distances is smaller than or equal to a preset distance threshold value, determining a detection frame corresponding to the minimum distance as a candidate boundary frame;
s1810, obtaining the expansion value of the candidate boundary box; performing outer expansion processing on the candidate boundary frames based on the outer expansion values to obtain outer expansion boundary frames, and determining the inner areas of the outer expansion boundary frames as candidate areas;
S1811, acquiring depth features and gray features of a candidate region;
s1812, determining a depth difference value between a target area in the candidate area and a background area outside the candidate boundary box based on the depth characteristics; and determining a gray difference value between the target region in the candidate region and the background region outside the candidate boundary box based on the gray features;
s1813, obtaining a depth ratio between the depth difference value and a background depth average value of the background area; the gray ratio between the gray difference value and the background gray average value of the background area is obtained;
s1814, if the depth ratio is larger than the gray ratio, determining the depth characteristic as a reference characteristic of the candidate region;
s1815, obtaining depth values of all pixel points in the candidate region;
s1816, determining a candidate region corresponding to the pixel point with the depth value larger than the preset depth threshold as a target region;
s1817, if the gray scale ratio is larger than the depth ratio, determining the gray scale characteristic as the reference characteristic of the candidate region;
s1818, analyzing the gray scale characteristics of the candidate region, and determining the gray scale threshold values of the target region characteristics and other regions except the target region in the candidate region;
s1819, dividing the candidate region by using a gray threshold value to obtain a target region corresponding to the candidate region;
S1820, obtaining a target area centroid and a target area of the target area;
s1821, correcting the size and the rotation angle of the candidate bounding box according to the mass center of the target area and the area of the target area to obtain a target bounding box corresponding to the object to be detected;
s1822, based on the target boundary box, controlling the shooting equipment to track the object to be detected in the target boundary box.
Fig. 23a is a schematic diagram of a candidate bounding box of an object to be detected in an embodiment, where the object to be detected is a tricycle and an outer bounding box of the tricycle is the candidate bounding box. Fig. 23b is a schematic diagram of a target bounding box of an object to be detected in an embodiment, in which the bounding box outside the tricycle is the target bounding box, and comparing fig. 23a and 23b, it can be seen that the internal area of the candidate bounding box is larger than the internal area of the target bounding box, which is only a case, and the internal area of the candidate bounding box may be smaller than the internal area of the target bounding box.
It should be understood that, although the steps in the flowcharts related to the above embodiments are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a determination device for the bounding box for realizing the determination method of the bounding box. The implementation of the solution provided by the apparatus is similar to the implementation described in the above method, so the specific limitation in the embodiments of the determination apparatus for one or more bounding boxes provided below may be referred to the limitation of the determination method for a bounding box hereinabove, and will not be repeated here.
In one embodiment, as shown in fig. 24, there is provided a determination device of a bounding box, including: an acquisition module 11, a segmentation module 12 and a correction module 13, wherein:
the acquiring module 11 is configured to acquire a candidate bounding box of an object to be detected in the current frame image and a corresponding candidate region;
a segmentation module 12, configured to segment the candidate region to obtain a target region in the candidate region;
and the correction module 13 is used for correcting the candidate bounding boxes based on the target area to obtain target bounding boxes corresponding to the object to be detected.
In one embodiment, the above-mentioned segmentation module includes: a first acquisition unit and a segmentation unit, wherein:
a first acquisition unit configured to acquire a reference feature of a candidate region of a candidate bounding box;
And the segmentation unit is used for segmenting the candidate region according to the reference characteristic to obtain a target region in the candidate region.
In one embodiment, the first obtaining unit is further configured to obtain a depth feature and a gray feature of the candidate region; determining a depth difference value between a target region and an outer background region in the candidate region based on the depth features; and determining a gray difference value between the target region in the candidate region and the background region outside the candidate boundary box based on the gray features; and determining the reference characteristic of the candidate region of the candidate boundary box according to the depth difference value and the gray level difference value.
In one embodiment, the first obtaining unit is further configured to obtain a depth ratio between the depth difference value and a background depth average value of the background area; the gray ratio between the gray difference value and the background gray average value of the background area is obtained; if the depth ratio is larger than the gray ratio, determining the depth characteristic as a reference characteristic of the candidate region; and if the gray scale ratio is larger than the depth ratio, determining the gray scale characteristic as the reference characteristic of the candidate region.
In one embodiment, the above-mentioned segmentation unit is further configured to obtain a depth value of each pixel point in the candidate region if the reference feature is a depth feature; and determining a candidate region corresponding to the pixel point with the depth value larger than the preset depth threshold as a target region. The preset depth threshold value is determined according to the peak value of the target area and the peak value of other areas except the target area in the candidate area; the peak value of the target area and the peak value of other areas are determined according to the number of pixels corresponding to the depth value.
In an embodiment, the segmentation unit is further adapted to, in case the reference feature is a gray feature,
analyzing the gray scale characteristics of the candidate region, and determining gray scale threshold values of the characteristics of the target region and the characteristics of other regions except the target region in the candidate region; and dividing the candidate region by using a gray threshold value to obtain a target region corresponding to the candidate region.
In one embodiment, the correction module includes: a second acquisition unit and a correction unit, wherein:
the second acquisition unit is used for acquiring a target area centroid and a target area of a target area in the candidate area;
and the correction unit is used for correcting the candidate boundary frames according to the mass center of the target area and the area of the target area to obtain the target boundary frames corresponding to the object to be detected.
In one embodiment, the correcting unit is further configured to correct the size and the rotation angle of the candidate bounding box according to the centroid of the target area and the area of the target area, so as to obtain a target bounding box corresponding to the object to be detected; the difference between the internal area of the target bounding box and the target area is less than a preset threshold.
In one embodiment, the correcting unit is further configured to perform primary correction on the size and the rotation angle of the candidate bounding box according to the centroid of the target area and the area of the target area, so as to obtain a primary corrected candidate bounding box; acquiring the intersection area of the internal area of the candidate boundary box subjected to primary correction and the target area; and carrying out iterative correction on the size and the rotation angle of the candidate boundary frame subjected to the primary correction based on the intersection area, and taking the corrected candidate boundary frame as a target boundary frame corresponding to the object to be detected.
In one embodiment, the correction unit is further configured to obtain a plurality of correction bounding boxes corresponding to the target area in the current correction process based on the candidate bounding box, the target area and the target area centroid; correcting the size and the rotation angle of each correction boundary frame, and acquiring the intersection area of the internal area of each corrected correction boundary frame and the target area; and taking the correction boundary frame corresponding to the maximum intersection area in each intersection area as a candidate boundary frame in the next correction process until the corrected candidate boundary frame meets a preset convergence condition, and taking the corrected candidate boundary frame as a target boundary frame corresponding to the object to be detected.
In one embodiment, the correction unit is further configured to obtain a region bounding box of the target region; and carrying out iterative correction on the size and the rotation angle of the candidate bounding box according to the area bounding box, the mass center of the target area and the area of the target area to obtain a target bounding box corresponding to the object to be detected.
In one embodiment, the acquiring module includes: a third acquisition unit, a detection unit, and a first determination unit, wherein:
the third acquisition unit is used for responding to the target detection instruction of the current picture image and acquiring an initial boundary frame of an object to be detected in the current picture image according to the target point coordinates carried in the target detection instruction;
The detection unit is used for carrying out target detection on the current picture image by utilizing the target detection model to obtain a plurality of detection frames;
and the first determining unit is used for determining candidate bounding boxes of the object to be detected according to the detection boxes and the initial bounding boxes.
In one embodiment, the third obtaining unit is further configured to segment the current picture image to obtain a plurality of super-pixel areas; determining a target super-pixel region according to each super-pixel region and the coordinates of the target point; an initial bounding box of the object to be detected is determined based on the target superpixel region.
In one embodiment, the third obtaining unit is further configured to screen a superpixel area corresponding to the coordinates of the target point from each superpixel area; clustering each super-pixel region according to a preset clustering rule to obtain super-pixel regions of the object to be detected; post-processing is carried out on the super-pixel area of the object to be detected, so that a target super-pixel area is obtained; the post-processing operations include at least one of a filling operation and a filtering operation.
In one embodiment, the first determining unit is further configured to obtain a distance between each detection frame and the initial bounding box; if the distances are larger than the preset distance threshold value, determining the initial boundary frame as a candidate boundary frame; and if the minimum distance in the distances is smaller than or equal to the preset distance threshold value, determining the detection frame corresponding to the minimum distance as a candidate boundary frame.
In one embodiment, the first determining unit is further configured to obtain, for any one of the detection frames, an intersection ratio of the detection frame and the initial bounding box; based on the intersection ratio, a distance between the detection box and the initial bounding box is determined.
In one embodiment, the acquiring module further includes: a fourth acquisition unit, an expansion unit and a second determination unit, wherein:
a fourth obtaining unit, configured to obtain an expansion value of the candidate bounding box;
the expansion unit is used for carrying out expansion processing on the candidate boundary frames based on the expansion value to obtain expansion boundary frames;
and a second determination unit configured to determine an inner region of the expanded bounding box as a candidate region.
In one embodiment, the determining device of the bounding box further includes a control module, wherein:
and the control module is used for controlling the shooting equipment to track the object to be detected in the target boundary box based on the target boundary box.
The respective modules in the determination means of the above-described bounding box may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In an embodiment, a computer device is provided, including a memory and a processor, where the memory stores a computer program, and the processor implements the contents of any one of the embodiments of the determination method of the bounding box described above when the computer program is executed.
In one embodiment, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the contents of any one of the embodiments of the determination method of a bounding box described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the contents of any one of the embodiments of the method of determining a bounding box described above.
The user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims (23)

1. A method of determining a bounding box, the method comprising:
acquiring a candidate boundary frame and a corresponding candidate region of an object to be detected in a current picture image;
dividing the candidate region to obtain a target region in the candidate region;
and correcting the candidate boundary boxes based on the target area to obtain target boundary boxes corresponding to the objects to be detected.
2. The method of claim 1, wherein the segmenting the candidate region to obtain the target region in the candidate region comprises:
acquiring reference features of the candidate region;
and dividing the candidate region according to the reference characteristic to obtain a target region in the candidate region.
3. The method of claim 2, wherein the obtaining the reference feature of the candidate region comprises:
acquiring depth features and gray features of the candidate region;
determining a depth difference value of a target region in the candidate region and a background region outside the candidate bounding box based on the depth feature; and determining a gray difference value of a target region in the candidate region and a background region outside the candidate bounding box based on the gray features;
and determining the reference characteristic of the candidate region according to the depth difference value and the gray level difference value.
4. A method according to claim 3, wherein said determining a reference feature of said candidate region from said depth difference and said gray scale difference comprises:
acquiring a depth ratio between the depth difference value and a background depth average value of the background area; the gray ratio between the gray difference value and the background gray average value of the background area is obtained;
If the depth ratio is greater than the gray ratio, determining the depth feature as a reference feature of the candidate region; and if the gray scale ratio is larger than the depth ratio, determining the gray scale characteristic as the reference characteristic of the candidate region.
5. The method according to any one of claims 2-4, wherein the segmenting the candidate region according to the reference feature to obtain the target region in the candidate region comprises:
acquiring depth values of all pixel points in the candidate region under the condition that the reference feature is a depth feature;
and determining a candidate region corresponding to the pixel point with the depth value larger than a preset depth threshold as the target region.
6. The method of claim 5, wherein the predetermined depth threshold is determined based on a peak value of a target region and a peak value of other regions of the candidate region than the target region; and the peak value of the target area and the peak value of the other areas are determined according to the number of the pixel points corresponding to the depth value.
7. The method according to any one of claims 2-4, wherein the segmenting the candidate region according to the reference feature to obtain the target region in the candidate region comprises:
When the reference feature is a gray feature, analyzing the gray feature of the candidate region, and determining gray thresholds of the target region feature and other region features except the target region in the candidate region;
and dividing the candidate region by using the gray threshold value to obtain a target region corresponding to the candidate region.
8. The method according to any one of claims 1 to 4, wherein the correcting the candidate bounding box based on the target area to obtain a target bounding box corresponding to the object to be detected includes:
acquiring a target area centroid and a target area of the target area;
and correcting the candidate bounding boxes according to the mass center of the target area and the area of the target area to obtain a target bounding box corresponding to the object to be detected.
9. The method according to claim 8, wherein the correcting the candidate bounding box according to the target area centroid and the target area to obtain the target bounding box corresponding to the object to be detected includes:
correcting the size and the rotation angle of the candidate bounding boxes according to the mass center of the target area and the area of the target area to obtain a target bounding box corresponding to the object to be detected; the difference between the area of the inner region of the target bounding box and the area of the target region is less than a preset threshold.
10. The method according to claim 9, wherein correcting the size and the rotation angle of the candidate bounding box according to the target area centroid and the target area to obtain the target bounding box corresponding to the object to be detected includes:
performing primary correction on the size and the rotation angle of the candidate bounding box according to the mass center of the target area and the area of the target area to obtain a primary corrected candidate bounding box;
acquiring an intersection area of an inner region of the primarily corrected candidate bounding box and the target region;
and carrying out iterative correction on the size and the rotation angle of the primary corrected candidate boundary frame based on the intersection area, and taking the corrected candidate boundary frame as a target boundary frame corresponding to the object to be detected.
11. The method according to claim 9, wherein correcting the size and the rotation angle of the candidate bounding box according to the target area centroid and the target area to obtain the target bounding box corresponding to the object to be detected includes:
based on the candidate bounding box and the mass center of the target area, acquiring a plurality of correction bounding boxes corresponding to the target area in the current correction process;
Correcting the size and the rotation angle of each correction boundary frame, and acquiring the intersection area of the internal area of each corrected correction boundary frame and the target area;
and taking the corrected boundary frame corresponding to the maximum intersection area in the intersection areas as a candidate boundary frame in the next correction process until the corrected candidate boundary frame meets a preset convergence condition, and taking the corrected candidate boundary frame as a target boundary frame corresponding to the object to be detected.
12. The method according to claim 9, wherein correcting the size and the rotation angle of the candidate bounding box according to the target area centroid and the target area to obtain the target bounding box corresponding to the object to be detected includes:
acquiring a region boundary frame of the target region;
and carrying out iterative correction on the size and the rotation angle of the candidate bounding box according to the region bounding box, the mass center of the target region and the area of the target region to obtain a target bounding box corresponding to the object to be detected.
13. The method according to any one of claims 1-4, wherein the obtaining a candidate bounding box of the object to be detected in the current picture image comprises:
Responding to a target detection instruction of the current picture image, and acquiring an initial boundary frame of an object to be detected in the current picture image according to target point coordinates carried in the target detection instruction;
performing target detection on the current picture image by using a target detection model to obtain a plurality of detection frames;
and determining candidate bounding boxes of the object to be detected according to the detection boxes and the initial bounding boxes.
14. The method according to claim 13, wherein the obtaining an initial bounding box of the object to be detected in the current frame image according to the target point coordinates carried in the target detection instruction includes:
dividing the current picture image to obtain a plurality of super-pixel areas;
determining a target super-pixel region according to each super-pixel region and the coordinates of the target point;
and determining an initial boundary box of the object to be detected based on the target super-pixel area.
15. The method of claim 14, wherein said determining a target superpixel region based on each superpixel region and target point coordinates comprises:
screening out the super pixel areas corresponding to the target point coordinates from the super pixel areas;
Clustering each super-pixel region according to a preset clustering rule to obtain a super-pixel region of the object to be detected;
post-processing is carried out on the super-pixel area of the object to be detected, so that the target super-pixel area is obtained; the post-processing operations include at least one of a filling operation and a filtering operation.
16. The method of claim 13, wherein said determining a candidate bounding box for the object to be detected based on each of the detection box and the initial bounding box comprises:
acquiring the distance between each detection frame and the initial boundary frame;
if the distances are larger than a preset distance threshold value, determining the initial boundary box as the candidate boundary box; and if the minimum distance in the distances is smaller than or equal to the preset distance threshold value, determining the detection frame corresponding to the minimum distance as the candidate boundary frame.
17. The method of claim 16, wherein the obtaining the distance between each of the detection boxes and the initial bounding box comprises:
for any detection frame, acquiring the intersection ratio of the detection frame and the initial boundary frame;
Based on the intersection ratio, a distance between the detection box and the initial bounding box is determined.
18. The method according to any one of claims 1-4, wherein the obtaining the candidate region corresponding to the candidate bounding box includes:
obtaining the expansion value of the candidate boundary box;
performing outer expansion processing on the candidate boundary frames based on the outer expansion value to obtain outer expansion boundary frames;
and determining an inner region of the expanded bounding box as the candidate region.
19. The method according to any one of claims 1-4, further comprising:
and controlling shooting equipment to track the object to be detected in the target boundary box based on the target boundary box.
20. A bounding box determination apparatus, the apparatus comprising:
the acquisition module is used for acquiring candidate boundary frames and corresponding candidate areas of the object to be detected in the current picture image;
the segmentation module is used for segmenting the candidate region to obtain a target region in the candidate region;
and the correction module is used for correcting the candidate boundary frames based on the target area to obtain target boundary frames corresponding to the object to be detected.
21. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 19 when the computer program is executed.
22. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 19.
23. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the method of any one of claims 1 to 19.
CN202310691574.7A 2023-06-12 2023-06-12 Determination method, determination device, determination apparatus, determination device, determination program storage medium, and determination program product Pending CN116645499A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310691574.7A CN116645499A (en) 2023-06-12 2023-06-12 Determination method, determination device, determination apparatus, determination device, determination program storage medium, and determination program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310691574.7A CN116645499A (en) 2023-06-12 2023-06-12 Determination method, determination device, determination apparatus, determination device, determination program storage medium, and determination program product

Publications (1)

Publication Number Publication Date
CN116645499A true CN116645499A (en) 2023-08-25

Family

ID=87622895

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310691574.7A Pending CN116645499A (en) 2023-06-12 2023-06-12 Determination method, determination device, determination apparatus, determination device, determination program storage medium, and determination program product

Country Status (1)

Country Link
CN (1) CN116645499A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117557784A (en) * 2024-01-09 2024-02-13 腾讯科技(深圳)有限公司 Target detection method, target detection device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117557784A (en) * 2024-01-09 2024-02-13 腾讯科技(深圳)有限公司 Target detection method, target detection device, electronic equipment and storage medium
CN117557784B (en) * 2024-01-09 2024-04-26 腾讯科技(深圳)有限公司 Target detection method, target detection device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US9483703B2 (en) Online coupled camera pose estimation and dense reconstruction from video
CN107067405B (en) Remote sensing image segmentation method based on scale optimization
Ma et al. Computer vision for road imaging and pothole detection: a state-of-the-art review of systems and algorithms
Dai et al. Building segmentation and outline extraction from UAV image-derived point clouds by a line growing algorithm
CN114424250A (en) Structural modeling
Son et al. A multi-vision sensor-based fast localization system with image matching for challenging outdoor environments
Wei et al. Automatic coarse registration of point clouds using plane contour shape descriptor and topological graph voting
Yun et al. Supervoxel-based saliency detection for large-scale colored 3D point clouds
Green et al. Normal distribution transform graph-based point cloud segmentation
CN116645499A (en) Determination method, determination device, determination apparatus, determination device, determination program storage medium, and determination program product
Chen et al. A local tangent plane distance-based approach to 3D point cloud segmentation via clustering
CN116645500A (en) Determination method, determination device, determination apparatus, determination device, determination program storage medium, and determination program product
Rangel et al. Object recognition in noisy RGB-D data using GNG
Kukolj et al. Road edge detection based on combined deep learning and spatial statistics of LiDAR data
Yalic et al. Automatic Object Segmentation on RGB-D Data using Surface Normals and Region Similarity.
Hesami et al. Range segmentation of large building exteriors: A hierarchical robust approach
CN113192174A (en) Mapping method and device and computer storage medium
Wilson et al. Image and Object Geo-Localization
Abdel-Wahab et al. Efficient reconstruction of large unordered image datasets for high accuracy photogrammetric applications
Budianti et al. Background blurring and removal for 3d modelling of cultural heritage objects
Dadgostar et al. Gesture-based human–machine interfaces: a novel approach for robust hand and face tracking
WO2020197495A1 (en) Method and system for feature matching
CN113536839B (en) Data processing method and device, positioning method and device and intelligent equipment
De Geyter et al. Review of Window and Door Type Detection Approaches
Liu et al. Traditional Point Cloud Analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination