CN116740398A - Target detection and matching method, device and readable storage medium - Google Patents

Target detection and matching method, device and readable storage medium Download PDF

Info

Publication number
CN116740398A
CN116740398A CN202310678789.5A CN202310678789A CN116740398A CN 116740398 A CN116740398 A CN 116740398A CN 202310678789 A CN202310678789 A CN 202310678789A CN 116740398 A CN116740398 A CN 116740398A
Authority
CN
China
Prior art keywords
target
frame
anchor
image
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310678789.5A
Other languages
Chinese (zh)
Inventor
王驰宇
蔡科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Glodon Co Ltd
Original Assignee
Glodon Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Glodon Co Ltd filed Critical Glodon Co Ltd
Priority to CN202310678789.5A priority Critical patent/CN116740398A/en
Publication of CN116740398A publication Critical patent/CN116740398A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target detection and matching method, a device and a readable storage medium, wherein the method comprises the following steps: acquiring an image to be processed, and inputting the image to be processed into a trained target detection model; calculating anchor frame information of each anchor frame through the target detection model; screening target anchor frames from all anchor frames according to the object detection information of each anchor frame; drawing an object detection frame for marking the target object on the image to be processed according to the object detection information of the target anchor frame; drawing an element prediction frame which has a matching relation with the object prediction frame and is used for marking the target element on the image to be processed according to the element detection information of the target anchor frame; according to the invention, the target detection frames of each category and the matching relation between the target detection frames can be calculated from the images through the trained target detection model with the double inference heads at one time, so that the detection accuracy is high, and the detection efficiency is good.

Description

Target detection and matching method, device and readable storage medium
Technical Field
The present invention relates to the field of image recognition technologies, and in particular, to a method and apparatus for detecting and matching a target, and a readable storage medium.
Background
In an image-based target detection scene, when two target objects need to be detected in one image, even if a certain matching relationship exists between the two target objects, two target detection models need to be trained respectively to detect the two target objects in the image respectively, and then the matching relationship between the target objects is calculated; for example, when two target objects are a human body and a human head, a currently-used detection method is to detect a boundary box of the human body and a boundary box of the human head in an image at the same time, and then determine a matching relationship between the human body and the human head by calculating the overlapping degree of the boundary boxes between a plurality of human bodies and the plurality of human heads; however, in the case where the degree of overlap of human body bounding boxes is high, by simply calculating the degree of overlap of bounding boxes between a plurality of human bodies and a plurality of human heads, an error condition in matching the human head of one person to the human body of another person may be caused. Therefore, how to more accurately detect two target objects with matching relationship from an image is a technical problem that needs to be solved by those skilled in the art.
Disclosure of Invention
The invention aims to provide a target detection and matching method, a target detection and matching device and a readable storage medium, which can calculate target detection frames of all classes and matching relations among the target detection frames from images at one time through a trained target detection model with double inference heads, and have high detection accuracy and high detection efficiency.
According to one aspect of the present invention, there is provided a target detection and matching method, the method comprising:
acquiring an image to be processed, and inputting the image to be processed into a trained target detection model;
calculating anchor frame information of each anchor frame through the target detection model; wherein the anchor frame information includes: object detection information for describing a target object and element detection information for describing a target element having a matching relationship;
screening target anchor frames from all anchor frames according to the object detection information of each anchor frame;
drawing an object detection frame for marking the target object on the image to be processed according to the object detection information of the target anchor frame;
and drawing an element prediction frame which has a matching relation with the object prediction frame and is used for marking the target element on the image to be processed according to the element detection information of the target anchor frame.
Optionally, before the acquiring the image to be processed and inputting the image to be processed into the trained target detection model, the method further includes:
acquiring an original image of an object real frame drawn with the object real frame for marking the target object;
generating N sub-original pictures based on the original pictures according to the number N of the object real frames in the original pictures; wherein, the pixels in an object real frame are not repeatedly reserved in each sub-original image, and the pixels outside the reserved object real frame are set to be black;
drawing an element real frame for marking the target element in each sub-original image in sequence, and establishing a matching relation between the object real frame and the element real frame in each sub-original image to obtain a sample image for model training;
training a preset model according to the sample image to obtain a target detection model for detecting the target object and the target element with the matching relation in the image.
Optionally, the training the preset model according to the sample image to obtain a target detection model for detecting the target object and the target element with a matching relationship in the image includes:
forming sample data according to the position information of the object real frame, the position information of the element real frame and the matching relation between the object real frame and the element real frame in the sample image;
training a YOLOv5 model with double inference heads according to the sample data to obtain the target detection model;
wherein the sample data comprises: object tag information describing the target object and element tag information describing the target element having a matching relationship;
the object marking information includes: the target object category ID, the abscissa of the center point of the object real frame, the ordinate of the center point of the object real frame, the width of the object real frame and the height of the object real frame;
the element marking information includes: the target element category ID, the abscissa of the element real frame center point, the ordinate of the element real frame center point, the width of the element real frame, and the height of the element real frame.
Optionally, the object detection information includes: object location information and object confidence for characterizing the probability of the target object appearing within an anchor frame; wherein the object position information includes: a deviation value of an abscissa of the center point of the object prediction frame and an abscissa of the center point of the anchor frame, a deviation value of an ordinate of the center point of the object prediction frame and an ordinate of the center point of the anchor frame, a deviation value of a width of the object prediction frame and a width of the anchor frame, and a deviation value of a height of the object prediction frame and a height of the anchor frame;
the element detection information includes: element position information and element confidence for characterizing the probability of the target element occurring within an anchor frame; wherein the element position information includes: the deviation value of the abscissa of the element prediction frame center point and the abscissa of the anchor frame center point, the deviation value of the ordinate of the element prediction frame center point and the ordinate of the anchor frame center point, the deviation value of the width of the element prediction frame and the width of the anchor frame, and the deviation value of the height of the element prediction frame and the height of the anchor frame.
Optionally, the screening the target anchor frame from all anchor frames according to the object detection information of each anchor frame includes:
screening one or more candidate anchor frames which have the object confidence coefficient larger than a first preset threshold value and are mutually adjacent from all anchor frames according to the object confidence coefficient of each anchor frame by a non-maximum suppression algorithm NMS, and merging all screened candidate anchor frames into the target anchor frame;
and forming the object detection information and the element detection information of the target anchor frame according to the object detection information and the element detection information of all the candidate anchor frames.
Optionally, the drawing, on the image to be processed, an object detection frame for marking the target object according to the object detection information of the target anchor frame includes:
and drawing the object prediction frame on the image to be processed according to the object position information in the object detection information of the target anchor frame and the center point position information of the target anchor frame.
Optionally, the drawing, on the image to be processed, an element prediction frame for marking the target element, where the element prediction frame has a matching relationship with the object prediction frame, according to the element detection information of the target anchor frame includes:
judging whether the element confidence coefficient in the element detection information of the target anchor frame is larger than a second preset threshold value or not;
if so, drawing the element prediction frame on the image to be processed according to the element position information in the element detection information of the target anchor frame and the center point position information of the target anchor frame;
if not, the element prediction frame is not drawn on the image to be processed.
Optionally, the target object is a human body, the target element is a human face, or the target object is a human body, the target element is a safety helmet, or the target object is a vehicle body, and the target element is a vehicle hopper.
In order to achieve the above purpose, the present invention further provides a target detection and matching device, which specifically includes the following components:
the acquisition module is used for acquiring an image to be processed and inputting the image to be processed into a trained target detection model;
the calculation module is used for calculating anchor frame information of each anchor frame through the target detection model; wherein the anchor frame information includes: object detection information for describing a target object and element detection information for describing a target element having a matching relationship;
the screening module is used for screening target anchor frames from all anchor frames according to the object detection information of each anchor frame;
the marking module is used for drawing an object detection frame for marking the target object on the image to be processed according to the object detection information of the target anchor frame;
and the matching module is used for drawing an element prediction frame which has a matching relation with the object prediction frame and is used for marking the target element on the image to be processed according to the element detection information of the target anchor frame.
In order to achieve the above object, the present invention further provides a computer device, which specifically includes: the system comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the steps of the target detection and matching method when executing the computer program.
In order to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the above-described object detection and matching method.
According to the target detection and matching method, the target detection and matching device and the readable storage medium, the target object and the target element with the matching relation can be identified from the image through the trained target detection model at one time, so that the detection accuracy is high, and the detection efficiency is good; compared with the traditional target detection model which can only output one inference head, the target detection model can output two inference heads, namely object detection information and element detection information, and the two inference heads can mark an object prediction frame and an element prediction frame which have a matching relationship on an image to be processed, so that the matching relationship between matched detection objects is realized while reasoning.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
FIG. 1 is a schematic flow chart of an alternative method for detecting and matching targets according to the first embodiment;
FIG. 2 is a schematic diagram of an artwork for model training provided in accordance with the first embodiment;
FIGS. 3 (a), 3 (b), and 3 (c) are schematic diagrams of three sub-artwork provided in embodiment one;
FIG. 4 is a schematic diagram of a model structure of improved YOLOv5 with two inference heads according to the first embodiment;
FIG. 5 is a schematic diagram of an alternative structure of a target detection and matching device according to the second embodiment;
fig. 6 is a schematic diagram of an alternative hardware architecture of a computer device according to the third embodiment.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
The embodiment of the invention provides a target detection and matching method, as shown in fig. 1, which specifically comprises the following steps:
step S101: and acquiring an image to be processed, and inputting the image to be processed into a trained target detection model.
Preferably, the target detection model is trained based on a YOLOv5 (You Only Look Once Version 5) model with a double inference head.
Specifically, before the step S101, the method further includes:
step A1: acquiring an original image of an object real frame drawn with a mark target object;
step A2: generating N sub-original pictures based on the original pictures according to the number N of the object real frames in the original pictures; wherein, the pixels in an object real frame are not repeatedly reserved in each sub-original image, and the pixels outside the reserved object real frame are set to be black;
for example, in the scene where the target object is a human body, if three human bodies are included in the original image as shown in fig. 2, three sub-original images need to be generated according to the original image, as shown in fig. 3 (a), 3 (b), and 3 (c), schematic diagrams of the three sub-original images are shown, each sub-original image includes a human body BBOX (Bounding Box), and in addition, in each sub-original image, except for the pixels in the BBOX, the remaining pixels are set to be black;
step A3: drawing an element real frame for marking a target element in each sub-original image in sequence, and establishing a matching relation between the object real frame and the element real frame in each sub-original image to obtain a sample image for model training;
in this embodiment, the marking information of the target object (i.e., the information of the object real frame) and the marking information of the target element (i.e., the information of the element real frame) in the sample image are combined (i.e., a matching relationship is established) to be used as training marks for model training;
step A4: training a preset model according to the sample image to obtain a target detection model for detecting the target object and the target element with the matching relation in the image.
It should be noted that, since only a single BBOX can be marked in the image when using the conventional YOLO-related marking tool, two matching BBOXs cannot be marked, in this embodiment, two matching BBOXs (i.e., the object real frame and the element real frame) can be marked in one original image by the manner of the above steps A1 to A4.
Further, the step A4 specifically includes:
step A41: forming sample data according to the position information of the object real frame, the position information of the element real frame and the matching relation between the object real frame and the element real frame in the sample image;
wherein the sample data comprises: object tag information describing the target object and element tag information describing the target element having a matching relationship;
the object marking information includes: the target object category ID, the abscissa of the center point of the object real frame, the ordinate of the center point of the object real frame, the width of the object real frame and the height of the object real frame;
the element marking information includes: the target element category ID, the abscissa of the element real frame center point, the ordinate of the element real frame center point, the width of the element real frame, and the height of the element real frame.
One training marker (i.e., sample data) in the conventional YOLOv5 training dataset contains only five items of data, respectively: class ID, abscissa x of BBOX central point, ordinate y of BBOX central point, width w of BBOX, height h of BBOX; in this embodiment, one sample data may include ten items of data, for example: class_person, x, y, w, h, class_head, x_head, y_head, w_head, h_head; the class_person represents a target object category ID, which can be set to 0, x, y, w and h as an abscissa of an object real frame center point, an ordinate of an object real frame center point, a width of an object real frame, and a height of an object real frame, and the class_head represents a target element category ID, which can be set to 1, x_head, y_head, w_head and h_head as an abscissa of an element real frame center point, an ordinate of an element real frame center point, a width of an element real frame, and a height of an element real frame. Also, if the target object is included in one sample image only and the target element is not included, the x_head, y_head, w_head, h_head may be set to "0, 0".
Step A42: and training a YOLOv5 model with double inference heads according to the sample data to obtain the target detection model.
In this embodiment, an existing YOLOv5 model is modified to detect a target object and a target element having a matching relationship; in the prior art, one YOLOv5 model trained has only one inference head, by which only a target object (e.g., human body) or a target element (e.g., human head) can be identified from an image, but a matching relationship between the identified target object (e.g., human body) and the target element (e.g., human head) cannot be acquired. In contrast, in this embodiment, an improved YOLOv5 model with two inference heads is used, as shown in fig. 4, which is a schematic diagram of an improved YOLOv5 model with two inference heads, where one inference head is used to detect a target object and the other inference head is used to detect a target element. In addition, since the sample image centered on the target object is selected to train the inference head of the target element in this embodiment, it is necessary to select the GIOU as the loss function of the BBOX of the human head target.
Further, the step S101 specifically includes:
step B1: performing data enhancement operation on the image to be processed; wherein the data enhancement operation includes at least one of: stitched image (mosaics), scale transform (scale), flip (flip), HSV (Value) enhancement;
step B2: and inputting the image subjected to the preprocessing operation into the target detection model.
The robustness and the ubiquity of the target detection model are improved by preprocessing the image to be processed.
Step S102: calculating anchor frame information of each anchor frame through the target detection model; wherein the anchor frame information includes: object detection information for describing a target object and element detection information for describing a target element having a matching relationship.
It should be noted that, the object detection information and the element detection information in this embodiment are equivalent to two inference heads, that is, two inference heads with a matching relationship are output after the image to be processed is input to the target detection model.
Specifically, the object detection information includes: object location information and object confidence for characterizing the probability of the target object appearing within an anchor frame; wherein the object position information includes: a deviation value of an abscissa of the center point of the object prediction frame and an abscissa of the center point of the anchor frame, a deviation value of an ordinate of the center point of the object prediction frame and an ordinate of the center point of the anchor frame, a deviation value of a width of the object prediction frame and a width of the anchor frame, and a deviation value of a height of the object prediction frame and a height of the anchor frame;
the element detection information includes: element position information and element confidence for characterizing the probability of the target element occurring within an anchor frame; wherein the element position information includes: the deviation value of the abscissa of the element prediction frame center point and the abscissa of the anchor frame center point, the deviation value of the ordinate of the element prediction frame center point and the ordinate of the anchor frame center point, the deviation value of the width of the element prediction frame and the width of the anchor frame, and the deviation value of the height of the element prediction frame and the height of the anchor frame.
Step S103: and screening target anchor frames from all anchor frames according to the object detection information of each anchor frame.
Specifically, the step S103 includes:
step C1: screening one or more candidate anchor frames with object confidence degrees larger than a first preset threshold value and adjacent to each other from all anchor frames through a non-maximum suppression algorithm (NMS, non Maximum Suppression) according to the object confidence degrees of all the anchor frames, merging all the screened candidate anchor frames into the target anchor frame, and forming object detection information and element detection information of the target anchor frame according to the object detection information and the element detection information of all the screened candidate anchor frames;
it should be noted that, since the target object may appear in multiple anchor frames, the multiple anchor frames in which the target object appears need to be combined by the NMS algorithm to obtain the target anchor frame;
if only one candidate anchor frame exists, the object detection information and the element detection information of the candidate anchor frame are used as the object detection information and the element detection information of the target anchor frame; if a plurality of candidate anchor frames appear, merging object detection information of the plurality of candidate anchor frames into object detection information of a target anchor frame according to a preset rule, and merging element detection information of the plurality of candidate anchor frames into element detection information of the target anchor frame; the object detection information of the target anchor frame also comprises object position information and object confidence, and the element detection information of the target anchor frame also comprises element position information and element confidence.
Step S104: and drawing an object detection frame for marking the target object on the image to be processed according to the object detection information of the target anchor frame.
Specifically, the step S104 includes:
and drawing the object prediction frame on the image to be processed according to the object position information in the object detection information of the target anchor frame and the center point position information of the target anchor frame.
Step S105: and drawing an element prediction frame which has a matching relation with the object prediction frame and is used for marking the target element on the image to be processed according to the element detection information of the target anchor frame.
Specifically, step S105 includes:
judging whether the element confidence coefficient in the element detection information of the target anchor frame is larger than a second preset threshold value or not;
if so, drawing the element prediction frame on the image to be processed according to the element position information in the element detection information of the target anchor frame and the center point position information of the target anchor frame;
if not, the element prediction frame is not drawn on the image to be processed.
It should be noted that, if the element confidence is smaller than the second preset threshold, it indicates that only the target object exists on the image to be processed and no target element exists, and only the object prediction frame may be drawn and the element prediction frame may not be drawn.
According to the target detection mode provided by the embodiment, since training data in the training process is based on the fact that the target object and the target element are in pairs, the model can learn the matching relationship between the target object and the target element, and therefore the target detection model can output two inference heads: an object prediction frame can be obtained from the output of one object inference head, and an element prediction frame can be obtained from the corresponding element inference head, so that the object prediction frame and the element prediction frame which are matched one by one can be obtained.
Further, the target object is a human body, the target element is a human face, or the target object is a human body, the target element is a safety helmet, or the target object is a vehicle body, and the target element is a vehicle hopper.
The target detection mode provided by the embodiment can be suitable for a scene combining early warning event detection and identity recognition, for example, after the target detection model detects the head of the person who does not wear the safety helmet, the staff identity information corresponding to the head of the person who does not wear the safety helmet can be matched.
In the embodiment, a target detection model capable of outputting multiple inference heads is provided, and the matching relation between detected objects can be matched while reasoning; the method can be applied to the matching detection scene of the head and the body and can be also expanded to the matching detection scene of the body and any body part. In addition, the method can be applied to a matching detection scene between other objects, such as a relation between a muck truck and a truck head.
According to the embodiment, the target object and the target element with the matching relationship can be identified from the image through the trained target detection model at one time, so that the detection accuracy is high, and the detection efficiency is good; compared with the traditional target detection model which can only output one inference head, the target detection model in the embodiment can output two inference heads, namely object detection information and element detection information, and the two inference heads can mark an object prediction frame and an element prediction frame which have a matching relationship on an image to be processed, so that the matching relationship between matched detection objects is realized while reasoning.
Example two
The embodiment of the invention provides a target detection and matching device, as shown in fig. 5, which specifically comprises the following components:
the acquisition module 501 is configured to acquire an image to be processed, and input the image to be processed into a trained target detection model;
the calculating module 502 is configured to calculate anchor frame information of each anchor frame according to the target detection model; wherein the anchor frame information includes: object detection information for describing a target object and element detection information for describing a target element having a matching relationship;
a screening module 503, configured to screen out target anchor frames from all anchor frames according to the object detection information of each anchor frame;
a marking module 504, configured to draw an object detection frame for marking the target object on the image to be processed according to the object detection information of the target anchor frame;
and the matching module 505 is configured to draw an element prediction frame for marking the target element, which has a matching relationship with the object prediction frame, on the image to be processed according to the element detection information of the target anchor frame.
Specifically, the device further comprises: training module for:
acquiring an original image of an object real frame drawn with the object real frame for marking the target object;
generating N sub-original pictures based on the original pictures according to the number N of the object real frames in the original pictures; wherein, the pixels in an object real frame are not repeatedly reserved in each sub-original image, and the pixels outside the reserved object real frame are set to be black;
drawing an element real frame for marking the target element in each sub-original image in sequence, and establishing a matching relation between the object real frame and the element real frame in each sub-original image to obtain a sample image for model training;
training a preset model according to the sample image to obtain a target detection model for detecting the target object and the target element with the matching relation in the image.
Further, the training module is specifically configured to:
forming sample data according to the position information of the object real frame, the position information of the element real frame and the matching relation between the object real frame and the element real frame in the sample image;
training a YOLOv5 model with double inference heads according to the sample data to obtain the target detection model;
wherein the sample data comprises: object tag information describing the target object and element tag information describing the target element having a matching relationship;
the object marking information includes: the target object category ID, the abscissa of the center point of the object real frame, the ordinate of the center point of the object real frame, the width of the object real frame and the height of the object real frame;
the element marking information includes: the target element category ID, the abscissa of the element real frame center point, the ordinate of the element real frame center point, the width of the element real frame, and the height of the element real frame.
Specifically, the object detection information includes: object location information and object confidence for characterizing the probability of the target object appearing within an anchor frame; wherein the object position information includes: a deviation value of an abscissa of the center point of the object prediction frame and an abscissa of the center point of the anchor frame, a deviation value of an ordinate of the center point of the object prediction frame and an ordinate of the center point of the anchor frame, a deviation value of a width of the object prediction frame and a width of the anchor frame, and a deviation value of a height of the object prediction frame and a height of the anchor frame;
the element detection information includes: element position information and element confidence for characterizing the probability of the target element occurring within an anchor frame; wherein the element position information includes: the deviation value of the abscissa of the element prediction frame center point and the abscissa of the anchor frame center point, the deviation value of the ordinate of the element prediction frame center point and the ordinate of the anchor frame center point, the deviation value of the width of the element prediction frame and the width of the anchor frame, and the deviation value of the height of the element prediction frame and the height of the anchor frame.
Further, the screening module 503 is specifically configured to:
screening one or more candidate anchor frames which have the object confidence coefficient larger than a first preset threshold value and are mutually adjacent from all anchor frames according to the object confidence coefficient of each anchor frame by a non-maximum suppression algorithm NMS, and merging all screened candidate anchor frames into the target anchor frame;
and forming the object detection information and the element detection information of the target anchor frame according to the object detection information and the element detection information of all the candidate anchor frames.
Further, the marking module 504 is specifically configured to:
and drawing the object prediction frame on the image to be processed according to the object position information in the object detection information of the target anchor frame and the center point position information of the target anchor frame.
Further, the matching module 505 is specifically configured to:
judging whether the element confidence coefficient in the element detection information of the target anchor frame is larger than a second preset threshold value or not;
if so, drawing the element prediction frame on the image to be processed according to the element position information in the element detection information of the target anchor frame and the center point position information of the target anchor frame;
if not, the element prediction frame is not drawn on the image to be processed.
Further, the target object is a human body, the target element is a human face, or the target object is a human body, the target element is a safety helmet, or the target object is a vehicle body, and the target element is a vehicle hopper.
According to the embodiment, the target object and the target element with the matching relationship can be identified from the image through the trained target detection model at one time, so that the detection accuracy is high, and the detection efficiency is good; compared with the traditional target detection model which can only output one inference head, the target detection model in the embodiment can output two inference heads, namely object detection information and element detection information, and the two inference heads can mark an object prediction frame and an element prediction frame which have a matching relationship on an image to be processed, so that the matching relationship between matched detection objects is realized while reasoning.
Example III
The present embodiment also provides a computer device, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack-mounted server, a blade server, a tower server, or a rack-mounted server (including an independent server or a server cluster formed by a plurality of servers) that can execute a program. As shown in fig. 6, the computer device 60 of the present embodiment includes at least, but is not limited to: a memory 601, and a processor 602 which are communicably connected to each other via a system bus. It should be noted that FIG. 6 only shows a computer device 60 having components 601-602, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead.
In this embodiment, the memory 601 (i.e., readable storage medium) includes a flash memory, a hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the memory 601 may be an internal storage unit of the computer device 60, such as a hard disk or memory of the computer device 60. In other embodiments, the memory 601 may also be an external storage device of the computer device 60, such as a plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash memory Card (Flash Card) or the like, which are provided on the computer device 60. Of course, the memory 601 may also include both internal storage units of the computer device 60 and external storage devices. In this embodiment, the memory 601 is typically used to store an operating system and various types of application software installed on the computer device 60. In addition, the memory 601 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 602 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 602 is generally used to control the overall operation of the computer device 60.
Specifically, in the present embodiment, the processor 602 is configured to execute a program of an object detection and matching method stored in the memory 601, where the program of the object detection and matching method is executed to implement the following steps:
acquiring an image to be processed, and inputting the image to be processed into a trained target detection model;
calculating anchor frame information of each anchor frame through the target detection model; wherein the anchor frame information includes: object detection information for describing a target object and element detection information for describing a target element having a matching relationship;
screening target anchor frames from all anchor frames according to the object detection information of each anchor frame;
drawing an object detection frame for marking the target object on the image to be processed according to the object detection information of the target anchor frame;
and drawing an element prediction frame which has a matching relation with the object prediction frame and is used for marking the target element on the image to be processed according to the element detection information of the target anchor frame.
The specific embodiment process of the above method steps can be referred to as embodiment one, and the description of this embodiment is not repeated here.
Example IV
The present embodiment also provides a computer readable storage medium, such as a flash memory, a hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application store, etc., having stored thereon a computer program that when executed by a processor performs the following method steps:
acquiring an image to be processed, and inputting the image to be processed into a trained target detection model;
calculating anchor frame information of each anchor frame through the target detection model; wherein the anchor frame information includes: object detection information for describing a target object and element detection information for describing a target element having a matching relationship;
screening target anchor frames from all anchor frames according to the object detection information of each anchor frame;
drawing an object detection frame for marking the target object on the image to be processed according to the object detection information of the target anchor frame;
and drawing an element prediction frame which has a matching relation with the object prediction frame and is used for marking the target element on the image to be processed according to the element detection information of the target anchor frame.
The specific embodiment process of the above method steps can be referred to as embodiment one, and the description of this embodiment is not repeated here.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (10)

1. A method for target detection and matching, the method comprising:
acquiring an image to be processed, and inputting the image to be processed into a trained target detection model;
calculating anchor frame information of each anchor frame through the target detection model; wherein the anchor frame information includes: object detection information for describing a target object and element detection information for describing a target element having a matching relationship;
screening target anchor frames from all anchor frames according to the object detection information of each anchor frame;
drawing an object detection frame for marking the target object on the image to be processed according to the object detection information of the target anchor frame;
and drawing an element prediction frame which has a matching relation with the object prediction frame and is used for marking the target element on the image to be processed according to the element detection information of the target anchor frame.
2. The target detection and matching method according to claim 1, wherein before the capturing the image to be processed and inputting the image to be processed into the trained target detection model, the method further comprises:
acquiring an original image of an object real frame drawn with the object real frame for marking the target object;
generating N sub-original pictures based on the original pictures according to the number N of the object real frames in the original pictures; wherein, the pixels in an object real frame are not repeatedly reserved in each sub-original image, and the pixels outside the reserved object real frame are set to be black;
drawing an element real frame for marking the target element in each sub-original image in sequence, and establishing a matching relation between the object real frame and the element real frame in each sub-original image to obtain a sample image for model training;
training a preset model according to the sample image to obtain a target detection model for detecting the target object and the target element with the matching relation in the image.
3. The method according to claim 2, wherein training a preset model according to the sample image to obtain a target detection model for detecting the target object and the target element having a matching relationship in an image comprises:
forming sample data according to the position information of the object real frame, the position information of the element real frame and the matching relation between the object real frame and the element real frame in the sample image;
training a YOLOv5 model with double inference heads according to the sample data to obtain the target detection model;
wherein the sample data comprises: object tag information describing the target object and element tag information describing the target element having a matching relationship;
the object marking information includes: the target object category ID, the abscissa of the center point of the object real frame, the ordinate of the center point of the object real frame, the width of the object real frame and the height of the object real frame;
the element marking information includes: the target element category ID, the abscissa of the element real frame center point, the ordinate of the element real frame center point, the width of the element real frame, and the height of the element real frame.
4. The target detection and matching method according to claim 1, wherein the object detection information includes: object location information and object confidence for characterizing the probability of the target object appearing within an anchor frame; wherein the object position information includes: a deviation value of an abscissa of the center point of the object prediction frame and an abscissa of the center point of the anchor frame, a deviation value of an ordinate of the center point of the object prediction frame and an ordinate of the center point of the anchor frame, a deviation value of a width of the object prediction frame and a width of the anchor frame, and a deviation value of a height of the object prediction frame and a height of the anchor frame;
the element detection information includes: element position information and element confidence for characterizing the probability of the target element occurring within an anchor frame; wherein the element position information includes: the deviation value of the abscissa of the element prediction frame center point and the abscissa of the anchor frame center point, the deviation value of the ordinate of the element prediction frame center point and the ordinate of the anchor frame center point, the deviation value of the width of the element prediction frame and the width of the anchor frame, and the deviation value of the height of the element prediction frame and the height of the anchor frame.
5. The method for detecting and matching objects as claimed in claim 4, wherein the step of screening out the object anchor frames from all anchor frames according to the object detection information of each anchor frame comprises:
screening one or more candidate anchor frames which have the object confidence coefficient larger than a first preset threshold value and are mutually adjacent from all anchor frames according to the object confidence coefficient of each anchor frame by a non-maximum suppression algorithm NMS, and merging all screened candidate anchor frames into the target anchor frame;
and forming the object detection information and the element detection information of the target anchor frame according to the object detection information and the element detection information of all the candidate anchor frames.
6. The method for detecting and matching a target according to claim 5, wherein drawing an object detection frame for marking the target object on the image to be processed according to the object detection information of the target anchor frame, comprises:
and drawing the object prediction frame on the image to be processed according to the object position information in the object detection information of the target anchor frame and the center point position information of the target anchor frame.
7. The method according to claim 5, wherein drawing an element prediction frame for marking the target element on the image to be processed, which has a matching relationship with the object prediction frame, based on the element detection information of the target anchor frame, comprises:
judging whether the element confidence coefficient in the element detection information of the target anchor frame is larger than a second preset threshold value or not;
if so, drawing the element prediction frame on the image to be processed according to the element position information in the element detection information of the target anchor frame and the center point position information of the target anchor frame;
if not, the element prediction frame is not drawn on the image to be processed.
8. The target detection and matching method according to any one of claims 1 to 7, wherein the target object is a human body, the target element is a human face, or the target object is a human body, the target element is a helmet, or the target object is a vehicle body, the target element is a car hopper.
9. A target detection and matching device, the device comprising:
the acquisition module is used for acquiring an image to be processed and inputting the image to be processed into a trained target detection model;
the calculation module is used for calculating anchor frame information of each anchor frame through the target detection model; wherein the anchor frame information includes: object detection information for describing a target object and element detection information for describing a target element having a matching relationship;
the screening module is used for screening target anchor frames from all anchor frames according to the object detection information of each anchor frame;
the marking module is used for drawing an object detection frame for marking the target object on the image to be processed according to the object detection information of the target anchor frame;
and the matching module is used for drawing an element prediction frame which has a matching relation with the object prediction frame and is used for marking the target element on the image to be processed according to the element detection information of the target anchor frame.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 8.
CN202310678789.5A 2023-06-08 2023-06-08 Target detection and matching method, device and readable storage medium Pending CN116740398A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310678789.5A CN116740398A (en) 2023-06-08 2023-06-08 Target detection and matching method, device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310678789.5A CN116740398A (en) 2023-06-08 2023-06-08 Target detection and matching method, device and readable storage medium

Publications (1)

Publication Number Publication Date
CN116740398A true CN116740398A (en) 2023-09-12

Family

ID=87909093

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310678789.5A Pending CN116740398A (en) 2023-06-08 2023-06-08 Target detection and matching method, device and readable storage medium

Country Status (1)

Country Link
CN (1) CN116740398A (en)

Similar Documents

Publication Publication Date Title
CN110414507B (en) License plate recognition method and device, computer equipment and storage medium
CN112016438B (en) Method and system for identifying certificate based on graph neural network
US20220019870A1 (en) Verification of classification decisions in convolutional neural networks
CN111325769A (en) Target object detection method and device
CN111160169A (en) Face detection method, device, equipment and computer readable storage medium
US9679218B2 (en) Method and apparatus for image matching
CN113011144A (en) Form information acquisition method and device and server
CN111160395A (en) Image recognition method and device, electronic equipment and storage medium
CN114494161A (en) Pantograph foreign matter detection method and device based on image contrast and storage medium
CN114155363A (en) Converter station vehicle identification method and device, computer equipment and storage medium
CN111402185A (en) Image detection method and device
Isaac et al. A key point based copy-move forgery detection using HOG features
CN112200789B (en) Image recognition method and device, electronic equipment and storage medium
CN114241463A (en) Signature verification method and device, computer equipment and storage medium
CN111626244B (en) Image recognition method, device, electronic equipment and medium
CN113435219A (en) Anti-counterfeiting detection method and device, electronic equipment and storage medium
CN112287905A (en) Vehicle damage identification method, device, equipment and storage medium
US6694059B1 (en) Robustness enhancement and evaluation of image information extraction
CN117218672A (en) Deep learning-based medical records text recognition method and system
CN112150467A (en) Method, system and device for determining quantity of goods
Ghandour et al. Building shadow detection based on multi-thresholding segmentation
Dhar et al. Interval type-2 fuzzy set and human vision based multi-scale geometric analysis for text-graphics segmentation
US11481881B2 (en) Adaptive video subsampling for energy efficient object detection
CN116740398A (en) Target detection and matching method, device and readable storage medium
CN115223173A (en) Object identification method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination