CN113076955A - Target detection method, system, computer equipment and machine readable medium - Google Patents

Target detection method, system, computer equipment and machine readable medium Download PDF

Info

Publication number
CN113076955A
CN113076955A CN202110398900.6A CN202110398900A CN113076955A CN 113076955 A CN113076955 A CN 113076955A CN 202110398900 A CN202110398900 A CN 202110398900A CN 113076955 A CN113076955 A CN 113076955A
Authority
CN
China
Prior art keywords
target
structured
image
target image
human
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110398900.6A
Other languages
Chinese (zh)
Inventor
张婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Yuncong Enterprise Development Co ltd
Original Assignee
Shanghai Yuncong Enterprise Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Yuncong Enterprise Development Co ltd filed Critical Shanghai Yuncong Enterprise Development Co ltd
Priority to CN202110398900.6A priority Critical patent/CN113076955A/en
Publication of CN113076955A publication Critical patent/CN113076955A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention provides a target detection method, a system, computer equipment and a machine readable medium, which utilize a neural network to extract a plurality of target characteristics from a target image; associating a plurality of targets based on the extracted plurality of target features to form a structured target; and predicting the classification confidence coefficient of the structured target and the regression value of the structured target, and displaying the structured target in the target image according to the prediction result. The method and the device can improve the subsequent tracking and identifying effects while improving the target association accuracy. The invention can directly integrate the single detection tasks of multiple targets such as human body detection, human face detection, human head detection and the like into a structured target for detection, maintains the correlation among the human body target, the human head target and the human face target in the same person, reduces the error of target correlation in the tracking and identifying stage and improves the tracking and identifying effect.

Description

Target detection method, system, computer equipment and machine readable medium
Technical Field
The present invention relates to the field of image recognition technologies, and in particular, to a target detection method, a target detection system, a computer device, and a machine-readable medium.
Background
The structured detection of personnel comprises human body detection, face detection, head detection and the like, and has important significance in the aspects of face recognition, crowd counting, behavior analysis and recognition and the like in the field of monitoring security. At present, the structured detection of personnel mainly adopts a mode which basically tends to multitask individual detection based on deep learning, the mode is easily influenced by light, weather, shielding and the like in a monitoring scene with a complex background, and meanwhile, due to the flexibility of the personnel and the changeability of posture angles, the situation that the head of the existing person does not have a human body or has one or other detection frames is easier to occur, and the detection effect is influenced to a certain extent due to the fact that the structured information of the personnel, namely the correlation of the face of the person, the head of the person and the human body, is ignored by the mode.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, it is an object of the present invention to provide an object detection method, system, computer device and machine-readable medium for solving the technical problems in the prior art.
To achieve the above and other related objects, the present invention provides a target detection method, comprising the steps of:
extracting a plurality of target features from the target image by using a neural network, and associating a plurality of targets based on the extracted plurality of target features to form a structured target; wherein the plurality of targets belong to the same object;
and predicting the classification confidence of the structured target and the regression value of the structured target, and displaying the structured target in the target image according to the prediction result.
Optionally, the displaying the structured target in the target image according to the prediction result includes:
extracting a regression position of the structured target from the prediction result;
mapping the regression location of the structured target into the target image, and locating and displaying the structured target in the target image.
Optionally, the process of predicting the regression value of the structured target comprises: and combining a plurality of target features extracted from the target image with anchor frame information or anchor point information which is associated in advance in the target image, and predicting a regression value of the structured target according to a combination result.
Optionally, before extracting a plurality of target features from the target image by using the neural network, normalizing the target image is further included.
Optionally, the method further includes adding an FPN structure to the network structure of the neural network, and extracting a plurality of target features from the target image by using the neural network after the FPN structure is added.
Optionally, the plurality of objects includes at least a human body, a human head, and a human face.
The invention also provides a target detection system, comprising:
the characteristic extraction module is used for extracting a plurality of target characteristics from the target image by utilizing a neural network;
the association module is used for associating a plurality of targets according to the extracted target features to form a structured target; wherein the plurality of targets belong to the same object;
a structured prediction module for predicting a classification confidence of the structured target and a regression value of the structured target;
and the display module is used for displaying the structured target in the target image according to the prediction result.
Optionally, the displaying module displays the structured target in the target image according to the prediction result, including:
extracting a regression position of the structured target from the prediction result;
mapping the regression location of the structured target into the target image, and locating and displaying the structured target in the target image.
Optionally, the process of predicting the regression value of the structured target by the structuring module comprises: and combining a plurality of target features extracted from the target image with anchor frame information or anchor point information associated in advance in the target image, and predicting a regression value of the structured target according to a combination result.
Optionally, the method further includes adding an FPN structure to the network structure of the neural network, and extracting a plurality of target features from the target image by using the neural network after the FPN structure is added.
The present invention also provides a computer apparatus comprising:
one or more processors; and
one or more machine-readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform a method as in any one of the above.
The invention also provides one or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform the method as described in any one of the above.
As described above, the present invention provides a target detection method, system, computer device and machine-readable medium, which have the following advantages: the method comprises the steps of extracting a plurality of target features from a target image by utilizing a neural network; associating a plurality of targets based on the extracted plurality of target features to form a structured target; and predicting the classification confidence coefficient of the structured target and the regression value of the structured target, and displaying the structured target in the target image according to the prediction result. The multiple targets belong to the same object, and comprise a human body target, a human head target and a human face target of the object. In the prior art, when the object of a person is detected independently, the correlation among the human body, the human face and the human head of the same person can be ignored, so that the structural information of the person can be lost. Aiming at the existing problems, the invention designs a personnel structured detection mode, adopts deep learning to construct a detection neural network, integrates and associates a plurality of targets by improving the prediction mode of the targets, changes the single target label prediction into integral structured prediction, and simultaneously combines the labels of the plurality of targets to perform anchor (namely anchor point or anchor frame) associated clustering to establish a structured model. Due to the fact that the structural model is combined with the structural correlation of the human target, the single human target can be correlated with the human face target and the human head target which are correlated with the human body target, missing detection of a plurality of targets in the same person is reduced, and false detection of irrelevant targets is also reduced. In addition, before tracking and identifying a certain person, the human body target, the human face target and the human head target of the person can be associated firstly, so that the target association accuracy is improved, and the subsequent tracking and identifying effects can be improved. The embodiment of the application directly integrates multiple targets such as human body detection, human face detection, human head detection and the like into one structured target for detection, the correlation of the human body target in the same person, the human head target and the human face target can be kept, not only is false detection eliminated, but also binding of the structured target is beneficial to simplifying the correlation operation of a series of follow-up tracking and identifying stages on the target, namely for the person under real-time monitoring, the person can obtain the human body, the human head, the human face and other information of the person only by judging the identity of the person, the person does not need to be judged again, whether the information of the human face and the human head corresponds to the same person or not, thereby the correlation of the person target is enhanced, the error of target correlation in the tracking and identifying stage is reduced, and the tracking and identifying effect is improved.
Drawings
Fig. 1 is a schematic flowchart of a target detection method according to an embodiment;
FIG. 2 is a schematic flow chart of a target detection method according to another embodiment;
fig. 3 is a schematic hardware structure diagram of an object detection system according to an embodiment;
FIG. 4 is a diagram of a structured prediction module, according to an embodiment;
fig. 5 is a schematic hardware structure diagram of a terminal device according to an embodiment;
fig. 6 is a schematic diagram of a hardware structure of a terminal device according to another embodiment.
Description of the element reference numerals
M10 feature extraction module
M20 association module
M30 structured prediction module
M40 display module
1100 input device
1101 first processor
1102 output device
1103 first memory
1104 communication bus
1200 processing assembly
1201 second processor
1202 second memory
1203 communication assembly
1204 Power supply Assembly
1205 multimedia assembly
1206 Audio component
1207 input/output interface
1208 sensor assembly
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.
Referring to fig. 1, the present invention provides a target detection method, including the following steps:
s100, extracting a plurality of target features from a target image by using a neural network; the method and the device for extracting the target features from the target image can use one neural network to extract a plurality of target features from the target image, and can also use a plurality of neural networks to extract a plurality of target features from the target image; the target image may be a single-frame image or a multi-frame image.
S200, associating a plurality of targets based on the extracted target features to form a structured target; the multiple targets belong to the same object, and the multiple targets at least comprise a human body target, a human head target and a human face target.
S300, predicting the classification confidence coefficient of the structured target and the regression value of the structured target, and displaying the structured target in the target image according to the prediction result.
Aiming at the existing problems, the embodiment of the application designs a personnel structured detection mode, a detection neural network is constructed by adopting deep learning, a plurality of targets are integrated and associated by improving the prediction mode of the targets, the prediction of an independent target label is changed into integral structured prediction, the anchor (namely anchor point or anchor frame) associated clustering is carried out by combining the labels of the targets, and a structured model is established. Due to the fact that the structural model is combined with the structural correlation of the human target, the single human target can be correlated with the human face target and the human head target which are correlated with the human body target, missing detection of a plurality of targets in the same person is reduced, and false detection of irrelevant targets is also reduced. In addition, before tracking and identifying a certain person, the human body target, the human face target and the human head target of the person can be associated firstly, so that the target association accuracy is improved, and the subsequent tracking and identifying effects can be improved. The embodiment of the application directly integrates multiple targets such as human body detection, human face detection, human head detection and the like into one structured target for detection, the correlation of the human body target in the same person, the human head target and the human face target can be kept, not only is false detection eliminated, but also binding of the structured target is beneficial to simplifying the correlation operation of a series of follow-up tracking and identifying stages on the target, namely for the person under real-time monitoring, the person can obtain the human body, the human head, the human face and other information of the person only by judging the identity of the person, the person does not need to be judged again, whether the information of the human face and the human head corresponds to the same person or not, thereby the correlation of the person target is enhanced, the error of target correlation in the tracking and identifying stage is reduced, and the tracking and identifying effect is improved. As an example, the single person detection is taken as an example to perform structured detection, where the detection of the human body, the human face and the human head are all different target tasks, and they can be detected separately, but at the same time, they belong to the same person and are information of the same person, so that the human body, the human face and the human head of the same person can be detected and/or predicted as the same structured target, so the embodiment of the present application detects the human body, the human head and the human face of the same person as the one structured target.
In accordance with the above, in an exemplary embodiment, the process of displaying the structured target in the target image according to the prediction result includes: and extracting regression positions of the multiple targets according to the prediction result, mapping the regression positions of the multiple targets to the target image, and positioning and displaying the structured target in the target image. As an example, for example, regression positions of the human body target, the human head target, and the human face target are extracted according to the prediction result, and then the regression positions of the human body target, the human head target, and the human face target are mapped to the position of the original input image (i.e., the target image), so as to obtain the actual positions of the human body, the human head, and the human face in the original input image, and complete the positioning and displaying of the structured target on the original input image.
According to the above description, the process of predicting the regression value of the structured target comprises: after the structured target is formed, combining a plurality of target features extracted from the target image with the associated anchors in the target image, namely combining the plurality of target features extracted from the target image with the associated anchor frame information or anchor point information in the target image, and then predicting the regression value of the structured target according to the combination result. As an example, for a certain person a in a certain single-frame target image, firstly, the head feature, the face feature and the body feature of the person a are extracted from the target image, and then the head target, the face target and the body target of the person a are associated according to the extracted head feature, the face feature and the body feature to form a structured target. And then, associating the human head frame, the human face frame and the human body frame of the person A in the frame of target image to obtain an associated anchor corresponding to the person A. And finally, combining the head features, the face features and the body features extracted from the target image with the anchors associated in the target image, and then predicting the regression value of the structured target according to the combination result.
According to the above description, before extracting a plurality of target features from a target image by using a neural network, the method further includes normalizing the target image to eliminate an average characteristic of the target image. As an example, specifically, the target image is acquired by the image acquisition device, and the acquired target image is normalized, that is, the mean value is reduced and the variance is removed, so that the average characteristic of the target image is eliminated, the difference characteristic of the target image is retained, and the input into the feature extraction module is more representative.
According to the above description, the neural network for extracting a plurality of target features from the target image may select a classical network structure or a customized full convolution network as a basic feature extraction layer, and then perform feature extraction on the obtained target image. As an example, the basic feature extraction network may select a VGG network, a resnet series network, and the like, and may also select a customized full convolution network. The embodiment of the application can also add an FPN structure on the network structure of the neural network, and extract a plurality of target features from the target image by using the neural network added with the FPN structure.
In one embodiment, the method provides a way to detect the structurization of a person, as shown in fig. 2, comprising:
step S101, image preprocessing. Firstly, a target image is obtained through an image acquisition device, normalization, namely, mean value reduction and variance removal, is carried out on the obtained target image, the average characteristic of the target image is eliminated, the difference characteristic of the target image is reserved, and the input entering an extraction feature module is more representative.
And step S102, extracting image features. The method comprises the steps of performing feature extraction on an input target image by selecting a classical network structure or a self-defined full convolution network as a basic feature extraction layer, wherein the basic feature extraction network can select a VGG (virtual vapor gateway) network, a resnet series network and the like, can also select a self-defined network, and adds an FPN (floating platform network) structure on the basic network structure to enhance the composition of target features. The image feature extraction method and the image feature extraction device adopt the basic full convolution network framework to extract the image features and reserve the depth information of the target image.
And step S103, image structuring prediction. And acquiring the classification confidence coefficient of the structured target and the regression value of the structured target according to the extracted features. The regression value of the structured target is jointly generated by combining the associated anchor information (namely anchor point information or anchor frame information), namely for a person A in an image, after the structured target of the person A is formed, the associated anchor frame information or anchor point information corresponding to the person A in the image is obtained, then a plurality of target features extracted from the image are combined with the associated anchor frame information or anchor point information corresponding to the person A in the image, and the regression value of the structured target is predicted according to the combination result. In the embodiment of the application, compared with the anchor of an individual target, the difference of the associated anchor is that the target of the associated anchor becomes a structured multi-target label instead of an individual target frame label. As an example, in the embodiment of the present application, the target frame tag of a single human target, a human head target, or a human face target is (xyz), and the multi-target tag of a structured target is (xhwxhywxhywh). Then setting a reasonable threshold value, and extracting regression positions of the human body, the human head and the human face at one time according to the set threshold value and the predicted regression value; that is, predicting the structured target is equivalent to predicting three types of targets, namely a human body, a human head and a human face, so that the position information of each type of target can be acquired at one time, and the result is (xywhxhywywhwh). Therefore, all regression positions of the output structured target can be directly predicted by combining the associated anchors, and the person structured prediction modeling is facilitated, so that the detection of each part can be improved, and the false detection rate can be reduced.
And step S104, displaying the target. And mapping the target position information extracted by prediction to the position of the original input image (namely the target image), obtaining the actual target position of the structured target, and finishing the positioning and display of the structured target on the original input image.
The method comprises the steps of extracting a plurality of target features from a target image by utilizing a neural network; associating a plurality of targets based on the extracted plurality of target features to form a structured target; and predicting the classification confidence coefficient of the structured target and the regression value of the structured target, and displaying the structured target in the target image according to the prediction result. The multiple targets belong to the same object, and comprise a human body target, a human head target and a human face target of the object. In the prior art, when the object of a person is detected independently, the correlation among the human body, the human face and the human head of the same person can be ignored, so that the structural information of the person can be lost. Aiming at the existing problems, the method designs a personnel structured detection mode, adopts deep learning to construct a detection neural network, integrates and associates a plurality of targets by improving the prediction mode of the targets, changes the single target label prediction into integral structured prediction, and combines the labels of the plurality of targets to perform anchor (namely anchor point or anchor frame) associated clustering to establish a structured model. Due to the fact that the structural model is combined with the structural correlation of the human target, the single human target can be correlated with the human face target and the human head target which are correlated with the human body target, missing detection of a plurality of targets in the same person is reduced, and false detection of irrelevant targets is also reduced. In addition, before tracking and identifying a certain person, the human body target, the human face target and the human head target of the person can be associated firstly, so that the target association accuracy is improved, and the subsequent tracking and identifying effects can be improved. The embodiment of the application directly integrates multiple targets such as human body detection, human face detection, human head detection and the like into one structured target for detection, the correlation of the human body target in the same person, the human head target and the human face target can be kept, not only is false detection eliminated, but also binding of the structured target is beneficial to simplifying the correlation operation of a series of follow-up tracking and identifying stages on the target, namely for the person under real-time monitoring, the person can obtain the human body, the human head, the human face and other information of the person only by judging the identity of the person, the person does not need to be judged again, whether the information of the human face and the human head corresponds to the same person or not, thereby the correlation of the person target is enhanced, the error of target correlation in the tracking and identifying stage is reduced, and the tracking and identifying effect is improved.
As shown in fig. 3 and 4, the present invention also provides an object detection system, including:
a feature extraction module M10, configured to extract a plurality of target features from the target image by using a neural network; according to the embodiment of the application, one neural network can be used for extracting a plurality of target features from a target image, and a plurality of neural networks can be used for extracting a plurality of target features from the target image; the target image may be a single-frame image or a multi-frame image.
The association module M20 is used for associating a plurality of targets according to the extracted target features to form a structured target; wherein the plurality of targets belong to the same object;
a structural prediction module M30, configured to predict the classification confidence of the structural target and the regression value of the structural target;
a display module M40, configured to display the structured target in the target image according to the prediction result.
Aiming at the existing problems, the embodiment of the application designs a personnel structured detection mode, a detection neural network is constructed by adopting deep learning, a plurality of targets are integrated and associated by improving the prediction mode of the targets, the prediction of an independent target label is changed into integral structured prediction, the anchor (namely anchor point or anchor frame) associated clustering is carried out by combining the labels of the targets, and a structured model is established. Due to the fact that the structural model is combined with the structural correlation of the human target, the single human target can be correlated with the human face target and the human head target which are correlated with the human body target, missing detection of a plurality of targets in the same person is reduced, and false detection of irrelevant targets is also reduced. In addition, before tracking and identifying a certain person, the human body target, the human face target and the human head target of the person can be associated firstly, so that the target association accuracy is improved, and the subsequent tracking and identifying effects can be improved. The embodiment of the application directly integrates multiple targets such as human body detection, human face detection, human head detection and the like into one structured target for detection, the correlation of the human body target in the same person, the human head target and the human face target can be kept, not only is false detection eliminated, but also binding of the structured target is beneficial to simplifying the correlation operation of a series of follow-up tracking and identifying stages on the target, namely for the person under real-time monitoring, the person can obtain the human body, the human head, the human face and other information of the person only by judging the identity of the person, the person does not need to be judged again, whether the information of the human face and the human head corresponds to the same person or not, thereby the correlation of the person target is enhanced, the error of target correlation in the tracking and identifying stage is reduced, and the tracking and identifying effect is improved. As an example, the single person detection is taken as an example to perform structured detection, where the detection of the human body, the human face and the human head are all different target tasks, and they can be detected separately, but at the same time, they belong to the same person and are information of the same person, so that the human body, the human face and the human head of the same person can be detected and/or predicted as the same structured target, so the embodiment of the present application detects the human body, the human head and the human face of the same person as the one structured target.
In accordance with the above, in an exemplary embodiment, the process of displaying the structured target in the target image according to the prediction result includes: and extracting regression positions of the multiple targets according to the prediction result, mapping the regression positions of the multiple targets to the target image, and positioning and displaying the structured target in the target image. As an example, for example, regression positions of the human body target, the human head target, and the human face target are extracted according to the prediction result, and then the regression positions of the human body target, the human head target, and the human face target are mapped to the position of the original input image (i.e., the target image), so as to obtain the actual positions of the human body, the human head, and the human face in the original input image, and complete the positioning and displaying of the structured target on the original input image.
According to the above description, the process of predicting the regression value of the structured target comprises: after the structured target is formed, combining a plurality of target features extracted from the target image with the associated anchors in the target image, namely combining the plurality of target features extracted from the target image with the associated anchor frame information or anchor point information in the target image, and then predicting the regression value of the structured target according to the combination result. As an example, for a certain person a in a certain single-frame target image, firstly, the head feature, the face feature and the body feature of the person a are extracted from the target image, and then the head target, the face target and the body target of the person a are associated according to the extracted head feature, the face feature and the body feature to form a structured target. And then, associating the human head frame, the human face frame and the human body frame of the person A in the frame of target image to obtain an associated anchor corresponding to the person A. And finally, combining the head features, the face features and the body features extracted from the target image with the anchors associated in the target image, and then predicting the regression value of the structured target according to the combination result.
According to the above description, before extracting a plurality of target features from a target image by using a neural network, the method further includes normalizing the target image to eliminate an average characteristic of the target image. As an example, specifically, the target image is acquired by the image acquisition device, and the acquired target image is normalized, that is, the mean value is reduced and the variance is removed, so that the average characteristic of the target image is eliminated, the difference characteristic of the target image is retained, and the input into the feature extraction module is more representative.
According to the above description, the neural network for extracting a plurality of target features from the target image may select a classical network structure or a customized full convolution network as a basic feature extraction layer, and then perform feature extraction on the obtained target image. As an example, the basic feature extraction network may select a VGG network, a resnet series network, and the like, and may also select a customized full convolution network. The embodiment of the application can also add an FPN structure on the network structure of the neural network, and extract a plurality of target features from the target image by using the neural network added with the FPN structure.
In one embodiment, the present system provides a way to detect the structuralization of a person, as shown in fig. 2, comprising:
step S101, image preprocessing. Firstly, a target image is obtained through an image acquisition device, normalization, namely, mean value reduction and variance removal, is carried out on the obtained target image, the average characteristic of the target image is eliminated, the difference characteristic of the target image is reserved, and the input entering an extraction feature module is more representative.
And step S102, extracting image features. The method comprises the steps of performing feature extraction on an input target image by selecting a classical network structure or a self-defined full convolution network as a basic feature extraction layer, wherein the basic feature extraction network can select a VGG (virtual vapor gateway) network, a resnet series network and the like, can also select a self-defined network, and adds an FPN (floating platform network) structure on the basic network structure to enhance the composition of target features. The image feature extraction method and the image feature extraction device adopt the basic full convolution network framework to extract the image features and reserve the depth information of the target image.
And step S103, image structuring prediction. And acquiring the classification confidence coefficient of the structured target and the regression value of the structured target according to the extracted features. The regression value of the structured target is jointly generated by combining the associated anchor information (namely anchor point information or anchor frame information), namely for a person A in an image, after the structured target of the person A is formed, the associated anchor frame information or anchor point information corresponding to the person A in the image is obtained, then a plurality of target features extracted from the image are combined with the associated anchor frame information or anchor point information corresponding to the person A in the image, and the regression value of the structured target is predicted according to the combination result. In the embodiment of the application, compared with the anchor of an individual target, the difference of the associated anchor is that the target of the associated anchor becomes a structured multi-target label instead of an individual target frame label. As an example, in the embodiment of the present application, the target frame tag of a single human target, a human head target, or a human face target is (xyz), and the multi-target tag of a structured target is (xhwxhywxhywh). Then setting a reasonable threshold value, and extracting regression positions of the human body, the human head and the human face at one time according to the set threshold value and the predicted regression value; that is, predicting the structured target is equivalent to predicting three types of targets, namely a human body, a human head and a human face, so that the position information of each type of target can be acquired at one time, and the result is (xywhxhywywhwh). Therefore, all regression positions of the output structured target can be directly predicted by combining the associated anchors, and the person structured prediction modeling is facilitated, so that the detection of each part can be improved, and the false detection rate can be reduced.
And step S104, displaying the target. And mapping the target position information extracted by prediction to the position of the original input image (namely the target image), obtaining the actual target position of the structured target, and finishing the positioning and display of the structured target on the original input image.
The system utilizes a neural network to extract a plurality of target features from a target image; associating a plurality of targets based on the extracted plurality of target features to form a structured target; and predicting the classification confidence coefficient of the structured target and the regression value of the structured target, and displaying the structured target in the target image according to the prediction result. The multiple targets belong to the same object, and comprise a human body target, a human head target and a human face target of the object. In the prior art, when the object of a person is detected independently, the correlation among the human body, the human face and the human head of the same person can be ignored, so that the structural information of the person can be lost. Aiming at the existing problems, the system designs a personnel structured detection mode, adopts deep learning to construct a detection neural network, integrates and associates a plurality of targets by improving the prediction mode of the targets, changes the single target label prediction into integral structured prediction, and simultaneously combines the labels of the targets to perform anchor (namely anchor point or anchor frame) associated clustering to establish a structured model. Due to the fact that the structural model is combined with the structural correlation of the human target, the single human target can be correlated with the human face target and the human head target which are correlated with the human body target, missing detection of a plurality of targets in the same person is reduced, and false detection of irrelevant targets is also reduced. In addition, before tracking and identifying a certain person, the human body target, the human face target and the human head target of the person can be associated firstly, so that the target association accuracy is improved, and the subsequent tracking and identifying effects can be improved. The embodiment of the application directly integrates multiple targets such as human body detection, human face detection, human head detection and the like into one structured target for detection, the correlation of the human body target in the same person, the human head target and the human face target can be kept, not only is false detection eliminated, but also binding of the structured target is beneficial to simplifying the correlation operation of a series of follow-up tracking and identifying stages on the target, namely for the person under real-time monitoring, the person can obtain the human body, the human head, the human face and other information of the person only by judging the identity of the person, the person does not need to be judged again, whether the information of the human face and the human head corresponds to the same person or not, thereby the correlation of the person target is enhanced, the error of target correlation in the tracking and identifying stage is reduced, and the tracking and identifying effect is improved.
An embodiment of the present application further provides a computer device, where the computer device may include: one or more processors; and one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the method of fig. 1. In practical applications, the device may be used as a terminal device, and may also be used as a server, where examples of the terminal device may include: the mobile terminal includes a smart phone, a tablet computer, an electronic book reader, an MP3 (Moving Picture Experts Group Audio Layer III) player, an MP4 (Moving Picture Experts Group Audio Layer IV) player, a laptop, a vehicle-mounted computer, a desktop computer, a set-top box, an intelligent television, a wearable device, and the like.
The present embodiment also provides a non-volatile readable storage medium, where one or more modules (programs) are stored in the storage medium, and when the one or more modules are applied to a device, the device may execute instructions (instructions) included in the data processing method in fig. 1 according to the present embodiment.
Fig. 5 is a schematic diagram of a hardware structure of a terminal device according to an embodiment of the present application. As shown, the terminal device may include: an input device 1100, a first processor 1101, an output device 1102, a first memory 1103, and at least one communication bus 1104. The communication bus 1104 is used to implement communication connections between the elements. The first memory 1103 may include a high-speed RAM memory, and may also include a non-volatile storage NVM, such as at least one disk memory, and the first memory 1103 may store various programs for performing various processing functions and implementing the method steps of the present embodiment.
Alternatively, the first processor 1101 may be, for example, a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components, and the processor 1101 is coupled to the input device 1100 and the output device 1102 through a wired or wireless connection.
Optionally, the input device 1100 may include a variety of input devices, such as at least one of a user-oriented user interface, a device-oriented device interface, a software programmable interface, a camera, and a sensor. Optionally, the device interface facing the device may be a wired interface for data transmission between devices, or may be a hardware plug-in interface (e.g., a USB interface, a serial port, etc.) for data transmission between devices; optionally, the user-facing user interface may be, for example, a user-facing control key, a voice input device for receiving voice input, and a touch sensing device (e.g., a touch screen with a touch sensing function, a touch pad, etc.) for receiving user touch input; optionally, the programmable interface of the software may be, for example, an entry for a user to edit or modify a program, such as an input pin interface or an input interface of a chip; the output devices 1102 may include output devices such as a display, audio, and the like.
In this embodiment, the processor of the terminal device includes a function for executing each module of the speech recognition apparatus in each device, and specific functions and technical effects may refer to the above embodiments, which are not described herein again.
Fig. 6 is a schematic hardware structure diagram of a terminal device according to another embodiment of the present application. FIG. 6 is a specific embodiment of the implementation of FIG. 5. As shown, the terminal device of the present embodiment may include a second processor 1201 and a second memory 1202.
The second processor 1201 executes the computer program code stored in the second memory 1202 to implement the method described in fig. 1 in the above embodiment.
The second memory 1202 is configured to store various types of data to support operations at the terminal device. Examples of such data include instructions for any application or method operating on the terminal device, such as messages, pictures, videos, and so forth. The second memory 1202 may include a Random Access Memory (RAM) and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.
Optionally, a second processor 1201 is provided in the processing assembly 1200. The terminal device may further include: communication components 1203, power components 1204, multimedia components 1205, audio components 1206, input/output interfaces 1207, and/or sensor components 1208. The specific components included in the terminal device are set according to actual requirements, which is not limited in this embodiment.
The processing component 1200 generally controls the overall operation of the terminal device. The processing assembly 1200 may include one or more second processors 1201 to execute instructions to perform all or part of the steps of the method illustrated in fig. 1 described above. Further, the processing component 1200 can include one or more modules that facilitate interaction between the processing component 1200 and other components. For example, the processing component 1200 can include a multimedia module to facilitate interaction between the multimedia component 1205 and the processing component 1200.
The power supply component 1204 provides power to the various components of the terminal device. The power components 1204 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the terminal device.
The multimedia components 1205 include a display screen that provides an output interface between the terminal device and the user. In some embodiments, the display screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the display screen includes a touch panel, the display screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.
The audio component 1206 is configured to output and/or input speech signals. For example, the audio component 1206 includes a Microphone (MIC) configured to receive external voice signals when the terminal device is in an operational mode, such as a voice recognition mode. The received speech signal may further be stored in the second memory 1202 or transmitted via the communication component 1203. In some embodiments, audio component 1206 also includes a speaker for outputting voice signals.
The input/output interface 1207 provides an interface between the processing component 1200 and peripheral interface modules, which may be click wheels, buttons, etc. These buttons may include, but are not limited to: a volume button, a start button, and a lock button.
The sensor component 1208 includes one or more sensors for providing various aspects of status assessment for the terminal device. For example, the sensor component 1208 may detect an open/closed state of the terminal device, relative positioning of the components, presence or absence of user contact with the terminal device. The sensor assembly 1208 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact, including detecting the distance between the user and the terminal device. In some embodiments, the sensor assembly 1208 may also include a camera or the like.
The communication component 1203 is configured to facilitate communications between the terminal device and other devices in a wired or wireless manner. The terminal device may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In one embodiment, the terminal device may include a SIM card slot therein for inserting a SIM card therein, so that the terminal device may log onto a GPRS network to establish communication with the server via the internet.
As can be seen from the above, the communication component 1203, the audio component 1206, the input/output interface 1207 and the sensor component 1208 in the embodiment of fig. 6 may be implemented as the input device in the embodiment of fig. 5.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims (11)

1. A method of target detection, comprising the steps of:
extracting a plurality of target features from the target image by using a neural network, and associating a plurality of targets based on the extracted plurality of target features to form a structured target; wherein the plurality of targets belong to the same object;
and predicting the classification confidence of the structured target and the regression value of the structured target, and displaying the structured target in the target image according to the prediction result.
2. The object detection method of claim 1, wherein displaying the structured object in the object image according to the prediction result comprises:
extracting a regression position of the structured target from the prediction result;
mapping the regression location of the structured target into the target image, and locating and displaying the structured target in the target image.
3. The method of claim 1 or 2, wherein predicting the regression value of the structured target comprises: and combining a plurality of target features extracted from the target image with anchor frame information or anchor point information which is associated in advance in the target image, and predicting a regression value of the structured target according to a combination result.
4. The method of claim 1, further comprising normalizing the target image prior to extracting the plurality of target features from the target image using the neural network.
5. The object detection method according to claim 1, further comprising adding an FPN structure to the network structure of the neural network, and extracting a plurality of object features from the object image using the neural network to which the FPN structure is added.
6. The object detection method according to any one of claims 1 to 5, wherein the plurality of objects includes at least a human body, a human head, and a human face.
7. An object detection system, comprising:
the characteristic extraction module is used for extracting a plurality of target characteristics from the target image by utilizing a neural network;
the association module is used for associating a plurality of targets according to the extracted target features to form a structured target; wherein the plurality of targets belong to the same object;
a structured prediction module for predicting a classification confidence of the structured target and a regression value of the structured target;
and the display module is used for displaying the structured target in the target image according to the prediction result.
8. The object detection system of claim 7, wherein the display module displays the structured object in the object image according to the prediction result, comprising:
extracting a regression position of the structured target from the prediction result;
mapping the regression location of the structured target into the target image, and locating and displaying the structured target in the target image.
9. The object detection system of claim 7 or 8, wherein the process of the structuring module predicting regression values of the structured object comprises: and combining a plurality of target features extracted from the target image with anchor frame information or anchor point information associated in advance in the target image, and predicting a regression value of the structured target according to a combination result.
10. A computer device, comprising:
one or more processors; and
one or more machine-readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the method of any of claims 1-6.
11. One or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform the method of any of claims 1-6.
CN202110398900.6A 2021-04-14 2021-04-14 Target detection method, system, computer equipment and machine readable medium Pending CN113076955A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110398900.6A CN113076955A (en) 2021-04-14 2021-04-14 Target detection method, system, computer equipment and machine readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110398900.6A CN113076955A (en) 2021-04-14 2021-04-14 Target detection method, system, computer equipment and machine readable medium

Publications (1)

Publication Number Publication Date
CN113076955A true CN113076955A (en) 2021-07-06

Family

ID=76618703

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110398900.6A Pending CN113076955A (en) 2021-04-14 2021-04-14 Target detection method, system, computer equipment and machine readable medium

Country Status (1)

Country Link
CN (1) CN113076955A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160217319A1 (en) * 2012-10-01 2016-07-28 The Regents Of The University Of California Unified face representation for individual recognition in surveillance videos and vehicle logo super-resolution system
CN108875577A (en) * 2018-05-11 2018-11-23 深圳市易成自动驾驶技术有限公司 Object detection method, device and computer readable storage medium
CN109886208A (en) * 2019-02-25 2019-06-14 北京达佳互联信息技术有限公司 Method, apparatus, computer equipment and the storage medium of object detection
CN111444850A (en) * 2020-03-27 2020-07-24 北京爱笔科技有限公司 Picture detection method and related device
CN111882582A (en) * 2020-07-24 2020-11-03 广州云从博衍智能科技有限公司 Image tracking correlation method, system, device and medium
CN112200187A (en) * 2020-10-16 2021-01-08 广州云从凯风科技有限公司 Target detection method, device, machine readable medium and equipment
CN112560705A (en) * 2020-12-17 2021-03-26 北京捷通华声科技股份有限公司 Face detection method and device and electronic equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160217319A1 (en) * 2012-10-01 2016-07-28 The Regents Of The University Of California Unified face representation for individual recognition in surveillance videos and vehicle logo super-resolution system
CN108875577A (en) * 2018-05-11 2018-11-23 深圳市易成自动驾驶技术有限公司 Object detection method, device and computer readable storage medium
CN109886208A (en) * 2019-02-25 2019-06-14 北京达佳互联信息技术有限公司 Method, apparatus, computer equipment and the storage medium of object detection
CN111444850A (en) * 2020-03-27 2020-07-24 北京爱笔科技有限公司 Picture detection method and related device
CN111882582A (en) * 2020-07-24 2020-11-03 广州云从博衍智能科技有限公司 Image tracking correlation method, system, device and medium
CN112200187A (en) * 2020-10-16 2021-01-08 广州云从凯风科技有限公司 Target detection method, device, machine readable medium and equipment
CN112560705A (en) * 2020-12-17 2021-03-26 北京捷通华声科技股份有限公司 Face detection method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN112200187A (en) Target detection method, device, machine readable medium and equipment
CN111898495B (en) Dynamic threshold management method, system, device and medium
CN111723746A (en) Scene recognition model generation method, system, platform, device and medium
CN111310725A (en) Object identification method, system, machine readable medium and device
CN111340848A (en) Object tracking method, system, device and medium for target area
CN107562356B (en) Fingerprint identification positioning method and device, storage medium and electronic equipment
CN114581998A (en) Deployment and control method, system, equipment and medium based on target object association feature fusion
CN112529939A (en) Target track matching method and device, machine readable medium and equipment
CN111291638A (en) Object comparison method, system, equipment and medium
CN112989210A (en) Insurance recommendation method, system, equipment and medium based on health portrait
CN108960213A (en) Method for tracking target, device, storage medium and terminal
CN111260697A (en) Target object identification method, system, device and medium
CN115623336B (en) Image tracking method and device for hundred million-level camera equipment
CN113076955A (en) Target detection method, system, computer equipment and machine readable medium
CN115409869A (en) Snow field trajectory analysis method and device based on MAC tracking
CN111626369B (en) Face recognition algorithm effect evaluation method and device, machine readable medium and equipment
CN112417197B (en) Sorting method, sorting device, machine readable medium and equipment
CN112347982A (en) Video-based unsupervised difficult case data mining method, device, medium and equipment
CN111639705B (en) Batch picture marking method, system, machine readable medium and equipment
CN112596846A (en) Method and device for determining interface display content, terminal equipment and storage medium
CN112150685A (en) Vehicle management method, system, machine readable medium and equipment
CN112580472A (en) Rapid and lightweight face recognition method and device, machine readable medium and equipment
CN111753852A (en) Tea leaf identification method, recommendation method, tea leaf identification device, equipment and medium
CN111079662A (en) Figure identification method and device, machine readable medium and equipment
CN116468883B (en) High-precision image data volume fog recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination