CN112528977B - Target detection method, target detection device, electronic equipment and storage medium - Google Patents
Target detection method, target detection device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN112528977B CN112528977B CN202110180875.4A CN202110180875A CN112528977B CN 112528977 B CN112528977 B CN 112528977B CN 202110180875 A CN202110180875 A CN 202110180875A CN 112528977 B CN112528977 B CN 112528977B
- Authority
- CN
- China
- Prior art keywords
- feature
- fusion
- picture
- detection
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the invention provides a target detection method, a target detection device, electronic equipment and a storage medium, wherein the target detection method comprises the following steps: acquiring a feature extraction network, a feature fusion network and a multi-task detector which are obtained by utilizing a fusion picture set to perform model training, wherein the fusion picture set is obtained by utilizing each single-task detection model to detect and mark the position, the type, the attribute and the key point of a preset target in each picture in a preset picture set; extracting the features of the picture to be detected through a feature extraction network to obtain a plurality of original feature maps with different sizes; performing feature fusion on the original feature maps through a feature fusion network to obtain a first preset number of fusion feature maps with different sizes; and performing feature detection on the fusion feature map through a multitask detector to obtain the position, type, attribute and key point of the target to be detected. The embodiment of the invention can reduce the workload of model training, realize multi-task one-stop detection and improve the detection efficiency.
Description
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a target detection method, a target detection device, electronic equipment and a storage medium.
Background
According to traditional target detection, a detector can only achieve detection of a single task, for example, the position of a detected target, a plurality of detectors need to be trained to achieve detection of a plurality of tasks at the same time, the plurality of tasks are sequentially and serially carried out by the aid of the plurality of detectors, model training workload is large, and overall detection efficiency is low.
Disclosure of Invention
The embodiment of the invention provides a target detection method, a target detection device, electronic equipment and a storage medium, which can reduce the workload of model training, realize multi-task one-stop detection and improve the detection efficiency.
In a first aspect, an embodiment of the present invention provides a target detection method, including:
acquiring a feature extraction network, a feature fusion network and a multi-task detector which are obtained by utilizing a fusion picture set to perform model training, wherein the fusion picture set is obtained by utilizing each single-task detection model to detect and mark the position, the type, the attribute and the key point of a preset target in each picture in a preset picture set;
extracting the features of the picture to be detected through the feature extraction network to obtain a plurality of original feature maps with different sizes;
performing feature fusion on the original feature maps through the feature fusion network to obtain a first preset number of fusion feature maps with different sizes;
and performing feature detection on the fusion feature map through the multitask detector to obtain the position, type, attribute and key point of the target to be detected.
In a second aspect, an embodiment of the present invention provides an object detection apparatus, including:
the system comprises an acquisition module, a feature extraction network, a feature fusion network and a multi-task detector, wherein the acquisition module is used for acquiring the feature extraction network, the feature fusion network and the multi-task detector which are obtained by utilizing a fusion picture set to carry out model training, and the fusion picture set is obtained by utilizing each single-task detection model to detect and mark the position, the type, the attribute and a key point of a preset target in each picture in the preset picture set;
the extraction module is used for extracting the features of the picture to be detected through the feature extraction network to obtain a plurality of original feature maps with different sizes;
the fusion module is used for carrying out feature fusion on the original feature maps through the feature fusion network to obtain a first preset number of fusion feature maps with different sizes;
and the detection module is used for carrying out feature detection on the fusion feature map through the multitask detector to obtain the position, the type, the attribute and the key point of the target to be detected.
In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the object detection method according to the embodiment of the present invention.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements an object detection method according to an embodiment of the present invention.
In the embodiment of the invention, a model trained based on a fused picture set for multi-task detection can be obtained, the fused picture set is obtained by detecting and marking the position, the type, the attribute and the key point of a preset target in each picture in the preset picture set by utilizing each single-task detection model, multi-task one-stop detection is realized based on the model obtained by training, and the detection efficiency is improved; in addition, the model training of the embodiment of the invention is based on the fusion picture set, the multi-task detector is trained in one step, and a plurality of detectors for detecting a plurality of tasks do not need to be trained respectively, so that the workload of the model training is reduced; furthermore, in the detection process, the multi-task shares the feature extraction network and the feature fusion network, so that the network utilization rate is improved, the calculated amount is reduced, and the overall detection efficiency is improved; in addition, target detection is carried out based on the fusion characteristic graphs of different sizes, targets of different sizes can be detected, and detection accuracy is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a schematic flow chart of a target detection method according to an embodiment of the present invention.
Fig. 2 is a schematic sub-flow chart of a target detection method according to an embodiment of the present invention.
Fig. 3 is a schematic diagram illustrating an effect of the target detection method according to the embodiment of the present invention.
Fig. 4 is a flowchart illustrating an obtaining method of a fused picture set according to an embodiment of the present invention.
Fig. 5 is a schematic diagram illustrating an effect of a training picture according to an embodiment of the present invention.
Fig. 6 is a schematic flowchart of a model training method according to an embodiment of the present invention.
Fig. 7 is a network diagram of a model training process according to an embodiment of the present invention.
Fig. 8 is a schematic structural diagram of an object detection apparatus according to an embodiment of the present invention.
Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Fig. 1 is a schematic flowchart of an object detection method according to an embodiment of the present invention, which may be implemented by an object detection apparatus according to an embodiment of the present invention, where the apparatus may be implemented in software and/or hardware. In a specific embodiment, the apparatus may be integrated into an electronic device, and the electronic device may be a mobile phone, a Personal Computer (PC), a tablet Computer, a notebook Computer, a desktop Computer, or the like. The following embodiments will be described taking as an example the integration of the device in an electronic apparatus. Referring to fig. 1, the method may specifically include the following steps:
For example, the single-task detection model may be a model for single-task detection, the single-task detection model may be an existing model selected according to actual detection requirements, or the single-task detection model may also be a model trained according to actual detection requirements. The preset target may be set according to an actual detection requirement, for example, if a model for performing target detection on a speech scene is to be trained, the preset target may include a human face and a hand. Illustratively, the single-task detection model may include, for example: the system comprises a model for detecting and positioning the human face, a model for detecting and positioning gestures, a model for detecting the attribute of the human face and a model for detecting key points of the human face. Wherein, the attributes of the face such as expression, eye spirit, gender, age, etc., and the key points of the face such as eyes, mouth, nose, etc.
Specifically, the preset picture set may be a set formed by a large number of pictures collected from an open platform and related to a scene to be detected, and after each picture in the preset picture set is detected and labeled by each single task detection model, each picture in the fused picture set is a picture labeled with a position, a type, an attribute and a key point of a target.
And 102, performing feature extraction on the picture to be detected through a feature extraction network to obtain a plurality of original feature maps with different sizes.
For example, the picture to be detected may be an independent picture, or may be a series of pictures taken from a video, for example, the picture to be detected may be from a video generated by real-time recording, or may be from a pre-generated video.
Specifically, the pictures to be detected can be input into the feature extraction network, so that pooling and sampling operations are performed on each picture by using the feature extraction network, and a plurality of original feature maps with different sizes corresponding to the pictures to be detected are obtained. In a specific embodiment, the feature graphs extracted by the feature extraction network are sequentially reduced in size from top to bottom, and in order to improve the processing efficiency, a first preset number of feature graphs can be selected from the middle-lower layer feature graphs extracted by the feature extraction network to serve as original feature graphs. The first preset number can be a user-defined value according to an actual situation, and can be, for example, 2 or 3.
And 103, performing feature fusion on the original feature maps through a feature fusion network to obtain a first preset number of fusion feature maps with different sizes.
The original feature maps with different sizes are subjected to feature fusion, so that the semantics of the feature maps can be enhanced, and the prediction accuracy is improved. The specific feature fusion method may, for example, unify the number of channels of each original feature map, start from the original feature map at the lowest layer, perform an upsampling operation on the original feature map at the lower layer, and then perform feature addition, convolution fusion and other operations on the original feature map at the adjacent upper layer, so as to obtain a fused feature map.
And 104, performing feature detection on the fusion feature map through a multi-task detector to obtain the position, type, attribute and key point of the target to be detected.
Specifically, the method for detecting features by using a multi-tasking detector can be shown in fig. 2, and includes the following steps:
And 1042, decoding the prediction output data to obtain prediction frame data.
The prediction box data may include prediction box position data, which may include center point coordinates and width and height of the prediction box, type data, which may include type names (such as face and hand) and type confidence, attribute data, which may include coordinates of respective key points, and attribute data, which may include attribute names (such as smile, anger, embarrassment) and attribute confidence.
for example, the prediction box data with the confidence lower than the preset confidence threshold may be filtered, and the NMS algorithm may be applied to the remaining prediction box data to filter the prediction box data with a large overlap degree, so as to finally obtain the target box data.
And step 1044, marking the original characteristic diagram according to the target frame data to obtain the position, type, attribute and key point of the target to be detected.
When the picture to be detected comes from the video, the target detection method is sequentially executed for each image in the video, and dynamic and continuous detection results can be seen in the video.
For example, for a speech video, the detection results of various speech indexes can be dynamically displayed in the video through the detection method provided by the embodiment of the invention, and for a real-time speech scene, a speaker can adjust the speaker according to the detection results presented in the video, so as to present a good speech effect. Furthermore, the speaker can be scored according to the detection result.
In a specific embodiment, as shown in fig. 3, the detection result presented by using the target detection method provided by the embodiment of the present invention may be represented by a picture in which a detection frame of a Face and a type thereof (Face), a detection frame of a gesture and a type thereof (Hand) are presented, and attributes (e.g., expression) and key points (e.g., eyes, nose, mouth) of the Face are labeled in the detection frame of the Face, and numbers in the figure represent confidence.
In the embodiment of the invention, a method for detecting and labeling the preset picture set by using a plurality of single-task detection models can be used for obtaining the fusion picture set, a model for multi-task detection is trained based on the fusion picture set, multi-task one-stop detection is realized based on the trained model, and the detection efficiency is improved; in addition, the model training of the embodiment of the invention is based on the fusion picture set, the multi-task detector is trained in one step, and a plurality of detectors for detecting a plurality of tasks do not need to be trained respectively, so that the workload of the model training is reduced, and convenience is provided for the model transformation and deployment; furthermore, in the detection process, the multi-task shares the feature extraction network and the feature fusion network, so that the network utilization rate is improved, the calculated amount is reduced, and the overall detection efficiency is improved; in addition, target detection is carried out based on the fusion characteristic graphs of different sizes, targets of different sizes can be detected, and detection accuracy is improved.
In a specific embodiment, as shown in fig. 4, the fused picture set can be obtained as follows:
For example, the face detection and positioning model may be used to detect the face in each picture in the preset picture set, and after the face is detected, the position of the face may be framed and the type may be labeled.
For example, a hand in each picture in the preset picture set may be detected by using the gesture detection and positioning model, and after the hand is detected, a frame may be added to the position of the hand and the type may be labeled.
And 203, detecting and marking the face attribute in each picture in the preset picture set by using the face attribute detection model.
For example, the face attribute detection model may be used to detect the face attributes in each picture in the preset picture set, such as expression (e.g., smile, laugh, embarrassment), gaze (e.g., point view, virtual view, and circular view), gender, age, and the like, and mark the attributes at the corresponding positions of the pictures according to the detection results.
And 204, detecting the human face key points in each picture in the preset picture set by using a human face key point detection model.
For example, a face key point detection model may be used to detect face key points, such as eyes, mouth, nose, etc., in each picture in a preset picture set, and mark key points at corresponding positions of the pictures according to the detection result.
After the position, the type, the attribute and the key point of a preset target in each picture in a preset picture set are detected and marked, a fused picture set is obtained. For example, as shown in fig. 5, one picture in the fused picture set may be shown, in fig. 5, the labeled type includes Face and Hand handle, the labeled attribute is expression No-smile, and the labeled key point is eye.
In a specific embodiment, as shown in fig. 6, the method for obtaining the feature extraction network, the feature fusion network, and the multi-task detector by using the fusion image set to perform model training may be as follows:
In a specific embodiment, a preset target corresponds to a real frame group route, and the determined real frame may include a position tag (which may include a center point coordinate and a width and a height of the real frame), a type tag, an attribute tag, and a key point tag of the real frame.
In a specific implementation, the initial feature extraction network may be a lightweight feature extraction network, including but not limited to networks such as MobileNet, ShuffleNet, SqueezeNet, and the like, and various iterative versions thereof. In order to meet the requirements of multiple aspects such as detection speed, model size, detection precision and the like, in the embodiment of the invention, the MobileNet-V2 can be selected as an initial feature extraction network, and the width scaling factor of the MobileNet-V2 is set to be 0.35, so that the parameter quantity of the initial feature extraction network is reduced to about 30M, and a foundation is laid for realizing the lightweight and low-delay performance of a mobile terminal model.
Exemplarily, each picture in the fused picture set is input into an initial feature extraction network, so that pooling and sampling operations are performed on each picture by using the initial feature extraction network, and a plurality of original training feature maps with different sizes corresponding to each picture are obtained.
For example, the size of the picture input into the initial feature extraction network is (224, 224, 3), and after feature extraction by the initial feature extraction network, feature maps of 1/2, 1/4, 1/8, 1/16, and 1/32, i.e., feature maps of (112, 112, 16), (56, 56, 24), (28, 28, 32), (14, 14, 96), (7, 7, 160) of the original image, can be obtained, where a large-size feature map can be used to detect a small-size object, and a small-size feature map can be used to detect a large-size object. In order to improve the processing efficiency, the feature map of the middle and lower layers extracted by the initial feature extraction network may be selected as the original training feature map, and for example, the feature maps with the sizes of (28, 28, 32), (14, 14, 96), (7, 7, 160) may be used as the original training feature map.
For example, the initial feature fusion network may be a feature pyramid network, that is, the feature pyramid network may be used to perform feature fusion on original training feature maps of different sizes, so as to enhance the semantics of the feature maps, and obtain a first preset number of fused training feature maps of different sizes.
For example, as shown in fig. 7, the original training feature maps are (28, 28, 32), (14, 14, 96) and (7, 7, 160), the original training feature maps with the scales of (28, 28, 32), (14, 14, 96) and (7, 7, 160) may be first convolved by 1 × 1, the number of channels may be uniformly adjusted to 64, feature maps with the scales of (28, 28, 64), (14, 14, 64) and (7, 7, 64) may be obtained, and the feature map with the scale of (7, 7, 64) may be directly used as one fused training feature map.
And performing up-sampling operation on the original training feature map with the size of (7, 7, 64) to obtain a feature map with the size of (14, 14, 64), and performing feature addition and convolution fusion on the feature map with the size of (14, 14, 64) and the feature map with the size of (14, 14, 64) obtained after channel adjustment to obtain a fused training feature map with the size of (14, 14, 64).
Similarly, the feature map with the size of (28, 28, 64) obtained after the up-sampling operation is performed on the feature map with the size of (14, 14, 64) after the fusion, and the feature map with the size of (28, 28, 64) obtained after the channel adjustment is subjected to feature addition and convolution fusion to obtain the fusion training feature map with the size of (28, 28, 64). That is, in the example of fig. 7, there are 3 output fused training feature maps.
It should be noted that the above-described feature fusion method is only an example, and in practical applications, other feature fusion methods may also be adopted, and are not specifically limited herein.
And step 304, setting a second preset number of different-size prior frames at each pixel point of the fusion training feature map.
In one possible implementation, the prior box may be a rectangular box, and the prior box may be set as follows: determining the size of the fusion training feature maps, determining the prior frame size for each fusion training feature map according to the preset relation between the prior frame size and the feature map size, and then setting the prior frames with different sizes in a second preset number on each pixel point of the corresponding fusion training feature maps according to the prior frame size determined for each fusion training feature map and the preset prior frame number (namely, the second preset number). The second preset number can be a user-defined value according to an actual situation, and can be, for example, 2 or 3. In the example shown in fig. 7, the number of the prior frames set on each pixel point of the fused training feature map is 2.
In some embodiments, the prior frame may also be set in other manners, such as according to the size of a preset target, which is not specifically limited herein.
For example, the intersection ratio may be calculated according to the following formula:
wherein the content of the first and second substances,denotes the cross-over ratio, a denotes the size of the prior box, and B denotes the size of the real box.
And step 306, selecting a suggestion box from the prior check boxes according to the intersection ratio.
Because each pixel point of the fusion training characteristic diagram is provided with a second preset number of prior frames with different sizes, the number of the prior frames is large, in order to improve the processing efficiency, a suggested frame can be selected from the prior frames according to the intersection ratio, and the specific selection method can be as follows:
(1) and selecting a prior frame with the intersection ratio larger than a preset threshold as a suggestion frame, wherein the preset threshold can be preset according to the actual situation, for example, the preset threshold can be 0.6, 0.7 and the like.
(2) And selecting the prior frame with the largest intersection ratio as a suggestion frame.
Specifically, each preset target may correspond to one real box and at least one suggestion box, and in the training stage, target detection may be performed based on the suggestion box corresponding to each preset target.
And 307, inputting the fused training feature map into the initial multitask detector, so that the initial multitask detector performs feature detection based on the suggestion box to obtain training output data.
Specifically, the real frame may be understood as a frame on the original image, and the real frame may be encoded on the fused training feature map, and then the type confidence loss function may be calculated according to the training output data and the encoded real frameTarget position offset penalty functionShift loss function of key point positionAnd attribute confidence loss function(ii) a According to the above、、Andcalculating a loss function。
Wherein the content of the first and second substances,
wherein the content of the first and second substances,the number of types, particularly in embodiments of the invention,the value is 3, namely the types are 3 types, namely the types are respectively a human face, a hand and a background,indicates whether it belongs to the predicted value (0 or 1) of the ith type,representing predicted probability values belonging to the ith type;
wherein the content of the first and second substances,the coordinate value quantity corresponding to the upper left corner and the lower right corner of the representation frame,the value is usually 4, which is the number,a function representing the loss of smoothness is represented,indicating the prediction offset to which the suggestion box corresponds,indicating the true offset to which the suggestion box corresponds,as a difference between the real coordinates and the predicted coordinates of the preset target,can be obtained according to the position data of the output frame corresponding to the suggestion frame and the position data corresponding to the real frame,the position data of the suggestion frame and the position data corresponding to the real frame can be obtained;
wherein the content of the first and second substances,the number of coordinate values representing the key points,representing the keypoint prediction offset to which the suggestion box corresponds,representing the true offsets of the keypoints to which the suggestion boxes correspond,the difference value of the real coordinate and the predicted coordinate of the key point is obtained;
wherein the content of the first and second substances,indicates whether or not the predicted value (0 or 1) of the ith attribute belongs,representing the predicted probability value belonging to the ith attribute.
Wherein N represents the number of suggestion boxes,、、、and representing the weight, wherein each weight can be self-defined to take a value according to the actual situation.
For example, a preset optimization algorithm, such as a random gradient descent algorithm, may be used to perform iterative optimization on the loss function, thereby optimizing the model parameters to obtain a feature extraction network, a feature fusion network, and a multi-task detector.
Through the design of the multitask detector and the corresponding loss function, the multitask detection process of multiple targets can be fused, and the recall rate and the positioning accuracy of target detection are improved.
In a specific embodiment, the model training process shown in fig. 6 may be executed on a server or an electronic device, and when the model training process is executed on the server, distributed training may be performed by using multiple servers, then the trained model is converted into a format supported by the electronic device by using a conversion tool, then a performance test is performed on the trained model locally on the server by using a corresponding interpreter, the model that passes the test and has good performance is deployed in a preset application program, and then the preset application program is installed on the electronic device, so that target detection is achieved by using the deployed model. The whole model training process can be finished end to end at one time to obtain a multi-task detection model, multi-task detection is realized at one time, the workload of model training is reduced, and the difficulty and the workload of the model in the conversion and deployment stages of a mobile terminal are reduced.
Experiments prove that the model trained by the embodiment of the invention is used for target detection on electronic equipment, so that the multi-task real-time detection and evaluation of indexes such as human faces, gestures, expressions, eye spirit and the like in videos can be realized, and the requirements of lightweight model, low delay and high precision are met. Specifically, on the premise that the detection accuracy is more than 90%, the size of the application-side model can be finally compressed within 1MB, and the detection speed can be 20-25 FPS.
Fig. 8 is a schematic structural diagram of an object detection apparatus provided in an embodiment of the present disclosure, and as shown in fig. 8, the apparatus includes:
an obtaining module 401, configured to obtain a feature extraction network, a feature fusion network, and a multi-task detector, where the feature extraction network, the feature fusion network, and the multi-task detector are obtained by performing model training on a fusion image set, and the fusion image set is obtained by performing position, type, attribute, and key point detection and labeling on a preset target in each image in a preset image set by using each single-task detection model;
an extraction module 402, configured to perform feature extraction on a picture to be detected through the feature extraction network to obtain a plurality of original feature maps of different sizes;
a fusion module 403, configured to perform feature fusion on the original feature maps through the feature fusion network to obtain a first preset number of fusion feature maps with different sizes;
and the detection module 404 is configured to perform feature detection on the fused feature map through the multitask detector to obtain a position, a type, an attribute, and a key point of the target to be detected.
In an embodiment, the fused picture set is obtained as follows:
detecting and marking the face and the position in each picture in the preset picture set by using a face detection and positioning model;
detecting and marking the gesture and the position in each picture in the preset picture set by using a gesture detection and positioning model;
detecting and marking the face attribute in each picture in the preset picture set by using a face attribute detection model; and
and detecting and marking the face key points in each picture in the preset picture set by using a face key point detection model.
In an embodiment, the method for obtaining the feature extraction network, the feature fusion network, and the multi-task detector by performing model training using the fusion image set includes:
determining a real frame of the preset target according to the label in the fusion picture set;
inputting each picture in the fused picture set into an initial feature extraction network to obtain a plurality of original training feature graphs with different sizes corresponding to each picture;
inputting the original training feature maps into an initial feature fusion network to obtain a first preset number of fusion training feature maps with different sizes corresponding to each picture;
setting a second preset number of prior frames with different sizes at each pixel point of the fusion training feature map;
inputting the fusion training feature map into an initial multi-task detector so that the initial multi-task detector performs feature detection based on the prior frame to obtain training output data;
calculating a loss function according to the training output data and the real frame;
and performing back propagation on the loss function to optimize model parameters to obtain the feature extraction network, the feature fusion network and the multitask detector.
In one embodiment, before inputting the fused training feature map into the initial multitask detector, the method further comprises:
calculating the intersection ratio of each prior frame and the real frame;
selecting a suggestion box from the prior boxes according to the intersection ratio;
inputting the fused training feature map into an initial multi-task detector to enable the initial multi-task detector to perform feature detection based on the prior frame to obtain training output data, wherein the method comprises the following steps:
and inputting the fused training feature map into the initial multitask detector, so that the initial multitask detector performs feature detection based on the suggestion box to obtain the training output data.
In one embodiment, the calculating a loss function from the training output data and the real box includes:
calculating a type confidence loss function from the training output data and the real boxTarget position offset penalty functionShift loss function of key point positionAnd attribute confidence loss function;
wherein the content of the first and second substances,the number of the types is represented and,indicates whether or not it belongs to the ith type of predictor,representing predicted probability values belonging to the ith type;
wherein the content of the first and second substances,the number of coordinate values representing the frame,a function representing the loss of smoothness is represented,indicating the prediction offset to which the suggestion box corresponds,indicating the true offset to which the suggestion box corresponds,the difference value of the real coordinate and the predicted coordinate of the preset target is obtained;
wherein the content of the first and second substances,the number of coordinate values representing the key points,representing the keypoint prediction offset to which the suggestion box corresponds,representing the true offsets of the keypoints to which the suggestion boxes correspond,the difference value of the real coordinate and the predicted coordinate of the key point is obtained;
wherein the content of the first and second substances,indicates whether or not the predicted value belongs to the ith attribute,representing the predicted probability value belonging to the ith attribute.
In one embodiment, the feature extraction network comprises MobleNet-V2 and the feature fusion network comprises a feature pyramid network.
In an embodiment, the detecting module 404 performs feature detection on the fused feature map through the multi-task detector to obtain the position, the type, the attribute, and the key point of the target to be detected, including:
inputting the fusion characteristic diagram into the multitask detector to obtain prediction output data;
decoding the prediction output data to obtain prediction frame data;
filtering the prediction frame data to obtain target frame data;
and marking the original characteristic diagram according to the target frame data to obtain the position, the type, the attribute and the key point of the target to be detected.
It is obvious to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions. For the specific working process of the functional module, reference may be made to the corresponding process in the foregoing method embodiment, which is not described herein again.
The device of the embodiment of the disclosure can acquire the model trained based on the fused picture set and used for multi-task detection, the fused picture set is obtained by detecting and marking the position, the type, the attribute and the key point of the preset target in each picture in the preset picture set by utilizing each single-task detection model, multi-task one-stop detection is realized based on the model obtained by training, and the detection efficiency is improved; in addition, the model training of the embodiment of the invention is based on the fusion picture set, the multi-task detector is trained in one step, and a plurality of detectors for detecting a plurality of tasks do not need to be trained respectively, so that the workload of the model training is reduced; furthermore, in the detection process, the multi-task shares the feature extraction network and the feature fusion network, so that the network utilization rate is improved, the calculated amount is reduced, and the overall detection efficiency is improved; in addition, target detection is carried out based on the fusion characteristic graphs of different sizes, targets of different sizes can be detected, and detection accuracy is improved.
The embodiment of the present invention further provides a target detection system, which includes an electronic device and a server, where the electronic device may obtain a trained feature extraction network, a feature fusion network, and a multitask detector from the server, and detect a to-be-detected picture based on the feature extraction network, the feature fusion network, and the multitask detector, and a specific detection process may refer to the foregoing embodiments, and details are not described here.
Referring now to FIG. 9, shown is a block diagram of a computer system 500 suitable for use in implementing an electronic device of an embodiment of the present invention. The electronic device shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 9, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 501.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules and/or units described in the embodiments of the present invention may be implemented by software, and may also be implemented by hardware. The described modules and/or units may also be provided in a processor, and may be described as: a processor includes an acquisition module, an extraction module, a fusion module, and a detection module. Wherein the names of the modules do not in some cases constitute a limitation of the module itself.
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: acquiring a feature extraction network, a feature fusion network and a multi-task detector which are obtained by utilizing a fusion picture set to perform model training, wherein the fusion picture set is obtained by utilizing each single-task detection model to detect and mark the position, the type, the attribute and the key point of a preset target in each picture in a preset picture set; extracting the features of the picture to be detected through the feature extraction network to obtain a plurality of original feature maps with different sizes; performing feature fusion on the original feature maps through the feature fusion network to obtain a first preset number of fusion feature maps with different sizes; and performing feature detection on the fusion feature map through the multitask detector to obtain the position, type, attribute and key point of the target to be detected.
According to the technical scheme of the embodiment of the invention, the model trained based on the fusion picture set for multi-task detection can be obtained, the fusion picture set is obtained by detecting and marking the position, the type, the attribute and the key point of the preset target in each picture in the preset picture set by utilizing each single-task detection model, the multi-task one-stop detection is realized based on the model obtained by training, and the detection efficiency is improved; in addition, the model training of the embodiment of the invention is based on the fusion picture set, the multi-task detector is trained in one step, and a plurality of detectors for detecting a plurality of tasks do not need to be trained respectively, so that the workload of the model training is reduced; furthermore, in the detection process, the multi-task shares the feature extraction network and the feature fusion network, so that the network utilization rate is improved, the calculated amount is reduced, and the overall detection efficiency is improved; in addition, target detection is carried out based on the fusion characteristic graphs of different sizes, targets of different sizes can be detected, and detection accuracy is improved.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (8)
1. An object detection method for a mobile terminal, comprising:
the method comprises the steps of obtaining a lightweight feature extraction network, a feature fusion network and a multi-task detector which are obtained by utilizing a fusion picture set to conduct model training, wherein the fusion picture set is obtained by utilizing each single-task detection model to conduct position, type, attribute and key point detection and marking on a preset target in each picture in a preset picture set, and the fusion picture set is obtained by the following specific method: detecting and marking the face and the position in each picture in the preset picture set by using a face detection and positioning model; detecting and marking the gesture and the position in each picture in the preset picture set by using a gesture detection and positioning model; detecting and marking the face attribute in each picture in the preset picture set by using a face attribute detection model; detecting and marking the face key points in each picture in the preset picture set by using a face key point detection model;
extracting the features of the picture to be detected through the lightweight feature extraction network to obtain a plurality of original feature maps with different sizes;
performing feature fusion on the original feature maps through the feature fusion network to obtain a first preset number of fusion feature maps with different sizes;
performing one-stop feature detection on the fusion feature map through the multi-task detector, and realizing multi-task detection once to obtain the position, type, attribute and key point of the target to be detected;
the method for obtaining the lightweight feature extraction network, the feature fusion network and the multitask detector by utilizing the fusion picture set to carry out model training comprises the following steps:
determining a real frame of the preset target according to the label in the fusion picture set;
inputting each picture in the fused picture set into an initial lightweight feature extraction network to obtain a plurality of original training feature maps with different sizes corresponding to each picture;
inputting the original training feature maps into an initial feature fusion network to obtain a first preset number of fusion training feature maps with different sizes corresponding to each picture;
setting a second preset number of prior frames with different sizes at each pixel point of the fusion training feature map;
inputting the fusion training feature map into an initial multi-task detector so that the initial multi-task detector performs feature detection based on the prior frame to obtain training output data;
calculating a loss function according to the training output data and the real frame;
performing back propagation on the loss function to optimize model parameters to obtain the lightweight feature extraction network, the feature fusion network and the multitask detector;
said calculating a loss function from said training output data and said real box comprises:
calculating a type confidence loss function from the training output data and the real boxTarget position offset penalty functionShift loss function of key point positionAnd attribute confidence loss function;
Wherein the content of the first and second substances,the number of the types is represented and,indicates whether or not it belongs to the ith type of predictor,representing predicted probability values belonging to the ith type;
wherein the content of the first and second substances,the number of coordinate values representing the frame,a function representing the loss of smoothness is represented,indicating the prediction offset to which the suggestion box corresponds,indicating the true offset to which the suggestion box corresponds,the difference value of the real coordinate and the predicted coordinate of the preset target is obtained;
wherein the content of the first and second substances,the number of coordinate values representing the key points,representing the keypoint prediction offset to which the suggestion box corresponds,representing the true offsets of the keypoints to which the suggestion boxes correspond,the difference value of the real coordinate and the predicted coordinate of the key point is obtained;
2. The method of claim 1, further comprising, prior to inputting the fused training feature map into an initial multitasking detector:
calculating the intersection ratio of each prior frame and the real frame;
selecting a suggestion box from the prior boxes according to the intersection ratio;
inputting the fused training feature map into an initial multi-task detector to enable the initial multi-task detector to perform feature detection based on the prior frame to obtain training output data, wherein the method comprises the following steps:
and inputting the fused training feature map into the initial multitask detector, so that the initial multitask detector performs feature detection based on the suggestion box to obtain the training output data.
4. The object detection method of claim 1, wherein the feature extraction network comprises MobleNet-V2 and the feature fusion network comprises a feature pyramid network.
5. The target detection method according to claim 1, wherein the performing the feature detection on the fused feature map through the multitask detector to obtain the position, the type, the attribute and the key point of the target to be detected comprises:
inputting the fusion characteristic diagram into the multitask detector to obtain prediction output data;
decoding the prediction output data to obtain prediction frame data;
filtering the prediction frame data to obtain target frame data;
and marking the original characteristic diagram according to the target frame data to obtain the position, the type, the attribute and the key point of the target to be detected.
6. An object detection apparatus for a mobile terminal, comprising:
the system comprises an acquisition module, a light-weight feature extraction network, a feature fusion network and a multi-task detector, wherein the acquisition module is used for acquiring the light-weight feature extraction network, the feature fusion network and the multi-task detector which are obtained by utilizing a fusion picture set to perform model training, the fusion picture set is obtained by utilizing each single-task detection model to detect and mark the position, the type, the attribute and the key point of a preset target in each picture in a preset picture set, and the fusion picture set is obtained by the following specific method: detecting and marking the face and the position in each picture in the preset picture set by using a face detection and positioning model; detecting and marking the gesture and the position in each picture in the preset picture set by using a gesture detection and positioning model; detecting and marking the face attribute in each picture in the preset picture set by using a face attribute detection model; detecting and marking the face key points in each picture in the preset picture set by using a face key point detection model;
the extraction module is used for extracting the features of the picture to be detected through the lightweight feature extraction network to obtain a plurality of original feature maps with different sizes;
the fusion module is used for carrying out feature fusion on the original feature maps through the feature fusion network to obtain a first preset number of fusion feature maps with different sizes;
the detection module is used for carrying out one-stop feature detection on the fusion feature map through the multi-task detector, realizing multi-task detection at one time and obtaining the position, type, attribute and key point of a target to be detected;
the method for obtaining the lightweight feature extraction network, the feature fusion network and the multitask detector by utilizing the fusion picture set to carry out model training comprises the following steps:
determining a real frame of the preset target according to the label in the fusion picture set;
inputting each picture in the fused picture set into an initial lightweight feature extraction network to obtain a plurality of original training feature maps with different sizes corresponding to each picture;
inputting the original training feature maps into an initial feature fusion network to obtain a first preset number of fusion training feature maps with different sizes corresponding to each picture;
setting a second preset number of prior frames with different sizes at each pixel point of the fusion training feature map;
inputting the fusion training feature map into an initial multi-task detector so that the initial multi-task detector performs feature detection based on the prior frame to obtain training output data;
calculating a loss function according to the training output data and the real frame;
performing back propagation on the loss function to optimize model parameters to obtain the lightweight feature extraction network, the feature fusion network and the multitask detector;
said calculating a loss function from said training output data and said real box comprises:
calculating a type confidence loss function from the training output data and the real boxEyes of peopleScalar position offset loss functionShift loss function of key point positionAnd attribute confidence loss function;
Wherein the content of the first and second substances,the number of the types is represented and,indicates whether or not it belongs to the ith type of predictor,representing predicted probability values belonging to the ith type;
wherein the content of the first and second substances,the number of coordinate values representing the frame,a function representing the loss of smoothness is represented,indicating the prediction offset to which the suggestion box corresponds,indicating the true offset to which the suggestion box corresponds,the difference value of the real coordinate and the predicted coordinate of the preset target is obtained;
wherein the content of the first and second substances,number of coordinate values representing key pointsThe amount of the compound (A) is,representing the keypoint prediction offset to which the suggestion box corresponds,representing the true offsets of the keypoints to which the suggestion boxes correspond,the difference value of the real coordinate and the predicted coordinate of the key point is obtained;
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the object detection method as claimed in any one of claims 1 to 5 when executing the program.
8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the object detection method according to any one of claims 1 to 5.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110180875.4A CN112528977B (en) | 2021-02-10 | 2021-02-10 | Target detection method, target detection device, electronic equipment and storage medium |
PCT/CN2021/111385 WO2022170742A1 (en) | 2021-02-10 | 2021-08-09 | Target detection method and apparatus, electronic device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110180875.4A CN112528977B (en) | 2021-02-10 | 2021-02-10 | Target detection method, target detection device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112528977A CN112528977A (en) | 2021-03-19 |
CN112528977B true CN112528977B (en) | 2021-07-02 |
Family
ID=74975739
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110180875.4A Active CN112528977B (en) | 2021-02-10 | 2021-02-10 | Target detection method, target detection device, electronic equipment and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112528977B (en) |
WO (1) | WO2022170742A1 (en) |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112528977B (en) * | 2021-02-10 | 2021-07-02 | 北京优幕科技有限责任公司 | Target detection method, target detection device, electronic equipment and storage medium |
CN112766244B (en) * | 2021-04-07 | 2021-06-08 | 腾讯科技(深圳)有限公司 | Target object detection method and device, computer equipment and storage medium |
CN113065568A (en) * | 2021-04-09 | 2021-07-02 | 神思电子技术股份有限公司 | Target detection, attribute identification and tracking method and system |
CN113591567A (en) * | 2021-06-28 | 2021-11-02 | 北京百度网讯科技有限公司 | Target detection method, training method of target detection model and device thereof |
CN113408502B (en) * | 2021-08-19 | 2021-12-21 | 深圳市信润富联数字科技有限公司 | Gesture recognition method and device, storage medium and electronic equipment |
CN113963167B (en) * | 2021-10-29 | 2022-05-27 | 北京百度网讯科技有限公司 | Method, device and computer program product applied to target detection |
CN114418901B (en) * | 2022-03-30 | 2022-08-09 | 江西中业智能科技有限公司 | Image beautifying processing method, system, storage medium and equipment based on Retinaface algorithm |
CN115376093A (en) * | 2022-10-25 | 2022-11-22 | 苏州挚途科技有限公司 | Object prediction method and device in intelligent driving and electronic equipment |
CN115880717B (en) * | 2022-10-28 | 2023-11-17 | 北京此刻启动科技有限公司 | Heat map key point prediction method and device, electronic equipment and storage medium |
CN115661577B (en) * | 2022-11-01 | 2024-04-16 | 吉咖智能机器人有限公司 | Method, apparatus and computer readable storage medium for object detection |
CN118053136A (en) * | 2022-11-16 | 2024-05-17 | 华为技术有限公司 | Target detection method, device and storage medium |
CN115512188A (en) * | 2022-11-24 | 2022-12-23 | 苏州挚途科技有限公司 | Multi-target detection method, device, equipment and medium |
CN115861839B (en) * | 2022-12-06 | 2023-08-29 | 平湖空间感知实验室科技有限公司 | Weak and small target detection method and system for geostationary orbit and electronic equipment |
CN116246128B (en) * | 2023-02-28 | 2023-10-27 | 深圳市锐明像素科技有限公司 | Training method and device of detection model crossing data sets and electronic equipment |
CN117029673B (en) * | 2023-07-12 | 2024-05-10 | 中国科学院水生生物研究所 | Fish body surface multi-size measurement method based on artificial intelligence |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107066932A (en) * | 2017-01-16 | 2017-08-18 | 北京龙杯信息技术有限公司 | The detection of key feature points and localization method in recognition of face |
CN110163187A (en) * | 2019-06-02 | 2019-08-23 | 东北石油大学 | Remote road traffic sign detection recognition methods based on F-RCNN |
CN110363124A (en) * | 2019-07-03 | 2019-10-22 | 广州多益网络股份有限公司 | Rapid expression recognition and application method based on face key points and geometric deformation |
CN110647834A (en) * | 2019-09-18 | 2020-01-03 | 北京市商汤科技开发有限公司 | Human face and human hand correlation detection method and device, electronic equipment and storage medium |
CN111626200A (en) * | 2020-05-26 | 2020-09-04 | 北京联合大学 | Multi-scale target detection network and traffic identification detection method based on Libra R-CNN |
CN111666839A (en) * | 2020-05-25 | 2020-09-15 | 东华大学 | Road pedestrian detection system based on improved Faster RCNN |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5849558B2 (en) * | 2011-09-15 | 2016-01-27 | オムロン株式会社 | Image processing apparatus, image processing method, control program, and recording medium |
CN111914861A (en) * | 2019-05-08 | 2020-11-10 | 北京字节跳动网络技术有限公司 | Target detection method and device |
CN110674748B (en) * | 2019-09-24 | 2024-02-13 | 腾讯科技(深圳)有限公司 | Image data processing method, apparatus, computer device, and readable storage medium |
CN112084860A (en) * | 2020-08-06 | 2020-12-15 | 中国科学院空天信息创新研究院 | Target object detection method and device and thermal power plant detection method and device |
CN112528977B (en) * | 2021-02-10 | 2021-07-02 | 北京优幕科技有限责任公司 | Target detection method, target detection device, electronic equipment and storage medium |
-
2021
- 2021-02-10 CN CN202110180875.4A patent/CN112528977B/en active Active
- 2021-08-09 WO PCT/CN2021/111385 patent/WO2022170742A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107066932A (en) * | 2017-01-16 | 2017-08-18 | 北京龙杯信息技术有限公司 | The detection of key feature points and localization method in recognition of face |
CN110163187A (en) * | 2019-06-02 | 2019-08-23 | 东北石油大学 | Remote road traffic sign detection recognition methods based on F-RCNN |
CN110363124A (en) * | 2019-07-03 | 2019-10-22 | 广州多益网络股份有限公司 | Rapid expression recognition and application method based on face key points and geometric deformation |
CN110647834A (en) * | 2019-09-18 | 2020-01-03 | 北京市商汤科技开发有限公司 | Human face and human hand correlation detection method and device, electronic equipment and storage medium |
CN111666839A (en) * | 2020-05-25 | 2020-09-15 | 东华大学 | Road pedestrian detection system based on improved Faster RCNN |
CN111626200A (en) * | 2020-05-26 | 2020-09-04 | 北京联合大学 | Multi-scale target detection network and traffic identification detection method based on Libra R-CNN |
Also Published As
Publication number | Publication date |
---|---|
WO2022170742A1 (en) | 2022-08-18 |
CN112528977A (en) | 2021-03-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112528977B (en) | Target detection method, target detection device, electronic equipment and storage medium | |
CN111368685B (en) | Method and device for identifying key points, readable medium and electronic equipment | |
CN109508681A (en) | The method and apparatus for generating human body critical point detection model | |
CN109858333B (en) | Image processing method, image processing device, electronic equipment and computer readable medium | |
US11704357B2 (en) | Shape-based graphics search | |
CN111369427A (en) | Image processing method, image processing device, readable medium and electronic equipment | |
CN113704531A (en) | Image processing method, image processing device, electronic equipment and computer readable storage medium | |
CN109325996B (en) | Method and device for generating information | |
CN113177472A (en) | Dynamic gesture recognition method, device, equipment and storage medium | |
CN112232311B (en) | Face tracking method and device and electronic equipment | |
CN110349161A (en) | Image partition method, device, electronic equipment and storage medium | |
CN114511661A (en) | Image rendering method and device, electronic equipment and storage medium | |
CN110110666A (en) | Object detection method and device | |
CN111209856B (en) | Invoice information identification method and device, electronic equipment and storage medium | |
CN113762109B (en) | Training method of character positioning model and character positioning method | |
CN114332590A (en) | Joint perception model training method, joint perception device, joint perception equipment and medium | |
CN110110696A (en) | Method and apparatus for handling information | |
CN110288691B (en) | Method, apparatus, electronic device and computer-readable storage medium for rendering image | |
CN111741329A (en) | Video processing method, device, equipment and storage medium | |
CN113610856B (en) | Method and device for training image segmentation model and image segmentation | |
CN111353470B (en) | Image processing method and device, readable medium and electronic equipment | |
CN115424060A (en) | Model training method, image classification method and device | |
CN111968030B (en) | Information generation method, apparatus, electronic device and computer readable medium | |
CN114022658A (en) | Target detection method, device, storage medium and terminal | |
CN113762260A (en) | Method, device and equipment for processing layout picture and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |