CN114627519A

CN114627519A - Data processing method and device, electronic equipment and storage medium

Info

Publication number: CN114627519A
Application number: CN202011469640.9A
Authority: CN
Inventors: 王建国; 汪彪
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-12-14
Filing date: 2020-12-14
Publication date: 2022-06-14

Abstract

The embodiment of the application provides a data processing method, a data processing device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring video data and determining a frame image in the video data; identifying a frame image, and extracting a detection image corresponding to a detection frame, wherein the detection image comprises a target object positioned by the detection frame; determining first position information and second position information of a key point of a target object in a target frame image, wherein the first position information is determined according to a target detection image of the target frame image, the second position information is determined after tracking and predicting the position of the key point of the target object in a historical frame image, and the historical frame image comprises at least one frame of frame image before the target frame image; determining attribute information of key points of the target object in the target frame image according to the first position information and the second position information; the accuracy of key point positioning can be improved.

Description

Data processing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data processing method, a data processing apparatus, an electronic device, and a storage medium.

Background

With the increasing popularization of applications such as live video broadcast and video shopping, functions such as beauty, makeup, special effects and the like are important components in many applications of videos, and most of the beauty, makeup, special effects and beautification of the face need to accurately position key points of the face first and then process the face by utilizing the positions of the key points.

An existing key point positioning model is used for carrying out image recognition on a single-frame face image, determining position information of each key point of the face in the single-frame image, and carrying out facial beautification and makeup treatment on the face by using the position information of the key points.

However, the way of locating the key points by analyzing the single-frame image by using the key point locating model has low locating accuracy of the key points.

Disclosure of Invention

The embodiment of the application provides a data processing method for improving the positioning accuracy of key points.

Correspondingly, the embodiment of the application also provides a data processing device, an electronic device and a storage medium, which are used for ensuring the realization and the application of the system.

In order to solve the above problem, an embodiment of the present application discloses a data processing method, where the method includes: acquiring video data and determining a frame image in the video data; identifying a frame image, and extracting a detection image corresponding to a detection frame, wherein the detection image comprises a target object positioned by the detection frame; determining first position information and second position information of a key point of a target object in a target frame image, wherein the first position information is determined according to a target detection image of the target frame image, the second position information is determined after tracking and predicting the position of the key point of the target object in a historical frame image, and the historical frame image comprises at least one frame of frame image before the target frame image; and determining attribute information of the key points of the target object in the target frame image according to the first position information and the second position information.

In order to solve the above problem, an embodiment of the present application discloses a data processing method, including: performing single-frame image training on the key point identification model by using the training input image and the training input label; inputting a training image in training video data into a key point identification model which completes single-frame image training to obtain a training label representing the position of a key point of a target object in the training image; tracking and predicting a historical training image of a target training image, and determining a prediction training label of the training image at the target moment; verifying the reliability of the training labels of the target training images according to the difference between the predicted training labels and the training labels of the training images at the target moment; and training by using the training labels passing the verification and the corresponding training images to finish the key point recognition model of the single-frame image training.

In order to solve the above problem, an embodiment of the present application discloses a data processing method, including: acquiring video data and determining a frame image in the video data; identifying a frame image, and extracting a detection image corresponding to a detection frame, wherein the detection image comprises a target object positioned by the detection frame; determining first position information and second position information of a key point of a target object in a target frame image, wherein the first position information is determined according to a target detection image of the target frame image, the second position information is determined after tracking and predicting the position of the key point of the target object in a historical frame image, and the historical frame image comprises at least one frame of frame image before the target frame image; determining attribute information of key points of the target object in the target frame image according to the first position information and the second position information; and carrying out special effect processing on the frame image in the video data according to the attribute information of the key point of the target object.

In order to solve the above problem, an embodiment of the present application discloses a data processing method, including: acquiring video data and determining a frame image in the video data; identifying a frame image, and extracting a detection image corresponding to a detection frame, wherein the detection image comprises face data positioned by the detection frame; determining first position information and second position information of key points of face data in a target frame image, wherein the first position information is determined according to a target detection image of the target frame image, the second position information is determined after tracking and predicting the positions of the key points of the face data in a historical frame image, and the historical frame image comprises at least one frame of frame image before the target frame image; determining attribute information of key points of the face data in the target frame image according to the first position information and the second position information; and performing special effect processing on the frame image in the video data according to the attribute information of the key points of the face data, wherein the special effect processing comprises at least one of beautifying processing, special effect adding processing, makeup processing and beautifying processing.

In order to solve the above problem, an embodiment of the present application discloses a data processing method, including: acquiring live video data, identifying frame images in the live video data, and extracting detection images corresponding to detection frames, wherein the detection images comprise face data positioned by the detection frames; determining first position information and second position information of key points of face data in a target frame image, wherein the first position information is determined according to a target detection image of the target frame image, the second position information is determined after tracking and predicting the positions of the key points of the face data in a historical frame image, and the historical frame image comprises at least one frame of frame image before the target frame image; determining attribute information of key points of the face data in the target frame image according to the first position information and the second position information; and performing special effect processing on the frame image in the live video data according to the attribute information of the key points of the face data, wherein the special effect processing comprises at least one of beautifying processing, special effect adding processing, makeup processing and beautifying processing.

In order to solve the above problem, an embodiment of the present application discloses a data processing method, including: providing a first interface to acquire related video data through the first interface and determine a frame image in the video data; identifying a frame image, and extracting a detection image corresponding to a detection frame, wherein the detection image comprises a target object positioned by the detection frame; determining first position information and second position information of a key point of a target object in a target frame image, wherein the first position information is determined according to a target detection image of the target frame image, the second position information is determined after tracking and predicting the position of the key point of the target object in a historical frame image, and the historical frame image comprises at least one frame of frame image before the target frame image; determining attribute information of key points of the target object in the target frame image according to the first position information and the second position information; and feeding back the attribute information through a second interface.

In order to solve the above problem, an embodiment of the present application discloses a data processing apparatus, including: the video data acquisition module is used for acquiring video data and determining a frame image in the video data; the detection image acquisition module is used for identifying the frame image and extracting a detection image corresponding to the detection frame, wherein the detection image comprises a target object positioned by the detection frame; the system comprises a position information acquisition module, a position information acquisition module and a position information acquisition module, wherein the position information acquisition module is used for determining first position information and second position information of key points of a target object in a target frame image, the first position information is determined according to a target detection image of the target frame image, the second position information is determined after tracking and predicting the positions of the key points of the target object in a historical frame image, and the historical frame image comprises at least one frame of frame image before the target frame image; and the attribute information acquisition module is used for determining the attribute information of the key point of the target object in the target frame image according to the first position information and the second position information.

In order to solve the above problem, an embodiment of the present application discloses a data processing apparatus, including: the model single-frame training module is used for performing single-frame image training on the key point identification model by using the training input image and the training input label; the training label acquisition module is used for inputting a training image in the training video data into the key point identification model which completes the single-frame image training to obtain a training label representing the position of the key point of the target object in the training image; the prediction label acquisition module is used for tracking and predicting the historical training images of the target training images and determining the prediction training labels of the training images at the target moment; the training label checking module is used for checking the reliability of the training label of the target training image according to the difference between the predicted training label and the training label of the training image at the target moment; and the model secondary training module is used for training the key point identification model which finishes the single-frame image training by using the training labels passing the verification and the corresponding training images.

In order to solve the above problem, an embodiment of the present application discloses an electronic device, including: a processor; and a memory having executable code stored thereon, which when executed, causes the processor to perform a method as described in one or more of the above method embodiments.

To address the above issues, embodiments of the present application disclose one or more machine-readable media having executable code stored thereon that, when executed, cause a processor to perform a method as described in one or more of the above method embodiments.

Compared with the prior art, the embodiment of the application has the following advantages:

in the embodiment of the application, frame images in video data can be identified, a detection image containing a target object is extracted, and then, on one hand, the target detection image of the target frame image can be input into a key point identification model to determine first position information of a key point of the target object in the target frame image; on the other hand, the positions of the key points of the target object in the historical frame image can be tracked and analyzed according to the motion continuity of the target object in the video data, so that second position information of the key points of the target object in the target frame image is obtained. And then, obtaining attribute information of the key points of the target object in the target frame image according to the first position information and the second position information, wherein compared with a mode of determining the first position information of the key points of the target object only by using a key point identification model, the embodiment also considers continuous motion of the key points of the target object in video data to analyze the second position information, and determines the attribute information of the key points according to the first position information and the second position information, so that the accuracy of key point positioning of the target object can be improved.

Drawings

FIG. 1 is a schematic flow chart diagram of a data processing method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart diagram of a data processing method according to another embodiment of the present application;

FIG. 3 is a schematic flow chart diagram of a data processing method according to yet another embodiment of the present application;

FIG. 4 is a schematic flow chart diagram of a data processing method according to yet another embodiment of the present application;

FIG. 5 is a schematic flow chart diagram of a data processing method according to yet another embodiment of the present application;

FIG. 6 is a schematic flow chart diagram of a data processing method according to yet another embodiment of the present application;

FIG. 7 is a schematic flow chart diagram of a data processing method according to yet another embodiment of the present application;

FIG. 8 is a schematic flow chart diagram of a data processing method according to yet another embodiment of the present application;

FIG. 9 is a block diagram of a data processing apparatus according to an embodiment of the present application;

FIG. 10 is a schematic block diagram of a data processing apparatus according to another embodiment of the present application;

FIG. 11 is a schematic block diagram of a data processing apparatus according to yet another embodiment of the present application;

FIG. 12 is a schematic block diagram of a data processing apparatus according to yet another embodiment of the present application;

FIG. 13 is a schematic block diagram of a data processing apparatus according to yet another embodiment of the present application;

FIG. 14 is a schematic block diagram of a data processing apparatus according to yet another embodiment of the present application;

fig. 15 is a schematic structural diagram of an apparatus provided in an embodiment of the present application.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.

The method and the device can be applied to the field of identifying the key points of the target object in the video data to determine the position information of the key points of the target object in the frame image of the video data, so that the image processing is performed on the image in the video data according to the position information of the key points, for example, the image processing such as the face beautifying, makeup and special effect adding processing is performed on the face image.

In the method, on one hand, a key point identification model can be adopted to identify a target frame image in video data, and first position information of a key point of a target object in the target frame image is determined; on the other hand, according to the motion continuity of the target object in the video data, the tracking analysis can be performed by using the historical frame image of the previous frame (or multiple frames) of the target frame image, and the second position information of the key point of the target object in the target frame image is predicted; then analyzing the offset between the first position information and the second position information, and when the offset between the first position information and the second position information is large (exceeds an offset threshold), taking the first position information as attribute information of a key point of the target object in the target frame image; and when the offset between the first position information and the second position information is small, taking the second position information as attribute information of the key point of the target object in the target frame image. Then, the image may be processed correspondingly according to the attribute information of the key points of the target object, such as performing beauty treatment and makeup on the face image.

In the embodiment of the present application, a first position information of a key point of a target object in a target frame image may be determined by using a key point identification model, a position of the target object in a historical frame image before the target frame image is tracked and analyzed according to a continuity of a target object motion in video data, so as to obtain a second position information of the key point of the target object in the target frame image, and then according to the first position information and the second position information, attribute information of the key point of the target object in the target frame image is obtained, compared with a mode that a first position information of the key point of the target object is determined only by using the key point identification model, the embodiment further considers a continuous behavior of the key point of the target object in the video data to analyze the second position information, so as to improve an accuracy of a key point position of the target object and improve a stability of the key point in the video data, and a more stable key point position is provided for subsequent image processing, and the image processing effect can be improved.

Specifically, as shown in fig. 1, after video data is acquired, frame images in the video data may be determined, and thereafter, a first detection frame and a second detection frame of the target object in the target frame image may be determined respectively, then, according to the first detection frame and the second detection frame, a target detection frame of the target object in the target frame image is determined so as to extract a detection image, and particularly, in one aspect, an object description box for positioning the target object in the target frame image can be determined, and an image corresponding to the object description box is input into the object box recognition model, the object frame recognition model is capable of recognizing key points of a target object in an image of the object description frame and determining an angle of the target object according to the key points of the target object, and rotating and aligning the target frame image according to the angle, and obtaining a first detection frame of the target object in the target frame image according to the positioned key point. On the other hand, the position of the key point of the target object in the historical frame image of the previous frame (or multiple frames) of the target frame image can be tracked and predicted, and the second detection frame of the target object in the target frame image can be predicted; after the first detection frame and the second detection frame are determined, the deviation amount between the first detection frame and the second detection frame may be analyzed, and then, according to the deviation amount, the first detection frame (when the deviation amount exceeds the deviation threshold) or the second detection frame (when the deviation amount does not exceed the deviation threshold) may be determined as a target detection frame of the target object, and a detection image corresponding to the target detection frame may be extracted.

After a detection image containing a target object is extracted from a target frame image, on one hand, the target detection image of the target object can be input into a key point identification model, and the key point identification model can identify key points of the target object to obtain first position information of the key points of the target object in the target frame image; on the other hand, the position of the target object in the historical frame image (the frame image of the previous frame or multiple frames) before the target frame image can be tracked and analyzed according to the motion continuity of the target object in the video data, and the second position information of the key point of the target object in the target frame image can be predicted. And then, according to the offset between the first position information and the second position information, determining the first position information (when the offset exceeds an offset threshold) or the second position information (when the offset does not exceed the offset threshold) as the attribute information of the key point of the target object.

When video data is identified, a keypoint identification model is generally adopted to identify an image of a single frame to obtain positions of keypoints, and the image is processed according to the positions of the keypoints, but in the process of identifying a target object (such as a human face) in a frame image of the video data in such a way, the motion of the target object in a continuous image in the video data is not considered, and in the process of the motion of the target object (such as the process of changing the human face from a front face to a side face), the keypoints cannot be identified or identified by a part of frame images are inaccurate, so that interruption or incompatibility in the processing of a later frame image (such as deviation of a beauty effect in the part of frame images to cause background distortion) is easily caused, and further the quality of the image is low. In the method of the embodiment of the present application, the first location information may be obtained by using a key point identification model, the second location information may be obtained by performing tracking analysis on the historical frame image, and then the attribute information of the key point of the target object in the image may be determined according to the first location information and the second location information, so as to perform image processing on the frame image according to the attribute information. In this embodiment, comprehensive analysis may be performed according to the first location information and the second location information to determine attribute information of the key point, so that a more stable key point can be obtained, and then special effect processing may be performed on the frame image according to more accurate and continuous attribute information, so that an image processing effect may be improved, and data quality of the video data may be improved.

The method of the embodiment of the application can be applied to a scene of identifying the key points of the face of a main broadcast in a live video, and can be used for positioning first position information of the key points according to a key point identification model, performing tracking analysis according to historical frame images in the live video, predicting second position information of the key points, and then determining attribute information of the key points according to the first position information and the second position information. Therefore, continuous and stable beautifying processing and makeup processing are provided for the anchor according to the attribute information, and the beautifying and makeup effects of facial data of the anchor can be improved. In addition, virtual three-dimensional structure data of wearing commodities (such as glasses) can be provided for the anchor, the wearing commodities can be adapted to the anchor according to the attribute information of the key points, in the embodiment, the attribute information of the key points can include the positions of the key points and the space angle information of the anchor face, so that the virtual three-dimensional structure data of the wearing commodities can be rotated according to the space angle information of the anchor face, the rotated virtual three-dimensional structure data can be adapted to the anchor face, and the effect of adapting the virtual three-dimensional structure data to the anchor face is improved.

The mode of the embodiment of the application can also be applied to a scene in which the key points of commodities in live telecast of E-commerce are identified, a model can be identified according to the key points, the first position information of the key points is positioned, tracking analysis is carried out according to historical frame images in live telecast videos of E-commerce, the second position information of the key points is predicted, then the attribute information of the key points is determined according to the first position information and the second position information, so that the positions of the commodities in the images can be accurately positioned according to the attribute information, corresponding processing is carried out on the images corresponding to the commodities, if the images corresponding to the commodities can be amplified, the three-dimensional structure data of the commodities can be obtained, the definition of the commodities in the live telecast videos can be improved according to the three-dimensional structure data of the commodities, and live users can know the details of the commodities.

The method of the embodiment of the application can also be applied to a scene for identifying key points of a target object (such as a player) in a game video, and can be used for positioning first position information of the key points (such as key points of a head, an arm and a leg of the player) according to a key point identification model, performing tracking analysis according to a historical frame image in the game video, predicting second position information of the key points, and determining attribute information of the key points according to the first position information and the second position information. Therefore, the position of the target object can be more accurately positioned according to the attribute information, so that special effect processing (such as amplifying the target object) can be performed on the event video, clearer and more accurate playback data can be formed, and the data quality of the playback data can be improved.

The method can be also applied to a scene of recognizing vehicles in the road monitoring video, can locate first position information of the key points according to the key point recognition model, carries out tracking analysis according to historical frame images in the road monitoring video, predicts second position information of the key points, and then determines attribute information of the key points according to the first position information and the second position information. So as to extract more clear and accurate relevant information (such as vehicle license plate, driver state, vehicle speed and the like) according to the attribute information.

The embodiment of the application provides a data processing method, which can be executed through a processing terminal, can perform image recognition on video data to locate the position of a target object in a frame image of the video data, and can perform image processing on the image in the video data according to position information of key points, such as performing beauty treatment on a human face. The processing end may be understood as a device for acquiring, storing, and transferring data, such as a device for acquiring, storing, or forwarding a live video, and a device for acquiring, storing, or forwarding a surveillance video. As shown in fig. 2, the method includes:

step 202, acquiring video data, and determining a frame image in the video data. The video data may be a live event video, a live anchor video, a traffic monitoring video, a cell monitoring video, and the like, and the video data includes frame images, and in this embodiment, corresponding frame images may be extracted from the video data, and in step 204, the frame images are identified, and detection images corresponding to the detection frames are extracted, where the detection images include target objects located by the detection frames. According to the scheme, the frame image can be identified, the target object and the background in the frame image can be distinguished, then the detection frame of the target object in the frame image can be further identified, and the corresponding detection image can be extracted for subsequent analysis. In this embodiment, the detection frame of the target object may be positioned first, and then the image in the detection frame may be positioned at the key point, so that the size of the subsequently analyzed image may be reduced, and the data processing speed may be increased. In this embodiment, a target object recognition technique may be adopted to determine a position of a key point of a target object in a frame image to distinguish the target object from a background, and further determine a detection frame of the target object in the frame image according to the position of the key point of the target object in the frame image, and extract a detection image corresponding to the detection frame, so as to recognize detection and obtain position information of the key point of the target object, where the target object recognition technique may recognize one or more features of the target object, and further determine whether the target object exists in the frame image, and further determine the detection frame of the target object. For example, when the target object is a human face, the embodiment may use a face recognition technique to determine the positions of facial features (such as mouth, eyes, and nose) in the frame image to distinguish the human face from the background, so as to determine a detection frame of the human face in the frame image, and extract an image corresponding to the detection frame, so as to locate key points of the human face in the detection image.

In this embodiment, the detection frame obtained by identification may be further adjusted to obtain a more accurate detection frame, so as to make the location of the key point more accurate, specifically, as an optional embodiment, identifying the frame image and extracting the detection image corresponding to the detection frame includes: analyzing the target frame image, and determining an object description frame of a target object in the target frame image; analyzing the image in the object description frame through an object frame identification model, and determining a first detection frame of the target object in the target frame image; tracking and predicting a historical frame image of a target frame image, and predicting a second detection frame of a target object in the target frame image, wherein the historical frame image comprises at least one frame of frame image before the target frame image; and determining a target detection frame of the target object in the target frame image according to the first detection frame and the second detection frame, and extracting a detection image corresponding to the target detection frame. The embodiment may adopt a target object identification technology to identify the features of the target object in the target frame image, and determine the object description frame of the target object in the target frame image. After the object description frame is determined, on one hand, images corresponding to the object description frame can be analyzed and adjusted by using a pre-trained object frame recognition model to obtain a first detection frame; on the other hand, according to the motion continuity of the target object in the continuous frame images, the historical frame images are used for tracking and predicting, a second detection frame of the target object in the target frame images is obtained through analysis, then the target detection frame of the target object in the target frame images is determined according to the difference between the first detection frame and the second detection frame, and corresponding detection images are extracted so as to locate the key points.

Specifically, on one hand, the image corresponding to the object description frame may be input into a pre-trained object frame recognition model, and the object frame recognition model may recognize key points of the target object and determine positions of the key points of the target object, where the key points of the target object are key points that may represent features of the target object, such as a nose key point, a chin key point, a forehead key point, an eye key point, and the like that represent features in a human face. After the positions of the key points are determined, the target frame image may be adjusted to obtain a first detection frame. Specifically, as an optional embodiment, the analyzing the image in the object description frame by the object frame recognition model to determine the first detection frame of the target object in the target frame image includes: determining the positioning information of the key points of the target object in the image of the object description frame through the object frame identification model; determining object angle information of the target object in the target frame image according to the positioning information; and carrying out rotation alignment on the target frame image according to the object angle information, and determining a first detection frame by combining positioning information. The object frame recognition model in the scheme is used for positioning key points of a target object in an object description frame to obtain positioning information, determining object angle information of the target object in a target frame image according to the positioning information, adjusting the target frame image according to the object angle information, and determining a first detection frame externally connected with each key point of the target object according to the positioning information. According to the embodiment, the first detection frame of the target object in the frame image can be more accurately positioned by using the object frame identification model, and the accuracy of subsequent key point positioning is improved.

On the other hand, according to the continuity of the movement of the target object in the continuous frame images, the tracking prediction can be performed by using the detection frame corresponding to the historical frame image of the previous frame (or multiple frames) of the target frame image, and the second detection frame of the target object in the target object can be predicted. And determining the first detection frame or the second detection frame as a target detection frame of the target object in the target frame image in combination with a deviation between the first detection frame and the second detection frame, specifically, as an optional embodiment, the determining the target detection frame of the target object in the target frame image according to the first detection frame and the second detection frame includes: determining a deviation amount between the first detection frame and the second detection frame; taking a first detection frame as a target detection frame when the deviation amount exceeds a deviation threshold; and when the deviation amount does not exceed the deviation threshold value, taking the second detection frame as a target detection frame. In this embodiment, the deviation amount between the first detection frame and the second detection frame may include a deviation between each side of the detection frame, and for example, the first detection frame and the second detection frame are frames including four sides, namely, an upper side, a lower side, a left side, a right side, and a left side.

In the process of training the object frame recognition model, a detection frame of a trained image is usually marked in a manual marking mode, and the manually marked detection frame may not be very accurate, so that a first detection frame obtained by separately adopting model analysis is used as a target detection frame of a target object, and the obtained target detection frame may shake greatly (inaccurately), so in the embodiment of the application, a deviation between the first detection frame and a second detection frame can be utilized, and when the deviation between the first detection frame and the second detection frame is small, the second detection frame which is continuous in continuous frame images is used as the target detection frame of the target object; when the deviation between the first detection frame and the second detection frame is large, the first detection frame obtained by analyzing the object frame analysis model is used as a target detection frame of a target object; the accuracy of the target detection frame can be improved.

After the detection frame is determined, first position information and second position information of the key point of the target object in the target frame image may be determined in step 206.

Specifically, as an optional embodiment, the determining the first position information and the second position information of the key point of the target object in the target frame image includes: inputting a target detection image corresponding to a target frame image into a key point identification model, and determining first position information of a key point of a target object in the target frame image; and tracking and predicting the positions of the key points of the target object in the historical frame images, and predicting second position information of the key points of the target object in the target frame images. On one hand, in this embodiment, a key point identification model may be trained in advance, where the key point identification model is used to identify an input image (detection image), determine first location information of each key point of the target object, and output the first location information, where the first location information may be coordinates of the key point in the target frame image. On the other hand, the present embodiment may perform tracking prediction on the locations of the key points of the target object in the historical frame images according to the continuity of the actions of the target object in the continuous frame images, and predict second location information of the key points of the target object in the target frame images, where the second location information may be understood as coordinates of the key points of the target object in the target frame images.

Specifically, as an optional embodiment, the method further includes a training step of the key point recognition model: performing single-frame image training on the key point identification model by using the training input image and the training input label; inputting a training image in training video data into a key point identification model which completes single-frame image training to obtain a training label representing the position of a key point of a target object in the training image; tracking and predicting a historical training image of a target training image, and determining a prediction training label of the target training image, wherein the historical training image comprises at least one frame of training image before the target training image; verifying the reliability of the training labels of the target training images according to the difference between the predicted training labels and the training labels of the target training images; and training the key point recognition model for single-frame image training by using the training labels passing the verification and the corresponding training images.

The existing key point recognition model is usually trained by adopting a single-frame image (and label thereof), however, if a large number of labeled single-frame images are adopted for training, the cost of manual labeling is too high, and if a small number of labeled single-frame images are adopted for training, the accuracy of the key point recognition model is easily low, so that the embodiment can utilize a small number of labeled single-frame images for preliminary training (or called single-frame image training), and then utilize the preliminarily trained key point recognition model to recognize continuous training images in training video data to determine corresponding training labels. After the training labels corresponding to the training images are determined, tracking and predicting can be performed by using the training labels of the historical training images of the previous frame (or previous frames) of the target training image, predicting the prediction training labels of the target training image, matching the prediction training labels and the training labels of the target training image, determining the difference (the difference can be determined according to the difference between the positions of the key points), and comparing the difference with a preset difference threshold value to check the reliability of the training labels of the target training image, for example, the training labels can be determined to pass the check under the condition that the difference does not exceed the difference threshold value; the training annotation check may be determined to have failed if the amount of difference exceeds a difference threshold. After the training labels are verified, the training labels passing the verification and the corresponding training images can be utilized to train the key point recognition model which completes the preliminary training.

In the embodiment of the application, a small amount of marked single-frame images can be used for preliminarily training the key point identification model, then the preliminarily trained key point identification model can be used for determining the training marks of the training images in the training video data, the continuity of the key points of the target object in the training video data is utilized, whether the training marks are accurate and reliable is checked, then the key point identification model is further trained by utilizing the training images and the training marks which pass the checking, the data volume of the data of the training key point identification model is enlarged, and the accuracy of the key point identification model is improved.

After determining the first location information and the second location information of the key points of the target object, in step 208, the attribute information of the key points of the target object in the target frame image may be determined according to the first location information and the second location information. In an alternative embodiment, the first position information and the second position information may be subjected to combined analysis, and then third position information corresponding to the first position information and the second position information is determined as attribute information of the target object key point in the target frame image. In another alternative embodiment, the present embodiment may use one of the first position information and the second position information as attribute information of a key point of the target object in the target frame image. Specifically, the determining the attribute information of the key point of the target object in the target frame image according to the first position information and the second position information includes: determining an offset between the first location information and the second location information; determining attribute information of key points of the target object in the target frame image according to the first position information under the condition that the offset exceeds an offset threshold; and determining attribute information of the key point of the target object in the target frame image according to the second position information under the condition that the offset does not exceed the offset threshold. The attribute information of the key points of the target object in the target frame image may include at least one of position information (such as coordinates) of the key points of the target object in the target frame image, spatial structure information (such as spatial association between the key points, etc.), depth information of the key points of the target object, etc. The present embodiment may determine an offset amount between the first location information and the second location information, and then compare the offset amount with an offset threshold to determine the confidence level of the second location information. When the offset amount is small (not exceeding the offset threshold), adopting second position information determined according to the continuity of the target object as attribute information of the key point of the target object; when the amount of shift is large (exceeds a shift threshold), first position information determined from the keypoint identification model is taken as attribute information of the keypoint of the target object.

After determining the attribute information of the key points of the target object, the method may perform special effect processing on the frame image according to the attribute information of the key points of the target object, and specifically, as an optional embodiment, the method further includes: and carrying out special effect processing on the frame image in the video data according to the attribute information of the key points of the target object. According to the difference of the target objects, different special effect processing modes can be adopted to process the frame image, for example, when the target object is a human face, special effect processing such as facial beautification and makeup can be carried out on the human face according to the attribute information of key points of the human face.

In the embodiment of the application, a frame image in video data can be identified, a detection image of a target object in the frame image is determined, and then, on one hand, the target detection image of the target frame image can be input into a key point identification model, and first position information of a key point of the target object in the target frame image is determined; on the other hand, the positions of the key points of the target object in the historical frame images before the target frame image can be tracked and analyzed according to the motion continuity of the target object in the video data, so that second position information of the key points of the target object in the target frame image is obtained. And then, obtaining attribute information of the key points of the target object in the target frame image according to the first position information and the second position information, wherein compared with a mode of determining the first position information of the key points of the target object only by using a key point identification model, the embodiment also considers continuous motion of the key points of the target object in video data to analyze the second position information, and determines the attribute information of the key points by using the first position information and the second position information, so that the accuracy of key point positioning of the target object can be improved.

On the basis of the foregoing embodiments, an embodiment of the present application further provides a data processing method, as shown in fig. 3, where the method includes:

step 302, video data is obtained, and a frame image in the video data is determined.

And step 304, analyzing the target frame image, and determining an object description frame of the target object in the target frame image.

And step 306, determining the positioning information of the key points of the target object in the image of the object description frame through the object frame identification model.

And 308, determining the object angle information of the target object in the target frame image according to the positioning information.

And 310, performing rotation alignment on the target frame image according to the object angle information, and determining a first detection frame by combining positioning information.

And step 312, performing tracking prediction on a historical frame image of the target frame image, and predicting a second detection frame of the target object in the target frame image, wherein the historical frame image comprises at least one frame of frame image before the target frame image.

Step 314, determining the deviation amount between the first detection frame and the second detection frame, and determining whether the deviation amount exceeds a deviation threshold value.

And step 316, taking the first detection frame as a target detection frame to extract a detection image when the deviation amount exceeds a deviation threshold value.

And step 318, taking the second detection frame as a target detection frame to extract the detection image when the deviation amount does not exceed the deviation threshold.

Step 320, inputting the target detection image corresponding to the target frame image into the key point identification model, and determining first position information of the key point of the target object in the target frame image. As an optional embodiment, the method further comprises a step of training the keypoint recognition model: performing single-frame image training on the key point identification model by using the training input image and the training input label; inputting a training image in training video data into a key point identification model which completes single-frame image training to obtain a training label representing the position of a key point of a target object in the training image; tracking and predicting a historical training image of a target training image, and determining a prediction training label of the target training image, wherein the historical training image comprises at least one frame of training image before the target training image; verifying the reliability of the training labels of the target training images according to the difference between the predicted training labels and the training labels of the target training images; and training the key point recognition model for single-frame image training by using the training labels passing the verification and the corresponding training images.

And 322, tracking and predicting the position of the key point of the target object in a historical frame image, predicting second position information of the key point of the target object in the target frame image, wherein the historical frame image comprises at least one frame of frame image before the target frame image.

Step 324, determining an offset between the first location information and the second location information, and determining whether the offset exceeds a deviation threshold.

And 326, determining the attribute information of the key point of the target object in the target frame image according to the first position information under the condition that the offset exceeds the offset threshold.

And 328, determining the attribute information of the key point of the target object in the target frame image according to the second position information under the condition that the offset does not exceed the offset threshold.

And step 330, performing special effect processing on the frame image in the video data according to the attribute information of the key point of the target object.

In the embodiment of the application, an object description frame of a target object in a target frame image may be determined according to a frame image in video data, then a first detection frame may be determined according to an object frame identification model, tracking prediction may be performed on a historical frame image, a second detection frame may be determined, then the first detection frame or the second detection frame is determined as a target detection frame of the target object, and a detection image corresponding to the target detection frame is extracted, so as to perform positioning of a subsequent key point according to the detection image. After determining the detection image containing the target object, on one hand, the target detection image may be input into a key point identification model, and first position information of a key point of the target object in the target frame image is determined; on the other hand, according to the motion continuity of the target object in the video data, the historical frame images before the target frame image are tracked and analyzed, and the second position information of the key point of the target object in the target frame image is obtained. And then, determining the first position information or the second position information as the attribute information of the key point of the target object, and carrying out special effect processing on the frame image according to the attribute information.

On the basis of the foregoing embodiments, the present application further provides a data processing method, by which a small amount of labeled single-frame image and video data can be used to train a keypoint recognition model with high recognition accuracy, and specifically, as shown in fig. 4, the method includes:

and step 402, performing single-frame image training on the key point identification model by using the training input image and the training input label.

And step 404, inputting the training images in the training video data into the key point identification model which completes the single-frame image training, and obtaining the training labels representing the positions of the key points of the target object in the training images.

And 406, tracking and predicting the historical training image of the target training image, and determining the prediction training label of the training image at the target moment.

And 408, verifying the reliability of the training labels of the target training images according to the difference between the prediction training labels and the training labels of the training images at the target time.

And step 410, training the key point identification model after the single-frame image training is completed by using the training labels passing the verification and the corresponding training images.

The implementation manner of the embodiment of the present application is similar to that of the embodiment described above, and reference may be made to the specific implementation manner of the embodiment described above for the specific implementation manner, which is not described herein again.

The existing key point recognition model is usually trained by adopting a single-frame image (and label thereof), however, if a large number of labeled single-frame images are adopted for training, the cost of manual labeling is too high, and if a small number of labeled single-frame images are adopted for training, the accuracy of the key point recognition model is easily low, so that the embodiment can utilize a small number of labeled single-frame images for preliminary training (or called single-frame image training), and then utilize the preliminarily trained key point recognition model to recognize continuous training images in training video data to determine corresponding training labels. After the training labels corresponding to the training images are determined, tracking and predicting can be performed by using the training labels of the historical training images of the previous frame (or previous frames) of the target training image, predicting the prediction training labels of the target training image, matching the prediction training labels and the training labels of the target training image, determining the difference (the difference can be determined according to the difference between the positions of the key points), and comparing the difference with a preset difference threshold value to check the reliability of the training labels of the target training image, for example, the training labels can be determined to pass the check under the condition that the difference does not exceed the difference threshold value; the training annotation check may be determined to have failed if the amount of difference exceeds a difference threshold. After the training labels are verified, the verification-passed training labels and the corresponding training images can be utilized to train the key point recognition model which completes the preliminary training.

On the basis of the foregoing embodiments, the embodiments of the present application further provide a data processing method, by which key points of a target object can be more accurately and stably located according to consecutive frame images in video data, so that special effect processing is performed on the frame images by using positions of the key points. For example, the method can be applied to a scene for positioning the key points of the players in the event video to extract more accurate images of the players and perform special effect processing on the images to form playback data. Specifically, as shown in fig. 5, the method includes:

step 502, acquiring video data and determining a frame image in the video data.

And 504, recognizing the frame image, and extracting a detection image corresponding to the detection frame, wherein the detection image comprises a target object positioned by the detection frame.

Step 506, determining first position information and second position information of a key point of a target object in a target frame image, wherein the first position information is determined according to a target detection image of the target frame image, the second position information is determined after tracking and predicting the position of the key point of the target object in a historical frame image, and the historical frame image comprises at least one frame of frame image before the target frame image.

And step 508, determining attribute information of the key points of the target object in the target frame image according to the first position information and the second position information.

Step 510, according to the attribute information of the key points of the target object, performing special effect processing on the frame image in the video data.

In the embodiment of the application, the video data can be a road monitoring video, a race video and a live video, the embodiment can identify a frame image in the video data, determine a detection frame of a target object in the frame image, and extract a detection image corresponding to the detection frame, and then, on one hand, can input the target detection image of the target frame image into a key point identification model to determine first position information of a key point of the target object in the target frame image; on the other hand, the positions of the key points of the target object in the historical frame images before the target frame image can be tracked and analyzed according to the motion continuity of the target object in the video data, so that second position information of the key points of the target object in the target frame image is obtained. And then, obtaining attribute information of the key point of the target object in the target frame image according to the first position information and the second position information. Then, the attribute information can be utilized to perform special effect processing on the frame image in the video data, and different special effect processing modes can be adopted to process the frame image according to different target objects, for example, when the target object is a human face, special effect processing such as beauty treatment, makeup treatment and the like can be performed on the human face according to the attribute information of key points of the human face.

On the basis of the above embodiment, the embodiment of the present application further provides a data processing method, which can perform key point positioning on a frame image containing face data in video data, and can improve accuracy of key points of the face data, so that attribute information of the key points of the face data is used to perform beauty processing, makeup processing, and the like on the face data in the frame image, thereby improving beauty and makeup effects and improving image effect of the video data. Specifically, as shown in fig. 6, the method includes:

step 602, acquiring video data, and determining a frame image in the video data.

And step 604, recognizing the frame image, and extracting a detection image corresponding to the detection frame, wherein the detection image comprises face data positioned by the detection frame.

Step 606, determining first position information and second position information of the key point of the face data in the target frame image, wherein the first position information is determined according to the target detection image of the target frame image, the second position information is determined after tracking prediction is carried out on the position of the key point of the face data in a historical frame image, and the historical frame image comprises at least one frame of frame image before the target frame image.

Step 608, determining attribute information of the key point of the face data in the target frame image according to the first position information and the second position information.

Step 610, according to the attribute information of the key points of the face data, performing special effect processing on the frame image in the video data, wherein the special effect processing comprises at least one of beautifying processing, special effect adding processing, makeup processing and beautifying processing.

In the embodiment of the application, frame images in video data can be identified, detection frames of face data in the frame images are determined, and detection images corresponding to the detection frames are extracted. Then, in one aspect, a target detection image of the target frame image may be input into the keypoint identification model, determining first position information of keypoints of the face data in the target frame image; on the other hand, the positions of the key points of the face data in the historical frame images before the target frame image can be tracked and analyzed according to the continuity of the movement of the face data in the video data, and second position information of the key points of the face data in the target frame image can be obtained. And then, obtaining attribute information of the key points of the face data in the target frame image according to the first position information and the second position information. Then, the attribute information can be used to perform special effect processing such as beautifying, face beautifying, makeup beautifying, special effect adding processing and the like on the face.

On the basis of the foregoing embodiment, the present application further provides a data processing method, by which key points of anchor face data can be more accurately and stably located according to continuous frame images in live video data, so that special effect processing is performed on the face data in the frame image by using positions of the key points of the face data, for example, the anchor face data in the frame image is subjected to beauty processing, makeup processing, and the like, thereby improving beauty and makeup effects on the anchor face, and improving an image effect of the live video data. Specifically, as shown in fig. 7, the method includes:

step 702, acquiring live video data, identifying frame images in the live video data, and extracting detection images corresponding to detection frames, wherein the detection images contain face data positioned by the detection frames.

Step 704, determining first position information and second position information of the key point of the face data in the target frame image, wherein the first position information is determined according to the target detection image of the target frame image, the second position information is determined after tracking prediction is carried out on the position of the key point of the face data in a historical frame image, and the historical frame image comprises at least one frame of frame image before the target frame image.

Step 706, determining attribute information of the key point of the face data in the target frame image according to the first position information and the second position information.

Step 708, according to the attribute information of the key points of the face data, performing special effect processing on the frame image in the live video data, wherein the special effect processing comprises at least one of beautifying processing, special effect adding processing, makeup processing and beautifying processing.

In the embodiment of the application, frame images in the live video data can be identified, detection frames of the face data of the anchor in the frame images can be determined, and detection images corresponding to the detection frames can be extracted. Then, on the one hand, a target detection image of the target frame image can be input into the key point identification model, and first position information of key points of the anchor face data in the target frame image is determined; on the other hand, the position of the key point of the face data in the historical frame image before the target frame image can be tracked and analyzed according to the continuity of the movement of the face data in the video data, and second position information of the key point of the face data in the target frame image can be obtained. And then, obtaining attribute information of the key points of the face data in the target frame image according to the first position information and the second position information. Then, the attribute information can be used to perform special effect processing such as beautifying, face beautifying, makeup beautifying, special effect adding processing and the like on the anchor.

On the basis of the foregoing embodiments, the present application further provides a data processing method, which may be applied in a Software-as-a-Service (SaaS) scenario. The above process is packaged as a key point identification service, so that the key point identification service of the video image can be provided for users (such as enterprise users).

Specifically, the processing end (or called service end) can provide an uploading interface of video data and a result issuing interface for a user, an enterprise user can upload corresponding video data through the uploading interface, then the processing end can perform more accurate and stable positioning on key points of a target object according to continuous frame images in the video data to obtain an identification result (attribute information of the key points in the images), then the processing end can issue the identification result of the video data to the enterprise user through the issuing interface, and the enterprise user can perform optimization processing on the video data according to the identification result. In addition, the processing end can also perform corresponding processing (such as adding special effects) on the video data according to the recognition result, and then the processing end issues the processed video data to the user through the issuing interface. Specifically, as shown in fig. 8, the method includes:

step 802, providing a first interface to acquire related video data through the first interface and determine a frame image in the video data.

And step 804, recognizing the frame image, and extracting a detection image corresponding to the detection frame, wherein the detection image comprises a target object positioned by the detection frame.

Step 806, determining first position information and second position information of a key point of the target object in the target frame image, where the first position information is determined according to a target detection image of the target frame image, and the second position information is determined by tracking and predicting a position of the key point of the target object in a historical frame image, where the historical frame image includes at least one frame of frame image before the target frame image.

And 808, determining attribute information of the key point of the target object in the target frame image according to the first position information and the second position information.

And step 810, feeding back the attribute information through a second interface.

The method in this embodiment of the present application may be applied to a processing end, where the processing end may be understood as a platform, the processing end may provide multiple interfaces to provide corresponding services, and a user (e.g., an enterprise) may upload video data through a first interface, where it is to be noted that the first interface may receive video data transferred by the enterprise, or may be directly connected to a video data acquisition device (e.g., a camera) to acquire the video data, and the interface in this embodiment represents a connection (e.g., a hardware connection, a network connection, and the like) between two devices. After the video data is acquired through the first interface, the embodiment can identify the frame image in the video data, determine the detection frame of the target object in the frame image, and extract the detection image corresponding to the detection frame, and then, on one hand, can input the target detection image of the target frame image into the key point identification model, and determine the first position information of the key point of the target object in the target frame image; on the other hand, the positions of the key points of the target object in the historical frame images before the target frame image can be tracked and analyzed according to the motion continuity of the target object in the video data, so that second position information of the key points of the target object in the target frame image is obtained. And then, obtaining attribute information of the key point of the target object in the target frame image according to the first position information and the second position information. Then, in one aspect. The processing end can issue the attribute information of the key point to the user through the second interface, and optimization processing can be performed on the video data on one side of the user according to the identification result, for example, the video data is subjected to beauty treatment and makeup treatment. In addition, the processing end may also perform corresponding processing (e.g., adding a special effect) on the video data according to the attribute information of the key point, and then the processing end issues the processed video data to the user through the second interface.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the embodiments. Further, those skilled in the art will also appreciate that the embodiments described in the specification are presently preferred and that no particular act is required of the embodiments of the application.

On the basis of the foregoing embodiment, this embodiment further provides a data processing apparatus, and with reference to fig. 9, the data processing apparatus may specifically include the following modules:

the video data obtaining module 902 is configured to obtain video data and determine a frame image in the video data.

And a detection image obtaining module 904, configured to identify the frame image and extract a detection image corresponding to the detection frame, where the detection image includes the target object located by the detection frame.

A position information obtaining module 906, configured to determine first position information and second position information of a key point of a target object in a target frame image, where the first position information is determined according to a target detection image of the target frame image, and the second position information is determined after tracking and predicting a position of the key point of the target object in a historical frame image, where the historical frame image includes a frame image of at least one frame before the target frame image.

An attribute information obtaining module 908, configured to determine attribute information of a key point of the target object in the target frame image according to the first location information and the second location information.

In summary, in the embodiment of the present application, frame images in video data may be identified, and a detection image including a target object may be extracted, and then, on one hand, the target detection image of the target frame image may be input into a keypoint identification model, and first position information of a keypoint of the target object in the target frame image is determined; on the other hand, the positions of the key points of the target object in the historical frame image can be tracked and analyzed according to the motion continuity of the target object in the video data, so that second position information of the key points of the target object in the target frame image is obtained. And then, obtaining attribute information of the key points of the target object in the target frame image according to the first position information and the second position information, wherein compared with a mode of determining the first position information of the key points of the target object only by using a key point identification model, the embodiment also considers continuous motion of the key points of the target object in video data to analyze the second position information, and determines the attribute information of the key points according to the first position information and the second position information, so that the accuracy of key point positioning of the target object can be improved.

On the basis of the foregoing embodiment, this embodiment further provides a data processing apparatus, which may specifically include the following modules:

and the video data acquisition processing module is used for acquiring the video data and determining the frame image in the video data.

And the object description frame acquisition processing module is used for analyzing the target frame image and determining the object description frame of the target object in the target frame image.

And the positioning information acquisition processing module is used for determining the positioning information of the key point of the target object in the image of the object description frame through the object frame identification model.

And the object angle acquisition processing module is used for determining the object angle information of the target object in the target frame image according to the positioning information.

And the first detection frame acquisition processing module is used for performing rotation alignment on the target frame image according to the object angle information and determining a first detection frame by combining positioning information.

And the second detection frame acquisition processing module is used for tracking and predicting a historical frame image of the target frame image and predicting a second detection frame of the target object in the target frame image, wherein the historical frame image comprises at least one frame of frame image before the target frame image.

And the deviation amount acquisition processing module is used for determining the deviation amount between the first detection frame and the second detection frame and determining whether the deviation amount exceeds a deviation threshold value.

And the first deviation processing module is used for taking the first detection frame as the target detection frame when the deviation amount exceeds a deviation threshold value.

And the second deviation processing module is used for taking the second detection frame as the target detection frame when the deviation amount does not exceed the deviation threshold value.

And the first position acquisition processing module is used for inputting a target detection image corresponding to the target frame image into the key point identification model and determining first position information of the key point of the target object in the target frame image.

And the second position acquisition processing module is used for tracking and predicting the positions of the key points of the target object in a historical frame image, predicting second position information of the key points of the target object in the target frame image, and the historical frame image comprises at least one frame of frame image before the target frame image.

And the offset acquisition processing module is used for determining the offset between the first position information and the second position information and determining whether the offset exceeds a deviation threshold value.

And the first offset processing module is used for determining the attribute information of the key point of the target object in the target frame image according to the first position information under the condition that the offset exceeds an offset threshold.

And the second offset processing module is used for determining the attribute information of the key point of the target object in the target frame image according to the second position information under the condition that the offset does not exceed the offset threshold.

And the special effect image acquisition processing module is used for carrying out special effect processing on the frame image in the video data according to the attribute information of the key point of the target object.

On the basis of the foregoing embodiment, this embodiment further provides a data processing apparatus, and with reference to fig. 10, the data processing apparatus may specifically include the following modules:

and a model single-frame training module 1002, configured to perform single-frame image training on the keypoint identification model by using the training input image and the training input label.

The training label obtaining module 1004 is configured to input a training image in the training video data into the key point recognition model that completes the single frame image training, and obtain a training label representing a position of a key point of the target object in the training image.

The prediction label obtaining module 1006 is configured to perform tracking prediction on a historical training image of the target training image, and determine a prediction training label of the training image at the target time.

And the training label checking module 1008 is configured to check the reliability of the training label of the target training image according to the difference between the predicted training label and the training label of the training image at the target time.

And the model secondary training module 1010 is used for training the key point identification model which completes the single-frame image training by using the training labels passing the verification and the corresponding training images.

In summary, the existing keypoint recognition model is usually trained by using single-frame images (and their labels), but if a large number of labeled single-frame images are used for training, the cost of manual labeling is too high, and if a small number of labeled single-frame images are used for training, the accuracy of the keypoint recognition model is easily low, so this embodiment may use a small number of labeled single-frame images for preliminary training (or referred to as single-frame image training), and then use the preliminarily trained keypoint recognition model to recognize continuous training images in the training video data, and determine the corresponding training labels. After the training labels corresponding to the training images are determined, tracking and predicting can be performed by using the training labels of the historical training images of the previous frame (or previous frames) of the target training image, predicting the prediction training labels of the target training image, matching the prediction training labels and the training labels of the target training image, determining the difference (the difference can be determined according to the difference between the positions of the key points), and comparing the difference with a preset difference threshold value to check the reliability of the training labels of the target training image, for example, the training labels can be determined to pass the check under the condition that the difference does not exceed the difference threshold value; the training annotation check may be determined to have failed if the amount of difference exceeds a difference threshold. After the training labels are verified, the verification-passed training labels and the corresponding training images can be utilized to train the key point recognition model which completes the preliminary training.

On the basis of the foregoing embodiment, this embodiment further provides a data processing apparatus, and with reference to fig. 11, the data processing apparatus may specifically include the following modules:

the video data determining module 1102 is configured to obtain video data and determine a frame image in the video data.

And the detection image determining module 1104 is configured to identify the frame image and extract a detection image corresponding to the detection frame, where the detection image includes the target object located by the detection frame.

A position information determining module 1106, configured to determine first position information and second position information of a key point of a target object in a target frame image, where the first position information is determined according to a target detection image of the target frame image, and the second position information is determined after tracking and predicting a position of the key point of the target object in a historical frame image, where the historical frame image includes a frame image of at least one frame before the target frame image.

An attribute information determining module 1108, configured to determine attribute information of a key point of the target object in the target frame image according to the first location information and the second location information.

The special effect image determining module 1110 is configured to perform special effect processing on a frame image in the video data according to the attribute information of the key point of the target object.

In summary, in the embodiment of the present application, a frame image in video data may be identified, a detection frame of a target object in the frame image may be determined, and a detection image corresponding to the detection frame may be extracted, and then, on one hand, a target detection image of the target frame image may be input into a keypoint identification model, and first position information of a keypoint of the target object in the target frame image may be determined; on the other hand, the positions of the key points of the target object in the historical frame images before the target frame image can be tracked and analyzed according to the motion continuity of the target object in the video data, so that second position information of the key points of the target object in the target frame image is obtained. And then, obtaining attribute information of the key point of the target object in the target frame image according to the first position information and the second position information. Then, the attribute information can be utilized to perform special effect processing on the frame image in the video data, and different special effect processing modes can be adopted to process the frame image according to different target objects, for example, when the target object is a human face, special effect processing such as beauty treatment, makeup treatment and the like can be performed on the human face according to the attribute information of key points of the human face.

On the basis of the foregoing embodiment, this embodiment further provides a data processing apparatus, and with reference to fig. 12, the data processing apparatus may specifically include the following modules:

the video data obtaining module 1202 is configured to obtain video data and determine a frame image in the video data.

A detection image obtaining module 1204, configured to identify a frame image and extract a detection image corresponding to the detection frame, where the detection image includes face data located by the detection frame.

A position information obtaining module 1206, configured to determine first position information and second position information of a key point of the face data in the target frame image, where the first position information is determined according to a target detection image of the target frame image, and the second position information is determined after tracking prediction is performed on a position of the key point of the face data in a history frame image, where the history frame image includes a frame image of at least one frame before the target frame image.

An attribute information obtaining module 1208, configured to determine attribute information of a key point of the face data in the target frame image according to the first location information and the second location information.

The special effect image obtaining module 1210 is configured to perform special effect processing on a frame image in the video data according to attribute information of key points of the face data, where the special effect processing includes at least one of beautification processing, special effect adding processing, makeup processing, and beauty processing.

In summary, in the embodiment of the present application, frame images in video data may be identified, detection frames of face data in the frame images are determined, and detection images corresponding to the detection frames are extracted. Then, on the one hand, a target detection image of the target frame image can be input into the key point identification model, and first position information of the key points of the face data in the target frame image is determined; on the other hand, the position of the key point of the face data in the historical frame image before the target frame image can be tracked and analyzed according to the continuity of the movement of the face data in the video data, and second position information of the key point of the face data in the target frame image can be obtained. And then, obtaining attribute information of the key points of the face data in the target frame image according to the first position information and the second position information. Then, the attribute information can be used to perform special effect processing such as beautifying, face beautifying, makeup beautifying, special effect adding processing and the like on the face.

On the basis of the foregoing embodiment, this embodiment further provides a data processing apparatus, and with reference to fig. 13, the data processing apparatus may specifically include the following modules:

the video data obtaining module 1302 is configured to obtain live video data, identify a frame image in the live video data, and extract a detection image corresponding to the detection frame, where the detection image includes face data located by the detection frame.

A position information obtaining module 1304, configured to determine first position information and second position information of a key point of the face data in the target frame image, where the first position information is determined according to a target detection image of the target frame image, and the second position information is determined after tracking and predicting a position of the key point of the face data in a history frame image, where the history frame image includes a frame image of at least one frame before the target frame image.

An attribute information obtaining module 1306, configured to determine attribute information of a key point of the face data in the target frame image according to the first location information and the second location information.

A special effect image obtaining module 1308, configured to perform special effect processing on a frame image in the live video data according to the attribute information of the key point of the face data, where the special effect processing includes at least one of beautification processing, special effect adding processing, makeup processing, and facial beautification processing.

In summary, in the embodiment of the present application, a frame image in the live video data may be identified, a detection frame of the face data of the anchor in the frame image may be determined, and a detection image corresponding to the detection frame may be extracted. Then, on the one hand, a target detection image of the target frame image can be input into the key point identification model, and first position information of key points of the anchor face data in the target frame image is determined; on the other hand, the position of the key point of the face data in the historical frame image before the target frame image can be tracked and analyzed according to the continuity of the movement of the face data in the video data, and second position information of the key point of the face data in the target frame image can be obtained. And then, obtaining attribute information of the key points of the face data in the target frame image according to the first position information and the second position information. Then, the attribute information can be used to perform special effect processing such as beautifying, face beautifying, makeup beautifying, special effect adding processing and the like on the anchor.

On the basis of the foregoing embodiment, this embodiment further provides a data processing apparatus, and with reference to fig. 14, the data processing apparatus may specifically include the following modules:

the service providing module 1402 is configured to provide a first interface, so as to obtain the relevant video data through the first interface and determine a frame image in the video data.

The service processing module 1404 is configured to identify a frame image, and extract a detection image corresponding to a detection frame, where the detection image includes a target object located by the detection frame; determining first position information and second position information of a key point of a target object in a target frame image, wherein the first position information is determined according to a target detection image of the target frame image, the second position information is determined after tracking and predicting the position of the key point of the target object in a historical frame image, and the historical frame image comprises at least one frame of frame image before the target frame image; and determining attribute information of the key points of the target object in the target frame image according to the first position information and the second position information.

And a result feedback module 1406, configured to feed back the attribute information through the second interface.

The embodiment of the application can be applied to a processing end, the processing end can be understood as a platform, the processing end can provide a plurality of interfaces to provide corresponding services, and a user (such as an enterprise) can upload video data through a first interface. After the video data is acquired through the first interface, the embodiment can identify the frame image in the video data, determine the detection frame of the target object in the frame image, and extract the detection image corresponding to the detection frame, and then, on one hand, can input the target detection image of the target frame image into the key point identification model, and determine the first position information of the key point of the target object in the target frame image; on the other hand, the positions of the key points of the target object in the historical frame images before the target frame image can be tracked and analyzed according to the motion continuity of the target object in the video data, so that second position information of the key points of the target object in the target frame image is obtained. And then, obtaining attribute information of the key point of the target object in the target frame image according to the first position information and the second position information. Then, in one aspect. The processing end can issue the attribute information of the key point to the user through the second interface, and optimization processing can be performed on the video data on one side of the user according to the identification result, for example, the video data is subjected to beauty treatment and makeup treatment. In addition, the processing end may also perform corresponding processing (e.g., adding a special effect) on the video data according to the attribute information of the key point, and then the processing end issues the processed video data to the user through the second interface.

The present application further provides a non-transitory, readable storage medium, where one or more modules (programs) are stored, and when the one or more modules are applied to a device, the device may execute instructions (instructions) of method steps in this application.

Embodiments of the present application provide one or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause an electronic device to perform the methods as described in one or more of the above embodiments. In the embodiment of the application, the electronic device includes a server, a terminal device and other devices.

Embodiments of the present disclosure may be implemented as an apparatus, which may comprise a server (cluster), a terminal, etc., electronic device, using any suitable hardware, firmware, software, or any combination thereof, in a desired configuration. Fig. 15 schematically illustrates an example apparatus 1500 that may be used to implement various embodiments described herein.

For one embodiment, fig. 15 illustrates an example apparatus 1500 having one or more processors 1502, a control module (chipset) 1504 coupled to at least one of the processor(s) 1502, a memory 1506 coupled to the control module 1504, a non-volatile memory (NVM)/storage 1508 coupled to the control module 1504, one or more input/output devices 1510 coupled to the control module 1504, and a network interface 1512 coupled to the control module 1504.

The processor 1502 may include one or more single-core or multi-core processors, and the processor 1502 may include any combination of general-purpose or special-purpose processors (e.g., graphics processors, application processors, baseband processors, etc.). In some embodiments, the apparatus 1500 can be used as a server, a terminal, or the like in the embodiments of the present application.

In some embodiments, the apparatus 1500 may include one or more computer-readable media (e.g., the memory 1506 or the NVM/storage 1508) having instructions 1514 and one or more processors 1502 configured to execute the instructions 1514 to implement modules to perform the actions described in this disclosure.

For one embodiment, the control module 1504 may include any suitable interface controller to provide any suitable interface to at least one of the processor(s) 1502 and/or any suitable device or component in communication with the control module 1504.

The control module 1504 may include a memory controller module to provide an interface to the memory 1506. The memory controller module may be a hardware module, a software module, and/or a firmware module.

The memory 1506 may be used, for example, to load and store data and/or instructions 1514 for the apparatus 1500. For one embodiment, memory 1506 may comprise any suitable volatile memory, such as suitable DRAM. In some embodiments, the memory 1506 may comprise a double data rate type four synchronous dynamic random access memory (DDR4 SDRAM).

For one embodiment, the control module 1504 may include one or more input/output controllers to provide an interface to the NVM/storage 1508 and the input/output device(s) 1510.

For example, NVM/storage 1508 may be used to store data and/or instructions 1514. NVM/storage 1508 may include any suitable non-volatile memory (e.g., flash memory) and/or may include any suitable non-volatile storage device(s) (e.g., one or more Hard Disk Drives (HDDs), one or more Compact Disc (CD) drives, and/or one or more Digital Versatile Disc (DVD) drives).

NVM/storage 1508 may include storage resources that are part of the device on which apparatus 1500 is installed, or it may be accessible by the device and may not necessarily be part of the device. For example, the NVM/storage 1508 may be accessible over a network via the input/output device(s) 1510.

The input/output device(s) 1510 may provide an interface for the apparatus 1500 to communicate with any other suitable device, and the input/output device(s) 1510 may include communication components, audio components, sensor components, and the like. The network interface 1512 may provide an interface for the apparatus 1500 to communicate over one or more networks, and the apparatus 1500 may wirelessly communicate with one or more components of a wireless network according to any of one or more wireless network standards and/or protocols, such as access to a communication standard-based wireless network, e.g., WiFi, 2G, 3G, 4G, 5G, etc., or a combination thereof.

For one embodiment, at least one of the processor(s) 1502 may be packaged together with logic for one or more controller(s) (e.g., memory controller module) of the control module 1504. For one embodiment, at least one of the processor(s) 1502 may be packaged together with logic for one or more controller(s) of control module 1504 to form a System In Package (SiP). For one embodiment, at least one of the processor(s) 1502 may be integrated on the same die with the logic of one or more controllers of the control module 1504. For one embodiment, at least one of the processor(s) 1502 may be integrated on the same die with logic for one or more controller(s) of control module 1504 to form a system on a chip (SoC).

In various embodiments, the apparatus 1500 may be, but is not limited to being: a server, a desktop computing device, or a mobile computing device (e.g., a laptop computing device, a handheld computing device, a tablet, a netbook, etc.), among other terminal devices. In various embodiments, apparatus 1500 may have more or fewer components and/or different architectures. For example, in some embodiments, device 1500 includes one or more cameras, a keyboard, a Liquid Crystal Display (LCD) screen (including a touch screen display), a non-volatile memory port, multiple antennas, a graphics chip, an Application Specific Integrated Circuit (ASIC), and speakers.

The detection device can adopt a main control chip as a processor or a control module, sensor data, position information and the like are stored in a memory or an NVM/storage device, a sensor group can be used as an input/output device, and a communication interface can comprise a network interface.

An embodiment of the present application further provides an electronic device, including: a processor; and a memory having executable code stored thereon that, when executed, causes the processor to perform a method as described in one or more of the embodiments of the application.

Embodiments of the present application also provide one or more machine-readable media having executable code stored thereon that, when executed, cause a processor to perform a method as described in one or more of the embodiments of the present application.

For the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and reference may be made to the partial description of the method embodiment for relevant points.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the true scope of the embodiments of the application.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The foregoing detailed description has provided a data processing method, a data processing apparatus, an electronic device, and a storage medium, and the principles and embodiments of the present application are described herein using specific examples, which are only used to help understand the method and the core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method of data processing, the method comprising:

acquiring video data and determining a frame image in the video data;

identifying a frame image, and extracting a detection image corresponding to a detection frame, wherein the detection image comprises a target object positioned by the detection frame;

determining first position information and second position information of a key point of a target object in a target frame image, wherein the first position information is determined according to a target detection image of the target frame image, the second position information is determined after tracking and predicting the position of the key point of the target object in a historical frame image, and the historical frame image comprises at least one frame of frame image before the target frame image;

and determining attribute information of the key points of the target object in the target frame image according to the first position information and the second position information.

2. The method of claim 1, wherein determining the first position information and the second position information of the key point of the target object in the target frame image comprises:

inputting a target detection image corresponding to a target frame image into a key point identification model, and determining first position information of a key point of a target object in the target frame image;

and tracking and predicting the positions of the key points of the target object in the historical frame images, and predicting second position information of the key points of the target object in the target frame images.

3. The method according to claim 1, wherein the determining attribute information of the key point of the target object in the target frame image according to the first position information and the second position information comprises:

determining an offset between the first location information and the second location information;

determining attribute information of key points of the target object in the target frame image according to the first position information under the condition that the offset exceeds an offset threshold;

and determining attribute information of the key point of the target object in the target frame image according to the second position information under the condition that the offset does not exceed the offset threshold.

4. The method according to claim 1, wherein the identifying the frame image and extracting the detection image corresponding to the detection frame comprises:

analyzing the target frame image, and determining an object description frame of the target object in the target frame image;

analyzing the image in the object description frame through an object frame identification model, and determining a first detection frame of the target object in the target frame image;

tracking and predicting a historical frame image of a target frame image, and predicting a second detection frame of a target object in the target frame image, wherein the historical frame image comprises at least one frame of frame image before the target frame image;

and determining a target detection frame of the target object in the target frame image according to the first detection frame and the second detection frame, and extracting a detection image corresponding to the target detection frame.

5. The method of claim 4, wherein analyzing the image in the object description frame by the object frame recognition model to determine a first detection frame of the target object in the target frame image comprises:

determining the positioning information of the key points of the target object in the image of the object description frame through the object frame identification model;

determining object angle information of the target object in the target frame image according to the positioning information;

and carrying out rotation alignment on the target frame image according to the object angle information, and determining a first detection frame by combining positioning information.

6. The method according to claim 4, wherein the determining a target detection frame of the target object in the target frame image according to the first detection frame and the second detection frame comprises:

determining a deviation amount between the first detection frame and the second detection frame;

taking a first detection frame as a target detection frame when the deviation amount exceeds a deviation threshold;

and when the deviation amount does not exceed the deviation threshold value, taking the second detection frame as a target detection frame.

7. The method of claim 2, further comprising the step of training the keypoint recognition model:

performing single-frame image training on the key point recognition model by using the training input image and the training input label;

inputting a training image in training video data into a key point identification model which completes single-frame image training to obtain a training label representing the position of a key point of a target object in the training image;

tracking and predicting a historical training image of a target training image, and determining a prediction training label of the target training image, wherein the historical training image comprises at least one frame of training image before the target training image;

verifying the reliability of the training labels of the target training images according to the difference between the predicted training labels and the training labels of the target training images;

and training by using the training labels passing the verification and the corresponding training images to finish the key point recognition model of the single-frame image training.

8. The method of claim 1, further comprising:

and carrying out special effect processing on the frame image in the video data according to the attribute information of the key points of the target object.

9. A data processing method, comprising:

performing single-frame image training on the key point identification model by using the training input image and the training input label;

tracking and predicting a historical training image of a target training image, and determining a prediction training label of the training image at the target moment;

verifying the reliability of the training labels of the target training images according to the difference between the predicted training labels and the training labels of the training images at the target moment;

10. A data processing method, comprising:

acquiring video data and determining a frame image in the video data;

determining attribute information of key points of the target object in the target frame image according to the first position information and the second position information;

and carrying out special effect processing on the frame image in the video data according to the attribute information of the key point of the target object.

11. A data processing method, comprising:

acquiring video data and determining a frame image in the video data;

identifying a frame image, and extracting a detection image corresponding to a detection frame, wherein the detection image comprises face data positioned by the detection frame;

determining first position information and second position information of key points of face data in a target frame image, wherein the first position information is determined according to a target detection image of the target frame image, the second position information is determined after tracking and predicting the positions of the key points of the face data in a historical frame image, and the historical frame image comprises at least one frame of frame image before the target frame image;

determining attribute information of key points of the face data in the target frame image according to the first position information and the second position information;

and performing special effect processing on the frame image in the video data according to the attribute information of the key points of the face data, wherein the special effect processing comprises at least one of beautifying processing, special effect adding processing, makeup processing and beautifying processing.

12. A data processing method, comprising:

acquiring live video data, identifying frame images in the live video data, and extracting detection images corresponding to detection frames, wherein the detection images comprise face data positioned by the detection frames;

and performing special effect processing on the frame image in the live video data according to the attribute information of the key points of the face data, wherein the special effect processing comprises at least one of beautifying processing, special effect adding processing, makeup processing and beautifying processing.

13. A data processing method, comprising:

providing a first interface to acquire related video data through the first interface and determine a frame image in the video data;

and feeding back the attribute information through a second interface.

14. A data processing apparatus, comprising:

the video data acquisition module is used for acquiring video data and determining a frame image in the video data;

the detection image acquisition module is used for identifying the frame image and extracting a detection image corresponding to the detection frame, wherein the detection image comprises a target object positioned by the detection frame;

the system comprises a position information acquisition module, a position information acquisition module and a position information acquisition module, wherein the position information acquisition module is used for determining first position information and second position information of key points of a target object in a target frame image, the first position information is determined according to a target detection image of the target frame image, the second position information is determined after tracking and predicting the positions of the key points of the target object in a historical frame image, and the historical frame image comprises at least one frame of frame image before the target frame image;

and the attribute information acquisition module is used for determining the attribute information of the key point of the target object in the target frame image according to the first position information and the second position information.

15. A data processing apparatus, comprising:

the model single-frame training module is used for performing single-frame image training on the key point identification model by using the training input image and the training input label;

the training label acquisition module is used for inputting a training image in the training video data into the key point identification model which completes the single-frame image training to obtain a training label representing the position of the key point of the target object in the training image;

the prediction annotation acquisition module is used for tracking and predicting the historical training images of the target training images and determining the prediction training annotations of the training images at the target moment;

the training label checking module is used for checking the reliability of the training label of the target training image according to the difference between the predicted training label and the training label of the training image at the target moment;

and the model secondary training module is used for training the key point identification model which finishes the single-frame image training by using the training labels passing the verification and the corresponding training images.

16. An electronic device, comprising: a processor; and

memory having stored thereon executable code which, when executed, causes the processor to perform the method of one or more of claims 1-13.

17. One or more machine-readable media having executable code stored thereon that, when executed, causes a processor to perform the method of one or more of claims 1-13.