CN113128448A

CN113128448A - Video matching method, device and equipment based on limb identification and storage medium

Info

Publication number: CN113128448A
Application number: CN202110473266.8A
Authority: CN
Inventors: 刘静
Original assignee: Ping An International Smart City Technology Co Ltd
Current assignee: Ping An International Smart City Technology Co Ltd
Priority date: 2021-04-29
Filing date: 2021-04-29
Publication date: 2021-07-16

Abstract

The invention discloses a video matching method, a device, computer equipment and a storage medium based on limb identification, wherein the method comprises the steps of obtaining a video to be matched and a standard video; inputting a video to be matched into a preset limb classification model to determine whether the video to be matched is a normal matching video; if the video to be matched is a normal matching video, respectively inputting the standard video and the video to be matched into a preset gesture recognition model to obtain each standard limb sequence and each limb sequence to be matched; determining a key frame index sequence corresponding to the standard video by an extremum key frame identification method according to each standard limb sequence; determining a plurality of matching index sequences matched with the key frame index sequence in each limb sequence to be matched and a sequence matching value corresponding to the matching index sequences by a maximum mean value video segment identification method; and determining a video matching result of the video to be matched according to the sequence matching value corresponding to each matching index sequence. The invention improves the efficiency and accuracy of video matching based on limb identification.

Description

Video matching method, device and equipment based on limb identification and storage medium

Technical Field

The invention relates to the technical field of image matching, in particular to a video matching method and device based on limb identification, computer equipment and a storage medium.

Background

In the prior art, a gesture recognition model is introduced for limb action matching for judgment, for example, in a dance video scoring system, a dance score is evaluated by a recognized limb point position approximate similarity method; however, the above method has the following disadvantages: firstly, when an unmanned landscape video appears, the phenomenon of wrong identification of the body points of the human body can be identified; secondly, by adopting a method of calculating the approximate similarity of the positions of the limbs, the difference information in the two videos is weakened, so that the difference of the dance scores evaluated finally is not large; thirdly, the initial frames of the two video motions are aligned, that is, the starting times of the two video motions cannot be aligned uniformly, so that the accuracy rate of final limb motion matching is low.

Disclosure of Invention

The embodiment of the invention provides a video matching method and device based on limb identification, computer equipment and a storage medium, and aims to solve the problem of low accuracy of limb action matching.

A video matching method based on limb identification comprises the following steps:

acquiring a video to be matched and a standard video corresponding to the video to be matched; the video to be matched comprises at least one frame of image to be matched; the standard video comprises at least one frame of standard image;

inputting the video to be matched into a preset limb classification model to determine whether the video to be matched is a normal matching video;

if the video to be matched is a normal matching video, the standard video and the video to be matched are respectively input into a preset gesture recognition model, so that a standard limb sequence corresponding to each frame of standard image and a limb sequence to be matched corresponding to each frame of image to be matched are obtained;

determining a key frame index sequence corresponding to the standard video by an extremum key frame identification method according to each standard limb sequence;

determining a plurality of matching index sequences matched with the key frame index sequence in each limb sequence to be matched and a sequence matching value corresponding to the matching index sequences by a maximum mean value video segment identification method;

and determining a video matching result of the video to be matched according to the sequence matching value corresponding to each matching index sequence.

A limb recognition-based video matching device, comprising:

the standard video acquisition module is used for acquiring a video to be matched and a standard video corresponding to the video to be matched; the video to be matched comprises at least one frame of image to be matched; the standard video comprises at least one frame of standard image;

the matching video judging module is used for inputting the video to be matched to a preset limb classification model so as to determine whether the video to be matched is a normal matching video;

the gesture recognition module is used for respectively inputting the standard video and the video to be matched into a preset gesture recognition model if the video to be matched is a normal matching video, so as to obtain a standard limb sequence corresponding to each frame of standard image and a limb sequence to be matched corresponding to each frame of image to be matched;

a key frame identification module, configured to determine, according to each of the standard limb sequences, a key frame index sequence corresponding to the standard video by an extremum key frame identification method;

the sequence matching value determining module is used for determining a plurality of matching index sequences matched with the key frame index sequence in each limb sequence to be matched and a sequence matching value corresponding to the matching index sequences through a maximum mean value video segment identification method;

and the video matching result determining module is used for determining the video matching result of the video to be matched according to the sequence matching value corresponding to each matching index sequence.

A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the limb identification-based video matching method when executing the computer program.

A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, implements the above-mentioned limb recognition-based video matching method.

According to the video matching method based on limb identification, the device, the computer equipment and the storage medium, the method comprises the steps of obtaining a video to be matched and a standard video corresponding to the video to be matched; the video to be matched comprises at least one frame of image to be matched; the standard video comprises at least one frame of standard image; inputting the video to be matched into a preset limb classification model to determine whether the video to be matched is a normal matching video; if the video to be matched is a normal matching video, the standard video and the video to be matched are respectively input into a preset gesture recognition model, so that a standard limb sequence corresponding to each frame of standard image and a limb sequence to be matched corresponding to each frame of image to be matched are obtained; determining a key frame index sequence corresponding to the standard video by an extremum key frame identification method according to each standard limb sequence; determining a plurality of matching index sequences matched with the key frame index sequence in each limb sequence to be matched and a sequence matching value corresponding to the matching index sequences by a maximum mean value video segment identification method; and determining a video matching result of the video to be matched according to the sequence matching value corresponding to each matching index sequence.

The method for aligning the two segments of videos is characterized in that an extreme value key frame identification method and a maximum mean value video segment identification method are combined, and then the aligned segments are found out through the maximum limb element and the minimum limb element, and then the corresponding frame similarity of low dimensionality is calculated; the accuracy rate of video matching is improved; the invention also introduces a preset limb classification model, when the video to be matched is not the normal matching video, the follow-up steps are not required to be executed to determine the video matching result, so that the computational complexity of a computer is reduced, and the efficiency and the accuracy of video matching are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 is a schematic diagram of an application environment of a video matching method based on limb identification according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method for limb identification based video matching in an embodiment of the invention;

FIG. 3 is a flowchart of step S40 of the limb identification-based video matching method according to an embodiment of the present invention;

FIG. 4 is a flowchart of step S50 of the limb identification-based video matching method according to an embodiment of the present invention;

FIG. 5 is a flowchart of step S503 of the video matching method based on limb identification according to an embodiment of the present invention;

FIG. 6 is a schematic block diagram of a video matching apparatus based on limb recognition according to an embodiment of the present invention;

FIG. 7 is a schematic block diagram of a key frame identification module in the video matching apparatus based on limb identification according to an embodiment of the present invention;

FIG. 8 is a schematic block diagram of the sequence matching value determination module in the video matching apparatus based on limb recognition according to an embodiment of the present invention;

FIG. 9 is a schematic block diagram of a sliding matching unit in the video matching device based on limb recognition according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of a computer device according to an embodiment of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The limb identification-based video matching method provided by the embodiment of the invention can be applied to the application environment shown in fig. 1. Specifically, the video matching method based on limb identification is applied to a video matching system based on limb identification, the video matching system based on limb identification comprises a client and a server shown in fig. 1, and the client and the server are communicated through a network and used for solving the problem of low accuracy of limb action matching. The client is also called a user side, and refers to a program corresponding to the server and providing local services for the client. The client may be installed on, but is not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices. The server may be implemented as a stand-alone server or as a server cluster consisting of a plurality of servers.

In an embodiment, as shown in fig. 2, a video matching method based on limb identification is provided, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:

s10: acquiring a video to be matched and a standard video corresponding to the video to be matched; the video to be matched comprises at least one frame of image to be matched; the standard video comprises at least one frame of standard image;

as will be appreciated, standard video refers to video that is verified to be error free; the video to be matched refers to the video to be matched; for example, in a dance video scoring scene, a standard video may be a demonstration video of a dance motion without errors; the video to be matched can be the video shot by the dancer in the standard video for different students to learn. The to-be-matched image refers to an image containing limb movement in the to-be-matched video, and in the embodiment, the to-be-matched video is obtained by at least one frame of to-be-matched image containing limb movement, that is, the to-be-matched video not containing any limb movement is, for example, manually screened, or removed in advance by using a preset limb classification model in step S20. The standard image refers to an image containing the motion of the limb in a standard video.

S20: inputting the video to be matched into a preset limb classification model to determine whether the video to be matched is a normal matching video;

the preset limb classification model is a two-classification model based on a neural network, and can identify whether the image to be matched contains a human body or a limb action; exemplarily, assuming that the image to be matched includes a human body or a limb motion, the preset limb classification model labels the image to be matched as 1, that is, classifies the image to be matched into a category including the human body or the limb motion; if the image to be matched does not contain the human body or the limb movement, the preset limb classification model labels the image to be matched as 0, that is, classifies the image to be matched into the category which does not contain the human body or the limb movement.

Further, the preset limb classification model in this embodiment may be trained based on the preset gesture recognition model in step S30, that is, after the standard videos are input to the preset gesture recognition model, the preset gesture recognition model outputs a standard limb sequence corresponding to the standard image including the limb motion, and further inputs the standard limb sequence into the preset limb classification model, so that the preset limb classification model recognizes and labels the standard limb sequence.

In one embodiment, step S20 includes:

determining whether each frame of image to be matched in the video to be matched contains limb actions or not through the preset limb classification model;

acquiring a first total number of images to be matched containing limb actions and a second total number of all images to be matched in the video to be matched;

as will be understood, the first total number refers to the total number of images to be matched containing the movement of the limb in the video to be matched; the second total number refers to the total number of all images to be matched in the video to be matched.

And when the ratio of the first total number to the second total number is greater than or equal to a preset number ratio, determining that the video to be matched is a normal matching video.

The preset number ratio may be determined according to the accuracy requirement of the application scenario, and for example, the preset number ratio may be set to 80%, 90%, or the like.

Specifically, after determining whether each frame of images to be matched in the video to be matched contains limb actions through a preset limb classification model, recording a first total number of the images to be matched containing the limb actions and a second total number of the images to be matched; and comparing the ratio of the first total number to the second total number with a preset number ratio, and if the ratio of the first total number to the second total number is greater than or equal to the preset number ratio, representing that the images to be matched containing the limb actions in the videos to be matched meet the number requirement, namely determining that the videos to be matched are normal matching videos. If the ratio of the first total number to the second total number is smaller than the preset number ratio, the number of images to be matched containing limb actions in the video to be matched is represented to be less, and the number of the images to be matched is not met, so that the video to be matched is determined not to be a normal matching video.

Further, if the video to be matched is not the normally matched video, a video uploading error instruction is sent to the preset receiving party, so that the preset receiving party updates the video to be matched.

If the ratio of the first total number to the second total number is smaller than the preset number ratio, determining that the video to be matched is an abnormal matching video, namely representing that the video to be matched contains fewer image frames to be matched with limb actions, and sending a video uploading error instruction to a preset receiver so as to enable the preset receiver to update the video to be matched; the updating mode may be to upload a new video to be matched again. The preset receiver may be a party sending a video to be matched.

S30: if the video to be matched is a normal matching video, the standard video and the video to be matched are respectively input into a preset gesture recognition model, so that a standard limb sequence corresponding to each frame of standard image and a limb sequence to be matched corresponding to each frame of image to be matched are obtained;

as can be understood, the preset gesture recognition model is used for recognizing the limb movement of the image to be matched in the video to be matched and the limb movement of the standard image in the standard video. The standard limb sequence refers to a sequence combination of coordinate positions of all limb elements in the standard image, and the limb sequence to be matched refers to a sequence combination of coordinate positions of all limb elements in the image to be matched.

Further, the limb elements in this embodiment include, but are not limited to, a head limb element, a left shoulder limb element, a right shoulder limb element, a left elbow limb element, a right elbow limb element, a left wrist limb element, a right wrist limb element, a left hip limb element, a right hip limb element, a left knee limb element, a right knee limb element, a left ankle limb element, and a right ankle limb element. For each frame of standard image, a standard limb sequence including the coordinate positions of the 13 limb elements corresponds, and the standard limb sequence may be, for example: [ (x00, y00), (x01, y01), (x02, y02), (x03, y03), (x04, y04), (x05, y05), …, (x013, y013) ].

Specifically, if the video to be matched is a normal matching video, the standard video and the video to be matched are respectively input into the preset gesture recognition model, so that the preset gesture recognition model recognizes the image to be matched and the standard image containing limb actions, and limb element coordinate position labeling is performed on each standard image containing limb actions and the image to be matched, so as to output a standard limb sequence corresponding to each frame of standard image and a limb sequence to be matched corresponding to each frame of image to be matched.

S40: determining a key frame index sequence corresponding to the standard video by an extremum key frame identification method according to each standard limb sequence;

it is understood that the extreme keyframe identification method refers to a method of determining the maximum and minimum values of each limb element from all standard limb sequences.

In one embodiment, as shown in fig. 3, in step S40, the standard limb sequence is associated with an image frame tag; that is, the determining, according to each standard limb sequence, a key frame index sequence corresponding to the standard video by an extremum key frame identification method includes:

s401: acquiring standard limb elements in all the standard limb sequences, wherein each standard limb sequence comprises a preset number of types of standard limb elements, and each type of standard limb element in each standard limb sequence is only one;

it is to be understood that the limb elements indicated in step S30 in the present embodiment include, but are not limited to, a head limb element, a left shoulder limb element, a right shoulder limb element, a left elbow limb element, a right elbow limb element, a left wrist limb element, a right wrist limb element, a left hip limb element, a right hip limb element, a left knee limb element, a right knee limb element, a left ankle limb element, and a right ankle limb element. Thus, the standard limb elements in the standard limb sequence are also all of the limb elements described above. In this embodiment, the preset number category is 13 categories; further, for one frame of standard image containing limb movement (only one frame of standard image containing limb movement is discussed in this embodiment to contain only one individual limb movement), one standard limb sequence corresponds to one standard limb sequence, so that each standard limb element of one standard limb sequence is only one, that is, the coordinate information of each standard limb element is one and only one in one frame of standard image.

S402: extracting the maximum limb element and the minimum limb element in each type of standard limb elements;

it is understood that the maximum limb element refers to the maximum value of the two-dimensional coordinate information in each type of standard limb element, and the minimum limb element refers to the minimum value of the coordinate information in each type of standard limb element. Further, for different types of standard videos and videos to be matched, different methods for extracting the maximum limb element and the minimum limb element may exist, for example, when it is assumed that there is a little motion of squatting and jumping in the change of the limb motion in the standard videos and the videos to be matched, such as square dance, the horizontal coordinate information in the two-dimensional coordinate information in each standard limb element may be only considered, that is, the maximum value in the horizontal coordinate information is extracted as the maximum limb element, and the minimum value in the horizontal coordinate information is extracted as the minimum limb element; if the standard video and the video to be matched have a lot of squat and jump actions in the limb action change, such as a martial art action video, the transverse coordinate information and the longitudinal coordinate information in the two-dimensional coordinate information in each standard limb element can be comprehensively considered, so that the maximum limb element and the minimum limb element in each type of standard limb elements are determined according to the transverse coordinate information and the longitudinal coordinate information in the two-dimensional coordinate information in each standard limb element.

S403: generating a maximum frame index sequence according to all the extracted maximum limb elements and the image frame labels corresponding to the maximum limb elements, and simultaneously generating a minimum frame index sequence according to all the extracted minimum limb elements and the image frame labels corresponding to the minimum limb elements;

it is understood that the maximum limb element is determined from the standard limb elements in all the standard limb sequences, and each standard limb sequence corresponds to one frame of standard image, so that the maximum limb element is associated with one frame of standard image, and therefore the image frame tag corresponding to each maximum limb element refers to the frame number of the standard image corresponding to each maximum limb element, namely the frame ordering of the standard image in the standard video. Illustratively, assuming that one of the largest limb elements of a class is one limb element from the 16 th frame standard image of the standard video, the image frame corresponding to the largest limb element is labeled as 16 frames. Similarly, the image frame label corresponding to each minimum limb element refers to the number of frames of the standard image corresponding to each minimum limb element. Further, since the extreme value key frame identification method is adopted in this embodiment, each type of standard limb element may appear in different standard images, but at the same time, it may also exist that image frame tags corresponding to a plurality of standard limb elements are the same, that is, a plurality of types of maximum limb elements or a plurality of types of minimum limb elements may appear in one frame of standard image at the same time.

S404: and carrying out sequence merging processing on the maximum frame index sequence and the minimum frame index sequence to obtain the key frame index sequence.

Specifically, after generating a maximum frame index sequence according to all the extracted maximum limb elements and the image frame tags corresponding to the maximum limb elements, and simultaneously generating a minimum frame index sequence according to all the extracted minimum limb elements and the image frame tags corresponding to the minimum limb elements, the maximum frame index sequence and the minimum frame index sequence are merged to obtain a merged key frame index sequence, that is, the key frame index sequence includes all the maximum limb elements and the minimum limb elements, and at the same time, the image frame tags corresponding to the maximum limb elements are attached, and the image frame tags are used as frame indexes, so that the matching index sequence is determined in step S50.

S50: determining a plurality of matching index sequences matched with the key frame index sequence in each limb sequence to be matched and a sequence matching value corresponding to the matching index sequences by a maximum mean value video segment identification method;

it can be understood that the matching index sequence refers to a sequence that is matched with each maximum limb element and each minimum limb element in the key frame index sequence in the limb sequence to be matched; the sequence matching value refers to a matching degree between the matching index sequence and the element video segment of the corresponding maximum limb element, and the sequence matching value can be determined by a cosine similarity algorithm.

In one embodiment, as shown in fig. 4, step S50 includes:

s501: selecting a first limb element video segment corresponding to each maximum limb element from the standard video according to the image frame label corresponding to each maximum limb element; the first limb element video segment comprises a standard image of a first preset number of frames;

for example, assuming that the image frame tag corresponding to one type of the largest body element is the 16 th frame standard image in the standard video, the selected first body element video segment may be a video segment composed of 20 frames of standard images in the standard video starting from the 16 th frame standard image, that is, a video segment composed of standard images from the 16 th frame to the 35 th frame in the standard video is the first body element video segment.

S502: selecting a second limb element video segment corresponding to each maximum limb element from the video to be matched according to the image frame label corresponding to each maximum limb element; the second limb element video segment comprises images to be matched of a second preset number of frames; the second preset number of frames is greater than the first preset number of frames;

exemplarily, assuming that the image frame tag corresponding to one type of the largest limb element is the 16 th frame standard image in the standard video, the selected second limb element video segment may be a video segment composed of 20 frames to be matched backward from the 6 th frame to be matched image in the video to be matched as the starting frame, and 20 frames to be matched backward from the 26 th frame to be matched image as the starting frame, that is, a video segment composed of the 6 th to 45 th frames to be matched images in the video to be matched is the second limb element video segment.

S503: sliding the first limb element video segments on the second limb element video segments to determine a sliding matching sequence in the second limb element video segments corresponding to each of the first limb element video segments;

it is to be understood that the sliding matching process is a matching process of the first limb element video segment and the second limb element video segment to determine a sliding matching sequence corresponding to each first limb element video segment in the second limb element video segment.

In one embodiment, as shown in fig. 5, step S503 includes:

s5031: displaying the images to be matched in each frame of the second limb element video segment on a preset video time axis according to a time sequence;

s5032: taking the first limb element video segment as a matching sliding window, and recording the second limb element video segment which is the same as the first preset number of frames as a first matching sequence after aligning the matching sliding window with the starting frame of the second limb element video segment;

it can be understood that, after the first limb element video segment is taken as the matching sliding window, the length of the matching sliding window is the length of the first preset number of frames in the first limb element video segment. Furthermore, all images to be matched in the second limb element video segment are displayed on a preset video time axis according to a time sequence (namely, the acquisition time sequence of the images to be matched), the images to be matched on the preset video time axis are displayed according to a sequence from first to last in time, and then the first limb element video segment is used as a sliding matching window, and the matching sliding window is aligned with the starting frame of the second limb element video segment; recording a second limb element video segment which is the same as the first preset number of frames as a first matching sequence, namely the first matching sequence is an image sequence which takes the images to be matched of the starting frame of the second limb element video segment as the starting images and has the first preset number of images to be matched in the second limb element video segment.

S5033: moving the matching sliding window on the second limb element video segment to a direction far away from the starting frame by a preset frame number step length;

optionally, the preset frame number step may be determined according to the number of images to be matched in the second limb element video segment, and for example, when the number of images to be matched in the second limb element video segment is small, the preset frame number step may be set to 1; when the number of images to be matched in the second limb element video segment is large, the preset frame number compensation can be set to 3 and the like.

Specifically, after the first limb element video segment is used as a matching sliding window and the matching sliding window is aligned with the starting frame of the second limb element video segment, the second limb element video segment which is the same as the first preset number of frames is recorded as a first matching sequence, and the matching sliding window is moved on the second limb element video segment by a preset frame number step length in a direction far away from the starting frame.

S5034: adding the image to be matched with the preset frame number step length after the last image to be matched in the first matching sequence into the first matching sequence, and deleting the image to be matched with the preset frame number step length in the first matching sequence from the initial frame to obtain a second matching sequence;

specifically, after moving a matching sliding window on a second limb element video segment in a direction away from a starting frame by a preset frame number step length, adding an image to be matched with the preset frame number step length after the last image to be matched in a first matching sequence into the first matching sequence, deleting the image to be matched with the preset frame number step length from the image to be matched in the starting frame of the first matching sequence, and obtaining a second matching sequence, wherein the number of the images to be matched in the second matching sequence is equal to the first preset number of frames. Exemplarily, assuming that the step size of the preset frame number is 1, after the first matching sequence is obtained, the sliding matching window is moved backward by one frame, and at this time, the image to be matched of the second frame in the video segment of the second body element is aligned with the sliding matching window.

S5035: detecting whether the frame number of the image to be matched after a preset frame number step length after the last image to be matched in the second matching sequence is greater than or equal to the preset frame number step length;

s5036: and if the frame number of the image to be matched after the preset frame number step length after the last image to be matched in the second matching sequence is smaller than the preset frame number step length, recording the first matching sequence and the second matching sequence as the sliding matching sequence.

Specifically, after adding an image to be matched with a preset frame number step length after the last image to be matched in the first matching sequence into the first matching sequence and deleting the image to be matched with the preset frame number step length in the first matching sequence from the initial frame to obtain a second matching sequence, detecting whether the frame number of the image to be matched with the preset frame number step length after the last image to be matched in the second matching sequence is greater than or equal to the preset frame number step length; exemplarily, assuming that the preset frame number step is 2, after the second matching sequence is obtained, the sliding matching window needs to be moved two frame number steps back and forth on the second limb element video segment, and if the image to be matched of the second limb element video segment after the moving is 0 or 1, a third matching sequence matched with the sliding window cannot be generated; therefore, if the frame number of the image to be matched after the preset frame number step length after the last image to be matched in the second matching sequence is smaller than the preset frame number step length, a new matching sequence cannot be generated at this time, and the first matching sequence and the second matching sequence are directly recorded as the sliding matching sequence.

In an embodiment, after step S5035, the method further includes:

if the frame number of the image to be matched after the preset frame number step length after the last image to be matched in the second matching sequence is greater than or equal to the preset frame number step length, adding the image to be matched after the preset frame number step length after the last image to be matched in the second matching sequence into the second matching sequence, and deleting the image to be matched after the preset frame number step length in the second matching sequence from the initial frame to obtain a third matching sequence;

specifically, after detecting whether the frame number of the image to be matched after the preset frame number step length after the last image to be matched in the second matching sequence is greater than or equal to the preset frame number step length, if the frame number of the image to be matched after the preset frame number step length after the last image to be matched in the second matching sequence is greater than or equal to the preset frame number step length, adding the image to be matched after the preset frame number step length after the last image to be matched in the second matching sequence to the second matching sequence, and deleting the image to be matched after the preset frame number step length in the second matching sequence from the starting frame to obtain a third matching sequence.

Detecting whether the frame number of the image to be matched after a preset frame number step length after the last image to be matched in the third matching sequence is greater than or equal to the preset frame number step length;

and if the frame number of the image to be matched after the preset frame number step length after the last image to be matched in the third matching sequence is smaller than the preset frame number step length, recording the first matching sequence, the second matching sequence and the third matching sequence as the sliding matching sequence.

Specifically, after adding an image to be matched with a preset frame number step length after the last image to be matched in the second matching sequence into the second matching sequence and deleting the image to be matched with the preset frame number step length in the second matching sequence from the initial frame to obtain a third matching sequence, detecting whether the frame number of the image to be matched with the preset frame number step length after the last image to be matched in the third matching sequence is greater than or equal to the preset frame number step length; and if the frame number of the image to be matched after the preset frame number step length after the last image to be matched in the third matching sequence is smaller than the preset frame number step length, recording the first matching sequence, the second matching sequence and the third matching sequence as sliding matching sequences.

Further, if the frame number of the image to be matched after the preset frame number step length after the last image to be matched in the third matching sequence is greater than or equal to the preset frame number step length, continuing to move the sliding matching window to obtain a fourth matching sequence; further, after the fourth matching sequence, a fifth matching sequence, a sixth matching sequence, etc. may be further included, and the methods for determining the matching sequences are pointed out in the above description, and therefore, are not described herein again.

S504: determining sequence similarity scores between each sliding matching sequence and the video segments of the first limb elements corresponding to the sliding matching sequence through a preset similarity algorithm;

specifically, after the first limb element video segment is slide-matched on the second limb element video segment to determine a first slide-matching sequence corresponding to each of the first limb element video segments in the second limb element video segment, a sequence similarity score between each slide-matching sequence and the first limb element video segment corresponding thereto is determined by a preset similarity algorithm (e.g., a cosine similarity algorithm). The sequence similarity score represents the similarity between the sliding matching sequence and the first limb element video segment, the higher the sequence similarity score is, the higher the similarity between the sliding matching sequence and the first limb element video segment is, and otherwise, the lower the sequence similarity score is, the lower the similarity between the sliding matching sequence and the first limb element video segment is.

S505: and recording the maximum sequence similarity value corresponding to the same maximum limb element as the sequence matching value, and recording the sliding matching sequence corresponding to the sequence matching value as the matching index sequence.

It can be understood that a plurality of sliding matching sequences exist for one maximum limb element, and the sequence similarity scores corresponding to the sliding matching sequences, and thus only the maximum sequence similarity score corresponding to the same maximum limb element needs to be recorded as a sequence matching value, and the sliding matching sequence corresponding to the sequence matching value needs to be recorded as a matching index sequence.

S60: and determining a video matching result of the video to be matched according to the sequence matching value corresponding to each matching index sequence.

Specifically, after determining a plurality of matching index sequences matched with the key frame index sequence and sequence matching values corresponding to the matching index sequences in each to-be-matched limb sequence through a maximum mean value video segment identification method, determining a video matching result of the to-be-matched video according to the sequence matching values corresponding to the matching index sequences.

In one embodiment, step S60 includes:

acquiring the total number of the index sequences of the matched index sequences;

and determining the video matching result through an average algorithm according to the sequence matching value corresponding to each matching index sequence and the total number of the index sequences.

It is to be understood that the total number of index sequences is the total number of matching index sequences; specifically, after determining a plurality of matching index sequences in each to-be-matched limb sequence, which are matched with the key frame index sequence, and sequence matching values corresponding to the matching index sequences through a maximum mean video segment identification method, the total number of index sequences of the matching index sequences is obtained, and according to the sequence matching values and the total number of index sequences corresponding to the matching index sequences, the ratio between the total number of index sequences and the sum of sequence matching values corresponding to the matching index sequences is recorded as a video matching result. The video matching result represents the matching degree between the video to be matched and the standard video, and for example, if the video matching result is applied to a dance video scoring scene, the video matching result may be a dance scoring score of the video to be matched.

In the embodiment, the method for aligning two segments of videos is to combine an extreme value key frame identification method and a maximum mean value video segment identification method, and then to perform low-dimensional corresponding frame similarity calculation after finding the aligned segment through a maximum limb element and a minimum limb element; the accuracy rate of video matching is improved; the embodiment also introduces the preset limb classification model, when the video to be matched is not the normal matching video, the follow-up steps are not required to be executed to determine the video matching result, so that the computational complexity of a computer is reduced, and the efficiency and the accuracy of video matching are improved.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

In an embodiment, a video matching device based on limb identification is provided, and the video matching device based on limb identification is in one-to-one correspondence with the video matching method based on limb identification in the above embodiment. As shown in fig. 6, the video matching apparatus based on limb recognition includes a standard video acquisition module 10, a matching video judgment module 20, a gesture recognition module 30, a key frame recognition module 40, a sequence matching value determination module 50, and a video matching result determination module 60. The functional modules are explained in detail as follows:

the standard video acquiring module 10 is used for acquiring a video to be matched and a standard video corresponding to the video to be matched; the video to be matched comprises at least one frame of image to be matched; the standard video comprises at least one frame of standard image;

the matching video judging module 20 is configured to input the video to be matched to a preset limb classification model to determine whether the video to be matched is a normal matching video;

the gesture recognition module 30 is configured to, if the video to be matched is a normal matching video, respectively input the standard video and the video to be matched into a preset gesture recognition model, so as to obtain a standard limb sequence corresponding to each frame of standard image and a limb sequence to be matched corresponding to each frame of image to be matched;

a key frame identification module 40, configured to determine, according to each standard limb sequence, a key frame index sequence corresponding to the standard video by an extremum key frame identification method;

a sequence matching value determining module 50, configured to determine, by using a maximum mean video segment identification method, a plurality of matching index sequences that match the key frame index sequence in each to-be-matched limb sequence and a sequence matching value corresponding to the matching index sequences;

and a video matching result determining module 60, configured to determine a video matching result of the video to be matched according to the sequence matching value corresponding to each matching index sequence.

Preferably, the matching video determination module 20 includes:

the limb action recognition unit is used for determining whether each frame of image to be matched in the video to be matched contains a limb action or not through the preset limb classification model;

the image quantity acquiring unit is used for acquiring a first total quantity of images to be matched containing limb actions and a second total quantity of all images to be matched in the video to be matched;

and the image quantity comparison unit is used for determining that the video to be matched is a normal matching video when the ratio of the first total quantity to the second total quantity is greater than or equal to a preset quantity ratio.

Preferably, as shown in fig. 7, the key frame identification module 40 includes:

a limb element obtaining unit 401, configured to obtain standard limb elements in all the standard limb sequences, where each standard limb sequence includes a preset number of types of standard limb elements; and only one of each type of the standard limb elements in one of the standard limb sequences;

a limb element extracting unit 402, configured to extract a largest limb element and a smallest limb element in each type of standard limb elements;

a frame index sequence generating unit 403, configured to generate a maximum frame index sequence according to all extracted maximum limb elements and image frame tags corresponding to the maximum limb elements, and generate a minimum frame index sequence according to all extracted minimum limb elements and image frame tags corresponding to the minimum limb elements;

a sequence merging unit 404, configured to perform sequence merging processing on the maximum frame index sequence and the minimum frame index sequence to obtain the key frame index sequence.

Preferably, as shown in fig. 8, the sequence matching value determining module 50 includes:

a first video segment selecting unit 501, configured to select, from the standard video, a first limb element video segment corresponding to each maximum limb element according to an image frame tag corresponding to each maximum limb element; the first limb element video segment comprises a standard image of a first preset number of frames;

a second video segment selecting unit 502, configured to select, according to the image frame tag corresponding to each maximum limb element, a second limb element video segment corresponding to each maximum limb element from the video to be matched; the second limb element video segment comprises images to be matched of a second preset number of frames; the second preset number of frames is greater than the first preset number of frames;

a sliding matching unit 503, configured to perform sliding matching on the first limb element video segment on the second limb element video segment, so as to determine a sliding matching sequence corresponding to each of the first limb element video segments in the second limb element video segment;

a similarity matching unit 504, configured to determine, through a preset similarity algorithm, a sequence similarity score between each sliding matching sequence and the video segment of the first limb element corresponding to the sliding matching sequence;

a matching index sequence determining unit 505, configured to record the maximum sequence similarity value corresponding to the same maximum limb element as the sequence matching value, and record the sliding matching sequence corresponding to the sequence matching value as the matching index sequence.

Preferably, the sliding matching unit 503 includes:

a matching image display subunit 5031, configured to display, on a preset video time axis, each frame of images to be matched in the second limb element video segment according to a time sequence;

a first matching sequence recording subunit 5032, configured to take the first limb element video segment as a matching sliding window, align the matching sliding window with a starting frame of the second limb element video segment, and record the second limb element video segment that is the same as the first preset number of frames as a first matching sequence;

a window moving subunit 5033, configured to move the matching sliding window on the second limb element video segment by a preset frame number step toward a direction away from the starting frame;

a second matching sequence determining subunit 5034, configured to add the image to be matched with the preset frame number step length after the last image to be matched in the first matching sequence to the first matching sequence, and delete the image to be matched with the preset frame number step length in the first matching sequence from the start frame to obtain a second matching sequence;

a first image frame number detecting subunit 5035, configured to detect whether a frame number of an image to be matched after a preset frame number step after a last image to be matched in the second matching sequence is greater than or equal to the preset frame number step;

a first sliding matching sequence recording subunit 5036, configured to record the first matching sequence and the second matching sequence as the sliding matching sequence if the frame number of the to-be-matched image after a preset frame number step after the last to-be-matched image in the second matching sequence is smaller than the preset frame number step.

Preferably, the sliding matching unit 503 further includes:

a third matching sequence determining subunit, configured to, if a frame number of an image to be matched after a preset frame number step length after a last image to be matched in the second matching sequence is greater than or equal to the preset frame number step length, add an image to be matched after the preset frame number step length after the last image to be matched in the second matching sequence to the second matching sequence, and delete the image to be matched after the preset frame number step length in the second matching sequence from the start frame, to obtain a third matching sequence;

a second image frame number detection subunit, configured to detect whether a frame number of an image to be matched after a preset frame number step after a last image to be matched in the third matching sequence is greater than or equal to the preset frame number step;

a second sliding matching sequence recording subunit, configured to record the first matching sequence, the second matching sequence, and the third matching sequence as the sliding matching sequence if a frame number of the to-be-matched image after a preset frame number step length after a last to-be-matched image in the third matching sequence is smaller than the preset frame number step length.

Preferably, the video matching result determining module 60 includes:

an index sequence total number obtaining unit, configured to obtain an index sequence total number of the matching index sequences;

and the video matching result determining unit is used for determining the video matching result through an average algorithm according to the sequence matching value corresponding to each matching index sequence and the total number of the index sequences.

For specific definition of the video matching device based on limb identification, refer to the above definition of the video matching method based on limb identification, which is not described herein again. The modules in the video matching device based on limb identification can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 10. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing the data used in the video matching method based on limb identification in the above embodiment. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a limb recognition based video matching method.

In one embodiment, a computer device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the limb identification-based video matching method in the above embodiments is implemented.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, implements the limb recognition-based video matching method in the above-described embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A video matching method based on limb identification is characterized by comprising the following steps:

2. The limb identification-based video matching method according to claim 1, wherein the inputting the video to be matched to a preset limb classification model to determine whether the video to be matched is a normal matching video comprises:

3. The method for matching video based on limb identification according to claim 1, wherein the determining the key frame index sequence corresponding to the standard video by the extremum key frame identification method according to each of the standard limb sequences comprises:

acquiring standard limb elements in all the standard limb sequences, wherein each standard limb sequence comprises a preset number of types of standard limb elements, and each type of standard limb element in each standard limb sequence is only one;

extracting the maximum limb element and the minimum limb element in each type of standard limb elements;

generating a maximum frame index sequence according to all the extracted maximum limb elements and the image frame labels corresponding to the maximum limb elements, and simultaneously generating a minimum frame index sequence according to all the extracted minimum limb elements and the image frame labels corresponding to the minimum limb elements;

and carrying out sequence merging processing on the maximum frame index sequence and the minimum frame index sequence to obtain the key frame index sequence.

4. The limb identification-based video matching method according to claim 1, wherein the determining a plurality of matching index sequences matching the key frame index sequence and corresponding sequence matching values in each of the limb sequences to be matched by the maximum mean value video segment identification method comprises:

selecting a first limb element video segment corresponding to each maximum limb element from the standard video according to the image frame label corresponding to each maximum limb element; the first limb element video segment comprises a standard image of a first preset number of frames;

selecting a second limb element video segment corresponding to each maximum limb element from the video to be matched according to the image frame label corresponding to each maximum limb element; the second limb element video segment comprises images to be matched of a second preset number of frames; the second preset number of frames is greater than the first preset number of frames;

sliding the first limb element video segments on the second limb element video segments to determine a sliding matching sequence in the second limb element video segments corresponding to each of the first limb element video segments;

determining sequence similarity scores between each sliding matching sequence and the video segments of the first limb elements corresponding to the sliding matching sequence through a preset similarity algorithm;

and recording the maximum sequence similarity value corresponding to the same maximum limb element as the sequence matching value, and recording the sliding matching sequence corresponding to the sequence matching value as the matching index sequence.

5. The limb identification-based video matching method according to claim 4, wherein said sliding matching the first limb element video segment over the second limb element video segment to determine a sliding matching sequence in the second limb element video segment corresponding to each of the first limb element video segments comprises:

displaying the images to be matched in each frame of the second limb element video segment on a preset video time axis according to a time sequence;

taking the first limb element video segment as a matching sliding window, and recording the second limb element video segment which is the same as the first preset number of frames as a first matching sequence after aligning the matching sliding window with the starting frame of the second limb element video segment;

moving the matching sliding window on the second limb element video segment to a direction far away from the starting frame by a preset frame number step length;

adding the image to be matched with the preset frame number step length after the last image to be matched in the first matching sequence into the first matching sequence, and deleting the image to be matched with the preset frame number step length in the first matching sequence from the initial frame to obtain a second matching sequence;

detecting whether the frame number of the image to be matched after a preset frame number step length after the last image to be matched in the second matching sequence is greater than or equal to the preset frame number step length;

and if the frame number of the image to be matched after the preset frame number step length after the last image to be matched in the second matching sequence is smaller than the preset frame number step length, recording the first matching sequence and the second matching sequence as the sliding matching sequence.

6. The limb identification-based video matching method according to claim 5, wherein after detecting whether the image to be matched after the preset frame number step after the last image to be matched in the second matching sequence is greater than or equal to the preset frame number step, the method comprises:

7. The limb identification-based video matching method according to claim 1, wherein the determining the video matching result of the video to be matched according to the sequence matching value corresponding to each matching index sequence comprises:

8. A video matching device based on limb recognition, comprising:

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the limb recognition based video matching method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, implements the limb identification-based video matching method according to any one of claims 1 to 7.