CN112418153A - Image processing method, image processing device, electronic equipment and computer storage medium - Google Patents

Image processing method, image processing device, electronic equipment and computer storage medium Download PDF

Info

Publication number
CN112418153A
CN112418153A CN202011417021.5A CN202011417021A CN112418153A CN 112418153 A CN112418153 A CN 112418153A CN 202011417021 A CN202011417021 A CN 202011417021A CN 112418153 A CN112418153 A CN 112418153A
Authority
CN
China
Prior art keywords
video data
image
frame
feature vector
coordinate information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011417021.5A
Other languages
Chinese (zh)
Inventor
吴天行
张研
吴玉东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Sensetime Technology Development Co Ltd
Original Assignee
Shanghai Sensetime Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Sensetime Technology Development Co Ltd filed Critical Shanghai Sensetime Technology Development Co Ltd
Priority to CN202011417021.5A priority Critical patent/CN112418153A/en
Publication of CN112418153A publication Critical patent/CN112418153A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the disclosure provides an image processing method, an image processing device, an electronic device and a computer storage medium, wherein the method comprises the following steps: acquiring first video data and second video data; carrying out normalization processing on coordinates of human key points of each frame of image in the first video data to obtain normalized coordinate information of each frame of image in the first video data; normalizing the coordinates of the human key points of each frame of image in the second video data to obtain normalized coordinate information of each frame of image in the second video data; and determining the human body action similarity of the first video data and the second video data according to the normalized coordinate information of each frame of image in the first video data and the normalized coordinate information of each frame of image in the second video data.

Description

Image processing method, image processing device, electronic equipment and computer storage medium
Technical Field
The present disclosure relates to computer vision processing technologies, and in particular, to an image processing method and apparatus, an electronic device, and a computer storage medium.
Background
At present, human body action and gesture recognition becomes a hot research problem in multiple fields, and with the emergence of various human body gesture recognition algorithms, human body action and gesture recognition achieves certain research progress. The measurement of the human body action posture similarity is widely applied to the aspects of action learning, game fitness, man-machine interaction, virtual reality and the like; for example, in an online video teaching scene, comparison of motion posture similarity is important.
However, in the related art, the similarity of the human body motions in the video is calculated based on the human body joint angle, and the human body joint angle cannot intuitively and accurately reflect the standard degree of the motions, for example, although the clockwise included angle and the counterclockwise included angle have the same angle, the corresponding motions are different.
Disclosure of Invention
The embodiment of the disclosure is expected to provide a technical scheme of image processing, which can accurately obtain human motion similarity in different videos.
The embodiment of the present disclosure provides an image processing method, including:
acquiring first video data and second video data;
carrying out normalization processing on coordinates of human key points of each frame of image in the first video data to obtain normalized coordinate information of each frame of image in the first video data; normalizing the coordinates of the human key points of each frame of image in the second video data to obtain normalized coordinate information of each frame of image in the second video data;
and determining the human body action similarity of the first video data and the second video data according to the normalized coordinate information of each frame of image in the first video data and the normalized coordinate information of each frame of image in the second video data.
In some embodiments, the determining the similarity of the human body motions of the first video data and the second video data according to the normalized coordinate information of each frame of image in the first video data and the normalized coordinate information of each frame of image in the second video data includes:
merging the normalized coordinate information of each frame of image in the first video data to obtain the key point feature vector of each frame of image in the first video data; merging the normalized coordinate information of each frame of image in the second video data to obtain the key point feature vector of each frame of image in the second video data;
and determining the human body motion similarity of the first video data and the second video data according to the key point feature vector of each frame of image in the first video data and the key point feature vector of each frame of image in the second video data.
In some embodiments, the determining the similarity of human body actions of the first video data and the second video data according to the key point feature vector of each frame of image in the first video data and the key point feature vector of each frame of image in the second video data includes:
merging the key point feature vectors of each frame of image in the first video data to obtain a feature vector time sequence of the first video data;
merging the key point feature vectors of each frame of image in the second video data to obtain a feature vector time sequence of the second video data;
and determining the human body motion similarity of the first video data and the second video data according to the feature vector time sequence of the first video data and the feature vector time sequence of the second video data.
In some embodiments, the merging the feature vectors of the key points of each frame of image in the first video data to obtain the time series of feature vectors of the first video data includes:
normalizing the feature vectors of the key points of each frame of image in the first video data to obtain normalized feature vectors of each frame of image in the first video data, and merging the normalized feature vectors of each frame of image in the first video data to obtain a feature vector time sequence of the first video data;
the merging the feature vectors of the key points of each frame of image in the second video data to obtain the feature vector time sequence of the second video data includes:
and normalizing the feature vectors of the key points of the images in the second video data to obtain normalized feature vectors of the images in the second video data, and merging the normalized feature vectors of the images in the second video data to obtain a feature vector time sequence of the second video data.
In some embodiments, the determining the similarity of human body actions of the first video data and the second video data according to the feature vector time sequence of the first video data and the feature vector time sequence of the second video data includes:
determining the distance of the normalized feature vector of each corresponding frame image in the first video data and the second video data by adopting a Dynamic Time Warping (DTW) method according to the feature vector Time sequence of the first video data and the feature vector Time sequence of the second video data;
and determining the human body motion similarity of the first video data and the second video data according to the distance of the normalized feature vectors of the corresponding frame images in the first video data and the second video data.
In some embodiments, the determining the human motion similarity of the first video data and the second video data according to the distance of the normalized feature vector of each corresponding frame image in the first video data and the second video data includes:
determining similarity scoring values of each corresponding frame image in the first video data and the second video data according to the distance of the normalized feature vector of each corresponding frame image in the first video data and the second video data;
and determining the human body action similarity of the first video data and the second video data according to the similarity scoring values of the corresponding frame images in the first video data and the second video data.
In some embodiments, the determining the human motion similarity of the first video data and the second video data according to the similarity score values of the corresponding frame images in the first video data and the second video data includes:
and determining the human body action similarity of the first video data and the second video data according to the average value of the similarity scoring values of the corresponding frame images in the first video data and the second video data.
In some embodiments, the obtaining the first video data and the second video data comprises:
acquiring first initial video data and second initial video data to be compared;
and preprocessing the first initial video data and the second initial video data to obtain the first video data and the second video data with the same frame number.
In some embodiments, the normalizing the coordinates of the human key points of each frame of image in the first video data to obtain normalized coordinate information of each frame of image in the first video data includes:
carrying out noise reduction smoothing processing on the coordinates of the human key points of each frame of image in the first video data, and carrying out normalization processing on the coordinates of the human key points after the noise reduction smoothing processing corresponding to the first video data to obtain normalized coordinate information of each frame of image in the first video data;
the normalizing the coordinates of the key points of the human body of each frame of image in the second video data to obtain normalized coordinate information of each frame of image in the second video data includes:
and performing noise reduction smoothing processing on the coordinates of the human key points of each frame of image in the second video data, and performing normalization processing on the coordinates of the human key points after the noise reduction smoothing processing corresponding to the second video data to obtain normalized coordinate information of each frame of image in the second video data.
An embodiment of the present disclosure further provides an image processing apparatus, including:
the acquisition module is used for acquiring first video data and second video data;
the first processing module is used for carrying out normalization processing on the coordinates of the human key points of each frame of image in the first video data to obtain normalized coordinate information of each frame of image in the first video data; normalizing the coordinates of the human key points of each frame of image in the second video data to obtain normalized coordinate information of each frame of image in the second video data;
and the second processing module is used for determining the human body action similarity of the first video data and the second video data according to the normalized coordinate information of each frame of image in the first video data and the normalized coordinate information of each frame of image in the second video data.
In some embodiments, the second processing module is configured to determine human motion similarity of the first video data and the second video data according to the normalized coordinate information of each frame of image in the first video data and the normalized coordinate information of each frame of image in the second video data, and includes:
merging the normalized coordinate information of each frame of image in the first video data to obtain the key point feature vector of each frame of image in the first video data; merging the normalized coordinate information of each frame of image in the second video data to obtain the key point feature vector of each frame of image in the second video data;
and determining the human body motion similarity of the first video data and the second video data according to the key point feature vector of each frame of image in the first video data and the key point feature vector of each frame of image in the second video data.
In some embodiments, the second processing module is configured to determine human motion similarity of the first video data and the second video data according to the key point feature vector of each frame of image in the first video data and the key point feature vector of each frame of image in the second video data, and includes:
merging the key point feature vectors of each frame of image in the first video data to obtain a feature vector time sequence of the first video data;
merging the key point feature vectors of each frame of image in the second video data to obtain a feature vector time sequence of the second video data;
and determining the human body motion similarity of the first video data and the second video data according to the feature vector time sequence of the first video data and the feature vector time sequence of the second video data.
In some embodiments, the second processing module is configured to combine the feature vectors of the key points of each frame of image in the first video data to obtain the feature vector time sequence of the first video data, and includes:
normalizing the feature vectors of the key points of each frame of image in the first video data to obtain normalized feature vectors of each frame of image in the first video data, and merging the normalized feature vectors of each frame of image in the first video data to obtain a feature vector time sequence of the first video data;
the second processing module is configured to merge the feature vectors of the key points of each frame of image in the second video data to obtain a feature vector time sequence of the second video data, and includes:
and normalizing the feature vectors of the key points of the images in the second video data to obtain normalized feature vectors of the images in the second video data, and merging the normalized feature vectors of the images in the second video data to obtain a feature vector time sequence of the second video data.
In some embodiments, the second processing module is configured to determine human motion similarity of the first video data and the second video data according to the feature vector time series of the first video data and the feature vector time series of the second video data, and includes:
determining the distance of the normalized feature vector of each corresponding frame image in the first video data and the second video data by adopting a DTW (dynamic time warping) method according to the feature vector time sequence of the first video data and the feature vector time sequence of the second video data;
and determining the human body motion similarity of the first video data and the second video data according to the distance of the normalized feature vectors of the corresponding frame images in the first video data and the second video data.
In some embodiments, the second processing module is configured to determine human motion similarity of the first video data and the second video data according to a distance between normalized feature vectors of corresponding frame images in the first video data and the second video data, and includes:
determining similarity scoring values of each corresponding frame image in the first video data and the second video data according to the distance of the normalized feature vector of each corresponding frame image in the first video data and the second video data;
and determining the human body action similarity of the first video data and the second video data according to the similarity scoring values of the corresponding frame images in the first video data and the second video data.
In some embodiments, the second processing module is configured to determine human motion similarity of the first video data and the second video data according to similarity score values of corresponding frame images in the first video data and the second video data, and includes:
and determining the human body action similarity of the first video data and the second video data according to the average value of the similarity scoring values of the corresponding frame images in the first video data and the second video data.
In some embodiments, the obtaining module is configured to obtain the first video data and the second video data, and includes:
acquiring first initial video data and second initial video data to be compared;
and preprocessing the first initial video data and the second initial video data to obtain the first video data and the second video data with the same frame number.
In some embodiments, the first processing module is configured to perform normalization processing on coordinates of a human body key point of each frame of image in the first video data to obtain normalized coordinate information of each frame of image in the first video data, and the normalization processing module includes:
carrying out noise reduction smoothing processing on the coordinates of the human key points of each frame of image in the first video data, and carrying out normalization processing on the coordinates of the human key points after the noise reduction smoothing processing corresponding to the first video data to obtain normalized coordinate information of each frame of image in the first video data;
the first processing module is configured to perform normalization processing on coordinates of human key points of each frame of image in the second video data to obtain normalized coordinate information of each frame of image in the second video data, and includes:
and performing noise reduction smoothing processing on the coordinates of the human key points of each frame of image in the second video data, and performing normalization processing on the coordinates of the human key points after the noise reduction smoothing processing corresponding to the second video data to obtain normalized coordinate information of each frame of image in the second video data.
The disclosed embodiments also provide an electronic device comprising a processor and a memory for storing a computer program capable of running on the processor; wherein the content of the first and second substances,
the processor is configured to run the computer program to perform any one of the image processing methods described above.
The disclosed embodiments also provide a computer storage medium having a computer program stored thereon, which when executed by a processor implements any of the image processing methods described above.
In the image processing method, the image processing device, the electronic equipment and the computer storage medium provided by the embodiment of the disclosure, first video data and second video data are acquired; carrying out normalization processing on coordinates of human key points of each frame of image in the first video data to obtain normalized coordinate information of each frame of image in the first video data; normalizing the coordinates of the human key points of each frame of image in the second video data to obtain normalized coordinate information of each frame of image in the second video data; the normalized coordinate information represents the normalized coordinates of the human body key points; and determining the human body action similarity of the first video data and the second video data according to the normalized coordinate information of each frame of image in the first video data and the normalized coordinate information of each frame of image in the second video data.
It can be seen that, in the embodiment of the present disclosure, the coordinates of the human body key points of each frame of image in the video data can be normalized, and then, the comparison of the human body motion similarity in the video data can be performed based on the normalized coordinate information of each frame of image in the video data; the normalization processing is carried out on the coordinates of the human key points of each frame of image in the video data, so that the influence caused by different factors such as background, human position and scale in the video can be eliminated, and further, the human action similarity of different video data can be visually and accurately compared.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.
FIG. 1 is a flow chart of an image processing method of an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of an application scenario of an embodiment of the present disclosure;
fig. 3A is a frame image of first initial video data according to an embodiment of the disclosure;
fig. 3B is a frame of image in the second initial video data according to the embodiment of the disclosure.
FIG. 4 is a schematic diagram of a component structure of an image processing apparatus according to an embodiment of the disclosure;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
The present disclosure will be described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the examples provided herein are merely illustrative of the present disclosure and are not intended to limit the present disclosure. In addition, the embodiments provided below are some embodiments for implementing the disclosure, not all embodiments for implementing the disclosure, and the technical solutions described in the embodiments of the disclosure may be implemented in any combination without conflict.
It should be noted that, in the embodiments of the present disclosure, the terms "comprises," "comprising," or any other variation thereof are intended to cover a non-exclusive inclusion, so that a method or apparatus including a series of elements includes not only the explicitly recited elements but also other elements not explicitly listed or inherent to the method or apparatus. Without further limitation, the use of the phrase "including a. -. said." does not exclude the presence of other elements (e.g., steps in a method or elements in a device, such as portions of circuitry, processors, programs, software, etc.) in the method or device in which the element is included.
For example, the image processing method provided by the embodiment of the present disclosure includes a series of steps, but the image processing method provided by the embodiment of the present disclosure is not limited to the described steps, and similarly, the image processing apparatus provided by the embodiment of the present disclosure includes a series of modules, but the apparatus provided by the embodiment of the present disclosure is not limited to include the explicitly described modules, and may also include modules that are required to be configured to acquire related information or perform processing based on the information.
The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.
The disclosed embodiments may be implemented in computer systems comprising terminals and/or servers and may be operational with numerous other general purpose or special purpose computing system environments or configurations. Here, the terminal may be a thin client, a thick client, a hand-held or laptop device, a microprocessor-based system, a set-top box, a programmable consumer electronics, a network personal computer, a small computer system, etc., and the server may be a server computer system, a small computer system, a mainframe computer system, a distributed cloud computing environment including any of the above, etc.
The electronic devices of the terminal, server, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.
In the related art, due to the limitation of scene conditions, most of videos have a single shooting angle, so that the shot videos are planar two-dimensional video data, and motions in the videos are likely not completely synchronous with standard motions, and therefore, the similarity of human posture motions needs to be calculated for the video data of different shooting angles.
The calculation of the human motion similarity in different video data can be realized based on human joint angles and a DTW (dynamic time warping) method, and in some embodiments, the human joint angles can be used as a comparison standard to eliminate the interference of video backgrounds and the influence of different human body sizes and positions; then, the sequence of the joint angles is processed by an exponential smoothing method, each sequence of the video data (namely, a plurality of frames of images arranged according to the time sequence) is divided by using the angle difference, and finally, the distance between the sequences of different video data is obtained by using a DTW mode.
The method for calculating the human body motion similarity has the following problems:
1) the human joint angle is used as a comparison standard, however, the human joint angle cannot sufficiently and reliably reflect the standard degree of the motion, so that the finally obtained human motion similarity is not accurate enough; moreover, the DTW algorithm determines the distance based on the shortest path of the sequence of different video data, and there may be a problem that the human body motion discrimination of different video data is low, for example, two completely different human body motions may also be recognized as human body motions with higher similarity.
2) However, since the motion range of each person is different, it is difficult to determine the general segmentation standard of the video sequence for different human bodies, which results in different video frames for two video data to be compared, and is not favorable for comparing the motion similarity of human bodies in the video data.
In view of the above technical problems, in some embodiments of the present disclosure, an image processing method is provided, and embodiments of the present disclosure may be applied to scenes such as human motion and gesture recognition.
Fig. 1 is a flowchart of an image processing method according to an embodiment of the disclosure, and as shown in fig. 1, the flowchart may include:
step 101: first video data and second video data are acquired.
Here, the first video data and the second video data represent two video data that need to be compared; the first video data and the second video data each include a plurality of frames of human body images.
In some embodiments, the first video data or the second video data may be acquired by an image acquisition device such as a camera, or may be video data acquired from a network side or a local storage space; in other embodiments, the initial video data acquired by the image acquisition device may be acquired first, or the initial video data may be acquired from a network side, or the initial video data may be acquired from a local storage space, and then the initial video data may be preprocessed to obtain the first video data or the second video data.
In some embodiments, the number of frames of the first video data and the second video data may be the same or different, and the embodiments of the present disclosure are not limited thereto.
Step 102: carrying out normalization processing on coordinates of human key points of each frame of image in the first video data to obtain normalized coordinate information of each frame of image in the first video data; and carrying out normalization processing on the coordinates of the human key points of each frame of image in the second video data to obtain the normalized coordinate information of each frame of image in the second video data.
In the embodiment of the disclosure, the human key point of each frame of image is at least one human key point which is specified in advance; in practical application, at least one human body key point can be specified from all human body key points according to practical requirements.
In some embodiments, the human body key points and the sequence number of each frame of image can be expressed as: {0, "Nose (Nose)" }, {1, "Neck (cock)" }, {2, "right shoulder (RShoulder)" }, {3, "right elbow (relabw)" }, {4, "right wrist (RWrist)" }, {5, "left shoulder (LShoulder)" }, {6, "left elbow (lelnow)" }, {7, "left wrist (LWrist)" }, {8, "hip middle (MidHip)" }, {9, "right hip (RHip)" }, {10, "right knee (rkne)" }, {11, "right ankle (Rankle)" }, {12, left hip (LHip) "}, {13," left knee (LKnee) "}, {14," left ankle (LAnkle) ", where 0 to 14 represent the sequence numbers of respective key points of the human body.
In some embodiments, the human body key points and the sequence number of each frame of image can be expressed as: {15, "right eye (Reye)" }, {16, "left eye (leeye)" }, {17, "right ear (real)" }, {18, "left ear (bear)" }, {19, "left big toe (LBigToe)" }, {20, "left small toe (LSmallToe)" }, {21, left heel (LHeel) "}, {22," right big toe (RBigToe) "}, {23," right small toe (RSmallToe) "}, {24," right heel (RHeel) "}, wherein 15 to 24 denote the sequence numbers of the respective human body key points.
In the embodiment of the disclosure, after the first video data and the second video data are obtained, each frame of image in the first video data and the second video data can be processed by adopting a human key point identification algorithm to obtain coordinates of human key points of each frame of image in the first video data and coordinates of human key points of each frame of image in the second video data; here, the human body key point recognition algorithm may be implemented using a neural network.
In some embodiments, after obtaining the coordinates of the key points of the human body of each frame of image in the first video data and the second video data, the noise reduction smoothing processing may be performed on the coordinates of the key points of the human body of each frame of image in the first video data and the second video data, and the normalization processing may be performed on the coordinates of the key points of the human body after the noise reduction smoothing processing corresponding to the first video data, so as to obtain normalized coordinate information of each frame of image in the first video data; and normalizing the coordinates of the key points of the human body after the noise reduction smoothing processing corresponding to the second video data to obtain normalized coordinate information of each frame of image in the second video data.
In the embodiment of the present disclosure, the principle of performing noise reduction smoothing processing on the coordinates of the human body key points of each frame of image in the video data is as follows: correcting the coordinates of the human body key points according to the coordinates of the pixel points in the neighborhood of the human body key points, for example, taking the average value of the coordinates of the pixel points in the neighborhood as the coordinates of the human body key points after noise reduction and smoothing treatment; it can be understood that, by performing noise reduction smoothing processing on the coordinates of the human body key points of each frame of image in the video data, inaccurate coordinates (i.e., noise data) of the human body key points can be corrected to a certain extent, so as to achieve the purpose of noise reduction. Illustratively, the method of the noise reduction smoothing process includes, but is not limited to, exponential smoothing, mean filtering, kalman filtering, and the like.
In some embodiments, the normalized coordinate information represents normalized coordinates of the human body key points, and for example, the coordinates of the human body key points of each frame of image in the first video data or the second video data may be normalized based on the following formulas (1) and (2):
Figure BDA0002818986440000121
Figure BDA0002818986440000122
wherein x isiAbscissa, x, representing ith individual body key point of each frame image in first video data or second video datai' denotes the abscissa of the ith individual key point after normalization, at xiIn the case of the abscissa representing the ith individual body key point of each frame image in the first video data, min { x }iDenotes the minimum value of the abscissa of each human key point in the first video data, max { x }iExpressing the maximum value of the abscissa of each human body key point in the first video data; at xiIn the case of the abscissa of the ith individual body key point representing each frame image in the second video data, min { x }iDenotes the minimum value of the abscissa of each human body key point in the second video data, max { x }iAnd represents the maximum value of the abscissa of each human key point in the second video data. y isiOrdinate, y, of i-th individual body key point representing each frame image in first video data or second video datai' represents the ordinate of the ith personal volume key point of each frame of image in the first video data or the second video data after normalization processing; at yiIn the case of the ordinate of the ith individual body key point representing each frame image in the first video data, min { y }iDenotes the minimum value of the ordinate of each human body key point in the first video data, max { y }iExpressing the maximum value of the ordinate of each human body key point in the first video data; at yiRepresents the secondIn the case of the ordinate of the ith individual body key point of each frame image in the video data, min { y }iDenotes the minimum value of the ordinate of each human body key point in the second video data, max { y }iExpressing the maximum value of the ordinate of each human body key point in the second video data; i represents an integer greater than or equal to 1.
It can be understood that, based on the above formula (1) and formula (2), the coordinates of the human body key points of each frame of image in the first video data or the second video data may be subjected to translation and scaling processing, so that the ranges of the abscissa and the ordinate of each human body key point after normalization processing are unified to [0,1 ].
Step 103: and determining the human body action similarity of the first video data and the second video data according to the normalized coordinate information of each frame of image in the first video data and the normalized coordinate information of each frame of image in the second video data.
Fig. 2 is a schematic view of an application scenario of the embodiment of the disclosure, as shown in fig. 2, first video data 201 and second video data 202 may be input to an image processing apparatus 203, and the image processing apparatus 203 may perform processing by the image processing method described in the foregoing embodiment, so as to obtain human motion similarity of the first video data and the second video data.
In practical applications, the steps 101 to 103 may be implemented by a Processor in an electronic Device, where the Processor may be at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a Central Processing Unit (CPU), a controller, a microcontroller, and a microprocessor.
It can be seen that, in the embodiment of the present disclosure, the coordinates of the human body key points of each frame of image in the video data can be normalized, and then, the comparison of the human body motion similarity in the video data can be performed based on the normalized coordinate information of each frame of image in the video data; the normalization processing is carried out on the coordinates of the human key points of each frame of image in the video data, so that the influence caused by different factors such as background, human position and scale in the video can be eliminated, and further, the human action similarity of different video data can be visually and accurately compared.
In some embodiments, the obtaining of the first video data and the second video data may be implemented by obtaining first initial video data and second initial video data to be compared; and preprocessing the first initial video data and the second initial video data to obtain the first video data and the second video data with the same frame number.
In one embodiment, the first initial video data is video data acquired from an image acquisition device, a network terminal or a local storage space, and the second initial video data is video data which is acquired in advance and contains standard actions of a human body; for example, the first initial video data is used to represent the tai chi boxing action to be compared, and the second initial video data represents the standard tai chi boxing action.
In an embodiment, the first initial video data and the second initial video data may be preprocessed by performing frame dropping on the first initial video data and the second initial video data to obtain the first video data and the second video data with the same number of frames.
In the embodiment of the disclosure, the first video data and the second video data with the same frame number can be obtained by preprocessing the first initial video data and the second initial video data; on the basis of obtaining the first video data and the second video data with the same frame number, the human body motion similarity of the first video data and the second video data is favorably compared.
In the embodiment of the present disclosure, as for the implementation manner of step 103, exemplarily, normalized coordinate information of each frame of image in the first video data may be merged to obtain a feature vector of a key point of each frame of image in the first video data; and combining the normalized coordinate information of each frame of image in the second video data to obtain the key point feature vector of each frame of image in the second video data.
And determining the human body motion similarity of the first video data and the second video data according to the key point feature vector of each frame of image in the first video data and the key point feature vector of each frame of image in the second video data.
In some embodiments, after obtaining the normalized coordinate information of each frame of image in the first video data and the second video data, the keypoint feature vector of each frame of image in the first video data or the second video data may be obtained according to formula (3):
v=(x1',y1'....xn',yn') (3)
where v represents a keypoint feature vector of each frame of image in the first video data or the second video data, x1' to xn' normalized abscissa, y, of the 1 st through nth human body key points of each frame of image in the first or second video data, respectively1' to yn' respectively represents the normalized vertical coordinates of the 1 st to nth human body key points of each frame of image in the first video data or the second video data, wherein n represents the number of the pre-designated human body key points.
In the embodiment of the present disclosure, the normalized coordinate information of different frame images may be combined according to the same sequence, so that in the obtained keypoint feature vectors of different frame images, elements at the same position represent the same meaning, for example, in the keypoint feature vectors of different frame images, the 3 rd element represents the coordinate of the right elbow keypoint of the human body.
For determining an implementation manner of human motion similarity of first video data and second video data according to a key point feature vector of each frame of image in the first video data and a key point feature vector of each frame of image in the second video data, in one example, the key point feature vectors of each frame of image in the first video data may be merged to obtain a feature vector time sequence of the first video data; merging the key point feature vectors of each frame of image in the second video data to obtain a feature vector time sequence of the second video data; and determining the human body motion similarity of the first video data and the second video data according to the feature vector time sequence of the first video data and the feature vector time sequence of the second video data.
Here, the feature vector time series of the first video data represents a plurality of key point feature vectors arranged in time series; the feature vector time series of the second video data represents a plurality of keypoint feature vectors arranged in a temporal order.
In the embodiment of the disclosure, the first video data and the second video data both include multi-frame data, and after the key point feature vectors of each frame of image in the first video data or the second video data are obtained, the key point feature vectors of each frame of image in the first video data or the second video data can be spliced in a time dimension to obtain a feature vector time sequence of the first video data or a feature vector time sequence of the second video data; in some embodiments, the feature vector time-series of the first video data or the feature vector time-series of the second video data may be a matrix.
It can be understood that, on the basis of obtaining the feature vector time sequence of the first video data or the feature vector time sequence of the second video data, the feature vector time sequence of the first video data or the feature vector time sequence of the second video data is favorably aligned in a time dimension, and actions and motions at different time points in the two video data are favorably aligned, so that the similarity of human body motions in the first video data and the second video data is favorably and accurately obtained.
For the implementation manner in which the feature vectors of the key points of each frame of image in the first video data are merged to obtain the feature vector time sequence of the first video data, for example, the feature vectors of the key points of each frame of image in the first video data may be normalized to obtain the normalized feature vectors of each frame of image in the first video data, and the normalized feature vectors of each frame of image in the first video data are merged to obtain the feature vector time sequence of the first video data.
For the implementation manner in which the feature vectors of the key points of each frame of image in the second video data are merged to obtain the feature vector time sequence of the second video data, for example, the feature vectors of the key points of each frame of image in the second video data may be normalized to obtain the normalized feature vectors of each frame of image in the second video data, and the normalized feature vectors of each frame of image in the second video data are merged to obtain the feature vector time sequence of the second video data.
In some embodiments, the keypoint feature vectors of each frame of image in the first video data or the second video data may be subjected to an L2 normalization process according to the following formula (4)
Figure BDA0002818986440000161
Where v' represents a normalized feature vector of each frame of image in the first video data or the second video data.
In some embodiments of the present application, after obtaining the feature vector time sequence of the first video data and the feature vector time sequence of the second video data, the distance between the normalized feature vectors of each corresponding frame image in the first video data and the second video data may be determined according to the feature vector time sequence of the first video data and the feature vector time sequence of the second video data by using a DTW method; and then, determining the human body motion similarity of the first video data and the second video data according to the distance of the normalized feature vectors of the corresponding frame images in the first video data and the second video data.
In some embodiments, each corresponding frame image in the first video data and the second video data represents: each pair of images with the same frame sequence number in the first video data and the second video data; in the case that the number of frames of the first video data and the second video data is the same, the distances of the normalized feature vectors of the images with the same frame number in the first video data and the second video data can be directly compared, where the frame number indicates the position of the video data frame, for example, the first video data and the second video data both include m frames of images, the frame number of the jth frame in the first video data and the second video data is j, j is 1 to m, and m is an integer greater than 1.
In other embodiments, when the number of frames of the first video data and the second video data is different, based on the DTW method, the video data with a smaller number of frames may be linearly mapped by using a linear expansion method, so that the number of frames of the video data after linear mapping reaches a target number of frames, where the target number of frames represents the larger number of frames of the first video data and the second video data; when the number of frames of the two pieces of video data is the same, according to the content described in the foregoing embodiment, for the two pieces of video data with the same number of frames, the steps of performing coordinate normalization processing, merging normalized coordinate information, performing normalization processing on the feature vectors of the key points, merging normalized feature vectors, calculating the distance between the normalized feature vectors, and the like are performed, so as to determine the distance between the normalized feature vectors of each corresponding frame image in the two pieces of video data with the same number of frames; here, the distance of the normalized feature vector of each corresponding frame image in two video data of the same frame number is: the distances of the normalized feature vectors of each corresponding frame image in the first video data and the second video data.
In some embodiments, the distance between the normalized feature vectors of each corresponding frame image in the first video data and the second video data is a cosine similarity d.
It can be seen that the DTW method can be used to dynamically compare the motion of two video data, that is, the normalized feature vectors of each corresponding frame image can be compared by combining the whole feature vector time sequence, which is beneficial to accurately obtain the human motion similarity of different video data compared with the scheme of performing static frame-by-frame comparison according to the time frame number; in addition, the embodiment of the disclosure does not need to segment each video data sequence based on the angle difference, but can obtain the feature vector time sequence capable of more accurately distinguishing the similarity of different human bodies according to the normalized coordinate information of each frame of image in the video data, and further, the DTW method is adopted to compare the normalized feature vectors of each corresponding frame of image in different video data, which is beneficial to accurately comparing the motion similarity of human bodies in the video data.
For an implementation of determining human motion similarity of the first video data and the second video data according to the distance of the normalized feature vector of each corresponding frame image in the first video data and the second video data, for example, the similarity score value of each corresponding frame image in the first video data and the second video data may be determined according to the distance of the normalized feature vector of each corresponding frame image in the first video data and the second video data; and determining the human body action similarity of the first video data and the second video data according to the similarity grade values of the corresponding frame images in the first video data and the second video data.
In some embodiments, the human motion similarity of the first video data and the second video data may be determined according to an average value of the similarity score values of each corresponding frame image in the first video data and the second video data.
In some embodiments, after obtaining the distance of the normalized feature vector of each corresponding frame image in the first video data and the second video data, the distance of the normalized feature vector of each corresponding frame image in the first video data and the second video data may be obtained according to a preset scoring mode.
In some embodiments, the similarity score values of the respective corresponding frame images in the first video data and the second video data, which are derived based on the preset scoring mode, range from [0,100 ].
In some embodiments, the predetermined scoring manner may be described according to the following equation (5).
Figure BDA0002818986440000181
Wherein s represents the similarity score value of the corresponding frame image in the first video data and the second video data; it can be seen that, based on the above equation (5), the distance of the normalized feature vector of each corresponding frame image in the first video data and the second video data can be mapped to a score value of 0 to 100, i.e., a static pose score is derived.
It should be noted that the above-mentioned contents are merely exemplary illustrations of preset scoring manners, and the embodiments of the present disclosure are not limited thereto.
In some embodiments, an average value of similarity score values of each corresponding frame image in the first video data and the second video data may be denoted as s _ mean, and the human motion similarity of the first video data and the second video data may be determined as s _ mean; in this way, the similarity of human body actions in the first video data and the second video data can be visually embodied by using the average value s _ mean of the similarity score values of the corresponding frame images in the first video data and the second video data.
Fig. 3A is a frame of image in first video data according to an embodiment of the disclosure; FIG. 3B is a block diagram of a second video frame according to an embodiment of the disclosure; fig. 3A and 3B show different human images, fig. 3A showing a tai chi boxing action to be compared, and fig. 3B showing a standard tai chi boxing action; in one example, the average value of the similarity score values of the respective corresponding frame images in the first video data and the second video data is 95.6, and the similarity score value of the corresponding frame image shown in fig. 3A and 3B is 95.6.
The embodiment of the disclosure can be applied to scenes such as human body action comparison and the like; in some embodiments, a user may record a motion of the user into a video, and input the video together with a standard video, and then based on the image processing method of the embodiments of the present disclosure, a similarity between the motion of the user and the motion in the standard video may be obtained, and a similarity score value of the motion of the user and the motion in the standard video in each corresponding frame may be determined, so that a portion of the motion of the user that is not in accordance with the motion of the standard video may be obtained.
In some embodiments, a user can shoot own actions through a camera or other equipment, then perform real-time operation on the actions of the standard video, determine the similarity score values of the own actions and the actions in the standard video in each corresponding frame, and display the similarity score values of the own actions and the actions in the standard video in each corresponding frame on a display screen in real time.
It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.
On the basis of the image processing method proposed by the foregoing embodiment, an embodiment of the present disclosure proposes an image processing apparatus.
Fig. 4 is a schematic diagram of a composition structure of an image processing apparatus according to an embodiment of the disclosure, and as shown in fig. 4, the apparatus may include:
an obtaining module 400, configured to obtain first video data and second video data;
a first processing module 401, configured to perform normalization processing on coordinates of human key points of each frame of image in the first video data to obtain normalized coordinate information of each frame of image in the first video data; normalizing the coordinates of the human key points of each frame of image in the second video data to obtain normalized coordinate information of each frame of image in the second video data;
a second processing module 402, configured to determine human motion similarity between the first video data and the second video data according to the normalized coordinate information of each frame of image in the first video data and the normalized coordinate information of each frame of image in the second video data.
In some embodiments, the second processing module 402, configured to determine the human motion similarity of the first video data and the second video data according to the normalized coordinate information of each frame of image in the first video data and the normalized coordinate information of each frame of image in the second video data, includes:
merging the normalized coordinate information of each frame of image in the first video data to obtain the key point feature vector of each frame of image in the first video data; merging the normalized coordinate information of each frame of image in the second video data to obtain the key point feature vector of each frame of image in the second video data;
and determining the human body motion similarity of the first video data and the second video data according to the key point feature vector of each frame of image in the first video data and the key point feature vector of each frame of image in the second video data.
In some embodiments, the second processing module 402 is configured to combine the feature vectors of the key points of each frame of image in the first video data to obtain the feature vector time sequence of the first video data, and includes:
normalizing the feature vectors of the key points of each frame of image in the first video data to obtain normalized feature vectors of each frame of image in the first video data, and merging the normalized feature vectors of each frame of image in the first video data to obtain a feature vector time sequence of the first video data;
the second processing module 402 is configured to merge the feature vectors of the key points of each frame of image in the second video data to obtain a feature vector time sequence of the second video data, and includes:
and normalizing the feature vectors of the key points of the images in the second video data to obtain normalized feature vectors of the images in the second video data, and merging the normalized feature vectors of the images in the second video data to obtain a feature vector time sequence of the second video data.
In some embodiments, the second processing module 402 is configured to determine human motion similarity of the first video data and the second video data according to the key point feature vector of each frame of image in the first video data and the key point feature vector of each frame of image in the second video data, and includes:
normalizing the feature vectors of the key points of each frame of image in the first video data to obtain normalized feature vectors of each frame of image in the first video data, and merging the normalized feature vectors of each frame of image in the first video data to obtain a feature vector time sequence of the first video data;
normalizing the feature vectors of the key points of each frame of image in the second video data to obtain normalized feature vectors of each frame of image in the second video data, and merging the normalized feature vectors of each frame of image in the second video data to obtain a feature vector time sequence of the second video data;
and determining the human body motion similarity of the first video data and the second video data according to the feature vector time sequence of the first video data and the feature vector time sequence of the second video data.
In some embodiments, the second processing module 402, configured to determine human motion similarity of the first video data and the second video data according to the feature vector time series of the first video data and the feature vector time series of the second video data, includes:
determining the distance of the normalized feature vector of each corresponding frame image in the first video data and the second video data by adopting a DTW (dynamic time warping) method according to the feature vector time sequence of the first video data and the feature vector time sequence of the second video data;
and determining the human body motion similarity of the first video data and the second video data according to the distance of the normalized feature vectors of the corresponding frame images in the first video data and the second video data.
In some embodiments, the second processing module 402 is configured to determine human motion similarity of the first video data and the second video data according to a distance between normalized feature vectors of corresponding frame images in the first video data and the second video data, and includes:
determining similarity scoring values of each corresponding frame image in the first video data and the second video data according to the distance of the normalized feature vector of each corresponding frame image in the first video data and the second video data;
and determining the human body action similarity of the first video data and the second video data according to the similarity scoring values of the corresponding frame images in the first video data and the second video data.
In some embodiments, the second processing module 402 is configured to determine human motion similarity of the first video data and the second video data according to similarity score values of corresponding frame images in the first video data and the second video data, and includes:
and determining the human body action similarity of the first video data and the second video data according to the average value of the similarity scoring values of the corresponding frame images in the first video data and the second video data.
In some embodiments, the obtaining module 400 is configured to obtain the first video data and the second video data, and includes:
acquiring first initial video data and second initial video data to be compared;
and preprocessing the first initial video data and the second initial video data to obtain the first video data and the second video data with the same frame number.
In some embodiments, the first processing module 401 is configured to perform normalization processing on coordinates of a human body key point of each frame of image in the first video data to obtain normalized coordinate information of each frame of image in the first video data, and includes:
carrying out noise reduction smoothing processing on the coordinates of the human key points of each frame of image in the first video data, and carrying out normalization processing on the coordinates of the human key points after the noise reduction smoothing processing corresponding to the first video data to obtain normalized coordinate information of each frame of image in the first video data;
the first processing module 401 is configured to perform normalization processing on coordinates of key points of a human body of each frame of image in the second video data to obtain normalized coordinate information of each frame of image in the second video data, and includes:
and performing noise reduction smoothing processing on the coordinates of the human key points of each frame of image in the second video data, and performing normalization processing on the coordinates of the human key points after the noise reduction smoothing processing corresponding to the second video data to obtain normalized coordinate information of each frame of image in the second video data.
In practical applications, the obtaining module 400, the first processing module 401 and the second processing module 402 may all be implemented by a processor in an electronic device, where the processor may be at least one of an ASIC, a DSP, a DSPD, a PLD, an FPGA, a CPU, a controller, a microcontroller, and a microprocessor.
In addition, each functional module in this embodiment may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware or a form of a software functional module.
Based on the understanding that the technical solution of the present embodiment essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method of the present embodiment. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Specifically, the computer program instructions corresponding to an image processing method in the present embodiment may be stored on a storage medium such as an optical disc, a hard disk, a usb disk, or the like, and when the computer program instructions corresponding to an image processing method in the storage medium are read or executed by an electronic device, any one of the image processing methods of the foregoing embodiments is implemented.
Based on the same technical concept of the foregoing embodiment, referring to fig. 5, it shows an electronic device 5 provided by the embodiment of the present disclosure, which may include: a memory 501 and a processor 502; wherein the content of the first and second substances,
the memory 501 is used for storing computer programs and data;
the processor 502 is configured to execute the computer program stored in the memory to implement any one of the image processing methods of the foregoing embodiments.
In practical applications, the memory 501 may be a volatile memory (volatile memory), such as a RAM; or a non-volatile memory (non-volatile memory) such as a ROM, a flash memory (flash memory), a Hard Disk (Hard Disk Drive, HDD) or a Solid-State Drive (SSD); or a combination of the above types of memories and provides instructions and data to the processor 502.
The processor 502 may be at least one of ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, microcontroller, and microprocessor. It is understood that the electronic devices for implementing the above-described processor functions may be other devices, and the embodiments of the present disclosure are not particularly limited.
In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.
The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to each other, which are not repeated herein for brevity
The methods disclosed in the method embodiments provided by the present application can be combined arbitrarily without conflict to obtain new method embodiments.
Features disclosed in various product embodiments provided by the application can be combined arbitrarily to obtain new product embodiments without conflict.
The features disclosed in the various method or apparatus embodiments provided herein may be combined in any combination to arrive at new method or apparatus embodiments without conflict.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (12)

1. An image processing method, characterized in that the method comprises:
acquiring first video data and second video data;
carrying out normalization processing on coordinates of human key points of each frame of image in the first video data to obtain normalized coordinate information of each frame of image in the first video data; normalizing the coordinates of the human key points of each frame of image in the second video data to obtain normalized coordinate information of each frame of image in the second video data;
and determining the human body action similarity of the first video data and the second video data according to the normalized coordinate information of each frame of image in the first video data and the normalized coordinate information of each frame of image in the second video data.
2. The method of claim 1, wherein determining the similarity of the human body actions of the first video data and the second video data according to the normalized coordinate information of each frame of image in the first video data and the normalized coordinate information of each frame of image in the second video data comprises:
merging the normalized coordinate information of each frame of image in the first video data to obtain the key point feature vector of each frame of image in the first video data; merging the normalized coordinate information of each frame of image in the second video data to obtain the key point feature vector of each frame of image in the second video data;
and determining the human body motion similarity of the first video data and the second video data according to the key point feature vector of each frame of image in the first video data and the key point feature vector of each frame of image in the second video data.
3. The method of claim 2, wherein determining the similarity of human body actions of the first video data and the second video data according to the key point feature vector of each frame of image in the first video data and the key point feature vector of each frame of image in the second video data comprises:
merging the key point feature vectors of each frame of image in the first video data to obtain a feature vector time sequence of the first video data;
merging the key point feature vectors of each frame of image in the second video data to obtain a feature vector time sequence of the second video data;
and determining the human body motion similarity of the first video data and the second video data according to the feature vector time sequence of the first video data and the feature vector time sequence of the second video data.
4. The method according to claim 3, wherein the merging the feature vectors of the keypoints of the images in the first video data to obtain the time series of feature vectors of the first video data comprises:
normalizing the feature vectors of the key points of each frame of image in the first video data to obtain normalized feature vectors of each frame of image in the first video data, and merging the normalized feature vectors of each frame of image in the first video data to obtain a feature vector time sequence of the first video data;
the merging the feature vectors of the key points of each frame of image in the second video data to obtain the feature vector time sequence of the second video data includes:
and normalizing the feature vectors of the key points of the images in the second video data to obtain normalized feature vectors of the images in the second video data, and merging the normalized feature vectors of the images in the second video data to obtain a feature vector time sequence of the second video data.
5. The method according to claim 3 or 4, wherein the determining the similarity of the human body motions of the first video data and the second video data according to the feature vector time sequence of the first video data and the feature vector time sequence of the second video data comprises:
determining the distance of the normalized feature vector of each corresponding frame image in the first video data and the second video data by adopting a Dynamic Time Warping (DTW) method according to the feature vector time sequence of the first video data and the feature vector time sequence of the second video data;
and determining the human body motion similarity of the first video data and the second video data according to the distance of the normalized feature vectors of the corresponding frame images in the first video data and the second video data.
6. The method of claim 5, wherein determining the human motion similarity of the first video data and the second video data according to the distance of the normalized feature vector of each corresponding frame image in the first video data and the second video data comprises:
determining similarity scoring values of each corresponding frame image in the first video data and the second video data according to the distance of the normalized feature vector of each corresponding frame image in the first video data and the second video data;
and determining the human body action similarity of the first video data and the second video data according to the similarity scoring values of the corresponding frame images in the first video data and the second video data.
7. The method according to claim 6, wherein the determining human motion similarity of the first video data and the second video data according to the similarity score values of the corresponding frame images in the first video data and the second video data comprises:
and determining the human body action similarity of the first video data and the second video data according to the average value of the similarity scoring values of the corresponding frame images in the first video data and the second video data.
8. The method of claim 1, wherein the obtaining the first video data and the second video data comprises:
acquiring first initial video data and second initial video data to be compared;
and preprocessing the first initial video data and the second initial video data to obtain the first video data and the second video data with the same frame number.
9. The method according to claim 1, wherein the normalizing the coordinates of the key points of the human body of each frame of image in the first video data to obtain normalized coordinate information of each frame of image in the first video data comprises:
carrying out noise reduction smoothing processing on the coordinates of the human key points of each frame of image in the first video data, and carrying out normalization processing on the coordinates of the human key points after the noise reduction smoothing processing corresponding to the first video data to obtain normalized coordinate information of each frame of image in the first video data;
the normalizing the coordinates of the key points of the human body of each frame of image in the second video data to obtain normalized coordinate information of each frame of image in the second video data includes:
and performing noise reduction smoothing processing on the coordinates of the human key points of each frame of image in the second video data, and performing normalization processing on the coordinates of the human key points after the noise reduction smoothing processing corresponding to the second video data to obtain normalized coordinate information of each frame of image in the second video data.
10. An image processing apparatus, characterized in that the apparatus comprises:
the acquisition module is used for acquiring first video data and second video data;
the first processing module is used for carrying out normalization processing on the coordinates of the human key points of each frame of image in the first video data to obtain normalized coordinate information of each frame of image in the first video data; normalizing the coordinates of the human key points of each frame of image in the second video data to obtain normalized coordinate information of each frame of image in the second video data;
and the second processing module is used for determining the human body action similarity of the first video data and the second video data according to the normalized coordinate information of each frame of image in the first video data and the normalized coordinate information of each frame of image in the second video data.
11. An electronic device comprising a processor and a memory for storing a computer program operable on the processor; wherein the content of the first and second substances,
the processor is configured to run the computer program to perform the method of any one of claims 1 to 9.
12. A computer storage medium on which a computer program is stored, characterized in that the computer program realizes the method of any one of claims 1 to 9 when executed by a processor.
CN202011417021.5A 2020-12-04 2020-12-04 Image processing method, image processing device, electronic equipment and computer storage medium Pending CN112418153A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011417021.5A CN112418153A (en) 2020-12-04 2020-12-04 Image processing method, image processing device, electronic equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011417021.5A CN112418153A (en) 2020-12-04 2020-12-04 Image processing method, image processing device, electronic equipment and computer storage medium

Publications (1)

Publication Number Publication Date
CN112418153A true CN112418153A (en) 2021-02-26

Family

ID=74775768

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011417021.5A Pending CN112418153A (en) 2020-12-04 2020-12-04 Image processing method, image processing device, electronic equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN112418153A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112906630A (en) * 2021-03-17 2021-06-04 北京市商汤科技开发有限公司 Video processing method and device, computer readable storage medium and computer equipment

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110305394A1 (en) * 2010-06-15 2011-12-15 David William Singer Object Detection Metadata
CN105930767A (en) * 2016-04-06 2016-09-07 南京华捷艾米软件科技有限公司 Human body skeleton-based action recognition method
CN107920257A (en) * 2017-12-01 2018-04-17 北京奇虎科技有限公司 Video Key point real-time processing method, device and computing device
CN107967693A (en) * 2017-12-01 2018-04-27 北京奇虎科技有限公司 Video Key point processing method, device, computing device and computer-readable storage medium
CN108615055A (en) * 2018-04-19 2018-10-02 咪咕动漫有限公司 A kind of similarity calculating method, device and computer readable storage medium
WO2018202089A1 (en) * 2017-05-05 2018-11-08 商汤集团有限公司 Key point detection method and device, storage medium and electronic device
CN110210284A (en) * 2019-04-12 2019-09-06 哈工大机器人义乌人工智能研究院 A kind of human body attitude behavior intelligent Evaluation method
CN110711374A (en) * 2019-10-15 2020-01-21 石家庄铁道大学 Multi-modal dance action evaluation method
CN110738192A (en) * 2019-10-29 2020-01-31 腾讯科技(深圳)有限公司 Human motion function auxiliary evaluation method, device, equipment, system and medium
CN111476097A (en) * 2020-03-06 2020-07-31 平安科技(深圳)有限公司 Human body posture assessment method and device, computer equipment and storage medium
CN111626137A (en) * 2020-04-29 2020-09-04 平安国际智慧城市科技股份有限公司 Video-based motion evaluation method and device, computer equipment and storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110305394A1 (en) * 2010-06-15 2011-12-15 David William Singer Object Detection Metadata
CN105930767A (en) * 2016-04-06 2016-09-07 南京华捷艾米软件科技有限公司 Human body skeleton-based action recognition method
WO2018202089A1 (en) * 2017-05-05 2018-11-08 商汤集团有限公司 Key point detection method and device, storage medium and electronic device
CN107920257A (en) * 2017-12-01 2018-04-17 北京奇虎科技有限公司 Video Key point real-time processing method, device and computing device
CN107967693A (en) * 2017-12-01 2018-04-27 北京奇虎科技有限公司 Video Key point processing method, device, computing device and computer-readable storage medium
CN108615055A (en) * 2018-04-19 2018-10-02 咪咕动漫有限公司 A kind of similarity calculating method, device and computer readable storage medium
CN110210284A (en) * 2019-04-12 2019-09-06 哈工大机器人义乌人工智能研究院 A kind of human body attitude behavior intelligent Evaluation method
CN110711374A (en) * 2019-10-15 2020-01-21 石家庄铁道大学 Multi-modal dance action evaluation method
CN110738192A (en) * 2019-10-29 2020-01-31 腾讯科技(深圳)有限公司 Human motion function auxiliary evaluation method, device, equipment, system and medium
CN111476097A (en) * 2020-03-06 2020-07-31 平安科技(深圳)有限公司 Human body posture assessment method and device, computer equipment and storage medium
CN111626137A (en) * 2020-04-29 2020-09-04 平安国际智慧城市科技股份有限公司 Video-based motion evaluation method and device, computer equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112906630A (en) * 2021-03-17 2021-06-04 北京市商汤科技开发有限公司 Video processing method and device, computer readable storage medium and computer equipment

Similar Documents

Publication Publication Date Title
CN108230383B (en) Hand three-dimensional data determination method and device and electronic equipment
US8644551B2 (en) Systems and methods for tracking natural planar shapes for augmented reality applications
RU2617557C1 (en) Method of exposure to virtual objects of additional reality
US9418480B2 (en) Systems and methods for 3D pose estimation
CN110909651A (en) Video subject person identification method, device, equipment and readable storage medium
CN111310705A (en) Image recognition method and device, computer equipment and storage medium
JP2013012190A (en) Method of approximating gabor filter as block-gabor filter, and memory to store data structure for access by application program running on processor
US10254831B2 (en) System and method for detecting a gaze of a viewer
CN110147708B (en) Image data processing method and related device
CN109063776B (en) Image re-recognition network training method and device and image re-recognition method and device
CN111104925A (en) Image processing method, image processing apparatus, storage medium, and electronic device
CN112651380A (en) Face recognition method, face recognition device, terminal equipment and storage medium
CN111815768B (en) Three-dimensional face reconstruction method and device
CN110032941B (en) Face image detection method, face image detection device and terminal equipment
CN112418153A (en) Image processing method, image processing device, electronic equipment and computer storage medium
WO2024022301A1 (en) Visual angle path acquisition method and apparatus, and electronic device and medium
CN109598201B (en) Action detection method and device, electronic equipment and readable storage medium
CN115223240B (en) Motion real-time counting method and system based on dynamic time warping algorithm
CN111104911A (en) Pedestrian re-identification method and device based on big data training
CN111611941B (en) Special effect processing method and related equipment
CN112613457B (en) Image acquisition mode detection method, device, computer equipment and storage medium
CN114627542A (en) Eye movement position determination method and device, storage medium and electronic equipment
CN113724176A (en) Multi-camera motion capture seamless connection method, device, terminal and medium
CN108446653B (en) Method and apparatus for processing face image
CN108446737B (en) Method and device for identifying objects

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination