CN115914741A - Baby video collection and capture method, device and equipment based on motion classification - Google Patents

Baby video collection and capture method, device and equipment based on motion classification Download PDF

Info

Publication number
CN115914741A
CN115914741A CN202211565460.XA CN202211565460A CN115914741A CN 115914741 A CN115914741 A CN 115914741A CN 202211565460 A CN202211565460 A CN 202211565460A CN 115914741 A CN115914741 A CN 115914741A
Authority
CN
China
Prior art keywords
video
image
baby
frame
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211565460.XA
Other languages
Chinese (zh)
Inventor
陈辉
熊章
杜沛力
张智
雷奇文
艾伟
胡国湖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Xingxun Intelligent Technology Co ltd
Original Assignee
Wuhan Xingxun Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Xingxun Intelligent Technology Co ltd filed Critical Wuhan Xingxun Intelligent Technology Co ltd
Priority to CN202211565460.XA priority Critical patent/CN115914741A/en
Publication of CN115914741A publication Critical patent/CN115914741A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/32Indexing scheme for image data processing or generation, in general involving image mosaicing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Signal Processing (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention belongs to the technical field of video editing, solves the technical problem of poor user experience effect caused by the fact that specific actions of infants can not be edited by editing videos by a conventional method in a targeted manner, and provides a method, a device and equipment for capturing infant video highlights based on action classification. Acquiring an infant initial video, converting the initial color space of the infant initial video into HSV color space, recording the converted video as an infant video, and extracting each frame of target image containing infant limb key points in the infant video; classifying the target images of each frame by using a preset action classification model to determine an action image set; and determining a target video according to the key points of the limbs of the baby of each frame of target image in the action image set. By the method, the corresponding video can be grabbed based on the specific actions of the baby, so that the storage and the checking of a user are facilitated, meanwhile, the storage space is saved, and the interestingness of the video is improved.

Description

Baby video collection and capture method, device and equipment based on motion classification
The application is a divisional application of an invention patent application with the application number of 2021104651800, which is filed in 2021, 4, 27 and has the title of 'method, device, equipment and medium for automatically grabbing baby wonderful video highlights'.
Technical Field
The invention relates to the technical field of video editing, in particular to a baby video collection and capture method, device and equipment based on motion classification.
Background
With the development of computer and network technologies, the functions of electronic devices are becoming more and more diversified. Splicing video segments of interest into new video by means of video editing is becoming more and more popular with users.
In the prior art, there are several methods for capturing baby video highlights based on motion classification: one method is to adopt a background modeling method to eliminate invalid backgrounds and combine dynamic scenes into a video, and the method has the advantages of high speed, poor clipping effect and easy occurrence of ghost phenomenon; another method is to capture a video by detecting a target object in a scene and taking a picture containing a person as a key frame, and the captured video cannot obtain meaningful information, and the information is relatively unclear. Thus resulting in a poor user experience.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, an apparatus, and a device for capturing baby video highlights based on motion classification, so as to solve the technical problem in the prior art that the baby video cannot be captured effectively.
The technical scheme adopted by the invention is as follows:
the invention provides a baby video collection grabbing method based on motion classification, which is characterized by comprising the following steps of:
acquiring an infant initial video, converting the initial color space of the infant initial video into an HSV color space, recording the converted video as an infant video, and extracting each frame of target image containing infant limb key points in the infant video;
classifying the target images of each frame by using a preset action classification model to determine an action image set;
and determining a target video according to the infant limb key points of each frame of target image in the action image set.
Preferably, the obtaining of the infant initial video, converting the initial color space of the infant initial video into an HSV color space, recording the converted video as the infant video, and extracting each frame of target image containing the infant limb key point in the infant video includes:
acquiring a conversion relation between an initial color space of an infant initial video and an HSV color space;
converting the infant initial video into the infant video of an HSV color space according to the conversion relation;
acquiring the frame rate of the baby video, and decomposing the baby video into corresponding frame images according to the frame rate;
and detecting the key points of the limbs of the baby on the images of the frames, and outputting target images of the frames containing the key points of the limbs of the baby.
Preferably, the classifying the target images of each frame by using a preset motion classification model, and determining the motion image set includes:
acquiring the image size of each frame of target image and the key point coordinates corresponding to the key points of each infant limb in the target image;
according to the image sizes, carrying out standardization processing on the coordinates of the key points in the target image corresponding to the image sizes to obtain the standardized coordinate values of the key points of the limbs of the babies;
and classifying the target images of each frame according to the standardized coordinate values of the key points of the limbs of each baby in the target images of each frame, and outputting the action image set.
Preferably, the classifying the target images of each frame according to the normalized coordinate values of the key points of the limbs of the infant in the target images of each frame, and outputting the motion image set includes:
taking the standard coordinate values of the key points of the limbs of the babies in each frame of target image as a group to obtain a target coordinate group of the key points of the limbs of the babies corresponding to each frame of target image;
inputting the target coordinate group into a preset classifier to classify the action of the target image and outputting an action image set; the preset classifier comprises an SVM classifier, the SVM classifier identifies whether the target image comprises the face of the infant or not according to the input target image, when the target image comprises the face of the infant, the target image is input into a classification sub-network of the SVM classifier, the probability of the action category of the infant is determined through the classification sub-network, and an action image set is output.
Preferably, the determining a target video according to the infant limb key point of each frame of target image in the motion image set includes:
calculating the dispersion of the infant limb key points of each frame of target image in the motion image set according to the positions of the infant limb key points in the target image, and outputting reference images corresponding to various motions;
and editing videos corresponding to various types of actions in the baby video based on the reference images, and outputting a target video.
Preferably, the calculating the dispersion of the baby limb key points of each frame of target image in the motion image set according to the position of each baby limb key point in the target image, and outputting the reference image corresponding to each type of motion includes:
dividing each frame of target image to obtain a plurality of sub-images with the same size;
acquiring the center coordinates of each sub-image, and determining a weight value according to the distance from the center position of each sub-image to the center position of the target image;
obtaining a discrete value of a key point of the body of the baby of each frame of target image according to the central coordinate and the weight value of each sub-image of each frame of target image;
and comparing the discrete values of the target images of the frames corresponding to the various actions, and outputting the reference images corresponding to the various actions.
Preferably, the clipping videos corresponding to various types of motions in the baby video based on the reference images, and outputting a target video includes:
acquiring the time length of each action and the time sequence of each reference image;
capturing the action image set before and/or after the reference image to obtain each action sub-video of the time length;
and splicing the action sub-videos according to the time sequence, and outputting the target video.
The invention also provides a baby video collection grabbing device based on motion classification, which comprises:
the image data extraction module is used for acquiring an infant initial video, converting the initial color space of the infant initial video into HSV color space, recording the converted video as the infant video, and extracting each frame of target image containing the infant limb key points in the infant video;
the image data classification module is used for classifying the target images of each frame by using a preset action classification model to determine an action image set;
and the video data synthesis module is used for determining a target video according to the key points of the limbs of the baby of each frame of target image in the action image set.
The present invention also provides an electronic device, comprising: at least one processor, at least one memory, and computer program instructions stored in the memory that, when executed by the processor, implement the method of any of the above.
The invention also provides a medium having stored thereon computer program instructions which, when executed by a processor, implement the method of any of the above.
In conclusion, the beneficial effects of the invention are as follows:
the invention provides a baby video collection capture method, a device and equipment based on motion classification, wherein each frame of target image containing a baby limb key point is extracted from a baby video, and the motion in each frame of image is classified to obtain a motion image set; then calculating the dispersion of the key points of the limbs of the baby in each frame of image, thereby determining a reference image corresponding to each action; capturing various actions in the baby video on the basis of the reference images to obtain a target video consisting of image frames corresponding to the actions; by the method, the video corresponding to the specific action of the baby can be accurately captured, and then the video is stored and checked, so that the storage space of equipment can be saved, and the interestingness and the user experience effect of the video can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below, and for those skilled in the art, without any creative effort, other drawings may be obtained according to the drawings, and these drawings are all within the protection scope of the present invention.
Fig. 1 is a schematic flow chart of a baby video collection and capture method based on motion classification in embodiment 1 of the present invention;
FIG. 2 is a schematic flowchart of acquiring a motion image set according to embodiment 1 of the present invention;
fig. 3 is a schematic flowchart of a process of normalizing coordinates of a limb key point of an infant in embodiment 1 of the present invention;
fig. 4 is a schematic flow chart illustrating a process of acquiring target coordinate values of key points of limbs of a baby in embodiment 1 of the present invention;
FIG. 5 is a schematic flow chart of acquiring reference images of various operations in embodiment 1 of the present invention;
fig. 6 is a schematic flowchart of a process of acquiring a target video according to embodiment 1 of the present invention;
fig. 7 is a schematic flowchart of a process of acquiring an infant video corresponding to HSV color space in embodiment 1 of the present invention;
fig. 8 is a schematic structural diagram of an infant video collection and gripping device based on motion classification in embodiment 2 of the present invention;
fig. 9 is a schematic structural diagram of an electronic device in embodiment 3 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. In the description of the present invention, it is to be understood that the terms "center", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc. indicate orientations or positional relationships based on those shown in the drawings, merely for convenience of description and simplicity of description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, are not to be construed as limiting the present invention. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising 8230; \8230;" comprises 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element. In case of conflict, it is intended that the embodiments of the present invention and the individual features of the embodiments may be combined with each other within the scope of the present invention.
Implementation mode one
Example 1
Referring to fig. 1, fig. 1 is a schematic flow chart of a baby video collection capture method based on motion classification in embodiment 1 of the present invention; the method comprises the following steps:
s1: extracting target images of frames containing key points of the limbs of the baby in the baby video, and outputting a motion image set for dividing motion categories;
specifically, the infant video collected by the camera is subjected to framing according to the frame rate of the video, and the infant video is converted into a multi-frame image; extracting image frames containing key points of limbs of the baby, and then dividing the extracted image frames according to action categories to form an action image set for distinguishing the action categories; the action image set at least comprises an image corresponding to an action, simultaneously, the action of any category at least comprises one image, and the key frames of all actions are integrated and classified, so that the comparison between the key frames of the same action is facilitated, the key frames meeting the requirements are screened out, the interference between the images of different actions is reduced, and the data processing amount is reduced.
In an embodiment, referring to fig. 2, the S1 includes:
s11: acquiring a frame rate of a video;
specifically, the frame rate of the video stream is the number of image frames transmitted in a unit time when the video is transmitted, such as: the frame rate was 20 frames per second, resulting in one image being transmitted at intervals of 0.05 seconds.
S12: decomposing the infant video into corresponding frame images according to the frame rate;
specifically, the infant video is converted into frame images according to the frame rate, wherein the frame images include frame images with an infant motion and frame images without an infant motion, such as: when the baby disappears or corresponding videos appear in the camera monitoring area, part of the image frames contain the baby, and part of the image frames do not contain the baby.
S13: detecting the key points of the limbs of the baby on each frame of the image, and outputting each frame of target image containing the key points of the limbs of the baby;
specifically, each frame of image is detected, whether each frame of image contains a human body key point corresponding to a specific action of the infant is judged, if yes, the frame of image is extracted, and each frame of image conforming to the specific action is obtained by adopting the method; the specific action includes at least one of: holding, sucking, laughing, jumping, climbing, standing, covering, raising head, turning, etc. of 0-3 years old.
S14: and classifying the target images of each frame by using a preset action classification model, and outputting the action image set.
Specifically, the information and the position relation of the key points of the limbs of the baby corresponding to each action are stored in an action classification model, and the collected position information between the key points of the limbs of the baby and the key points corresponding to the actions is compared with preset historical data, so that images of each action are obtained, and an action image set is formed; the classification of the action is completed.
In an embodiment, referring to fig. 3, the S14 includes:
s141: acquiring the image size of each frame of target image and the key point coordinates corresponding to the key points of each infant limb in the target image;
specifically, the length dimension L of each frame image is acquired a And a width dimension W a And the key point coordinates P of each key point of the baby limb in each target image a (X i ,Y i ) Wherein a is the a-th target image, i is the i-th limb key point of the target image, X is the abscissa, Y is the ordinate, W is the width of the target image, L is the length of the target image, P is the length of the target image a (X i ,Y i ) The coordinate of the ith limb key point of the a-th target image is shown.
S142: according to the image sizes, the coordinates of the key points in the target image corresponding to the image sizes are standardized to obtain standardized coordinate values of the key points of the limbs of the baby
Specifically, the image size of each target image and the key point coordinates of each key point of the limb of the baby in each target image are obtained, and then the coordinates of each key point are subjected to standardization treatment, namely normalization treatment; and obtaining the standard coordinate value of each limb key point of the baby.
In an embodiment, referring to fig. 4, the S142 includes:
s1421: using a formula
Figure BDA0003985964270000081
Converting each of the keypoint coordinates P (X, Y) into a corresponding floating point value;
specifically, using the formula
Figure BDA0003985964270000082
Converting the abscissa and ordinate of each key point in the obtained target image into corresponding floating point values, such as converting the abscissa X of the ith key point in the image a i Converted into corresponding floating point values P i (X'), the abscissa Y of the ith key point in image a i Converted into corresponding floating point value P i (Y′)。
S1422: using formulas
Figure BDA0003985964270000091
Converting each floating point value into a corresponding standard coordinate value;
specifically, after converting the key point coordinates into corresponding floating point values, a formula is used
Figure BDA0003985964270000092
Converting each floating point value into a corresponding standard coordinate value, thereby completing the coordinates of the key pointsThe normalization process of (1). Such as the abscissa X of the ith key point in the image a i After standardization processing, a coordinate standard value P is obtained i (X'), the abscissa Y of the ith key point in image a i Obtaining a coordinate standard value P after standardization processing i (Y″)。
Wherein, W is the width of the target image, L is the length of the target image, X is the abscissa of the limb key point, Y is the ordinate of the limb key point, X 'is a floating point value corresponding to the abscissa, Y' is a floating point value corresponding to the ordinate, X "is a standard coordinate value corresponding to the abscissa, Y" is a standard coordinate value corresponding to the ordinate, and q is the total number of the limb key points included in the corresponding action in each target image.
S143: and classifying the target images of each frame according to the standardized coordinate values of the key points of the limbs of each baby in the target images of each frame, and outputting the action image set.
Specifically, the action types of the target images are obtained according to the normalized coordinate values corresponding to all the limb key points in each target image, and the action types of each target image are classified and output as an action image set. Taking the target coordinate values of all the limb key points in each target image as a group to obtain a target coordinate group of each limb key point corresponding to each target image one by one, inputting each group of target coordinate value group into a svm classifier to classify the action in each target image, and outputting an action image set; the svm classifier is a linear classifier and is mainly used for two-class classification, and the svm classifier can determine whether the image includes a face or does not include the face based on an input feature map. Inputting the feature map of the image frame into a classification sub-network, wherein the classification sub-network can output the probability that each target detection frame contains each action category, so that the action classification of the target image is realized; the human face can be an arm, a trunk and the like; .
S2: calculating the dispersion of the infant limb key points of each frame of target image in the action image set, and outputting reference images corresponding to various actions;
specifically, the position of the key point of the infant limb in the image is calculatedTaking the image with the dispersion meeting the requirement as a reference image, and subsequently taking the reference image as a reference to capture each frame of image for corresponding action, such as action A 1 The reference image of (a) is a, and the capturing operation A is performed in a predetermined capturing mode based on the image a 1 Obtaining action A from the corresponding image frame 1 The grabbing action of (2); action A 2 B, and performing capture operation A in a predetermined capture mode based on the image b 2 Obtaining action A from the corresponding image frame 2 The grasping action of (1).
It should be noted that: the preset classifier includes but is not limited to SVM, softmax and other common classifiers.
In an embodiment, referring to fig. 5, the S2 includes:
s21: dividing each frame of target image to obtain a plurality of sub-images with the same size;
specifically, each target image is divided into a plurality of sub-images with the same size according to a predetermined rule, for example, each target image is divided into 9 sub-images by a squared figure.
S22: acquiring the center coordinate and the weight value of each sub-image;
specifically, the central coordinates of the sub-images are obtained, a weight value is given to each sub-image, the size of the weight is determined according to the distance between the central position of each sub-image and the central position of the target image, for example, each target image is divided in a nine-grid manner to obtain 9 sub-images, the distances between the 1 st, 3 rd, 7 th and 9 th sub-images and the central point are the farthest, the weight value is set to be 2, the distances between the 2 nd, 4 th, 6 th and 8 th sub-images and the central point are close, the weight value is set to be 1, the central point of the 5 th sub-image is overlapped with the central point of the target image, and the weight value is set to be 0.
It should be noted that: the setting of the weight includes but is not limited to the presetting, the setting of the user according to the use habit, the correction of the system according to the preference of the user to the target image; such as: some users prefer to have a centered high color image or video, and some users prefer to have an image or video that is shifted or distorted.
It should be noted that: the central coordinates of the image can be preset by the system or can be a point formulated later by the user.
S23: obtaining a discrete value of a key point of the body of the baby of each frame of target image according to the central coordinate and the weighted value of each sub-image of each frame of target image;
specifically, according to the center coordinates of each sub-image and the corresponding weight value, the formula S = W 1 +W 2 +…+W q Calculating the discrete value of each target image, wherein S is the discrete value of the key point of the limb of the baby of each target image, W is the weight value, and W is the weight value q A weight value representing a qth limb keypoint; taking the Sudoku division target image as an example, if the limb key points 1 and 2 are both located in the 1 st sub-image, then W 1 And W 2 All values of (a) are 2.
S24: and comparing the discrete values of the target images of the frames corresponding to the various actions, and outputting the reference images corresponding to the various actions.
Specifically, the larger the discrete value is, the better the image effect of the captured baby motion is, compared with the discrete value of the target image of each frame in the same motion type, and therefore the target image with the larger discrete value is used as the reference image.
S3: and editing videos corresponding to various types of actions in the baby video based on the reference images, and outputting a target video.
Specifically, after the reference image of each action is determined, the actions are captured according to action capture rules to obtain highlight video streams corresponding to the actions one by one, and then the highlight video streams are spliced to obtain a target video.
In an embodiment, referring to fig. 6, the S3 includes:
s31: acquiring the time length of each action and the time sequence of each reference image;
specifically, a time length T corresponding to one motion is set, that is, the total number of images included in one motion is determined; and determining the time of each frame of reference image to obtain the time sequence of all the reference images.
S32: capturing the action image set before and/or after the reference image to obtain each action sub-video of the time length;
specifically, according to the time length required by the action, a plurality of images including corresponding reference images are screened from the moving image set; the time length corresponding to the multiple images is less than or equal to T; for example, a images are captured before the reference image, corresponding to the time length T1, and b images are captured after the reference image, corresponding to the time length T2, and T1+ T2 is less than or equal to T.
S33: and splicing the action sub-videos according to the time sequence, and outputting the target video.
In an embodiment, referring to fig. 7, the S1 further includes:
s01: acquiring a conversion relation between an initial color space of an infant initial video and an HSV color space;
s02: and converting the infant initial video into the infant video in the HSV color space according to the conversion relation.
Specifically, a video of a non-HSV color space is converted into a video of an HSV color space, wherein H is hue, S is saturation and V is brightness; the accuracy of extracting parameters of the hue, the brightness and the brightness of the image can be improved, and the problem of false color detection is reduced.
By adopting the baby video collection capture method based on motion classification in the embodiment, each frame of target image containing the key points of the limbs of the baby is extracted from the baby video, and the motion in each frame of image is classified to obtain a motion image set; then calculating the dispersion of the key points of the limbs of the baby in each frame of image, thereby determining a reference image corresponding to each action; capturing various actions in the infant video on the basis of the reference images to obtain a target video consisting of image frames corresponding to the actions; by the method, the video corresponding to the specific action of the baby can be accurately captured, and then the video is stored and checked, so that the storage space of equipment can be saved, and the interestingness and the user experience effect of the video can be improved.
Example 2
The invention also provides a baby video collection grabbing device based on motion classification, please refer to fig. 8, which includes:
an image data detection module: the system comprises a video acquisition unit, a motion image classification unit and a motion image classification unit, wherein the video acquisition unit is used for acquiring a frame of target image containing a key point of a baby limb in a baby video and outputting a motion image set for classifying motion categories;
an image data processing module: the motion image set is used for calculating the dispersion of the infant limb key points of each frame of target image in the motion image set and outputting reference images corresponding to various motions;
the video data synthesis module: and the video editing device is used for editing videos corresponding to various types of actions in the baby video based on the reference images and outputting a target video.
By adopting the baby video collection and grabbing device based on motion classification in the embodiment, each frame of target image containing the key points of the limbs of a baby is extracted from a baby video, and the motion in each frame of image is classified to obtain a motion image set; then calculating the dispersion of the key points of the limbs of the baby in each frame of image, thereby determining a reference image corresponding to each action; capturing various actions in the infant video on the basis of the reference images to obtain a target video consisting of image frames corresponding to the actions; by the method, the video corresponding to the specific action of the baby can be accurately captured, and then the video is stored and checked, so that the storage space of equipment can be saved, and the interestingness and the user experience effect of the video can be improved.
In one embodiment, obtaining the frame rate of the video stream:
image length unit: acquiring the frame rate of the video of the baby;
a video splitting unit: decomposing the infant video into corresponding frame images according to the frame rate;
the key point detection unit: detecting the key points of the limbs of the baby on each frame of the image, and outputting each frame of target image containing the key points of the limbs of the baby;
an image motion classification unit: and classifying the target images of each frame by using a preset action classification model, and outputting the action image set.
In one embodiment, the image motion classification unit includes:
an image parameter acquisition unit: acquiring the image size of each frame of target image and the key point coordinates corresponding to the key points of each infant limb in the target image;
a normalization processing unit: according to the image sizes, carrying out standardization processing on the coordinates of the key points in the target image corresponding to the image sizes to obtain the standardized coordinate values of the key points of the limbs of the babies;
an action image classification unit: and classifying the target images of each frame according to the standardized coordinate values of the key points of the limbs of each baby in the target images of each frame, and outputting the action image set.
In one embodiment, the motion image classification unit includes:
coordinate floating value unit: using formulas
Figure BDA0003985964270000141
Converting each of the keypoint coordinates P (X, Y) into a corresponding floating point value;
target coordinate value unit: using formulas
Figure BDA0003985964270000142
Converting each floating point value into a corresponding standard coordinate value;
an action classification unit: inputting the target coordinate values of the key points of the limbs of the babies in each frame of target image into an svm classifier to classify each frame of target image, and outputting the action image set;
wherein, W is the width of the target image, L is the length of the target image, X is the abscissa of the key points of the limbs, Y is the ordinate of the key points of the limbs, X 'is the floating point value corresponding to the abscissa, Y' is the floating point value corresponding to the ordinate, X 'is the standard coordinate value corresponding to the abscissa, Y' is the standard coordinate value corresponding to the ordinate, and q is the total number of the key points of the limbs included in the corresponding action in each target image.
In one embodiment, the image data processing module comprises:
an image segmentation unit: dividing each frame of target image to obtain a plurality of sub-images with the same size;
sub-image parameter unit: acquiring the center coordinate and the weight value of each sub-image;
image discrete value unit: obtaining a discrete value of a key point of the body of the baby of each frame of target image according to the central coordinate and the weight value of each sub-image of each frame of target image;
a reference image calculation unit: and comparing the discrete values of the target images of the frames corresponding to the various actions, and outputting the reference images corresponding to the various actions.
In an embodiment, the video data composition module comprises:
a time sequence unit: acquiring the time length of each action and the time sequence of each reference image;
a sub-video capture unit: capturing the action image set before and/or after the reference image to obtain each action sub-video of the time length;
a video splicing unit: and splicing the action sub-videos according to the time sequence, and outputting the target video.
In an embodiment, before executing the image data detection module, the method further comprises:
color space mapping relationship unit: acquiring a conversion relation between an initial color space of an infant initial video and an HSV color space;
a video color space conversion unit: and converting the infant initial video into the infant video in the HSV color space according to the conversion relation.
By adopting the baby video collection and grabbing device based on motion classification in the embodiment, each frame of target image containing the key points of the limbs of a baby is extracted from a baby video, and the motion in each frame of image is classified to obtain a motion image set; then calculating the dispersion of the key points of the limbs of the baby in each frame of image so as to determine a reference image corresponding to each action; capturing various actions in the baby video on the basis of the reference images to obtain a target video consisting of image frames corresponding to the actions; by the method, the video corresponding to the specific action of the baby can be accurately captured, and then the video is stored and checked, so that the storage space of equipment can be saved, and the interestingness and the user experience effect of the video can be improved.
Example 3
The present invention provides an electronic device and medium, as shown in fig. 9, comprising at least one processor, at least one memory, and computer program instructions stored in the memory.
Specifically, the processor may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present invention, and the electronic device includes at least one of the following: the camera, have mobile device of camera, have wearing equipment of camera.
The memory may include mass storage for data or instructions. By way of example, and not limitation, memory may include a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, magnetic tape, or Universal Serial Bus (USB) Drive or a combination of two or more of these. The memory may include removable or non-removable (or fixed) media, where appropriate. The memory may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory is non-volatile solid-state memory. In a particular embodiment, the memory includes Read Only Memory (ROM). Where appropriate, the ROM may be mask-programmed ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically Erasable PROM (EEPROM), electrically Alterable ROM (EAROM), or flash memory, or a combination of two or more of these.
The processor reads and executes the computer program instructions stored in the memory to realize the baby video collection and capture method based on motion classification in any one of the above embodiment modes.
In one example, the electronic device may also include a communication interface and a bus. The processor, the memory and the communication interface are connected through a bus and complete mutual communication.
The communication interface is mainly used for realizing communication among modules, devices, units and/or equipment in the embodiment of the invention.
A bus comprises hardware, software, or both that couple components of an electronic device to one another. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industrial Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hyper Transport (HT) interconnect, an Industrial Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. A bus may include one or more buses, where appropriate. Although specific buses have been described and shown in the embodiments of the invention, any suitable buses or interconnects are contemplated by the invention.
In summary, embodiments of the present invention provide a method, an apparatus, and a device for capturing baby video collection based on motion classification, in which each frame of target image including a baby limb key point is extracted from a baby video, and motions in each frame of image are classified to obtain a motion image set; then calculating the dispersion of the key points of the limbs of the baby in each frame of image, thereby determining a reference image corresponding to each action; capturing various actions in the infant video on the basis of the reference images to obtain a target video consisting of image frames corresponding to the actions; by the method, the video corresponding to the specific action of the baby can be accurately captured, and then the video is stored and checked, so that the storage space of equipment can be saved, and the interestingness and the user experience effect of the video can be improved.
It is to be understood that the invention is not limited to the specific arrangements and instrumentality described above and shown in the drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications and additions or change the order between the steps after comprehending the spirit of the present invention.
The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments can be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include an electronic circuit, a semiconductor memory device, a ROM, a flash memory, an Erasable ROM (EROM), a floppy disk, a CD-ROM, an optical disk, a hard disk, an optical fiber medium, a Radio Frequency (RF) link, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A baby video collection grabbing method based on motion classification is characterized by comprising the following steps:
acquiring an infant initial video, converting the initial color space of the infant initial video into an HSV color space, recording the converted video as an infant video, and extracting each frame of target image containing infant limb key points in the infant video;
classifying the target images of each frame by using a preset action classification model to determine an action image set;
and determining a target video according to the key points of the limbs of the baby of each frame of target image in the action image set.
2. The motion classification-based baby video collection and grabbing method according to claim 1, wherein the obtaining of the baby initial video, converting the initial color space of the baby initial video into HSV color space, recording the converted video as the baby video, and extracting each frame of target image containing the baby limb key point in the baby video comprises:
acquiring a conversion relation between an initial color space of an infant initial video and an HSV color space;
converting the infant initial video into the infant video of an HSV color space according to the conversion relation;
acquiring the frame rate of the baby video, and decomposing the baby video into corresponding frame images according to the frame rate;
and detecting the key points of the limbs of the baby on the images of the frames, and outputting target images of the frames containing the key points of the limbs of the baby.
3. The motion classification-based baby video collection capture method of claim 1, wherein the classifying each frame of target image by using a preset motion classification model, and determining the motion image set comprises:
acquiring the image size of each frame of target image and the key point coordinates corresponding to the key points of each infant limb in the target image;
according to the image sizes, carrying out standardization processing on the coordinates of the key points in the target image corresponding to the image sizes to obtain the standardized coordinate values of the key points of the limbs of the babies;
and classifying the target images of each frame according to the standardized coordinate values of the key points of the limbs of each baby in the target images of each frame, and outputting the action image set.
4. The baby video collection and grabbing method based on motion classification as claimed in claim 3, wherein the classifying each frame of target image according to the normalized coordinate value of each baby limb key point in each frame of target image, and outputting the motion image set comprises:
taking the standard coordinate values of the key points of the limbs of the babies in each frame of target image as a group to obtain a target coordinate group of the key points of the limbs of the babies corresponding to each frame of target image;
inputting the target coordinate group into a preset classifier to perform action classification on the target image, and outputting an action image set; the preset classifier comprises an SVM classifier, the SVM classifier identifies whether the target image comprises the infant face or not according to the input target image, when the target image comprises the infant face, the target image is input into a classification sub-network of the SVM classifier, the probability of the occurrence of each infant action category is determined through the classification sub-network, and an action image set is output.
5. The baby video collection and grabbing method based on motion classification as claimed in claim 4, wherein the determining a target video according to the baby limb key point of each frame of target image in the motion image set comprises:
calculating the dispersion of the baby limb key points of each frame of target image in the motion image set according to the position of each baby limb key point in the target image, and outputting reference images corresponding to various motions;
and editing videos corresponding to various types of actions in the baby video based on the reference images, and outputting a target video.
6. The motion classification-based baby video collection and capture method according to claim 5, wherein the calculating the dispersion of the baby limb key points of each frame of target image in the motion image set according to the position of each baby limb key point in the target image, and outputting the reference images corresponding to each type of motion comprises:
dividing each frame of target image to obtain a plurality of sub-images with the same size;
acquiring the center coordinates of each sub-image, and determining a weight value according to the distance from the center position of each sub-image to the center position of the target image;
obtaining a discrete value of a key point of the body of the baby of each frame of target image according to the central coordinate and the weighted value of each sub-image of each frame of target image;
and comparing the discrete values of the target images of the frames corresponding to the various actions, and outputting the reference images corresponding to the various actions.
7. The motion-classification-based baby video highlight capture method according to claim 6, wherein the editing videos of the baby video corresponding to various types of motions based on the reference images, and outputting a target video comprises:
acquiring the time length of each action and the time sequence of each reference image;
capturing the action image set before and/or after the reference image to obtain each action sub-video of the time length;
and splicing the action sub-videos according to the time sequence, and outputting the target video.
8. An apparatus for video highlight grabbing of a baby based on motion classification, the apparatus comprising:
the image data extraction module is used for acquiring an infant initial video, converting an initial color space of the infant initial video into an HSV color space, recording the converted video as an infant video, and extracting each frame of target image containing infant limb key points in the infant video;
the image data classification module is used for classifying the target images of each frame by using a preset action classification model to determine an action image set;
and the video data synthesis module is used for determining a target video according to the key points of the limbs of the baby of each frame of target image in the action image set.
9. An electronic device, comprising: at least one processor, at least one memory, and computer program instructions stored in the memory that, when executed by the processor, implement the method of any of claims 1-7.
10. A storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method of any one of claims 1-7.
CN202211565460.XA 2021-04-27 2021-04-27 Baby video collection and capture method, device and equipment based on motion classification Pending CN115914741A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211565460.XA CN115914741A (en) 2021-04-27 2021-04-27 Baby video collection and capture method, device and equipment based on motion classification

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110465180.0A CN113194359B (en) 2021-04-27 2021-04-27 Method, device, equipment and medium for automatically grabbing baby wonderful video highlights
CN202211565460.XA CN115914741A (en) 2021-04-27 2021-04-27 Baby video collection and capture method, device and equipment based on motion classification

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN202110465180.0A Division CN113194359B (en) 2021-04-27 2021-04-27 Method, device, equipment and medium for automatically grabbing baby wonderful video highlights

Publications (1)

Publication Number Publication Date
CN115914741A true CN115914741A (en) 2023-04-04

Family

ID=76979679

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202110465180.0A Active CN113194359B (en) 2021-04-27 2021-04-27 Method, device, equipment and medium for automatically grabbing baby wonderful video highlights
CN202211565460.XA Pending CN115914741A (en) 2021-04-27 2021-04-27 Baby video collection and capture method, device and equipment based on motion classification

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202110465180.0A Active CN113194359B (en) 2021-04-27 2021-04-27 Method, device, equipment and medium for automatically grabbing baby wonderful video highlights

Country Status (1)

Country Link
CN (2) CN113194359B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116800976A (en) * 2023-07-17 2023-09-22 武汉星巡智能科技有限公司 Audio and video compression and restoration method, device and equipment for infant with sleep

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115412765B (en) * 2022-08-31 2024-03-26 北京奇艺世纪科技有限公司 Video highlight determination method and device, electronic equipment and storage medium
CN116386671B (en) * 2023-03-16 2024-05-07 宁波星巡智能科技有限公司 Infant crying type identification method, device, equipment and storage medium
CN116761035B (en) * 2023-05-26 2024-05-07 武汉星巡智能科技有限公司 Video intelligent editing method, device and equipment based on maternal and infant feeding behavior recognition

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013056311A1 (en) * 2011-10-20 2013-04-25 The University Of Sydney Keypoint based keyframe selection
CN107220597A (en) * 2017-05-11 2017-09-29 北京化工大学 A kind of key frame extraction method based on local feature and bag of words human action identification process
CN107566907A (en) * 2017-09-20 2018-01-09 广东欧珀移动通信有限公司 video clipping method, device, storage medium and terminal
WO2018108047A1 (en) * 2016-12-15 2018-06-21 腾讯科技(深圳)有限公司 Method and device for generating information displaying image
CN108900896A (en) * 2018-05-29 2018-11-27 深圳天珑无线科技有限公司 Video clipping method and device
CN111507137A (en) * 2019-01-31 2020-08-07 北京奇虎科技有限公司 Action understanding method and device, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013056311A1 (en) * 2011-10-20 2013-04-25 The University Of Sydney Keypoint based keyframe selection
WO2018108047A1 (en) * 2016-12-15 2018-06-21 腾讯科技(深圳)有限公司 Method and device for generating information displaying image
CN107220597A (en) * 2017-05-11 2017-09-29 北京化工大学 A kind of key frame extraction method based on local feature and bag of words human action identification process
CN107566907A (en) * 2017-09-20 2018-01-09 广东欧珀移动通信有限公司 video clipping method, device, storage medium and terminal
CN108900896A (en) * 2018-05-29 2018-11-27 深圳天珑无线科技有限公司 Video clipping method and device
CN111507137A (en) * 2019-01-31 2020-08-07 北京奇虎科技有限公司 Action understanding method and device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王晗 等;: "针对用户兴趣的视频精彩片段提取", 中国图象图形学报, no. 05, 16 May 2018 (2018-05-16) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116800976A (en) * 2023-07-17 2023-09-22 武汉星巡智能科技有限公司 Audio and video compression and restoration method, device and equipment for infant with sleep
CN116800976B (en) * 2023-07-17 2024-03-12 武汉星巡智能科技有限公司 Audio and video compression and restoration method, device and equipment for infant with sleep

Also Published As

Publication number Publication date
CN113194359B (en) 2022-12-27
CN113194359A (en) 2021-07-30

Similar Documents

Publication Publication Date Title
CN113194359B (en) Method, device, equipment and medium for automatically grabbing baby wonderful video highlights
CN109145803B (en) Gesture recognition method and device, electronic equipment and computer readable storage medium
CN110991506B (en) Vehicle brand identification method, device, equipment and storage medium
CN110502962B (en) Method, device, equipment and medium for detecting target in video stream
CN110795595A (en) Video structured storage method, device, equipment and medium based on edge calculation
CN111179302A (en) Moving target detection method and device, storage medium and terminal equipment
CN108345251B (en) Method, system, device and medium for processing robot sensing data
CN114724131A (en) Vehicle tracking method and device, electronic equipment and storage medium
CN116092119A (en) Human behavior recognition system based on multidimensional feature fusion and working method thereof
CN113038272B (en) Method, device and equipment for automatically editing baby video and storage medium
CN110472561B (en) Football goal type identification method, device, system and storage medium
CN113837006A (en) Face recognition method and device, storage medium and electronic equipment
CN116012949B (en) People flow statistics and identification method and system under complex scene
CN115862115B (en) Infant respiration detection area positioning method, device and equipment based on vision
CN101562700A (en) Identification method through fingerprint identification of digital camera
CN109034171B (en) Method and device for detecting unlicensed vehicles in video stream
CN116110129A (en) Intelligent evaluation method, device, equipment and storage medium for dining quality of infants
CN114494321A (en) Infant sleep breath real-time monitoring method, device, equipment and storage medium
CN113780083A (en) Gesture recognition method, device, equipment and storage medium
CN110800313B (en) Information processing apparatus, information processing method, and computer program
CN113378762A (en) Sitting posture intelligent monitoring method, device, equipment and storage medium
CN110443244A (en) A kind of method and relevant apparatus of graphics process
CN109359562A (en) Target identification method, device, target identification equipment and storage medium
CN116386671B (en) Infant crying type identification method, device, equipment and storage medium
CN116761035B (en) Video intelligent editing method, device and equipment based on maternal and infant feeding behavior recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination