WO2022104637A1 - 视频剪辑装置、方法、可移动平台、云台和硬件设备 - Google Patents

视频剪辑装置、方法、可移动平台、云台和硬件设备 Download PDF

Info

Publication number
WO2022104637A1
WO2022104637A1 PCT/CN2020/130074 CN2020130074W WO2022104637A1 WO 2022104637 A1 WO2022104637 A1 WO 2022104637A1 CN 2020130074 W CN2020130074 W CN 2020130074W WO 2022104637 A1 WO2022104637 A1 WO 2022104637A1
Authority
WO
WIPO (PCT)
Prior art keywords
category
sample frame
character
frame
sample
Prior art date
Application number
PCT/CN2020/130074
Other languages
English (en)
French (fr)
Inventor
董双
刘志鹏
朱高
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2020/130074 priority Critical patent/WO2022104637A1/zh
Publication of WO2022104637A1 publication Critical patent/WO2022104637A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs

Definitions

  • the present application relates to the technical field of video editing, and in particular, to a video editing device, a video editing method, a movable platform, an intelligent PTZ, a readable storage medium, and a hardware device.
  • the video editing software in the related art is highly professional, requires the user to manually perform the editing work, and has a large learning cost, which is not easy for ordinary users to use.
  • the embodiments of the present application aim to solve at least one of the technical problems existing in the prior art or related technologies.
  • a first aspect of the embodiments of the present application provides a video editing apparatus.
  • a second aspect of the embodiments of the present application provides a video editing method.
  • a third aspect of the embodiments of the present application provides a movable platform.
  • a fourth aspect of the embodiments of the present application provides an intelligent cloud platform.
  • a fifth aspect of the embodiments of the present application provides a readable storage medium.
  • a sixth aspect of the embodiments of the present application provides a hardware device.
  • an embodiment of the present application provides a video editing device
  • the video editing device includes a memory, a processor, and a program or instruction stored in the memory and executable by the processor, and the processor executes the program or When instructed, it can achieve: obtain sample frames in the video material, perform image recognition on the sample frames, obtain the character age category and character action category corresponding to the sample frame; When the first condition is set, the age category of the character and the action category of the character are added as the frame label of the sample frame; the video material is edited according to the frame label.
  • an embodiment of the present application provides a video editing method, including:
  • sample frames from the video material, perform image recognition on the sample frames, and obtain the age category and action category of the character corresponding to the sample frame;
  • Video clips are clipped based on frame tags.
  • an embodiment of the present application provides a movable platform, including:
  • a first image sensor for capturing video material a memory, a processor, and a program or instruction stored in the memory and executable by the processor, when the processor executes the program or instruction, the video editing method in the second aspect is implemented.
  • an intelligent cloud platform including:
  • a second image sensor for capturing video material a memory, a processor, and a program or instruction stored in the memory and executable by the processor, when the processor executes the program or instruction, the video editing method in the second aspect is implemented.
  • an embodiment of the present application provides a readable storage medium, in which a program or an instruction is stored, and when the program or instruction is executed by a processor, the steps of the video editing method in the second aspect are implemented.
  • an embodiment of the present application provides a hardware device, where the hardware device includes:
  • the user input module is used to receive user input; the network module is used to access the network and perform data command interaction with the terminal device and/or server; the output module is used to output sound information, image information, vibration information, and network signals One or more of; a power supply module for supplying power to a hardware device; a memory, a processor, and a program or instruction stored on the memory and executed by the processor, when the processor executes the program or instruction, the second aspect is implemented video editing method.
  • the video editing apparatus can automatically extract sample frames from the video material after the user shoots the video material, and obtain the age category and the action category of the person in the sample frame through image recognition technology.
  • the age categories of characters may include infants, childhood, youth, and old age, etc.
  • actions may include general actions such as walking, standing, and bending over, as well as sports actions such as running, jumping, and rolling.
  • the recognition accuracy of the character age category and the character action category is not guaranteed, and the matching results of the character age category and the character action category can be verified, thereby reducing misrecognition.
  • the age category of the current character is "elderly”
  • the action category of the same person is recognized to be “backflip”, but according to general common sense, the elderly do not and cannot do “backflip”.
  • This kind of difficult and violent sports action at this time, it is determined that the age category of the character does not match the action category of the character, and the action category or age category of the character is re-identified.
  • the video material can be automatically edited according to the content of the frame labels.
  • the system can automatically determine the frame label of the sample frame, and automatically complete the editing according to the user's needs, or according to the template selected by the user, without the need for the user to master professional skills, and without paying additional learning costs , which implements video clips that are easy for general users to use.
  • FIG. 1 shows a structural block diagram of a video editing apparatus according to an embodiment of the present application
  • FIG. 2 shows one of the flowcharts of a video editing method according to an embodiment of the present application
  • FIG. 3 shows the second flowchart of the video editing method according to the embodiment of the present application
  • Fig. 4 shows the third flowchart of the video editing method according to the embodiment of the present application.
  • Fig. 5 shows the fourth flowchart of the video editing method according to the embodiment of the present application.
  • Fig. 6 shows the fifth flowchart of the video editing method according to the embodiment of the present application.
  • Fig. 7 shows the sixth flowchart of the video editing method according to the embodiment of the present application.
  • Fig. 8 shows the seventh flowchart of the video editing method according to the embodiment of the present application.
  • Fig. 9 shows the eighth flow chart of the video editing method according to the embodiment of the present application.
  • Fig. 10 shows the ninth flow chart of the video editing method according to the embodiment of the present application.
  • FIG. 11 shows a tenth flow chart of a video editing method according to an embodiment of the present application.
  • FIG. 12 shows the eleventh flowchart of the video editing method according to the embodiment of the present application.
  • FIG. 13 shows a twelfth flowchart of a video editing method according to an embodiment of the present application
  • FIG. 14 shows a structural block diagram of a mobile platform according to an embodiment of the present application.
  • FIG. 15 shows a structural block diagram of an intelligent cloud platform according to an embodiment of the present application.
  • FIG. 16 shows a structural block diagram of a hardware device according to an embodiment of the present application.
  • the following describes a video editing apparatus, a video editing method, a movable platform, an intelligent cloud platform, a readable storage medium, and a hardware device according to some embodiments of the present specification with reference to FIG. 1 to FIG. 16 .
  • FIG. 1 shows a structural block diagram of a video editing apparatus according to an embodiment of the present application.
  • the video editing apparatus 100 includes a memory 102, a processor 104, and is stored on the memory 102 and can be accessed by The program or instruction executed by the processor 104, when the processor 104 executes the program or instruction, it realizes:
  • sample frames from the video material, perform image recognition on the sample frames, and obtain the age category and action category of the character corresponding to the sample frame;
  • Video clips are clipped based on frame tags.
  • the video editing apparatus may automatically extract sample frames from the video material after the user shoots the video material, and obtain the age category and the action category of the person in the sample frame through image recognition technology.
  • the age categories of characters may include infants, childhood, youth, and old age, etc., and correspondingly, actions may include general actions such as walking, standing, and bending over, as well as sports actions such as running, jumping, and rolling.
  • the recognition accuracy of the character age category and the character action category is not guaranteed, and the matching results of the character age category and the character action category can be verified, thereby reducing misrecognition. Specifically, when the verification result satisfies the preset first condition, it is considered that the character action category matches the character age category.
  • the age category of the current character is identified as "older”
  • the action category of the same character is identified as "backflip".
  • the elderly cannot or cannot perform the action of "backflip”.
  • the identified character age category and character action category can be compared Added as the frame label of the current sample frame.
  • the video material can be automatically edited according to the content of the frame labels.
  • the system can automatically determine the frame label of the sample frame, and automatically complete the editing according to the user's needs, or according to the template selected by the user, without the need for the user to master professional skills, and without paying additional learning costs , which implements video clips that are easy for general users to use.
  • the process of obtaining sample frames in the video material includes:
  • one or more video materials are determined according to a user's selection operation.
  • the video material may be a video obtained by a user through a mobile shooting device or other equipment, a video saved in the storage space of the body, or a video stored in the cloud. Do limit.
  • frame extraction processing is performed on the video material according to a preset frame extraction rule, and a sequence of multiple sample frames distributed at a certain interval is obtained.
  • the frame drawing rules can be set according to the video frame rate, the device performance of the current editing device, and user settings. For example, when the frame rate of the original video material is 24 frames/second, the frame extraction rule may be 4 frames/second, that is, 1 frame is extracted after every 5 frames. For example, when the frame rate of the original video material is 60 frames per second, the frame extraction rule may also be 10 frames per second or 12 frames per second.
  • the above frame drawing rule can be set freely in the background program or the foreground receiving a user's setting instruction, and the frame drawing rule is not limited in this embodiment of the present application. It should be understood that the frame rate of the extracted frame is generally not greater than the original frame rate of the video.
  • a time stamp can be added to each sample frame, on the one hand, to ensure the continuity of the sample frame sequence, On the other hand, it provides a basis for the sequence of the composite video when editing.
  • the process of performing image recognition on the sample frame further includes:
  • the character expression category includes, for example, smiling, laughing, grimace, etc., and may also include anger, crying, and the like.
  • the video editing device After adding the character expression category to the frame label of the sample frame, the video editing device allows the user to automatically edit the video material through "expression". For example, if a user wants to get a video of a child laughing, he can extract and re-create the frame segment in the video material where the age category of the character is "infant" and the expression of the character is not “smiling” or “laughing”. Synthesize, and then obtain the clip video of the children's laughter collection, so as to meet the user's clipping needs.
  • the process of editing the video material according to the frame tag includes:
  • the target category includes at least one of a target expression category, a target age category, and a target action category.
  • the video obtained by editing in order to reduce the computational pressure and at the same time ensure that the video obtained by editing is more complete and coherent, first determine the corresponding (video) segment in the video material through the sample frame, and according to the frame label of the sample frame, determine the corresponding (video) segment in the video material.
  • the above-mentioned segment is marked to form a segment label corresponding to the segment.
  • the video clip is edited according to the clip label and the preset target category.
  • the target categories include target expression categories, target age categories and/or target action categories.
  • the age category of the character is "Youth”
  • the action category of the character is "standing”
  • the expression category of the character is "smile”
  • the clips containing the target category can be edited, which can improve the completion and coherence of the edited video, while reducing the computational pressure.
  • the process of realizing the editing of the video material according to the frame label and the target category of the sample frame includes:
  • the segment tag of the segment includes the target category, the segment is determined as the segment to be edited;
  • the to-be-edited segments are combined according to the time sequence of the time stamps of the sample frames corresponding to the to-be-edited segments to complete the editing.
  • the corresponding target category is determined.
  • segment tags of a segment include “smile”, “childhood”, and “jump”, and the target category is "jump”, it is determined that the segment tags of the segment include the above-mentioned target categories.
  • All clips including the target category are determined as clips to be edited, and each clip to be edited contains at least one sample frame, and the sample frame is marked with a timestamp, so according to the timestamp of the sample frame, it can be determined that all the clips to be edited can be determined.
  • the sequence of the clips, and the clips are combined according to the sequence, and the final clip video is obtained to complete the editing work.
  • the frame label of the sample frame is implemented, and the process of labeling the segment corresponding to the sample frame in the video material includes: :
  • the category is the same, determining a set of all frames between the first sample frame and the second sample frame as a segment;
  • each sample frame after the frame label of each sample frame is determined, if there are two adjacent sample frames, they are recorded as the first sample frame and the second sample frame, and the first sample frame and the third sample frame are recorded as the first sample frame and the second sample frame.
  • the frame label of the two-sample frame if at least one of the character expression category, character age category and character action category is the same, then the first sample frame and the second sample frame, and the non-identical difference between the above two sample frames A collection of sample frames, identified as a segment, and labeled with the same "category" as the segment label for that segment.
  • the frame labels of the first sample frame include “youth”, “smile” and “standing”, and the frame labels of the second sample frame include “toddler”, “smile” and “rolling”, wherein the first The same category in this frame and the second sample frame is "smile", then the first sample frame, the second sample frame and multiple non-sample frames between them are determined as a segment, and the segment label of the segment is for "smile".
  • the frame label of the sample frame is implemented, and the process of labeling the segment corresponding to the sample frame in the video material includes: :
  • the third sample frame and the fourth sample frame correspond to any one of the character expression category, character age category, and character action category.
  • the number of frames in the interval between the third sample frame and the fourth sample frame is less than the preset frame number threshold, the set of all frames between the third sample frame and the fourth sample frame is determined as a segment; Frame tags are marked as fragment tags.
  • the third sample frame and the fourth sample frame After the frame label of each sample frame is determined, if there are two sample frames, they are denoted as the third sample frame and the fourth sample frame.
  • the character expression category, character age category, and character action category is the same, and at the same time meets the number of frames between the third sample frame and the fourth sample frame (which may include non-sample frames or both The sample frame and non-sample frame) are less than the preset frame number threshold, then the third sample frame and the fourth sample frame, and the set of non-sample frames between the above two sample frames are determined as a segment, and the The same "category" tag is the fragment label for that fragment.
  • the frame number threshold can be set based on big data analysis, or can be freely adjusted according to the user's usage habits, such as 25, 50, 60, etc. This embodiment of the present application does not limit the specific value of the frame number threshold.
  • the frame labels of the third sample frame include “youth” and “standing”
  • the frame labels of the fourth sample frame include “toddler”, “smile” and “standing”
  • the third sample frame and the fourth sample frame The same category in the frame is "standing”
  • the number of frames between the third sample frame and the fourth sample frame is 14, which is less than the current preset frame number threshold of 25, then the third sample frame, the fourth sample frame and all frames in between are identified as a segment, and the segment label of the segment is "standing".
  • the process of performing image recognition on the sample frame further includes:
  • the frame label of the sample frame is used, and the process of marking the segment corresponding to the sample frame in the video material includes:
  • the sample frame includes the fifth sample frame and the sixth sample frame
  • the characters corresponding to the fifth sample frame and the sixth sample frame are the same person and the object is the same object
  • the fifth sample frame and the sixth sample frame The set of all frames in between is determined as a segment.
  • an object that interacts with the character can be further acquired.
  • the chair the character sits on the racket, ball, skipping rope and other sports equipment that the character touches, or the clothes the character wears.
  • the fifth sample frame and the sixth sample frame If there are two sample frames, recorded as the fifth sample frame and the sixth sample frame, the characters included are the same character, and the objects interacting with the characters are the same objects, then the fifth sample frame and the sixth sample frame The set of all frames between frames, identified as a segment.
  • the segment label corresponding to the determined segment may include the entire content of the frame label of the fifth sample frame, and simultaneously include the entire content of the frame label of the sixth sample frame.
  • the characters in the two sample frames are a unified character by means of face and human body recognition. After face recognition, it is determined that the characters included in the fifth sample frame and the sixth sample frame are all "Xiao Ming", and in the fifth sample frame, the object that "Xiao Ming" interacts with is a basketball, and in the sixth sample frame, "Xiao Ming" interacts with the object. If the object is also a basketball, the fifth sample frame, the sixth sample frame, and all frames in between are determined as one segment. The frame label content of the fifth sample frame and the frame label content of the sixth sample frame are also set as the segment label of the segment.
  • the above matching result includes a matching degree.
  • the processor 104 further implements the following steps when executing the program or instruction:
  • the matching degree is greater than the preset matching degree threshold, it is determined that the matching result satisfies the first condition.
  • a preset matching database can be used to obtain the matching degree of the currently recognized character age category and the character action category, and according to the matching degree
  • the comparison result between the degree and the matching degree threshold is used to determine whether the current matching result satisfies the above-mentioned first condition.
  • comparison data of each age category of characters and each kind of person movement category are pre-stored. For example, for sports actions, “Youth” has the highest match, followed by “Childhood”, “Youth” again, and “Old” the lowest match. Depending on the action category, each age group can also have a different match.
  • the setting of the above matching degree can be defined by the designer, or can be set according to big data analysis.
  • the embodiments of the present application do not limit the specific form and content of the matching database.
  • the matching degree threshold may be set according to the matching database, and the specific numerical range of the matching degree threshold is not limited in this embodiment of the present application.
  • the matching result includes a corresponding relationship between a character age category and a character action category
  • the character age category includes a childhood category, a childhood category, a youth category, a middle-aged category, and an elderly category
  • the first condition includes: the character The age category is childhood category, youth category, or middle-aged category; or the character age category is the elderly category, and the character action category is the first action category; or the character age category is the juvenile category, and the character action category is the second action category.
  • the age category of the character is the childhood category or the youth category
  • the first condition is met. Specifically, since people have the strongest athletic ability when they are in childhood or youth, it is reasonable to make any actions at the current age. Therefore, when the age category of the character is the childhood category or the youth category, it can be determined that The matching result meets the first condition.
  • the age category of the characters is the elderly category
  • the elderly due to their relatively high age, the elderly are generally not and are not suitable to make some exaggerated and large-scale actions. Therefore, the action that may be made by the elderly is set as the first action category.
  • the age category of the character is the elderly category
  • the action category of the character is the first action category, it is determined that the matching result meets the first condition, otherwise the matching result is determined not to be. meet the first condition.
  • the character category When the character category is infancy, young children have good physical flexibility, but because they are not fully developed and untrained, they are generally unable to perform some skillful movements, such as skipping ropes and handstands. Therefore, some regular actions and non-skilled actions are divided into the second action category. If the age category of the character is the juvenile category, it is further judged whether the current action belongs to the second action category. If the judgment result is yes, it can be determined that the matching result meets the first action category condition.
  • the action categories include non-exercise actions and sports actions, wherein the first action category includes: non-exercise actions, sports actions with a motion speed less than a preset speed threshold, and motion amplitudes less than a predetermined speed threshold.
  • the second action category includes: non-exercise actions and sports actions with a movement speed less than the speed threshold.
  • the motion range and motion speed of the motion-type action can be identified, so as to determine whether it is the first action category or the second action category, thereby reducing the amount of calculation and improving the identification speed.
  • both the first action category and the second action category include non-exercise categories.
  • the motion categories they are distinguished according to motion speed and motion amplitude.
  • the first movement category corresponding to the elderly category it can be understood that the movements of the elderly are generally slower, and the flexibility of the joints of the elderly will be reduced, so the general movement range will be relatively small. Therefore, an action whose movement speed is lower than the preset speed threshold and at the same time the movement amplitude is smaller than the preset amplitude threshold is set as the first action category, that is, if the movement speed is greater than the above speed threshold, or the movement amplitude is greater than the above When the amplitude threshold is satisfied, it is determined that the action does not belong to the first action category.
  • the range of motion may include the maximum distance traveled by a limb from one position to another, and may also include the range of specific actions such as raising hands, bending over, bending knees, etc.
  • the specific content of the range of motion is not described in this embodiment of the present application. limited.
  • the movement speed may include the movement speed of the human body, and may also include the swing speed of the limbs, and the specific content of the movement speed is not limited in this embodiment of the present application.
  • the processor 104 executes the program or the instruction, image recognition is performed on the sample frame, and the character expression category, character age category and character corresponding to the sample frame are determined.
  • the course of action category includes:
  • the human body information is detected through the second neural network model to obtain the original human action category;
  • the continuity of the original character action categories corresponding to the continuous preset number of sample frames is verified to obtain the verification result
  • the original character action category is determined as the character action category.
  • a neural network model obtained by machine learning can be used to identify the character expression category, the character age category, and the character action category in the sample frame.
  • the sample frame is the input of the image recognition model.
  • the image recognition model first recognizes the face, intercepts the face information, and recognizes the human body, and intercepts the human body information. .
  • the human image is "intercepted” from the complete frame image, and refined into face information (face image) and human body information (limb image).
  • the face information is input into the first neural network model to perform face detection and recognition, and based on artificial intelligence analysis, the corresponding facial expression category and the corresponding age category of the person are obtained.
  • the human body information is input into the second neural network model to obtain the corresponding original character action category.
  • the time stamp in the continuous preset number of sample frames, the continuity of the age category and the action category of the characters identified by the first neural network model and the second neural network model is verified, and based on the verification.
  • the result satisfies the second condition, it is determined that the action category of the original character obtained by the action recognition is accurate.
  • continuity verification is performed on consecutive sample frame 1, sample frame 2, and sample frame 3.
  • the character actions in the three consecutive sample frames should be continuous and non-jumping, such as the above-mentioned three consecutive sample frames of the character action category of the same, or continuous.
  • these three sample frames may be the process of the person being photographed taking off. The above process is continuous, so it is judged that the second condition is satisfied.
  • the verification result includes a continuity score; and when the continuity score is greater than or equal to a preset continuity threshold, it is determined that the verification result satisfies the second condition.
  • a continuity score can be set to judge whether the actions are continuous.
  • the continuity score value can be set for the connected different action categories.
  • the continuity score may be determined according to a preset table look-up table. For example, the continuity score between “lying down” and “jumping” is low, such as set to 1 point. The continuity score between “knee bending" and “standing”, and the continuity score between "standing” and “jumping” are higher, for example, set to 10 points.
  • the continuity threshold can be specifically determined according to the above-mentioned setting of the continuity score and the number of sample frames selected in the continuity verification. For example, in the case of sampling 3 frames according to the above score setting method, the continuity threshold may be set to 8 to 12.
  • the original character action category of sample frame 1 is kneeling
  • the original character action category of sample frame 2 is also the first case of standing
  • the original character action category of sample frame 3 is the case of jumping
  • the continuity score of the first case is greater than the continuity threshold, the first case satisfies the second condition, and the second case does not satisfy the second condition.
  • the processor 104 further implements the following steps when executing the program or instruction:
  • the verification result includes: in the preset number of sample frames, the position change degree between the positions of the human limbs corresponding to any two sample frames;
  • image recognition may also be performed on a continuous preset number of sample frames, so as to determine the position of a person's limb corresponding to each sample frame.
  • the position of the person's limbs may represent the position of the person being photographed and the pose of the person.
  • the user's posture and position will not "mutate". For example, the character in the previous frame is still standing naturally with his hands drooping, and the next frame will immediately mutate. Into an inverted state, this situation is obviously not in line with common sense.
  • the age category of the character and/or the action category of the character corresponding to the sample frame is re-acquired.
  • the matching result does not satisfy the first condition, it means that the obtained age category of the character or one of the obtained character action categories does not conform to the actual situation. Therefore, the age category of the character corresponding to the current sample frame can be re-acquired, and the action category of the character corresponding to the current sample frame can be re-acquired simultaneously or separately, thereby avoiding inaccurate editing results caused by misrecognition, and effectively improving the use of the video editing device. experience.
  • the step of performing image recognition on the sample frame is performed again.
  • the matching result does not satisfy the first condition and also does not satisfy the second condition at the same time, it means that there is a relatively large deviation in the image recognition result of the sample frame.
  • Action recognition results may be unreliable. Therefore, re-executing the step of performing image recognition on the sample frame can effectively improve the accuracy of image recognition, thereby improving the accuracy of automatic editing.
  • FIG. 2 shows one of the flowcharts of a video editing method according to an embodiment of the present application.
  • the video editing method may include the following steps:
  • Step 202 Obtain sample frames from the video material, perform image recognition on the sample frames, and obtain a character age category and a character action category corresponding to the sample frames;
  • Step 204 obtaining the matching result of the character age category and the character action category, when the matching result satisfies the preset first condition, adding the character age category and the character action category as the frame label of the sample frame;
  • Step 206 Edit the video material according to the frame tag.
  • the video editing apparatus may automatically extract sample frames from the video material after the user shoots the video material, and obtain the age category and the action category of the person in the sample frame through image recognition technology.
  • the age categories of characters may include infants, childhood, youth, and old age, etc., and correspondingly, actions may include general actions such as walking, standing, and bending over, as well as sports actions such as running, jumping, and rolling.
  • the recognition accuracy of the character age category and the character action category is not guaranteed, and the matching results of the character age category and the character action category can be verified, thereby reducing misrecognition. Specifically, when the verification result satisfies the preset first condition, it is considered that the character action category matches the character age category.
  • the age category of the current character is identified as "older”
  • the action category of the same character is identified as "backflip".
  • the elderly cannot or cannot perform the action of "backflip”.
  • the identified character age category and character action category can be compared Added as the frame label of the current sample frame.
  • the video material can be automatically edited according to the content of the frame labels.
  • the system can automatically determine the frame label of the sample frame, and automatically complete the editing according to the user's needs, or according to the template selected by the user, without the need for the user to master professional skills, and without paying additional learning costs , which implements video clips that are easy for general users to use.
  • FIG. 3 shows the second flowchart of a video editing method according to an embodiment of the present application.
  • the process of acquiring sample frames in a video material includes the following steps:
  • Step 302 in response to the selection operation, determine at least one video material
  • Step 304 performing frame extraction processing on the video material to obtain corresponding sample frames
  • Step 306 adding a timestamp to the sample frame according to the time axis of the video material.
  • one or more video materials are determined according to a user's selection operation.
  • the video material may be a video obtained by a user through a mobile shooting device or other equipment, a video saved in the storage space of the body, or a video stored in the cloud. Do limit.
  • frame extraction processing is performed on the video material according to a preset frame extraction rule, and a sequence of multiple sample frames distributed at a certain interval is obtained.
  • the frame drawing rules can be set according to the video frame rate, the device performance of the current editing device, and user settings. For example, when the frame rate of the original video material is 24 frames/second, the frame extraction rule may be 4 frames/second, that is, 1 frame is extracted after every 5 frames. For example, when the frame rate of the original video material is 60 frames per second, the frame extraction rule may also be 10 frames per second or 12 frames per second.
  • the above frame drawing rule can be set freely in the background program or the foreground receiving a user's setting instruction, and the frame drawing rule is not limited in this embodiment of the present application. It should be understood that the frame rate of the extracted frame is generally not greater than the original frame rate of the video.
  • FIG. 4 shows the third flowchart of a video editing method according to an embodiment of the present application. Specifically, the process of performing image recognition on the sample frame further includes the following steps:
  • Step 402 perform image recognition on the sample frame, and obtain the character expression category corresponding to the sample frame;
  • Step 404 adding the character expression category to the frame label.
  • the character expression category includes, for example, smiling, laughing, grimace, etc., and may also include anger, crying, and the like.
  • the video editing device After adding the character expression category to the frame label of the sample frame, the video editing device allows the user to automatically edit the video material through "expression". For example, if a user wants to get a video of a child laughing, he can extract and re-create the frame segment in the video material where the age category of the character is "infant" and the expression of the character is not “smiling” or “laughing”. Synthesize, and then obtain the clip video of the children's laughter collection, so as to meet the user's clipping needs.
  • FIG. 5 shows the fourth flowchart of the video editing method according to the embodiment of the present application.
  • the process of editing video material according to the frame tag includes the following steps:
  • Step 502 Mark the segment corresponding to the sample frame in the video material by using the frame label of the sample frame to obtain the corresponding segment label;
  • Step 504 Edit the video clip according to the clip label and the preset target category.
  • the target category includes at least one of a target expression category, a target age category, and a target action category.
  • the video obtained by editing in order to reduce the computational pressure and at the same time ensure that the video obtained by editing is more complete and coherent, first determine the corresponding (video) segment in the video material through the sample frame, and according to the frame label of the sample frame, determine the corresponding (video) segment in the video material.
  • the above-mentioned segment is marked to form a segment label corresponding to the segment.
  • the video clip is edited according to the clip label and the preset target category.
  • the target categories include target expression categories, target age categories and/or target action categories.
  • the age category of the character is "Youth”
  • the action category of the character is "standing”
  • the expression category of the character is "smile”
  • the clips containing the target category can be edited, which can improve the completion and coherence of the edited video, while reducing the computational pressure.
  • FIG. 6 shows the fifth flowchart of the video editing method according to the embodiment of the present application.
  • the process of editing video material according to the frame label of the sample frame and the target category includes the following: step:
  • Step 602 when the segment label of the segment includes the target category, the segment is determined as the segment to be edited;
  • Step 604 Combine the segments to be edited according to the time sequence of the timestamps of the sample frames corresponding to the segments to be edited to complete the editing.
  • the corresponding target category is determined.
  • segment tags of a segment include “smile”, “childhood”, and “jump”, and the target category is "jump”, it is determined that the segment tags of the segment include the above-mentioned target categories.
  • All clips including the target category are determined as clips to be edited, and each clip to be edited contains at least one sample frame, and the sample frame is marked with a timestamp, so according to the timestamp of the sample frame, it can be determined that all the clips to be edited can be determined.
  • the sequence of the clips, and the clips are combined according to the sequence, and the final clip video is obtained to complete the editing work.
  • FIG. 7 shows the sixth flowchart of a video editing method according to an embodiment of the present application. Specifically, the segment corresponding to the sample frame in the video material is marked by the frame label of the sample frame.
  • the process includes the following steps:
  • Step 702 when the sample frame includes the first sample frame and the second sample frame, the first sample frame and the second sample frame correspond to any one of the character expression category, character age category and character action category.
  • the set of all frames between the first sample frame and the second sample frame is determined as a segment;
  • Step 704 Mark the frame label corresponding to any category as a segment label.
  • each sample frame after the frame label of each sample frame is determined, if there are two adjacent sample frames, they are recorded as the first sample frame and the second sample frame, and the first sample frame and the third sample frame are recorded as the first sample frame and the second sample frame.
  • the frame label of the two-sample frame if at least one of the character expression category, character age category and character action category is the same, then the first sample frame and the second sample frame, and the non-identical difference between the above two sample frames A collection of sample frames, identified as a segment, and labeled with the same "category" as the segment label for that segment.
  • the frame labels of the first sample frame include “youth”, “smile” and “standing”, and the frame labels of the second sample frame include “toddler”, “smile” and “rolling”, wherein the first The same category in this frame and the second sample frame is "smile", then the first sample frame, the second sample frame and multiple non-sample frames between them are determined as a segment, and the segment label of the segment is for "smile".
  • FIG. 8 shows the seventh flowchart of a video editing method according to an embodiment of the present application. Specifically, the segment corresponding to the sample frame in the video material is marked by the frame label of the sample frame.
  • the process includes the following steps:
  • Step 802 when the sample frame includes the third sample frame and the fourth sample frame, it satisfies that the category of any one of the character expression category, character age category and character action category corresponding to the third sample frame and the fourth sample frame is the same, And when the number of frames that satisfies the interval between the third sample frame and the fourth sample frame is less than the preset frame number threshold, the set of all frames between the third sample frame and the fourth sample frame is determined as a segment;
  • Step 804 Mark the frame label corresponding to any category as a segment label.
  • the third sample frame and the fourth sample frame After the frame label of each sample frame is determined, if there are two sample frames, they are denoted as the third sample frame and the fourth sample frame.
  • the character expression category, character age category, and character action category is the same, and at the same time meets the number of frames between the third sample frame and the fourth sample frame (which may include non-sample frames or both The sample frame and non-sample frame) are less than the preset frame number threshold, then the third sample frame and the fourth sample frame, and the set of non-sample frames between the above two sample frames are determined as a segment, and the The same "category" tag is the fragment label for that fragment.
  • the frame number threshold can be set based on big data analysis, or can be freely adjusted according to the user's usage habits, such as 25, 50, 60, etc. This embodiment of the present application does not limit the specific value of the frame number threshold.
  • the frame labels of the third sample frame include “youth” and “standing”
  • the frame labels of the fourth sample frame include “toddler”, “smile” and “standing”
  • the third sample frame and the fourth sample frame The same category in the frame is "standing”
  • the number of frames between the third sample frame and the fourth sample frame is 14, which is less than the current preset frame number threshold of 25, then the third sample frame, the fourth sample frame and all frames in between are identified as a segment, and the segment label of the segment is "standing".
  • FIG. 9 shows the eighth flow chart of the video editing method according to the embodiment of the present application. Specifically, the video editing method further includes the following steps:
  • Step 902 performing image recognition on the sample frame, and obtaining the character corresponding to the sample frame and the object interacting with the character;
  • Step 904 when the sample frame includes the fifth sample frame and the sixth sample frame, and it satisfies that the characters corresponding to the fifth sample frame and the sixth sample frame are the same person and the object is the same object, the fifth sample frame and the sixth sample frame are combined.
  • the set of all frames between the six sample frames is determined as a segment.
  • an object that interacts with the character can be further acquired.
  • the chair the character sits on the racket, ball, skipping rope and other sports equipment that the character touches, or the clothes the character wears.
  • the fifth sample frame and the sixth sample frame If there are two sample frames, recorded as the fifth sample frame and the sixth sample frame, the characters included are the same character, and the objects interacting with the characters are the same objects, then the fifth sample frame and the sixth sample frame The set of all frames between frames, identified as a segment.
  • the segment label corresponding to the determined segment may include the entire content of the frame label of the fifth sample frame, and simultaneously include the entire content of the frame label of the sixth sample frame.
  • the characters in the two sample frames are a unified character by means of face and human body recognition. After face recognition, it is determined that the characters included in the fifth sample frame and the sixth sample frame are all "Xiao Ming", and in the fifth sample frame, the object that "Xiao Ming" interacts with is a basketball, and in the sixth sample frame, "Xiao Ming" interacts with the object. If the object is also a basketball, the fifth sample frame, the sixth sample frame, and all frames in between are determined as one segment. The frame label content of the fifth sample frame and the frame label content of the sixth sample frame are also set as the segment label of the segment.
  • the above-mentioned matching result includes a matching degree
  • FIG. 10 shows the ninth flow chart of the video editing method according to the embodiment of the present application. Specifically, the video editing method further includes the following steps:
  • Step 1002 according to the character age category and the character action category, obtain the corresponding matching degree in the preset matching database;
  • Step 1004 when the matching degree is greater than a preset matching degree threshold, determine that the matching result satisfies the first condition.
  • a preset matching database can be used to obtain the matching degree of the currently recognized character age category and the character action category, and according to the matching degree
  • the comparison result between the degree and the matching degree threshold is used to determine whether the current matching result satisfies the above-mentioned first condition.
  • the comparison data of each age category of the person and each category of the person's movement are pre-stored. For example, for sports actions, “Youth” has the highest match, followed by “Childhood”, “Youth” again, and “Old” the lowest match. Depending on the action category, each age group can also have a different match.
  • the setting of the above matching degree can be defined by the designer, or can be set according to big data analysis.
  • the embodiments of the present application do not limit the specific form and content of the matching database.
  • the matching degree threshold may be set according to the matching database, and the specific numerical range of the matching degree threshold is not limited in this embodiment of the present application.
  • the matching result includes a correspondence between a character age category and a character action category, and the character age category includes a childhood category, a childhood category, a youth category, a middle-aged category, and an elderly category; and the first condition includes: a character The age category is childhood category, youth category, or middle-aged category; or the character age category is the elderly category, and the character action category is the first action category; or the character age category is the juvenile category, and the character action category is the second action category.
  • the age category of the character is the childhood category or the youth category
  • the first condition is met. Specifically, since people have the strongest athletic ability when they are in childhood or youth, it is reasonable to make any actions at the current age. Therefore, when the age category of the character is the childhood category or the youth category, it can be determined that The matching result meets the first condition.
  • the age category of the characters is the elderly category
  • the elderly due to their relatively high age, the elderly are generally not and are not suitable to make some exaggerated and large-scale actions. Therefore, the action that may be made by the elderly is set as the first action category.
  • the age category of the character is the elderly category
  • the action category of the character is the first action category, it is determined that the matching result meets the first condition, otherwise the matching result is determined not to be. meet the first condition.
  • the character category When the character category is infancy, young children have good physical flexibility, but because they are not fully developed and untrained, they are generally unable to perform some skillful movements, such as skipping ropes and handstands. Therefore, some regular actions and non-skilled actions are divided into the second action category. If the age category of the character is the juvenile category, it is further judged whether the current action belongs to the second action category. If the judgment result is yes, it can be determined that the matching result meets the first category. condition.
  • the action categories include non-exercise actions and sports actions, wherein the first action category includes: non-exercise actions, sports actions with a motion speed less than a preset speed threshold, and motion amplitudes less than a predetermined speed threshold.
  • the second action category includes: non-exercise actions and sports actions with a movement speed less than the speed threshold.
  • the motion range and motion speed of the motion-type action can be identified, so as to determine whether it is the first action category or the second action category, thereby reducing the amount of calculation and improving the identification speed.
  • both the first action category and the second action category include non-exercise categories.
  • the motion categories they are distinguished according to motion speed and motion amplitude.
  • the first action category corresponding to the elderly category it can be understood that the movements of the elderly are generally slower, and the flexibility of the joints of the elderly will be reduced, so the general movement range will be relatively small. Therefore, an action whose movement speed is lower than the preset speed threshold and at the same time the movement amplitude is smaller than the preset amplitude threshold is set as the first action category, that is, if the movement speed is greater than the above speed threshold, or the movement amplitude is greater than the above When the amplitude threshold is satisfied, it is determined that the action does not belong to the first action category.
  • the range of motion may include the maximum distance traveled by a limb from one position to another, and may also include the range of specific actions such as raising hands, bending over, bending knees, etc.
  • the specific content of the range of motion is not described in this embodiment of the present application. limited.
  • the movement speed may include the movement speed of the human body, and may also include the swing speed of the limbs, and the specific content of the movement speed is not limited in this embodiment of the present application.
  • FIG. 11 shows a tenth flow chart of a video editing method according to an embodiment of the present application. Specifically, image recognition is performed on sample frames to determine the character expression category and character age corresponding to the sample frames.
  • the process for categories and character action categories includes the following steps:
  • Step 1102 input the sample frame into the image recognition model, and identify the face information and human body information included in the sample frame through the image recognition model;
  • Step 1104 detect the face information through the first neural network model to obtain the character expression category and the character age category;
  • Step 1106 detect the human body information through the second neural network model to obtain the original character action category
  • Step 1108 according to the time stamp, verify the continuity of the original character action categories corresponding to the continuous preset number of sample frames to obtain the verification result;
  • Step 1110 when the verification result satisfies the preset second condition, determine the original character action category as the character action category.
  • a neural network model obtained by machine learning can be used to identify the character expression category, the character age category, and the character action category in the sample frame.
  • the sample frame is the input of the image recognition model.
  • the image recognition model first recognizes the face, intercepts the face information, and recognizes the human body, and intercepts the human body information. .
  • the human image is "intercepted” from the complete frame image, and refined into face information (face image) and human body information (limb image).
  • the face information is input into the first neural network model to perform face detection and recognition, and based on artificial intelligence analysis, the corresponding facial expression category and the corresponding age category of the person are obtained.
  • the human body information is input into the second neural network model to obtain the corresponding original character action category.
  • the time stamp in the continuous preset number of sample frames, the continuity of the age category and the action category of the characters identified by the first neural network model and the second neural network model is verified, and based on the verification.
  • the result satisfies the second condition, it is determined that the action category of the original character obtained by the action recognition is accurate.
  • continuity verification is performed on consecutive sample frame 1, sample frame 2, and sample frame 3.
  • the character actions in the three consecutive sample frames should be coherent and non-jumping, such as the above-mentioned three consecutive sample frames of the character action category of the same or continuous.
  • these three sample frames may be the process of the person being photographed taking off. The above process is continuous, so it is judged that the second condition is satisfied.
  • the verification result includes a continuity score; and when the continuity score is greater than or equal to a preset continuity threshold, it is determined that the verification result satisfies the second condition.
  • a continuity score can be set to judge whether the actions are continuous.
  • the continuity score value can be set for the connected different action categories.
  • the continuity score may be determined according to a preset table look-up table. For example, the continuity score between “lying down” and “jumping” is low, such as set to 1 point. The continuity score between “knee bending" and “standing”, and the continuity score between "standing” and “jumping” are higher, for example, set to 10 points.
  • the continuity threshold can be specifically determined according to the above-mentioned setting of the continuity score and the number of sample frames selected in the continuity verification. For example, in the case of sampling 3 frames according to the above score setting method, the continuity threshold may be set to 8 to 12.
  • the original character action category of sample frame 1 is kneeling
  • the original character action category of sample frame 2 is also the first case of standing
  • the original character action category of sample frame 3 is the case of jumping
  • the continuity score of the first case is greater than the continuity threshold, the first case satisfies the second condition, and the second case does not satisfy the second condition.
  • FIG. 12 shows an eleventh flowchart of a video editing method according to an embodiment of the present application. Specifically, the video editing method further includes the following steps:
  • Step 1202 performing image recognition on a preset number of sample frames, and determining the position of the person's limbs corresponding to the preset number of sample frames;
  • Step 1204 in a preset number of sample frames, determine the position change degree between the positions of the human limbs corresponding to any two sample frames;
  • Step 1206 when the position change range is less than a preset change degree threshold, determine that the verification result satisfies the second condition.
  • image recognition may also be performed on a continuous preset number of sample frames, so as to determine the position of a person's limb corresponding to each sample frame.
  • the position of the person's body can indicate the position of the person being photographed and the pose of the person.
  • the user's posture and position will not "mutate". For example, the character in the previous frame is still standing naturally with his hands drooping, and the next frame will immediately mutate. Into an inverted state, this situation is obviously not in line with common sense.
  • the age category of the character and/or the action category of the character corresponding to the sample frame is re-acquired.
  • the matching result does not satisfy the first condition, it means that the obtained age category of the character or one of the obtained character action categories does not conform to the actual situation. Therefore, the age category of the character corresponding to the current sample frame can be re-acquired, and the action category of the character corresponding to the current sample frame can be re-acquired simultaneously or separately, thereby avoiding inaccurate editing results caused by misrecognition, and effectively improving the use of the video editing device. experience.
  • the step of performing image recognition on the sample frame is performed again.
  • the matching result does not satisfy the first condition and also does not satisfy the second condition at the same time, it means that there is a relatively large deviation in the image recognition result of the sample frame.
  • Action recognition results may be unreliable. Therefore, re-executing the step of performing image recognition on the sample frame can effectively improve the accuracy of image recognition, thereby improving the accuracy of automatic editing.
  • FIG. 13 shows the completion flowchart of the video editing method according to the embodiment of the present application.
  • the clipping method also includes the following steps:
  • Step 1302 acquiring video material
  • Step 1304 frame the video material according to 4fps/s to obtain a picture sequence
  • Step 1306A obtain the face sequence from the picture sequence through the detection module
  • Step 1306B obtain the human body sequence from the picture sequence through the detection module
  • Step 1308A through CNN classification, obtain the age label and expression label of the character
  • Step 1308B through cnn classification, obtain the posture of the current character
  • Step 1310 perform cross-validation according to the continuity of human motion
  • Step 1312 perform cross-validation according to the correlation between human actions and age
  • Step 1314 incorporate character expression, age, posture information and time stamp into the video
  • Step 1316 perform editing according to user settings.
  • Step1 The user selects all the materials that he wants to enter into the editing
  • Step2 The algorithm system will extract frames from the video selected by the user to obtain some picture sequences.
  • the picture sequence obtained by the frame extraction will be added with a timestamp, which is convenient for re-integrating the tags of characters into the video in post-processing;
  • Step3 Input the obtained image sequence into the algorithm module of detection, and you can get a face sequence and a human body sequence;
  • Step4 The upstream output face sequence will go through an expression and age recognition module, and output the character's expression and age label at the same time;
  • Step5 The human body sequence output from the upstream will go through a human body gesture recognition module to output the action of the current character;
  • Step6 According to the principle of continuity of human movements and the correlation between age and human movements, cross-validate the output results to improve the accuracy of algorithm recognition;
  • Step7 Re-integrate the information of the character's expression, age and human posture into the video
  • Step8 According to the user's preference or the selected template, edit according to the tag obtained in Step7.
  • a movable platform 1400 is provided, and FIG. 14 shows a structural block diagram of a movable platform according to an embodiment of the present application.
  • the movable platform 1400 specifically includes:
  • a first image sensor 1402 used for capturing video material
  • the movable platform 1400 is configured to implement each process of the above-mentioned video editing method embodiment when executing the program or instruction, and can achieve the same technical effect, and in order to avoid repetition, it is not repeated here.
  • a smart pan-tilt 1500 is provided, and FIG. 15 shows a structural block diagram of the smart pan-tilt according to an embodiment of the present application.
  • the smart pan-tilt 1500 specifically includes:
  • the second image sensor 1502 is used for capturing video material
  • the intelligent pan-tilt 1500 is configured to implement each process of the above-mentioned video editing method embodiments when executing programs or instructions, and can achieve the same technical effect, which is not repeated here to avoid repetition.
  • a computer-readable storage medium is provided, and a program or an instruction is stored in the readable storage medium, and when the program or instruction is executed by a processor, a video clip as described in FIG. 2 to FIG. 13 is realized. steps of the method.
  • the readable storage medium is configured to implement each process of the above video editing method embodiments when executing the program or instruction, and can achieve the same technical effect, which is not repeated here to avoid repetition.
  • a hardware device 1600 is provided, and FIG. 16 shows a structural block diagram of a hardware device according to an embodiment of the present application.
  • the hardware device 1600 includes:
  • the network module 1604 is used to access the network and perform data instruction interaction with the terminal device and/or the server;
  • an output module 1606, configured to output one or more of sound information, image information, vibration information, and network signals
  • the hardware device is configured to implement each process of the above-mentioned video editing method embodiment when executing the program or instruction, and can achieve the same technical effect, which is not repeated here to avoid repetition.
  • the hardware devices in the embodiments of the present application include the aforementioned mobile electronic devices and non-mobile electronic devices.
  • Hardware devices include: mobile phones, tablets, drones, remote controls, smart pan-tilts, laptops, PDAs, in-vehicle electronics, wearables, personal digital assistants, personal computers, and televisions.
  • the network module 1604 provides the user with wireless broadband Internet access, such as helping the user to send and receive emails, browse the web, access streaming media, and the like.
  • the user input module 1602 can be used to receive input numerical or character information, and generate key signal input related to user settings and function control of the electronic device.
  • the output module can be an audio output module, such as a speaker, etc., or a video output module, such as a liquid crystal display, an organic light-emitting diode, etc., or a vibration motor or a signal transmitting device, such as Bluetooth, radio frequency, etc.
  • an audio output module such as a speaker, etc.
  • a video output module such as a liquid crystal display, an organic light-emitting diode, etc., or a vibration motor or a signal transmitting device, such as Bluetooth, radio frequency, etc.
  • the power module 1608 can be a power source (such as a battery) that supplies power to various components, and the power source can be logically connected to the processor 1612 through a power management system, so as to manage charging, discharging, and power consumption management functions through the power management system.
  • a power source such as a battery
  • connection can be a fixed connection, a detachable connection, or an integral connection; it can be directly connected or through an intermediate medium. indirectly connected.
  • the terms “one embodiment,” “some embodiments,” “specific embodiments,” and the like mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in this specification. at least one embodiment or example of .
  • schematic representations of the above terms do not necessarily refer to the same embodiment or instance.
  • the particular features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

本申请实施例提供了一种视频剪辑装置、方法、可移动平台、云台和硬件设备,其中,视频剪辑装置包括存储器、处理器及存储在存储器上并可被处理器执行的程序或指令,处理器执行程序或指令时能够实现:在视频素材中获取样本帧,对样本帧进行图像识别,获取样本帧对应的人物年龄类别和人物动作类别;获取人物年龄类别和人物动作类别的匹配结果,当匹配结果满足预设的第一条件时,将人物年龄类别和人物动作类别添加为样本帧的帧标签;根据帧标签对视频素材进行剪辑。本申请实施例可自动对样本帧的帧标签进行确定,根据用户选择的模版自动完成剪辑,无需用户掌握专业技能,也无需付出额外的学习成本,实现了易于一般用户使用的视频剪辑。

Description

视频剪辑装置、方法、可移动平台、云台和硬件设备
版权申明
本专利文件披露的内容包含受版权保护的材料。该版权为版权所有人所有。版权所有人不反对任何人复制专利与商标局的官方记录和档案中所存在的该专利文件或者该专利披露。
技术领域
本申请涉及视频剪辑技术领域,具体而言,涉及一种视频剪辑装置、视频剪辑方法、一种可移动平台、一种智能云台、一种可读存储介质和一种硬件设备。
背景技术
移动端进行视频拍摄后,由于视频拍摄中会出现多种场景、人物的变化,内容较为复杂,因此为了得到一个符合需要的视频,用户往往对拍摄好的视频有剪辑需求。
相关技术中的视频剪辑软件专业性强,需要用户手动进行剪辑工作,且学习成本大,不易于普通用户上手使用。
因此,如何提供一种易于普通用户上手的视频剪辑方案,是目前亟待解决的技术问题。
公开内容
本申请实施例旨在至少解决现有技术或相关技术中存在的技术问题之一。
为此,本申请实施例的第一方面提出一种视频剪辑装置。
本申请实施例的第二方面提出一种视频剪辑方法。
本申请实施例的第三方面提出一种可移动平台。
本申请实施例的第四方面提出一种智能云台。
本申请实施例的第五方面提出一种可读存储介质。
本申请实施例的第六方面提出一种硬件设备。
有鉴于此,第一方面,本申请实施例提供了一种视频剪辑装置,该视频剪辑装置包括存储器、处理器及存储在存储器上并可被处理器执行的程序或指令,处理器执行程序或指令时能够实现:在视频素材中获取样本帧,对样本帧进行图像识别,获取样本帧对应的人物年龄类别和人物动作类别;获取人物年龄类别和人物动作类别的匹配结果,当匹配结果满足预设的第一条件时,将人物年龄类别和人物动作类别添加为样本帧的帧标签;根据帧标签对视频素材进行剪辑。
第二方面,本申请实施例提供了一种视频剪辑方法,包括:
在视频素材中获取样本帧,对样本帧进行图像识别,获取样本帧对应的人物年龄类别和人物动作类别;
获取人物年龄类别和人物动作类别的匹配结果,当匹配结果满足预设的第一条件时,将人物年龄类别和人物动作类别添加为样本帧的帧标签;
根据帧标签对视频素材进行剪辑。
第三方面,本申请实施例提供了一种可移动平台,包括:
第一图像传感器,用于采集视频素材;存储器、处理器及存储在存储器上并可被处理器执行的程序或指令,处理器执行程序或指令时实现如第二方面中的视频剪辑方法。
第四方面,本申请实施例提供了一种智能云台,包括:
第二图像传感器,用于采集视频素材;存储器、处理器及存储在存储器上并可被处理器执行的程序或指令,处理器执行程序或指令时实现如第二方面中的视频剪辑方法。
第五方面,本申请实施例提供了一种可读存储介质,可读存储介质中存储有程序或指令,程序或指令被处理器执行时实现如第二方面中的视频剪辑方法的步骤。
第六方面,本申请实施例提供了一种硬件设备,硬件设备包括:
用户输入模块,用于接收用户输入;网络模块,用于访问网络,以及与终端设备和/或服务器之间进行数据指令交互;输出模块,用于输出声音信息、图像信息、振动信息、网络信号中的一种或多种;电源模块,用于 为硬件设备供电;存储器、处理器及存储在存储器上并可被处理器执行的程序或指令,处理器执行程序或指令时实现如第二方面的视频剪辑方法。
在本申请实施中,视频剪辑装置可以在用户拍摄视频素材后,自动在视频素材中抽取样本帧,并通过图像识别技术获取样本帧中的人物年龄类别和人物动作类别。
具体地,人物年龄类别可以包括如幼儿、童年、青年和老年等,而相应的,动作可以包括走动、站立、弯腰等一般性动作,还包括跑、跳、翻滚等运动动作。未保证对人物年龄类别和人物动作类别的识别准确度,可对人物年龄类别和人物动作类别的匹配结果进行验证,从而减少误识别的情况。
举例来说,识别到当前人物年龄类别为“老年”,同时识别到同一人物的动作类别是“后空翻”,但是按照一般性的常识来说,老年人不会,也无法做出“后空翻”这种高难度且剧烈的运动类动作,此时判定人物年龄类别和人物动作类别不匹配,重新对人物动作类别或人物年龄类别进行识别。
而如果人物年龄类别和人物动作类别匹配,如“老人”和“走路”、“青年”和“跳跃”、“幼儿”和“翻滚”等,则将识别到的人物年龄类别和人物动作类别添加为当前样本帧的帧标签。
按照上述方法,将全部的样本帧均打上对应的帧标签后,则可以依据帧标签的内容,对视频素材进行自动剪辑。
具体地,当用户想得到“孩子”的视频时,则可将视频素材中,包含帧标签中人物年龄类别为“幼儿”的帧片段进行抽取和再合成,进而得到仅包含“幼儿”出现的剪辑视频,从而满足了用户的剪辑需求。且该过程无需用户手动操作专业软件,系统可自动对样本帧的帧标签进行确定,并按照用户需求,或根据用户选择的模版自动完成剪辑,无需用户掌握专业技能,也无需付出额外的学习成本,实现了易于一般用户使用的视频剪辑。
附图说明
图1示出了根据本申请实施例的视频剪辑装置的结构框图;
图2示出了根据本申请实施例的视频剪辑方法的流程图之一;
图3示出了根据本申请实施例的视频剪辑方法的流程图之二;
图4示出了根据本申请实施例的视频剪辑方法的流程图之三;
图5示出了根据本申请实施例的视频剪辑方法的流程图之四;
图6示出了根据本申请实施例的视频剪辑方法的流程图之五;
图7示出了根据本申请实施例的视频剪辑方法的流程图之六;
图8示出了根据本申请实施例的视频剪辑方法的流程图之七;
图9示出了根据本申请实施例的视频剪辑方法的流程图之八;
图10示出了根据本申请实施例的视频剪辑方法的流程图之九;
图11示出了根据本申请实施例的视频剪辑方法的流程图之十;
图12示出了根据本申请实施例的视频剪辑方法的流程图之十一;
图13示出了根据本申请实施例的视频剪辑方法的流程图之十二;
图14示出了根据本申请实施例的可移动平台的结构框图;
图15示出了根据本申请实施例的智能云台的结构框图;
图16示出了根据本申请实施例的硬件设备的结构框图。
具体实施方式
为了能够更清楚地理解本说明书的上述目的、特征和优点,下面结合附图和具体实施方式对本说明书进行进一步的详细描述。需要说明的是,在不冲突的情况下,本说明书的实施例及实施例中的特征可以相互组合。
在下面的描述中阐述了很多具体细节以便于充分理解本说明书,但是,本说明书还可以采用其他不同于在此描述的其他方式来实施,因此,本说明书的保护范围并不受下面公开的具体实施例的限制。
下面参照图1至图16描述根据本说明书一些实施例的视频剪辑装置、视频剪辑方法、可移动平台、智能云台、可读存储介质和硬件设备。
在本申请的一些实施例中,图1示出了根据本申请实施例的视频剪辑装置的结构框图,具体地,视频剪辑装置100包括存储器102、处理器104及存储在存储器102上并可被处理器104执行的程序或指令,处理器104执行程序或指令时实现:
在视频素材中获取样本帧,对样本帧进行图像识别,获取样本帧对应的人物年龄类别和人物动作类别;
获取人物年龄类别和人物动作类别的匹配结果,当匹配结果满足预设的第一条件时,将人物年龄类别和人物动作类别添加为样本帧的帧标签;
根据帧标签对视频素材进行剪辑。
在本申请实施例中,视频剪辑装置可以在用户拍摄视频素材后,自动在视频素材中抽取样本帧,并通过图像识别技术获取样本帧中的人物年龄类别和人物动作类别。
具体地,人物年龄类别可以包括如幼儿、童年、青年和老年等,而相应的,动作可以包括走动、站立、弯腰等一般性动作,还包括跑、跳、翻滚等运动动作。未保证对人物年龄类别和人物动作类别的识别准确度,可对人物年龄类别和人物动作类别的匹配结果进行验证,从而减少误识别的情况。具体地,当验证结果满足预设的第一条件时,则认为人物动作类别和人物年龄类别相匹配。
如果不满足第一条件,则认为其两者不匹配。举例来说,识别到当前人物年龄类别为“老年”,同时识别到同一人物的动作类别是“后空翻”,显然一般来说,老年人不会或无法做到“后空翻”的动作,此时判定人物年龄类别和人物动作类别不匹配,重新对人物动作类别或人物年龄类别进行识别。
而如果人物年龄类别和人物动作类别匹配,如“老人”和“走路”、“青年”和“跳跃”、“幼儿”和“翻滚”等,则可以将识别到的人物年龄类别和人物动作类别添加为当前样本帧的帧标签。
按照上述方法,将全部的样本帧均打上对应的帧标签后,则可以依据帧标签的内容,对视频素材进行自动剪辑。
具体地,当用户想得到“孩子”的视频时,则可将视频素材中,包含帧标签中人物年龄类别为“幼儿”的帧片段进行抽取和再合成,进而得到仅包含“幼儿”出现的剪辑视频,从而满足了用户的剪辑需求。且该过程无需用户手动操作专业软件,系统可自动对样本帧的帧标签进行确定,并按照用户需求,或根据用户选择的模版自动完成剪辑,无需用户掌握专业 技能,也无需付出额外的学习成本,实现了易于一般用户使用的视频剪辑。
在本申请的一些实施例中,图1所示的视频剪辑装置100中,处理器104执行程序或指令时实现在视频素材中获取样本帧的过程包括:
响应于选取操作,确定至少一个视频素材;
对视频素材进行抽帧处理,得到对应的样本帧;以及
根据视频素材的时间轴,为样本帧添加时间戳。
在本申请实施例中,首先根据用户的选取操作,确定一个或多个视频素材。其中,该视频素材可以是用户通过移动拍摄设备或其他设备拍摄得到的视频,也可以是本体存储空间内保存的视频,还可以是存储于云端的视频,本申请实施例对视频素材的来源不做限定。
在根据用户的选取操作确定待剪辑的视频素材后,按照预先设定的抽帧规则,对视频素材进行抽帧处理,并得到按照一定间隔分布的多个样本帧的序列。
其中,可根据视频帧率、当前剪辑设备的设备性能以及用户设定等方式,对抽帧规则进行设置。如原始的视频素材的帧率为24帧/秒时,抽帧规则可以为4帧/秒,即每间隔5帧后抽取1帧。如原始的视频素材的帧率为60帧/秒时,抽帧规则还可以为10帧/秒或12帧每秒。
上述抽帧规则可以在后台程序,或前台接收用户的设置指令来自由设置,本申请实施例对抽帧规则不做限定。应该理解的时,抽帧的帧率一般不大于视频原始帧率。
在得到对应的样本帧之后,可根据视频素材的时间轴,按照每个样本帧在视频素材中出现的时间点,在每个样本帧上添加时间戳,一方面保证样本帧序列的连续性,另一方面在剪辑时为合成视频的顺序提供依据。
在本申请的一些实施例中,图1所示的视频剪辑装置100中,处理器104执行程序或指令时实现对样本帧进行图像识别的过程还包括:
对样本帧进行图像识别,获取样本帧对应的人物表情类别;以及
将人物表情类别添加至帧标签。
在本申请实施例中,通过图像识别技术获取样本帧中的人物年龄类别和人物动作类别的同时,还可以进一步获取人物表情类别,并将人物表情 类别添加到样本帧的帧标签中,从而丰富自动剪辑的方式。
具体地,人物表情类别包括如:微笑、大笑、鬼脸等,还可以包括愤怒、哭泣等。
当为样本帧的帧标签中添加人物表情类别后,视频剪辑装置允许用户通过“表情”来对视频素材进行自动剪辑。举例来说,用户希望得到孩童欢笑时的视频,则可以将视频素材中,人物年龄类别为“幼儿”,且同时人物表情来别为“微笑”或“大笑”的帧片段进行抽取和再合成,进而得到孩童欢笑合集的剪辑视频,从而满足了用户的剪辑需求。
在本申请的一些实施例中,图1所示的视频剪辑装置100中,处理器104执行程序或指令时实现根据帧标签对视频素材进行剪辑的过程包括:
通过样本帧的帧标签,对视频素材中样本帧对应的片段进行标记,得到对应的片段标签;
根据片段标签和预设的目标类别对视频片段进行剪辑;
其中,目标类别包括目标表情类别、目标年龄类别和目标动作类别中的至少一种。
在本申请实施例中,为减小运算压力,同时保证剪辑得到的视频更加完整和连贯,首先通过样本帧在视频素材中确定对应的(视频)片段,并根据该样本帧的帧标签,对上述片段进行标记,形成为该片段对应的片段标签。
在对视频素材进行剪辑时,根据片段标签,和预设的目标类别对视频片段进行剪辑。与帧标签和片段标签相对应的,目标类别包括目标表情类别、目标年龄类别和/或目标动作类别。
举例来说,某样本帧的帧标签中,包括的人物年龄类别为“青年”、人物动作类别为“站立”、人物表情类别为“微笑”,则对应的,该样本帧对应的片段,其片段标签也对应为“青年”、“站立”和“微笑”。
通过片段的片段标签和目标类别做比对,从而对包含目标类别的片段进行剪辑,能够提高剪辑视频的完成度和连贯性,同时降低运算压力。
在本申请的一些实施例中,图1所示的视频剪辑装置100中,处理器104执行程序或指令时实现根据样本帧的帧标签和目标类别对视频素材进 行剪辑的过程包括:
当片段的片段标签中包括目标类别时,将片段确定为待剪辑片段;
按照待剪辑片段对应的样本帧的时间戳的时间顺序,对待剪辑片段进行组合,以完成剪辑。
在本申请实施例中,首先根据用户的剪辑操作,或根据当前选定的剪辑模版,确定对应的目标类别。依次按照时间轴的顺序选取视频素材中的多个片段,并依次确定每个片段的片段标签中是否包含有上述目标类别。
举例来说,如果一个片段的片段标签包括“微笑”、“童年”和“跳跃”,而目标类别为“跳跃”,则确定该片段的片段标签包括上述目标类别。
将所有包括目标类别的片段均确定为待剪辑片段,其中每个待剪辑片段均包含至少一个样本帧,且该样本帧上标记有时间戳,因此根据样本帧的时间戳,可以确定全部待剪辑片段的先后顺序,并按照先后顺序对待剪辑片段进行组合,得到最终的剪辑视频以完成剪辑工作。
在本申请的一些实施例中,图1所示的视频剪辑装置100中,处理器104执行程序或指令时实现通过样本帧的帧标签,对视频素材中样本帧对应的片段进行标记的过程包括:
当样本帧中包括第一样本帧和第二样本帧,满足第一样本帧和第二样本帧对应的人物表情类别、人物年龄类别和人物动作类别中的任一类别的类别相同时,将第一样本帧和第二样本帧之间的全部帧的集合确定为片段;
将任一类别对应的帧标签标记为片段标签。
在本申请实施例中,在确定好每个样本帧的帧标签后,如果有相邻的两个样本帧,记为第一样本帧和第二样本帧,且第一样本帧和第二样本帧的帧标签中,人物表情类别、人物年龄类别和人物动作类别中的至少一个类别相同,则将该第一样本帧和第二样本帧,以及上述两个样本帧之间的非样本帧的集合,确定为一个片段,并将相同的“类别”标记为该片段的片段标签。
举例来说,第一样本帧的帧标签包括“青年”、“微笑”和“站立”,第二样本帧的帧标签包括“幼儿”、“微笑”和“翻滚”,其中,第一样 本帧和第二样本帧中相同的类别为“微笑”,则将第一样本帧、第二样本帧和两者之间的多个非样本帧确定为一个片段,且该片段的片段标签为“微笑”。
在本申请的一些实施例中,图1所示的视频剪辑装置100中,处理器104执行程序或指令时实现通过样本帧的帧标签,对视频素材中样本帧对应的片段进行标记的过程包括:
当样本帧中包括第三样本帧和第四样本帧,满足第三样本帧和第四样本帧对应的人物表情类别、人物年龄类别和人物动作类别中的任一类别的类别相同,且满足第三样本帧和第四样本帧之间间隔的帧数小于预设的帧数阈值时,将第三样本帧和第四样本帧之间的全部帧的集合确定为片段;将任一类别对应的帧标签标记为片段标签。
在本申请实施例中,在确定好每个样本帧的帧标签后,如果存在两个样本帧,记为第三样本帧和第四样本帧,在第三样本帧和第四样本帧的帧标签中,人物表情类别、人物年龄类别和人物动作类别中的至少一个类别相同,且同时满足第三样本帧和第四样本帧之间间隔的帧数(可以包括非样本帧,也可以同时包括样本帧和非样本帧)小于预设的帧数阈值,则将该第三样本帧和第四样本帧,以及上述两个样本帧之间的非样本帧的集合,确定为一个片段,并将相同的“类别”标记为该片段的片段标签。
其中,帧数阈值可基于大数据分析进行设定,也可以根据用户使用习惯进行自由调整,如设置为25、50、60等。本申请实施例对帧数阈值的具体值不做限定。
举例来说,第三样本帧的帧标签包括“青年”和“站立”,第四样本帧的帧标签包括“幼儿”、“微笑”和“站立”,其中,第三样本帧和第四样本帧中相同的类别为“站立”,且第三样本帧和第四样本帧之间间隔的帧数为14,小于当前预设的帧数阈值25,则将第三样本帧、第四样本帧和两者之间的全部帧确定为一个片段,且该片段的片段标签为“站立”。
在本申请的一些实施例中,图1所示的视频剪辑装置100中,处理器104执行程序或指令时实现对样本帧进行图像识别的过程还包括:
对样本帧进行图像识别,获取样本帧对应的人物和人物互动的物体; 以及
处理器执行程序或指令时实现通过样本帧的帧标签,对视频素材中样本帧对应的片段进行标记的过程包括:
当样本帧中包括第五样本帧和第六样本帧,满足第五样本帧与第六样本帧中对应的人物为相同人物,且物体为相同物体时,将第五样本帧和第六样本帧之间的全部帧的集合确定为片段。
在本申请实施例中,在对样本帧进行图像识别时,可以进一步获取与人物相互动的物体。如:人物坐的椅子,人物触及的球拍、球、跳绳等运动器材,又或是人物穿戴的服饰。
如果存在两个样本帧,记为第五样本帧和第六样本帧,其中包括的人物为相同的人物,且与该人物互动的物体为相同的物体,则将第五样本帧和第六样本帧之间的全部帧的集合,确定为一个片段。
对于这种情况,所确定的片段对应的片段标签可以包括第五样本帧的帧标签的全部内容,同时包括第六样本帧的帧标签的全部内容。
具体举例来说,可通过人脸、人体识别的方式,来确定两个样本帧中的人物是否为统一人物。经人脸识别,确定第五样本帧和第六样本帧中包括人物均是“小明”,且在第五样本帧中“小明”互动的物体为篮球,第六样本帧中“小明”互动的物体也是篮球,则将第五样本帧、第六样本帧和两者之间的全部帧确定为一个片段。并将第五样本帧的帧标签内容、第六样本帧的帧标签内容一并设置为该片段的片段标签。
在本申请的一些实施例中,上述匹配结果包括匹配度,图1所示的视频剪辑装置100中,处理器104执行程序或指令时还实现如下步骤:
根据人物年龄类别和人物动作类别,在预设的匹配数据库中获取对应的匹配度;
当匹配度大于预设的匹配度阈值时,确定匹配结果满足第一条件。
在本申请实施例中,在判断人物年龄类别和人物动作类别的匹配度时,可以通过预设的匹配数据库,来获取当前识别到的人物年龄类别和人物动作类别的匹配度,并根据该匹配度与匹配度阈值的比较结果,确定当前匹配结果是否满足上述第一条件。
具体地,在上述匹配数据库中,预存储有每种人物年龄类别和每种人物动类别的对照数据。举例来说,对于运动类动作,“青年”的匹配度最高,“童年”次之,“幼年”再次,“老年”的匹配度最低。根据动作类别的不同,每个年龄段也可以有不同的匹配度。
上述匹配度的设置可以由设计人员进行定义,也可以根据大数据分析进行设置。本申请实施例对匹配数据库的具体形式和内容不做限定。
进一步地,如果确定当前人物年龄类别,与当前人物动作类别的匹配度大于上述匹配度阈值,则可以确定匹配结果满足第一条件。其中,匹配度阈值可根据匹配数据库进行设定,本申请实施例对匹配度阈值的具体数值范围不做限定。
在本申请的一些实施例中,匹配结果包括人物年龄类别和人物动作类别的对应关系,人物年龄类别包括幼年类别、童年类别、青年类别、中年类别和老年类别;以及第一条件包括:人物年龄类别为童年类别、青年类别或中年类别;或人物年龄类别为老年类别,且人物动作类别为第一动作类别;或人物年龄类别为幼年类别,且人物动作类别为第二动作类别。
在本申请实施例中,如果人物年龄类别为童年类别或青年类别的话,则无论人物动作类别具体为什么类别,则均认定符合第一条件。具体地,由于处于童年或青年阶段时,人的运动能力最强,因此在当前年龄段做出什么样的动作都是合理的,因此当人物年龄类别为童年类别或青年类别时,即可认定匹配结果符合第一条件。
对于人物年龄类别是老年类别的情况,由于年纪较高,老年人一般不会也不适于做出一些夸张的、幅度较大的动作。因此将可能被老年人做出的动作设置为第一动作类别,当人物年龄类别是老年类别,且同时人物动作类别为第一动作类别时,认定匹配结果符合第一条件,否则认定匹配结果不符合第一条件。
对于人物类别为幼年类别的情况,幼儿具有良好的身体柔软性,但由于发育不完全且未经训练,一般无法做出一些技巧性的运动动作,如跳绳、倒立等。因此将一些常规动作和非技巧性动作划分为第二动作类别,如果人物年龄类别是幼年类别,则进一步判断当前动作是否属于第二动作类别, 如果判断结果为时,可认定匹配结果符合第一条件。
通过判断人物年龄类别与人物动作类别是否匹配,从而判定识别结果是否准确,能够有效地提高对人物年龄类别和人物动作类别的识别准确度,有效减少误判。
在本申请的一些实施例中,动作类别包括非运动类动作和运动类动作,其中第一动作类别包括:非运动类动作、运动速度小于预设的速度阈值的运动类动作和运动幅度小于预设的幅度阈值的运动类动作;第二动作类别包括:非运动类动作、运动速度小于速度阈值的运动类动作。
在本申请实施例中,可以对运动类动作的运动幅度和运动速度进行识别,从而判断是第一动作类别或第二类动作类别,进而减少计算量,提高识别速度。
具体地,对于行走、站立等非运动类别,该类动作与全部年龄均匹配,因此第一动作类别和第二动作类别中均包含有非运动类别。
而对于运动类别,则根据运动速度和运动幅值来进行区分。对于老年类别对应的第一动作类别,能够理解的是,老年人的动作一般较为缓慢,且老人关节的柔韧性会降低,因此一般运动幅度会比较小。因此,将运动速度低于预设的速度阈值,同时运动幅度小于预设的幅度阈值的动作,设定为第一动作类别,也就是说,如果运动速度大于上述速度阈值,或运动幅度大于上述幅度阈值被满足时,则判定该动作不属于第一动作类别。
而由于幼儿的身高、四肢长度较小,且肌肉发育不完全,因此幼儿的动作同样较为缓慢。但由于幼儿身体具有极佳的柔韧性,因此运动幅度上,对于幼年类别不做限制。
其中,运动幅度可以包括肢体由一个位置运动到另一个位置所经过的最大距离,还可以包括如举手、弯腰、屈膝等特定动作的幅度,本申请实施例对运动幅度的具体内容不做限定。
运动速度可以包括人体移动的速度,也可以包括肢体摆动的速度,本申请实施例对运动速度的具体内容不做限定。
在本申请的一些实施例中,图1所示的视频剪辑装置100中,处理器104执行程序或指令时实现对样本帧进行图像识别,确定样本帧对应的人 物表情类别、人物年龄类别和人物动作类别的过程包括:
将样本帧输入至图像识别模型中,通过图像识别模型识别样本帧中包括的人脸信息和人体信息;
通过第一神经网络模型对人脸信息进行检测,以得到人物表情类别和人物年龄类别;
通过第二神经网络模型对人体信息进行检测,以得到原始人物动作类别;
根据时间戳,对连续的预设数量的样本帧对应的原始人物动作类别的连续性进行验证,以获得验证结果;
当验证结果满足预设的第二条件时,将原始人物动作类别确定为人物动作类别。
在本申请实施例中,可以通过机器学习得到的神经网络模型,来对样本帧中的人物表情类别、人物年龄类别和人物动作类别进行识别。
具体地,样本帧为图像识别模型的输入,在将样本帧输入到图像识别模型后,图像识别模型首先对人脸进行识别,截取到到人脸信息,并对人体进行识别,截取到人体信息。
在这个过程中,通过图像识别模型,将人物图像从完整的帧图像中“截取”出来,并将其细化为人脸信息(人脸图像)和人体信息(肢体图像)。
进一步地,将人脸信息输入值第一神经网络模型中,进行面部检测识别,基于人工智能分析,得到对应的人物表情类别,以及对应的人物年龄类别。
同时,将人体信息输入值第二神经网络模型,得到对应的原始人物动作类别。
进一步地,按照时间戳,对连续的预设数量的样本帧中,通过上述第一神经网络模型和第二神经网络模型识别得到的人物年龄类别和人物动作类别的连续性进行验证,并基于验证结果满足第二条件时,确定动作识别得到的原始人物动作类别准确。
具体地,举例来说,对连续的样本帧1、样本帧2和样本帧3进行连续性验证。一般来说,连续3个样本帧中的人物动作应该是连贯的,非跳 跃性的,如上述连续3个样本帧的人物动作类别相同,或相连续。
如果样本帧1的原始人物动作类别为屈膝,样本帧2的原始人物动作类别也为站立,而样本帧3的原始人物动作类别为跳跃,这3个样本帧可能是被拍摄人起跳的过程,上述过程是连续的,因此判断满足第二条件。
而如果样本帧1的原始人物动作类别为躺倒,样本帧2的原始人物动作类别为跳跃,样本帧3的原始人物动作类别也为躺倒,一般来讲,连续的3个样本帧所对应的时间很短,被拍摄的人物一般无法在很短的事件内由躺倒的姿势直接跳起,又瞬间返回躺倒的姿势,因此认为上述过程不连续,不满足第二条件。
通过验证动作类别的连续性,能够进一步保证识别的人物动作类别的准确性,提高视频剪辑的准确度。
在本申请的一些实施例中,验证结果包括连续性得分;以及当连续性得分大于或等于预设的连续性阈值时,确定验证结果满足第二条件。
在本申请实施例中,可设置连续性得分,对动作是否连续进行判断。具体地,可针对相连接的不同动作类别设置连续性得分值,在确定预设数量的样本帧是的连续性得分时,可通过依次将每两个相邻的两个样本帧之间的连续性分值的综合,确定为连续性得分。
其中,连续性得分可按照预设的表格查表确定。比如说,“躺倒”与“跳跃”之间的连续性分值较低,如设置为1分。而“屈膝”和“站立”之间的连续性分值,以及“站立”和“跳跃”之间的连续性分值就较高,如均设置为10分。
而连续性阈值则可以根据上述连续性分值的设定,以及在连续性验证中选取的样本帧数量而具体确定。比如按照上述分值设置方式,采样3帧的情况,可以将连续性阈值设置为8至12。
按照上述逻辑,对于样本帧1的原始人物动作类别为屈膝,样本帧2的原始人物动作类别也为站立的第一种情况,而样本帧3的原始人物动作类别为跳跃的情况,其连续性得分为10+10=20分。
而对于样本帧1的原始人物动作类别为躺倒,样本帧2的原始人物动作类别为跳跃,样本帧3的原始人物动作类别也为躺倒的第二种情况,其 连续性得分为1+1=2分。
由此,第一种情况的连续性分值大于连续性阈值,第一种情况满足第二条件,第二种情况不满足第二条件。
在本申请的一些实施例中,图1所示的视频剪辑装置100中,处理器104执行程序或指令时还实现以下步骤:
对预设数量的样本帧进行图像识别,确定预设数量的样本帧对应的人物肢体位置;以及
验证结果包括:预设数量的样本帧中,任两个样本帧对应的人物肢体位置之间的位置变化度;
当位置变化幅度小于预设的变化度阈值时,确定验证结果满足第二条件。
在本申请实施例中,还可以对连续的预设数量的样本帧进行图像识别,从而确定每个样本帧之间对应的人物肢体位置。
具体地,人物肢体位置可以表示被拍摄人物所处的位置,以及人物姿势。一般来说,在预设数量的样本帧的期间(较短时间)内,用户的姿势和位置不会发生“突变”,如上一帧中人物还是双手下垂的自然站立,下一帧立刻突变成倒立状态了,这种情况明显不符合一般常理。
因此,通过比较预设数量样本帧中任两个样本帧的人物肢体位置,并确定该两个样本帧事件人物肢体位置的变化度,如果变化度小于预设的变化度阈值,则说明人物位置和姿势没有发生突变,则可以判断验证结果满足第二条件,当前人物动作类别合理。
在本申请的一些实施例中,当匹配结果不满足第一条件时,重新获取样本帧对应的人物年龄类别和/或人物动作类别。
在本申请实施例中,如果匹配结果不满足第一条件,则说明获取到的人物年龄类别,或获取到的人物动作类别中的一个,与实际情况不符合。因此,可以重新获取当前样本帧对应的人物年龄类别,也可以同时或单独重新获取当前样本帧对应的人物动作类别,从而避免因误识别导致的剪辑结果不准确,有效地提高视频剪辑装置的使用体验。
在本申请的一些实施例中,当匹配结果不满足第一条件,且验证结果 不满足第二条件时,重新执行对样本帧进行图像识别的步骤。
在本申请实施例中,如果匹配结果不满足第一条件,且同时也不满足第二条件,则说明对样本帧的图像识别结果出现了比较大的偏差,此时人物年龄、人物表情和人物动作的识别结果可能都不可靠。因此,重新执行对样本帧进行图像识别的步骤,能够有效地提高图像识别的准确率,从而提高自动剪辑的准确率。
在本申请的一些实施例中,图2示出了根据本申请实施例的视频剪辑方法的流程图之一,具体地,视频剪辑方法可以包括以下步骤:
步骤202,在视频素材中获取样本帧,对样本帧进行图像识别,获取样本帧对应的人物年龄类别和人物动作类别;
步骤204,获取人物年龄类别和人物动作类别的匹配结果,当匹配结果满足预设的第一条件时,将人物年龄类别和人物动作类别添加为样本帧的帧标签;
步骤206,根据帧标签对视频素材进行剪辑。
在本申请实施例中,视频剪辑装置可以在用户拍摄视频素材后,自动在视频素材中抽取样本帧,并通过图像识别技术获取样本帧中的人物年龄类别和人物动作类别。
具体地,人物年龄类别可以包括如幼儿、童年、青年和老年等,而相应的,动作可以包括走动、站立、弯腰等一般性动作,还包括跑、跳、翻滚等运动动作。未保证对人物年龄类别和人物动作类别的识别准确度,可对人物年龄类别和人物动作类别的匹配结果进行验证,从而减少误识别的情况。具体地,当验证结果满足预设的第一条件时,则认为人物动作类别和人物年龄类别相匹配。
如果不满足第一条件,则认为其两者不匹配。举例来说,识别到当前人物年龄类别为“老年”,同时识别到同一人物的动作类别是“后空翻”,显然一般来说,老年人不会或无法做到“后空翻”的动作,此时判定人物年龄类别和人物动作类别不匹配,重新对人物动作类别或人物年龄类别进行识别。
而如果人物年龄类别和人物动作类别匹配,如“老人”和“走路”、 “青年”和“跳跃”、“幼儿”和“翻滚”等,则可以将识别到的人物年龄类别和人物动作类别添加为当前样本帧的帧标签。
按照上述方法,将全部的样本帧均打上对应的帧标签后,则可以依据帧标签的内容,对视频素材进行自动剪辑。
具体地,当用户想得到“孩子”的视频时,则可将视频素材中,包含帧标签中人物年龄类别为“幼儿”的帧片段进行抽取和再合成,进而得到仅包含“幼儿”出现的剪辑视频,从而满足了用户的剪辑需求。且该过程无需用户手动操作专业软件,系统可自动对样本帧的帧标签进行确定,并按照用户需求,或根据用户选择的模版自动完成剪辑,无需用户掌握专业技能,也无需付出额外的学习成本,实现了易于一般用户使用的视频剪辑。
在本申请的一些实施例中,图3示出了根据本申请实施例的视频剪辑方法的流程图之二,具体地,在视频素材中获取样本帧的过程包括以下步骤:
步骤302,响应于选取操作,确定至少一个视频素材;
步骤304,对视频素材进行抽帧处理,得到对应的样本帧;
步骤306,根据视频素材的时间轴,为样本帧添加时间戳。
在本申请实施例中,首先根据用户的选取操作,确定一个或多个视频素材。其中,该视频素材可以是用户通过移动拍摄设备或其他设备拍摄得到的视频,也可以是本体存储空间内保存的视频,还可以是存储于云端的视频,本申请实施例对视频素材的来源不做限定。
在根据用户的选取操作确定待剪辑的视频素材后,按照预先设定的抽帧规则,对视频素材进行抽帧处理,并得到按照一定间隔分布的多个样本帧的序列。
其中,可根据视频帧率、当前剪辑设备的设备性能以及用户设定等方式,对抽帧规则进行设置。如原始的视频素材的帧率为24帧/秒时,抽帧规则可以为4帧/秒,即每间隔5帧后抽取1帧。如原始的视频素材的帧率为60帧/秒时,抽帧规则还可以为10帧/秒或12帧每秒。
上述抽帧规则可以在后台程序,或前台接收用户的设置指令来自由设置,本申请实施例对抽帧规则不做限定。应该理解的时,抽帧的帧率一般 不大于视频原始帧率。
在得到对应的样本帧之后,可根据视频素材的时间轴,按照每个样本帧在视频素材中出现的时间点,在每个样本帧桑添加时间戳,一方面保证样本帧序列的连续性,另一方面在剪辑时为合成视频的顺序提供依据。
在本申请的一些实施例中,图4示出了根据本申请实施例的视频剪辑方法的流程图之三,具体地,对样本帧进行图像识别的过程还包括以下步骤:
步骤402,对样本帧进行图像识别,获取样本帧对应的人物表情类别;
步骤404,将人物表情类别添加至帧标签。
在本申请实施例中,通过图像识别技术获取样本帧中的人物年龄类别和人物动作类别的同时,还可以进一步获取人物表情类别,并将人物表情类别添加到样本帧的帧标签中,从而丰富自动剪辑的方式。
具体地,人物表情类别包括如:微笑、大笑、鬼脸等,还可以包括愤怒、哭泣等。
当为样本帧的帧标签中添加人物表情类别后,视频剪辑装置允许用户通过“表情”来对视频素材进行自动剪辑。举例来说,用户希望得到孩童欢笑时的视频,则可以将视频素材中,人物年龄类别为“幼儿”,且同时人物表情来别为“微笑”或“大笑”的帧片段进行抽取和再合成,进而得到孩童欢笑合集的剪辑视频,从而满足了用户的剪辑需求。
在本申请的一些实施例中,图5示出了根据本申请实施例的视频剪辑方法的流程图之四,具体地,根据帧标签对视频素材进行剪辑的过程包括以下步骤:
步骤502,通过样本帧的帧标签,对视频素材中样本帧对应的片段进行标记,得到对应的片段标签;
步骤504,根据片段标签和预设的目标类别对视频片段进行剪辑。
其中,目标类别包括目标表情类别、目标年龄类别和目标动作类别中的至少一种。
在本申请实施例中,为减小运算压力,同时保证剪辑得到的视频更加完整和连贯,首先通过样本帧在视频素材中确定对应的(视频)片段,并 根据该样本帧的帧标签,对上述片段进行标记,形成为该片段对应的片段标签。
在对视频素材进行剪辑时,根据片段标签,和预设的目标类别对视频片段进行剪辑。与帧标签和片段标签相对应的,目标类别包括目标表情类别、目标年龄类别和/或目标动作类别。
举例来说,某样本帧的帧标签中,包括的人物年龄类别为“青年”、人物动作类别为“站立”、人物表情类别为“微笑”,则对应的,该样本帧对应的片段,其片段标签也对应为“青年”、“站立”和“微笑”。
通过片段的片段标签和目标类别做比对,从而对包含目标类别的片段进行剪辑,能够提高剪辑视频的完成度和连贯性,同时降低运算压力。
在本申请的一些实施例中,图6示出了根据本申请实施例的视频剪辑方法的流程图之五,具体地,根据样本帧的帧标签和目标类别对视频素材进行剪辑的过程包括以下步骤:
步骤602,当片段的片段标签中包括目标类别时,将片段确定为待剪辑片段;
步骤604,按照待剪辑片段对应的样本帧的时间戳的时间顺序,对待剪辑片段进行组合,以完成剪辑。
在本申请实施例中,首先根据用户的剪辑操作,或根据当前选定的剪辑模版,确定对应的目标类别。依次按照时间轴的顺序选取视频素材中的多个片段,并依次确定每个片段的片段标签中是否包含有上述目标类别。
举例来说,如果一个片段的片段标签包括“微笑”、“童年”和“跳跃”,而目标类别为“跳跃”,则确定该片段的片段标签包括上述目标类别。
将所有包括目标类别的片段均确定为待剪辑片段,其中每个待剪辑片段均包含至少一个样本帧,且该样本帧上标记有时间戳,因此根据样本帧的时间戳,可以确定全部待剪辑片段的先后顺序,并按照先后顺序对待剪辑片段进行组合,得到最终的剪辑视频以完成剪辑工作。
在本申请的一些实施例中,图7示出了根据本申请实施例的视频剪辑方法的流程图之六,具体地,通过样本帧的帧标签,对视频素材中样本帧 对应的片段进行标记的过程包括以下步骤:
步骤702,当样本帧中包括第一样本帧和第二样本帧,满足第一样本帧和第二样本帧对应的人物表情类别、人物年龄类别和人物动作类别中的任一类别的类别相同时,将第一样本帧和第二样本帧之间的全部帧的集合确定为片段;
步骤704,将任一类别对应的帧标签标记为片段标签。
在本申请实施例中,在确定好每个样本帧的帧标签后,如果有相邻的两个样本帧,记为第一样本帧和第二样本帧,且第一样本帧和第二样本帧的帧标签中,人物表情类别、人物年龄类别和人物动作类别中的至少一个类别相同,则将该第一样本帧和第二样本帧,以及上述两个样本帧之间的非样本帧的集合,确定为一个片段,并将相同的“类别”标记为该片段的片段标签。
举例来说,第一样本帧的帧标签包括“青年”、“微笑”和“站立”,第二样本帧的帧标签包括“幼儿”、“微笑”和“翻滚”,其中,第一样本帧和第二样本帧中相同的类别为“微笑”,则将第一样本帧、第二样本帧和两者之间的多个非样本帧确定为一个片段,且该片段的片段标签为“微笑”。
在本申请的一些实施例中,图8示出了根据本申请实施例的视频剪辑方法的流程图之七,具体地,通过样本帧的帧标签,对视频素材中样本帧对应的片段进行标记的过程包括以下步骤:
步骤802,当样本帧中包括第三样本帧和第四样本帧,满足第三样本帧和第四样本帧对应的人物表情类别、人物年龄类别和人物动作类别中的任一类别的类别相同,且满足第三样本帧和第四样本帧之间间隔的帧数小于预设的帧数阈值时,将第三样本帧和第四样本帧之间的全部帧的集合确定为片段;
步骤804,将任一类别对应的帧标签标记为片段标签。
在本申请实施例中,在确定好每个样本帧的帧标签后,如果存在两个样本帧,记为第三样本帧和第四样本帧,在第三样本帧和第四样本帧的帧标签中,人物表情类别、人物年龄类别和人物动作类别中的至少一个类别 相同,且同时满足第三样本帧和第四样本帧之间间隔的帧数(可以包括非样本帧,也可以同时包括样本帧和非样本帧)小于预设的帧数阈值,则将该第三样本帧和第四样本帧,以及上述两个样本帧之间的非样本帧的集合,确定为一个片段,并将相同的“类别”标记为该片段的片段标签。
其中,帧数阈值可基于大数据分析进行设定,也可以根据用户使用习惯进行自由调整,如设置为25、50、60等。本申请实施例对帧数阈值的具体值不做限定。
举例来说,第三样本帧的帧标签包括“青年”和“站立”,第四样本帧的帧标签包括“幼儿”、“微笑”和“站立”,其中,第三样本帧和第四样本帧中相同的类别为“站立”,且第三样本帧和第四样本帧之间间隔的帧数为14,小于当前预设的帧数阈值25,则将第三样本帧、第四样本帧和两者之间的全部帧确定为一个片段,且该片段的片段标签为“站立”。
在本申请的一些实施例中,图9示出了根据本申请实施例的视频剪辑方法的流程图之八,具体地,视频剪辑方法还包括以下步骤:
步骤902,对样本帧进行图像识别,获取样本帧对应的人物和人物互动的物体;
步骤904,当样本帧中包括第五样本帧和第六样本帧,满足第五样本帧与第六样本帧中对应的人物为相同人物,且物体为相同物体时,将第五样本帧和第六样本帧之间的全部帧的集合确定为片段。
在本申请实施例中,在对样本帧进行图像识别时,可以进一步获取与人物相互动的物体。如:人物坐的椅子,人物触及的球拍、球、跳绳等运动器材,又或是人物穿戴的服饰。
如果存在两个样本帧,记为第五样本帧和第六样本帧,其中包括的人物为相同的人物,且与该人物互动的物体为相同的物体,则将第五样本帧和第六样本帧之间的全部帧的集合,确定为一个片段。
对于这种情况,所确定的片段对应的片段标签可以包括第五样本帧的帧标签的全部内容,同时包括第六样本帧的帧标签的全部内容。
具体举例来说,可通过人脸、人体识别的方式,来确定两个样本帧中的人物是否为统一人物。经人脸识别,确定第五样本帧和第六样本帧中包 括人物均是“小明”,且在第五样本帧中“小明”互动的物体为篮球,第六样本帧中“小明”互动的物体也是篮球,则将第五样本帧、第六样本帧和两者之间的全部帧确定为一个片段。并将第五样本帧的帧标签内容、第六样本帧的帧标签内容一并设置为该片段的片段标签。
在本申请的一些实施例中,上述匹配结果包括匹配度,图10示出了根据本申请实施例的视频剪辑方法的流程图之九,具体地,视频剪辑方法还包括以下步骤:
步骤1002,根据人物年龄类别和人物动作类别,在预设的匹配数据库中获取对应的匹配度;
步骤1004,当匹配度大于预设的匹配度阈值时,确定匹配结果满足第一条件。
在本申请实施例中,在判断人物年龄类别和人物动作类别的匹配度时,可以通过预设的匹配数据库,来获取当前识别到的人物年龄类别和人物动作类别的匹配度,并根据该匹配度与匹配度阈值的比较结果,确定当前匹配结果是否满足上述第一条件。
具体地,在上述匹配数据库中,预存储有每种人物年龄类别和每种人物动类别的对照数据。举例来说,对于运动类动作,“青年”的匹配度最高,“童年”次之,“幼年”再次,“老年”的匹配度最低。根据动作类别的不同,每个年龄段也可以有不同的匹配度。
上述匹配度的设置可以由设计人员进行定义,也可以根据大数据分析进行设置。本申请实施例对匹配数据库的具体形式和内容不做限定。
进一步地,如果确定当前人物年龄类别,与当前人物动作类别的匹配度大于上述匹配度阈值,则可以确定匹配结果满足第一条件。其中,匹配度阈值可根据匹配数据库进行设定,本申请实施例对匹配度阈值的具体数值范围不做限定。
在本申请的一些实施例中,匹配结果包括人物年龄类别和人物动作类别的对应关系,人物年龄类别包括幼年类别、童年类别、青年类别、中年类别和老年类别;以及第一条件包括:人物年龄类别为童年类别、青年类别或中年类别;或人物年龄类别为老年类别,且人物动作类别为第一动作 类别;或人物年龄类别为幼年类别,且人物动作类别为第二动作类别。
在本申请实施例中,如果人物年龄类别为童年类别或青年类别的话,则无论人物动作类别具体为什么类别,则均认定符合第一条件。具体地,由于处于童年或青年阶段时,人的运动能力最强,因此在当前年龄段做出什么样的动作都是合理的,因此当人物年龄类别为童年类别或青年类别时,即可认定匹配结果符合第一条件。
对于人物年龄类别是老年类别的情况,由于年纪较高,老年人一般不会也不适于做出一些夸张的、幅度较大的动作。因此将可能被老年人做出的动作设置为第一动作类别,当人物年龄类别是老年类别,且同时人物动作类别为第一动作类别时,认定匹配结果符合第一条件,否则认定匹配结果不符合第一条件。
对于人物类别为幼年类别的情况,幼儿具有良好的身体柔软性,但由于发育不完全且未经训练,一般无法做出一些技巧性的运动动作,如跳绳、倒立等。因此将一些常规动作和非技巧性动作划分为第二动作类别,如果人物年龄类别是幼年类别,则进一步判断当前动作是否属于第二动作类别,如果判断结果为时,可认定匹配结果符合第一条件。
通过判断人物年龄类别与人物动作类别是否匹配,从而判定识别结果是否准确,能够有效地提高对人物年龄类别和人物动作类别的识别准确度,有效减少误判。
在本申请的一些实施例中,动作类别包括非运动类动作和运动类动作,其中第一动作类别包括:非运动类动作、运动速度小于预设的速度阈值的运动类动作和运动幅度小于预设的幅度阈值的运动类动作;第二动作类别包括:非运动类动作、运动速度小于速度阈值的运动类动作。
在本申请实施例中,可以对运动类动作的运动幅度和运动速度进行识别,从而判断是第一动作类别或第二类动作类别,进而减少计算量,提高识别速度。
具体地,对于行走、站立等非运动类别,该类动作与全部年龄均匹配,因此第一动作类别和第二动作类别中均包含有非运动类别。
而对于运动类别,则根据运动速度和运动幅值来进行区分。对于老年 类别对应的第一动作类别,能够理解的是,老年人的动作一般较为缓慢,且老人关节的柔韧性会降低,因此一般运动幅度会比较小。因此,将运动速度低于预设的速度阈值,同时运动幅度小于预设的幅度阈值的动作,设定为第一动作类别,也就是说,如果运动速度大于上述速度阈值,或运动幅度大于上述幅度阈值被满足时,则判定该动作不属于第一动作类别。
而由于幼儿的身高、四肢长度较小,且肌肉发育不完全,因此幼儿的动作同样较为缓慢。但由于幼儿身体具有极佳的柔韧性,因此运动幅度上,对于幼年类别不做限制。
其中,运动幅度可以包括肢体由一个位置运动到另一个位置所经过的最大距离,还可以包括如举手、弯腰、屈膝等特定动作的幅度,本申请实施例对运动幅度的具体内容不做限定。
运动速度可以包括人体移动的速度,也可以包括肢体摆动的速度,本申请实施例对运动速度的具体内容不做限定。
在本申请的一些实施例中,图11示出了根据本申请实施例的视频剪辑方法的流程图之十,具体地,对样本帧进行图像识别,确定样本帧对应的人物表情类别、人物年龄类别和人物动作类别的过程包括以下步骤:
步骤1102,将样本帧输入至图像识别模型中,通过图像识别模型识别样本帧中包括的人脸信息和人体信息;
步骤1104,通过第一神经网络模型对人脸信息进行检测,以得到人物表情类别和人物年龄类别;
步骤1106,通过第二神经网络模型对人体信息进行检测,以得到原始人物动作类别;
步骤1108,根据时间戳,对连续的预设数量的样本帧对应的原始人物动作类别的连续性进行验证,以获得验证结果;
步骤1110,当验证结果满足预设的第二条件时,将原始人物动作类别确定为人物动作类别。
在本申请实施例中,可以通过机器学习得到的神经网络模型,来对样本帧中的人物表情类别、人物年龄类别和人物动作类别进行识别。
具体地,样本帧为图像识别模型的输入,在将样本帧输入到图像识别 模型后,图像识别模型首先对人脸进行识别,截取到到人脸信息,并对人体进行识别,截取到人体信息。
在这个过程中,通过图像识别模型,将人物图像从完整的帧图像中“截取”出来,并将其细化为人脸信息(人脸图像)和人体信息(肢体图像)。
进一步地,将人脸信息输入值第一神经网络模型中,进行面部检测识别,基于人工智能分析,得到对应的人物表情类别,以及对应的人物年龄类别。
同时,将人体信息输入值第二神经网络模型,得到对应的原始人物动作类别。
进一步地,按照时间戳,对连续的预设数量的样本帧中,通过上述第一神经网络模型和第二神经网络模型识别得到的人物年龄类别和人物动作类别的连续性进行验证,并基于验证结果满足第二条件时,确定动作识别得到的原始人物动作类别准确。
具体地,举例来说,对连续的样本帧1、样本帧2和样本帧3进行连续性验证。一般来说,连续3个样本帧中的人物动作应该是连贯的,非跳跃性的,如上述连续3个样本帧的人物动作类别相同,或相连续。
如果样本帧1的原始人物动作类别为屈膝,样本帧2的原始人物动作类别也为站立,而样本帧3的原始人物动作类别为跳跃,这3个样本帧可能是被拍摄人起跳的过程,上述过程是连续的,因此判断满足第二条件。
而如果样本帧1的原始人物动作类别为躺倒,样本帧2的原始人物动作类别为跳跃,样本帧3的原始人物动作类别也为躺倒,一般来讲,连续的3个样本帧所对应的时间很短,被拍摄的人物一般无法在很短的事件内由躺倒的姿势直接跳起,又瞬间返回躺倒的姿势,因此认为上述过程不连续,不满足第二条件。
通过验证动作类别的连续性,能够进一步保证识别的人物动作类别的准确性,提高视频剪辑的准确度。
在本申请的一些实施例中,验证结果包括连续性得分;以及当连续性得分大于或等于预设的连续性阈值时,确定验证结果满足第二条件。
在本申请实施例中,可设置连续性得分,对动作是否连续进行判断。 具体地,可针对相连接的不同动作类别设置连续性得分值,在确定预设数量的样本帧是的连续性得分时,可通过依次将每两个相邻的两个样本帧之间的连续性分值的综合,确定为连续性得分。
其中,连续性得分可按照预设的表格查表确定。比如说,“躺倒”与“跳跃”之间的连续性分值较低,如设置为1分。而“屈膝”和“站立”之间的连续性分值,以及“站立”和“跳跃”之间的连续性分值就较高,如均设置为10分。
而连续性阈值则可以根据上述连续性分值的设定,以及在连续性验证中选取的样本帧数量而具体确定。比如按照上述分值设置方式,采样3帧的情况,可以将连续性阈值设置为8至12。
按照上述逻辑,对于样本帧1的原始人物动作类别为屈膝,样本帧2的原始人物动作类别也为站立的第一种情况,而样本帧3的原始人物动作类别为跳跃的情况,其连续性得分为10+10=20分。
而对于样本帧1的原始人物动作类别为躺倒,样本帧2的原始人物动作类别为跳跃,样本帧3的原始人物动作类别也为躺倒的第二种情况,其连续性得分为1+1=2分。
由此,第一种情况的连续性分值大于连续性阈值,第一种情况满足第二条件,第二种情况不满足第二条件。
在本申请的一些实施例中,图12示出了根据本申请实施例的视频剪辑方法的流程图之十一,具体地,视频剪辑方法还包括以下步骤:
步骤1202,对预设数量的样本帧进行图像识别,确定预设数量的样本帧对应的人物肢体位置;
步骤1204,在预设数量的样本帧中,确定任两个样本帧对应的人物肢体位置之间的位置变化度;
步骤1206,当位置变化幅度小于预设的变化度阈值时,确定验证结果满足第二条件。
在本申请实施例中,还可以对连续的预设数量的样本帧进行图像识别,从而确定每个样本帧之间对应的人物肢体位置。
具体地,人物肢体位置可以表示被拍摄人物所处的位置,以及人物姿 势。一般来说,在预设数量的样本帧的期间(较短时间)内,用户的姿势和位置不会发生“突变”,如上一帧中人物还是双手下垂的自然站立,下一帧立刻突变成倒立状态了,这种情况明显不符合一般常理。
因此,通过比较预设数量样本帧中任两个样本帧的人物肢体位置,并确定该两个样本帧事件人物肢体位置的变化度,如果变化度小于预设的变化度阈值,则说明人物位置和姿势没有发生突变,则可以判断验证结果满足第二条件,当前人物动作类别合理。
在本申请的一些实施例中,当匹配结果不满足第一条件时,重新获取样本帧对应的人物年龄类别和/或人物动作类别。
在本申请实施例中,如果匹配结果不满足第一条件,则说明获取到的人物年龄类别,或获取到的人物动作类别中的一个,与实际情况不符合。因此,可以重新获取当前样本帧对应的人物年龄类别,也可以同时或单独重新获取当前样本帧对应的人物动作类别,从而避免因误识别导致的剪辑结果不准确,有效地提高视频剪辑装置的使用体验。
在本申请的一些实施例中,当匹配结果不满足第一条件,且验证结果不满足第二条件时,重新执行对样本帧进行图像识别的步骤。
在本申请实施例中,如果匹配结果不满足第一条件,且同时也不满足第二条件,则说明对样本帧的图像识别结果出现了比较大的偏差,此时人物年龄、人物表情和人物动作的识别结果可能都不可靠。因此,重新执行对样本帧进行图像识别的步骤,能够有效地提高图像识别的准确率,从而提高自动剪辑的准确率。
在本申请的一些实施例中,视频剪辑方法的完成流程图如图13所示,具体地,图13示出了根据本申请实施例的视频剪辑方法的流程图之十二,具体地,视频剪辑方法还包括以下步骤:
步骤1302,获取视频素材;
步骤1304,按照4fps/s对视频素材进行抽帧,得到图片序列;
步骤1306A,通过detection的模块,从图片序列中得到人脸序列;
步骤1306B,通过detection的模块,从图片序列中得到人体序列;
步骤1308A,通过cnn分类,得到人物的年龄标签和表情标签;
步骤1308B,通过cnn分类,得到当前人物的姿态;
步骤1310,根据人体动作连续性进行交叉验证;
步骤1312,根据人体动作和年龄相关性进行交叉验证;
步骤1314,将人物表情、年龄、姿态信息和时间戳合入视频;
步骤1316,根据用户设置进行剪辑。
具体地,包括8个大体步骤:
Step1:用户自己选取希望进入剪辑的全部素材;
Step2:算法系统会对用户选取的视频进行抽帧,以得到一些图片序列,抽帧得到的图片序列会被加上一个时间戳,便于在后处理中人物的tag重新合入视频;
Step3:将得到的图片序列输入到detection的算法模块中,可以得到一个人脸序列和人体序列;
Step4:上游输出的人脸序列会经过一个表情和年龄识别的模块,同时输出人物的表情和年龄标签;
Step5:上游输出的人体序列会经过一个人体姿态识别的模块,输出当前人物的动作;
Step6:根据人体动作的连续性原则以及年龄和人体动作的相关性,对输出结果进行交叉验证,提高算法识别的精度;
Step7:将人物的表情,年龄和人体姿态的信息重新合入视频中;
Step8:按照用户的偏好或者选择的模板,根据Step7得到的tag进行剪辑。
在本申请的一些实施例中,提供了一种可移动平台1400,图14示出了根据本申请实施例的可移动平台的结构框图,具体地,可移动平台1400具体包括:
第一图像传感器1402,用于采集视频素材;
存储器1404、处理器1406及存储在存储器1404上并可被处理器1406执行的程序或指令,处理器执行程序或指令时实现如图2至图13所描述的视频剪辑方法的步骤。
因此,该可移动平台1400被配置为执行程序或指令时实现上述视频剪 辑方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
在本申请的一些实施例中,提供了一种智能云台1500,图15示出了根据本申请实施例的智能云台的结构框图,具体地,智能云台1500具体包括:
第二图像传感器1502,用于采集视频素材;
存储器1504、处理器1506及存储在存储器1504上并可被处理器1506执行的程序或指令,处理器执行程序或指令时实现如图2至图13所描述的视频剪辑方法的步骤。
因此,该智能云台1500被配置为执行程序或指令时实现上述视频剪辑方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
在本申请的一些实施例中,提供了一种计算机可读存储介质,可读存储介质中存储有程序或指令,程序或指令被处理器执行时实现如图2至图13所描述的视频剪辑方法的步骤。
因此,该可读存储介质被配置为执行程序或指令时实现上述视频剪辑方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
在本申请的一些实施例中,提供了一种硬件设备1600,图16示出了根据本申请实施例的硬件设备的结构框图,具体地,硬件设备1600包括:
用户输入模块1602,用于接收用户输入;
网络模块1604,用于访问网络,以及与终端设备和/或服务器之间进行数据指令交互;
输出模块1606,用于输出声音信息、图像信息、振动信息、网络信号中的一种或多种;
电源模块1608,用于为硬件设备供电;
存储器1610、处理器1612及存储在存储器1610上并可被处理器1612执行的程序或指令,处理器执行程序或指令时实现如图2至图13所描述的视频剪辑方法的步骤。
因此,该硬件设备被配置为执行程序或指令时实现上述视频剪辑方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
需要注意的是,本申请实施例中的硬件设备包括上述的移动电子设备和非移动电子设备。
硬件设备包括:手机、平板电脑、无人机、遥控器、智能云台、笔记本电脑、掌上电脑、车载电子设备、可穿戴设备、个人数字助理、个人计算机和电视机。
应理解的是,网络模块1604为用户提供了无线的宽带互联网访问,如帮助用户收发电子邮件、浏览网页和访问流式媒体等。
用户输入模块1602可用于接收输入的数字或字符信息,以及产生与电子设备的用户设置以及功能控制有关的键信号输入。
输出模块可以是音频输出模块,如扬声器等,还可以是视频输出模块,如液晶显示屏、有机发光二极管等形式吗,还可以是振动马达或信号发射装置,如蓝牙、射频等。
电源模块1608可以是给各个部件供电的电源(比如电池),电源可以通过电源管理系统与处理器1612逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。
本说明书的描述中,术语“多个”则指两个或两个以上,除非另有明确的限定,术语“上”、“下”等指示的方位或位置关系为基于附图所述的方位或位置关系,仅是为了便于描述本说明书和简化描述,而不是指示或暗示所指的设备或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本说明书的限制;术语“连接”、“安装”、“固定”等均应做广义理解,例如,“连接”可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是直接相连,也可以通过中间媒介间接相连。对于本领域的普通技术人员而言,可以根据具体情况理解上述术语在本说明书中的具体含义。
在本说明书的描述中,术语“一个实施例”、“一些实施例”、“具体实施例”等的描述意指结合该实施例或示例描述的具体特征、结构、材 料或特点包含于本说明书的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或实例。而且,描述的具体特征、结构、材料或特点可以在任何的一个或多个实施例或示例中以合适的方式结合。
以上所述仅为本说明书的优选实施例而已,并不用于限制本说明书,对于本领域的技术人员来说,本说明书可以有各种更改和变化。凡在本说明书的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本说明书的保护范围之内。

Claims (37)

  1. 一种视频剪辑装置,其特征在于,所述视频剪辑装置包括存储器、处理器及存储在所述存储器上并可被所述处理器执行的程序或指令,所述处理器执行所述程序或指令时实现:
    在视频素材中获取样本帧,对所述样本帧进行图像识别,获取所述样本帧对应的人物年龄类别和人物动作类别;
    获取所述人物年龄类别和所述人物动作类别的匹配结果,当所述匹配结果满足预设的第一条件时,将所述人物年龄类别和所述人物动作类别添加为所述样本帧的帧标签;
    根据所述帧标签对所述视频素材进行剪辑。
  2. 根据权利要求1所述的视频剪辑装置,其特征在于,所述处理器执行所述程序或指令时实现在视频素材中获取样本帧的过程包括:
    响应于选取操作,确定至少一个所述视频素材;
    对所述视频素材进行抽帧处理,得到对应的所述样本帧;以及
    根据所述视频素材的时间轴,为所述样本帧添加时间戳。
  3. 根据权利要求2所述的视频剪辑装置,其特征在于,所述处理器执行所述程序或指令时实现对所述样本帧进行图像识别的过程还包括:
    对所述样本帧进行图像识别,获取所述样本帧对应的人物表情类别;以及
    将所述人物表情类别添加至所述帧标签。
  4. 根据权利要求3所述的视频剪辑装置,其特征在于,所述处理器执行所述程序或指令时实现根据所述帧标签对所述视频素材进行剪辑的过程包括:
    通过所述样本帧的帧标签,对所述视频素材中所述样本帧对应的片段进行标记,得到对应的片段标签;
    根据所述片段标签和预设的目标类别对所述视频片段进行剪辑;
    其中,所述目标类别包括目标表情类别、目标年龄类别和目标动作类别中的至少一种。
  5. 根据权利要求4所述的视频剪辑装置,其特征在于,所述处理器执行所述程序或指令时实现根据所述样本帧的帧标签和所述目标类别对所述视频素材进行剪辑的过程包括:
    当所述片段的所述片段标签中包括所述目标类别时,将所述片段确定为待剪辑片段;
    按照所述待剪辑片段对应的所述样本帧的所述时间戳的时间顺序,对所述待剪辑片段进行组合,以完成所述剪辑。
  6. 根据权利要求4或5所述的视频剪辑装置,其特征在于,所述处理器执行所述程序或指令时实现通过所述样本帧的帧标签,对所述视频素材中所述样本帧对应的片段进行标记的过程包括:
    当所述样本帧中包括第一样本帧和第二样本帧,满足所述第一样本帧和所述第二样本帧对应的所述人物表情类别、所述人物年龄类别和所述人物动作类别中的任一类别的类别相同时,将所述第一样本帧和所述第二样本帧之间的全部帧的集合确定为所述片段;
    将所述任一类别对应的所述帧标签标记为所述片段标签。
  7. 根据权利要求4或5所述的视频剪辑装置,其特征在于,所述处理器执行所述程序或指令时实现通过所述样本帧的帧标签,对所述视频素材中所述样本帧对应的片段进行标记的过程包括:
    当所述样本帧中包括第三样本帧和第四样本帧,满足所述第三样本帧和所述第四样本帧对应的所述人物表情类别、所述人物年龄类别和所述人物动作类别中的任一类别的类别相同,且满足所述第三样本帧和所述第四样本帧之间间隔的帧数小于预设的帧数阈值时,将所述第三样本帧和所述第四样本帧之间的全部帧的集合确定为所述片段;
    将所述任一类别对应的所述帧标签标记为所述片段标签。
  8. 根据权利要求4或5所述的视频剪辑装置,其特征在于,所述处理器执行所述程序或指令时实现对所述样本帧进行图像识别的过程还包括:
    对所述样本帧进行图像识别,获取所述样本帧对应的人物和所述人物互动的物体;以及
    所述处理器执行所述程序或指令时实现通过所述样本帧的帧标签,对 所述视频素材中所述样本帧对应的片段进行标记的过程包括:
    当所述样本帧中包括第五样本帧和第六样本帧,满足所述第五样本帧与所述第六样本帧中对应的所述人物为相同人物,且所述物体为相同物体时,将所述第五样本帧和所述第六样本帧之间的全部帧的集合确定为所述片段。
  9. 根据权利要求1至8中任一项所述的视频剪辑装置,其特征在于,所述匹配结果包括匹配度,所述处理器执行所述程序或指令时实现:
    根据所述人物年龄类别和所述人物动作类别,在预设的匹配数据库中获取对应的所述匹配度;以及
    当所述匹配度大于预设的匹配度阈值时,确定所述匹配结果满足所述第一条件。
  10. 根据权利要求1至8中任一项所述的视频剪辑装置,其特征在于,所述匹配结果包括所述人物年龄类别和所述人物动作类别的对应关系,所述人物年龄类别包括幼年类别、童年类别、青年类别、中年类别和老年类别;以及
    所述第一条件包括:
    所述人物年龄类别为所述童年类别、所述青年类别或所述中年类别;或
    所述人物年龄类别为所述老年类别,且所述人物动作类别为第一动作类别;或
    所述人物年龄类别为所述幼年类别,且所述人物动作类别为第二动作类别。
  11. 根据权利要求10所述的视频剪辑装置,其特征在于,所述动作类别包括非运动类动作和运动类动作,其中所述第一动作类别包括:
    所述非运动类动作、运动速度小于预设的速度阈值的所述运动类动作和运动幅度小于预设的幅度阈值的所述运动类动作;
    所述第二动作类别包括:
    所述非运动类动作、运动速度小于所述速度阈值的所述运动类动作。
  12. 根据权利要求3至8中任一项所述的视频剪辑装置,其特征在于, 所述处理器执行所述程序或指令时实现对所述样本帧进行图像识别,确定所述样本帧对应的人物表情类别、人物年龄类别和人物动作类别的过程包括:
    将所述样本帧输入至图像识别模型中,通过所述图像识别模型识别所述样本帧中包括的人脸信息和人体信息;
    通过第一神经网络模型对所述人脸信息进行检测,以得到所述人物表情类别和所述人物年龄类别;
    通过第二神经网络模型对所述人体信息进行检测,以得到原始人物动作类别;
    根据所述时间戳,对连续的预设数量的所述样本帧对应的所述原始人物动作类别的连续性进行验证,以获得验证结果;
    当所述验证结果满足预设的第二条件时,将所述原始人物动作类别确定为所述人物动作类别。
  13. 根据权利要求12所述的视频剪辑装置,其特征在于,所述验证结果包括连续性得分;以及
    当所述连续性得分大于或等于预设的连续性阈值时,确定所述验证结果满足所述第二条件。
  14. 根据权利要求12所述的视频剪辑装置,其特征在于,所述处理器执行和所述程序或指令时实现:
    对所述预设数量的所述样本帧进行图像识别,确定所述预设数量的所述样本帧对应的人物肢体位置;以及
    所述验证结果包括:所述预设数量的所述样本帧中,任两个所述样本帧对应的人物肢体位置之间的位置变化度;
    当所述位置变化幅度小于预设的变化度阈值时,确定所述验证结果满足所述第二条件。
  15. 根据权利要求1至14中任一项所述的视频剪辑装置,其特征在于,当所述匹配结果不满足所述第一条件时,重新获取所述样本帧对应的所述人物年龄类别和/或所述人物动作类别。
  16. 根据权利要求12至14中任一项所述的视频剪辑装置,其特征在 于,当所述匹配结果不满足所述第一条件,且所述验证结果不满足所述第二条件时,重新执行所述对所述样本帧进行图像识别的步骤。
  17. 一种视频剪辑方法,其特征在于,包括:
    在视频素材中获取样本帧,对所述样本帧进行图像识别,获取所述样本帧对应的人物年龄类别和人物动作类别;
    获取所述人物年龄类别和所述人物动作类别的匹配结果,当所述匹配结果满足预设的第一条件时,将所述人物年龄类别和所述人物动作类别添加为所述样本帧的帧标签;
    根据所述帧标签对所述视频素材进行剪辑。
  18. 根据权利要求17所述的视频剪辑方法,其特征在于,所述在视频素材中获取样本帧的步骤包括:
    响应于选取操作,确定至少一个所述视频素材;
    对所述视频素材进行抽帧处理,得到对应的所述样本帧;以及
    根据所述视频素材的时间轴,为所述样本帧添加时间戳。
  19. 根据权利要求18所述的视频剪辑方法,其特征在于,所述对所述样本帧进行图像识别的步骤还包括:
    对所述样本帧进行图像识别,获取所述样本帧对应的人物表情类别;以及
    将所述人物表情类别添加至所述帧标签。
  20. 根据权利要求19所述的视频剪辑方法,其特征在于,所述根据所述帧标签对所述视频素材进行剪辑的步骤包括:
    通过所述样本帧的帧标签,对所述视频素材中所述样本帧对应的片段进行标记,得到对应的片段标签;
    根据所述片段标签和预设的目标类别对所述视频片段进行剪辑;
    其中,所述目标类别包括目标表情类别、目标年龄类别和目标动作类别中的至少一种。
  21. 根据权利要求20所述的视频剪辑方法,其特征在于,所述根据所述样本帧的帧标签和所述目标类别对所述视频素材进行剪辑的步骤包括:
    当所述片段的所述片段标签中包括所述目标类别时,将所述片段确定 为待剪辑片段;
    按照所述待剪辑片段对应的所述样本帧的所述时间戳的时间顺序,对所述待剪辑片段进行组合,以完成所述剪辑。
  22. 根据权利要求20或21所述的视频剪辑方法,其特征在于,所述通过所述样本帧的帧标签,对所述视频素材中所述样本帧对应的片段进行标记的步骤包括:
    当所述样本帧中包括第一样本帧和第二样本帧,满足所述第一样本帧和所述第二样本帧对应的所述人物表情类别、所述人物年龄类别和所述人物动作类别中的任一类别的类别相同时,将所述第一样本帧和所述第二样本帧之间的全部帧的集合确定为所述片段;
    将所述任一类别对应的所述帧标签标记为所述片段标签。
  23. 根据权利要求20或21所述的视频剪辑方法,其特征在于,所述通过所述样本帧的帧标签,对所述视频素材中所述样本帧对应的片段进行标记的步骤包括:
    当所述样本帧中包括第三样本帧和第四样本帧,满足所述第三样本帧和所述第四样本帧对应的所述人物表情类别、所述人物年龄类别和所述人物动作类别中的任一类别的类别相同,且满足所述第三样本帧和所述第四样本帧之间间隔的帧数小于预设的帧数阈值时,将所述第三样本帧和所述第四样本帧之间的全部帧的集合确定为所述片段;
    将所述任一类别对应的所述帧标签标记为所述片段标签。
  24. 根据权利要求20或21所述的视频剪辑方法,其特征在于,所述对所述样本帧进行图像识别的过程还包括:
    对所述样本帧进行图像识别,获取所述样本帧对应的人物和所述人物互动的物体;以及
    所述通过所述样本帧的帧标签,对所述视频素材中所述样本帧对应的片段进行标记的步骤包括:
    当所述样本帧中包括第五样本帧和第六样本帧,满足所述第五样本帧与所述第六样本帧中对应的所述人物为相同人物,且所述物体为相同物体时,将所述第五样本帧和所述第六样本帧之间的全部帧的集合确定为所述 片段。
  25. 根据权利要求17至24中任一项所述的视频剪辑方法,其特征在于,所述匹配结果包括匹配度,所述:
    根据所述人物年龄类别和所述人物动作类别,在预设的匹配数据库中获取对应的所述匹配度;以及
    当所述匹配度大于预设的匹配度阈值时,确定所述匹配结果满足所述第一条件。
  26. 根据权利要求17至24中任一项所述的视频剪辑方法,其特征在于,所述匹配结果包括所述人物年龄类别和所述人物动作类别的对应关系,所述人物年龄类别包括幼年类别、童年类别、青年类别、中年类别和老年类别;以及
    所述第一条件包括:
    所述人物年龄类别为所述童年类别、所述青年类别或所述中年类别;或
    所述人物年龄类别为所述老年类别,且所述人物动作类别为第一动作类别;或
    所述人物年龄类别为所述幼年类别,且所述人物动作类别为第二动作类别。
  27. 根据权利要求26所述的视频剪辑方法,其特征在于,所述动作类别包括非运动类动作和运动类动作,其中所述第一动作类别包括:
    所述非运动类动作、运动速度小于预设的速度阈值的所述运动类动作和运动幅度小于预设的幅度阈值的所述运动类动作;
    所述第二动作类别包括:
    所述非运动类动作、运动速度小于所述速度阈值的所述运动类动作。
  28. 根据权利要求19至24中任一项所述的视频剪辑方法,其特征在于,所述对所述样本帧进行图像识别,确定所述样本帧对应的人物表情类别、人物年龄类别和人物动作类别的步骤包括:
    将所述样本帧输入至图像识别模型中,通过所述图像识别模型识别所述样本帧中包括的人脸信息和人体信息;
    通过第一神经网络模型对所述人脸信息进行检测,以得到所述人物表情类别和所述人物年龄类别;
    通过第二神经网络模型对所述人体信息进行检测,以得到原始人物动作类别;
    根据所述时间戳,对连续的预设数量的所述样本帧对应的所述原始人物动作类别的连续性进行验证,以获得验证结果;
    当所述验证结果满足预设的第二条件时,将所述原始人物动作类别确定为所述人物动作类别。
  29. 根据权利要求28所述的视频剪辑方法,其特征在于,所述验证结果包括连续性得分;以及
    当所述连续性得分大于或等于预设的连续性阈值时,确定所述验证结果满足所述第二条件。
  30. 根据权利要求28所述的视频剪辑方法,其特征在于,还包括:
    对所述预设数量的所述样本帧进行图像识别,确定所述预设数量的所述样本帧对应的人物肢体位置;以及
    所述验证结果包括:所述预设数量的所述样本帧中,任两个所述样本帧对应的人物肢体位置之间的位置变化度;
    当所述位置变化幅度小于预设的变化度阈值时,确定所述验证结果满足所述第二条件。
  31. 根据权利要求17至30中任一项所述的视频剪辑方法,其特征在于,当所述匹配结果不满足所述第一条件时,重新获取所述样本帧对应的所述人物年龄类别和/或所述人物动作类别。
  32. 根据权利要求28至30中任一项所述的视频剪辑方法,其特征在于,当所述匹配结果不满足所述第一条件,且所述验证结果不满足所述第二条件时,重新执行所述对所述样本帧进行图像识别的步骤。
  33. 一种可移动平台,其特征在于,所述可移动平台包括:
    第一图像传感器,用于采集视频素材;
    存储器、处理器及存储在所述存储器上并可被所述处理器执行的程序或指令,所述处理器执行所述程序或指令时实现如权利要求17至32中任 一项所述的视频剪辑方法。
  34. 一种智能云台,其特征在于,所述智能云台包括:
    第二图像传感器,用于采集视频素材;
    存储器、处理器及存储在所述存储器上并可被所述处理器执行的程序或指令,所述处理器执行所述程序或指令时实现如权利要求17至32中任一项所述的视频剪辑方法。
  35. 一种可读存储介质,所述可读存储介质中存储有程序或指令,其特征在于,所述程序或指令被处理器执行时实现如权利要求17至32中任一项所述的视频剪辑方法的步骤。
  36. 一种硬件设备,其特征在于,所述硬件设备包括:
    用户输入模块,用于接收用户输入;
    网络模块,用于访问网络,以及与终端设备和/或服务器之间进行数据指令交互;
    输出模块,用于输出声音信息、图像信息、振动信息、网络信号中的一种或多种;
    电源模块,用于为所述硬件设备供电;
    存储器、处理器及存储在所述存储器上并可被所述处理器执行的程序或指令,所述处理器执行所述程序或指令时实现如权利要求17至32中任一项所述的视频剪辑方法。
  37. 根据权利要求36所述的硬件设备,其特征在于,所述硬件设备包括:
    手机、平板电脑、无人机、遥控器、智能云台、笔记本电脑、掌上电脑、车载电子设备、可穿戴设备、个人数字助理、个人计算机和电视机。
PCT/CN2020/130074 2020-11-19 2020-11-19 视频剪辑装置、方法、可移动平台、云台和硬件设备 WO2022104637A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/130074 WO2022104637A1 (zh) 2020-11-19 2020-11-19 视频剪辑装置、方法、可移动平台、云台和硬件设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/130074 WO2022104637A1 (zh) 2020-11-19 2020-11-19 视频剪辑装置、方法、可移动平台、云台和硬件设备

Publications (1)

Publication Number Publication Date
WO2022104637A1 true WO2022104637A1 (zh) 2022-05-27

Family

ID=81708169

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/130074 WO2022104637A1 (zh) 2020-11-19 2020-11-19 视频剪辑装置、方法、可移动平台、云台和硬件设备

Country Status (1)

Country Link
WO (1) WO2022104637A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116095363A (zh) * 2023-02-09 2023-05-09 西安电子科技大学 基于关键行为识别的移动端短视频高光时刻剪辑方法
CN116471452A (zh) * 2023-05-10 2023-07-21 武汉亿臻科技有限公司 一种基于智能ai的视频剪辑平台

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015135805A1 (en) * 2014-03-10 2015-09-17 Alcatel Lucent Dynamic content delivery in a multiscreen digital televison environment
CN107566907A (zh) * 2017-09-20 2018-01-09 广东欧珀移动通信有限公司 视频剪辑方法、装置、存储介质及终端
US10445582B2 (en) * 2016-12-20 2019-10-15 Canon Kabushiki Kaisha Tree structured CRF with unary potential function using action unit features of other segments as context feature
CN110855904A (zh) * 2019-11-26 2020-02-28 Oppo广东移动通信有限公司 视频处理方法、电子装置和存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015135805A1 (en) * 2014-03-10 2015-09-17 Alcatel Lucent Dynamic content delivery in a multiscreen digital televison environment
US10445582B2 (en) * 2016-12-20 2019-10-15 Canon Kabushiki Kaisha Tree structured CRF with unary potential function using action unit features of other segments as context feature
CN107566907A (zh) * 2017-09-20 2018-01-09 广东欧珀移动通信有限公司 视频剪辑方法、装置、存储介质及终端
CN110855904A (zh) * 2019-11-26 2020-02-28 Oppo广东移动通信有限公司 视频处理方法、电子装置和存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116095363A (zh) * 2023-02-09 2023-05-09 西安电子科技大学 基于关键行为识别的移动端短视频高光时刻剪辑方法
CN116095363B (zh) * 2023-02-09 2024-05-14 西安电子科技大学 基于关键行为识别的移动端短视频高光时刻剪辑方法
CN116471452A (zh) * 2023-05-10 2023-07-21 武汉亿臻科技有限公司 一种基于智能ai的视频剪辑平台
CN116471452B (zh) * 2023-05-10 2024-01-19 武汉亿臻科技有限公司 一种基于智能ai的视频剪辑平台

Similar Documents

Publication Publication Date Title
US11990160B2 (en) Disparate sensor event correlation system
US11355160B2 (en) Multi-source event correlation system
US10213645B1 (en) Motion attributes recognition system and methods
US9911045B2 (en) Event analysis and tagging system
US9690982B2 (en) Identifying gestures or movements using a feature matrix that was compressed/collapsed using principal joint variable analysis and thresholds
CN103760968B (zh) 数字标牌显示内容选择方法和装置
US8156067B1 (en) Systems and methods for performing anytime motion recognition
WO2016169432A1 (zh) 身份验证方法、装置及终端
CN113596537B (zh) 显示设备及播放速度方法
WO2022104637A1 (zh) 视频剪辑装置、方法、可移动平台、云台和硬件设备
KR102266219B1 (ko) 퍼스널 트레이닝 서비스 제공 방법 및 시스템
US10088901B2 (en) Display device and operating method thereof
JP2011170856A (ja) 複数の検出ストリームを用いたモーション認識用システム及び方法
KR102089002B1 (ko) 행동에 대한 피드백을 제공하는 웨어러블 디바이스 및 방법
US11954869B2 (en) Motion recognition-based interaction method and recording medium
US11682157B2 (en) Motion-based online interactive platform
CN109769213A (zh) 用户行为轨迹记录的方法、移动终端及计算机存储介质
KR20180015648A (ko) 자동화 분류 및/또는 성능센서 유닛으로 부터 얻어진 사용자 성능특성에 기초하여 미디어 데이터 검색이 가능하도록 구성된 구조, 장치 및 방법
WO2023040449A1 (zh) 利用健身动作触发客户端操作指令
Takano et al. A multimedia tennis instruction system: Tracking and classifying swing motions
JP2011170857A (ja) 最小のディレイでモーション認識を行うシステム及び方法
CN114432683A (zh) 一种智能语音播放方法及设备
WO2022135177A1 (zh) 控制方法和电子设备
KR102276009B1 (ko) 심폐소생술 훈련 시뮬레이션 장치 및 방법
WO2024055661A1 (zh) 一种显示设备及显示方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20961928

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20961928

Country of ref document: EP

Kind code of ref document: A1