WO2019100757A1

WO2019100757A1 - Video generation method and device, and electronic apparatus

Info

Publication number: WO2019100757A1
Application number: PCT/CN2018/098602
Authority: WO
Inventors: 张晨曦; 李震; 杨鹏博; 戴硕; 李鹤; 黄怡青
Original assignee: 乐蜜有限公司; 张晨曦
Priority date: 2017-11-23
Filing date: 2018-08-03
Publication date: 2019-05-31
Also published as: CN107920269A

Abstract

Provided are a video generation method and device, and an electronic apparatus. The method comprises: acquiring a selected audio file and standard movements corresponding to respective time points in the audio file; playing the audio file, and collecting respective video picture frames in the process of playing the audio file; when the respective time points in the audio file are reached, displaying the corresponding standard movements, and recognizing body movements in the video picture frames synchronously collected at the time points; generating, according to a degree of difference between the standard movement and the body movement at the same time point, movement evaluation information associated with the body movement; and generating a target video according to the audio file, the respective video picture frames and the movement evaluation information associated with the respective body movements. Since the standard movements are body movements to be performed by a user, the invention can effectively enrich dancing movements compared to the prior art, which requires a user to step on corresponding arrows to dance. Moreover, generation of movement evaluation information enables a user to timely ascertain whether a body movement performed by the user meets the standard, thereby improving user experience.

Description

Video generation method, device and electronic device

Cross-reference to related applications

The present application claims priority to Chinese Patent Application No. "201711185439.6", filed on November 23, 2017, entitled "Video Generation Method, Apparatus, and Electronic Apparatus".

Technical field

The present application relates to the field of mobile terminal technologies, and in particular, to a video generation method, apparatus, and electronic device.

Background technique

Somatosensory dance games, through the Internet operating platform, human-computer interaction. The user makes the corresponding body movements according to the prompts of the dancing device according to the somatosensory, so that the user can achieve the fitness function while enjoying the somatosensory interaction experience while dancing.

In the prior art, the somatosensory dance game is mainly applied to fixed devices, such as a somatosensory dance machine, a computer, etc., and the portability is poor. In addition, the judgment of the user's body movement is determined by determining whether the arrow direction of the user's foot is correct or not, and the manner of dancing is relatively simple. Moreover, when the user is playing a game, the user's participation is low because the game process cannot be recorded.

Summary of the invention

The present application aims to solve at least one of the technical problems in the related art to some extent.

To this end, the first object of the present application is to provide a video generation method. Since the standard action is a human body action that the user needs to make, the dance action can be effectively enriched compared to the dance mode of the user's foot arrow in the prior art. To enhance the user experience. In addition, according to the degree of difference between the standard action and the human body action at the same time node, the action evaluation information of the human body action is generated, which enables the user to know in time whether the human body action made by the user is standard, and further enhance the user experience. Finally, by generating a video at the end of the audio playback, the user can play back or share the video, thereby enhancing the user's sense of participation, and solving the prior art somatosensory dance game is mainly applied to fixed devices, such as a somatosensory dance machine, a computer, etc. , portability is poor. In addition, the judgment of the user's body movement is determined by determining whether the arrow direction of the user's foot is correct or not, and the manner of dancing is relatively simple. Moreover, when the user is playing a game, the user's sense of participation is low due to the inability to record the game process.

A second object of the present application is to propose a video generating apparatus.

A third object of the present application is to propose an electronic device.

A fourth object of the present application is to propose a non-transitory computer readable storage medium.

A fifth object of the present application is to propose a computer program product.

To achieve the above objective, the first aspect of the present application provides a video generating method, including:

Obtaining selected audio, and standard actions corresponding to each time node in the audio;

Playing the audio, and collecting each video frame during the playing of the audio;

Displaying a corresponding standard action when the audio is played to each time node, and identifying a human body motion in the video frame frame acquired by the time node synchronously;

Generating action evaluation information of the human body motion according to a degree of difference between the standard action of the node at the same time and the human body motion;

A target video is generated based on the audio, each video frame frame, and motion evaluation information of each human body motion.

Optionally, as the first possible implementation manner of the first aspect, before the playing the audio and simultaneously acquiring the video image, the method further includes:

Display preparation actions and collect preparation images;

It is determined that the human body motion in the preparation image matches the preparation motion.

Optionally, as a second possible implementation manner of the first aspect, the generating the target video according to the motion evaluation information of the audio, each video frame, and each human motion includes:

Adding action evaluation information corresponding to the human body motion in each video frame frame according to the human body motion recognized by each video frame frame;

The target video is generated based on the audio and a video frame frame after adding the motion evaluation information.

Optionally, as a third possible implementation manner of the first aspect, after the action evaluation information of the human motion is generated according to the difference between the standard action and the human motion of the same time node, Also includes:

Displaying motion evaluation information of each human body motion on a shooting interface for collecting each video frame;

When the audio playback ends, the overall evaluation information is generated according to the motion evaluation information of each human body motion;

In the result display interface, the total evaluation information is displayed.

Optionally, as a fourth possible implementation manner of the first aspect, the result display interface further includes: reviewing a control, a shooting control, and a sharing control;

Playing the target video when a triggering operation for the lookback control is detected;

When the triggering operation for the shooting control is detected, the shooting interface is displayed to regenerate the target video;

The target video is shared when a triggering operation for the sharing control is detected.

Optionally, as the fifth possible implementation manner of the first aspect, the sharing, by the target video, includes:

a sharing interface; wherein the sharing interface includes a self-owned platform sharing control and a third-party platform sharing control;

Displaying the shooting control and the display control on the sharing interface when detecting a triggering operation for the own platform sharing control;

When a triggering operation for the display control is detected, a video aggregation page is displayed; the video aggregation page includes the target video and/or a video that has been shared on the own platform.

Optionally, as the sixth possible implementation manner of the first aspect, before the acquiring the selected audio, the method further includes:

The song selection interface is displayed when an operation for the shooting control is detected.

Optionally, as a seventh possible implementation manner of the first aspect, the recognizing a human body motion of the video frame frame that is synchronously collected by the time node includes:

Identifying the joints of the human body in the frame of the video picture;

Connecting two adjacent joints in each joint of the human body to obtain a connection between two adjacent joints;

The human body motion is determined according to the actual angle between the connection between the adjacent two joints and the preset reference direction.

The video generating method of the embodiment of the present application obtains selected audio and standard actions corresponding to each time node in the audio; plays audio, and collects each video frame during the playing of the audio; and plays the audio to each time node. Displaying the corresponding standard action and identifying the human body motion in the video frame frame acquired by the time node synchronously; generating the action evaluation information of the human body action according to the degree of difference between the standard action and the human body action at the same time; according to the audio, Each video frame frame and motion evaluation information of each human body motion generate a target video. In this embodiment, since the standard action is a human body action that the user needs to make, compared with the dance mode of the user's foot arrow in the prior art, the dance action can be effectively enriched and the user experience can be improved. In addition, according to the degree of difference between the standard action and the human body action at the same time node, the action evaluation information of the human body action is generated, which enables the user to know in time whether the human body action made by the user is standard, and further enhance the user experience. Finally, by generating a video at the end of the audio playback, the user can play back or share the video, thereby enhancing the user's sense of participation, and is used to solve the prior art somatosensory dance game mainly used on fixed devices, such as a somatosensory dance machine. Computers, etc., poor portability. In addition, the judgment of the user's body movement is determined by determining whether the arrow direction of the user's foot is correct or not, and the manner of dancing is relatively simple. Moreover, when the user is playing a game, the user's sense of participation is low due to the inability to record the game process.

In order to achieve the above objective, the second aspect of the present application provides a video generating apparatus, including:

a selection module for acquiring selected audio and standard actions corresponding to each time node in the audio;

An acquisition module, configured to play the audio, and collect each video frame during the playing of the audio;

a display module, configured to display a corresponding standard action when the audio is played to each time node, and identify a human body motion of the video frame frame that is synchronously acquired by the time node;

An evaluation module, configured to generate action evaluation information of the human body action according to a degree of difference between the standard action and the human body action at the same time node;

And a generating module, configured to generate a target video according to the audio, each video frame frame, and motion evaluation information of each human body motion.

Optionally, as a first possible implementation manner of the second aspect, the device further includes:

And a display determining module, configured to display a preparation action, and collect a preparation image, and determine that the human body action in the preparation image matches the preparation action before the playing the audio and synchronously acquiring the video image.

Optionally, as a second possible implementation manner of the second aspect, the generating module is specifically configured to:

Optionally, as a third possible implementation manner of the second aspect, the device further includes:

a display generation module, configured to: after the motion evaluation information of the human motion is generated, the motion evaluation information of the human motion is generated according to the degree of difference between the standard motion and the human motion according to the same time node On the interface, the action evaluation information of each human body action is displayed; when the audio play ends, the total evaluation information is generated according to the action evaluation information of each human body motion; and the total evaluation information is displayed on the result display interface.

Optionally, as a fourth possible implementation manner of the second aspect, the result display interface further includes: a look back control, a shooting control, and a sharing control; and the display generating module is further configured to:

Optionally, as a fifth possible implementation manner of the second aspect, the display generating module is specifically configured to:

Optionally, as a sixth possible implementation manner of the second aspect, the device further includes:

The interface display module is configured to display a song selection interface when detecting an operation for the shooting control before the obtaining the selected audio.

Optionally, as a seventh possible implementation manner of the second aspect, the display module is specifically configured to:

Identifying the joints of the human body in the frame of the video picture;

The video generating apparatus of the embodiment of the present application acquires the selected audio and the standard actions corresponding to each time node in the audio; plays the audio, and collects each video frame during the playing of the audio; and plays the audio to each time node. Displaying the corresponding standard action and identifying the human body motion in the video frame frame acquired by the time node synchronously; generating the action evaluation information of the human body action according to the degree of difference between the standard action and the human body action at the same time; according to the audio, Each video frame frame and motion evaluation information of each human body motion generate a target video. In this embodiment, since the standard action is a human body action that the user needs to make, compared with the dance mode of the user's foot arrow in the prior art, the dance action can be effectively enriched and the user experience can be improved. In addition, according to the degree of difference between the standard action and the human body action at the same time node, the action evaluation information of the human body action is generated, which enables the user to know in time whether the human body action made by the user is standard, and further enhance the user experience. Finally, by generating a video at the end of audio playback, the user can play back or share the video to enhance the user's sense of participation, and is used to solve the existing somatosensory dance game mainly used on fixed devices, such as body dancing machines, computers, etc. , portability is poor. In addition, the judgment of the user's body movement is determined by determining whether the arrow direction of the user's foot is correct or not, and the manner of dancing is relatively simple. Moreover, when the user is playing a game, the user's sense of participation is low due to the inability to record the game process.

To achieve the above objective, an embodiment of the third aspect of the present application provides an electronic device including: a housing, a processor, a memory, a circuit board, and a power supply circuit, wherein the circuit board is disposed inside the space enclosed by the housing, and is processed. And a memory disposed on the circuit board; a power supply circuit for powering each circuit or device of the electronic device; a memory for storing executable program code; and the processor operating by reading executable program code stored in the memory The program corresponding to the executable program code is used to execute the video generating method described in the first aspect of the present application.

To achieve the above objective, the fourth aspect of the present application provides a non-transitory computer readable storage medium having stored thereon a computer program, wherein the program is executed by the processor to implement the first aspect of the present application. The video generation method described in the embodiment.

In order to achieve the above object, the fifth aspect of the present application provides a computer program product, where the instructions in the computer program product are executed by a processor, and the video generation method according to the embodiment of the first aspect of the present application is executed. .

The aspects and advantages of the present invention will be set forth in part in the description which follows.

DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the embodiments will be briefly described below. Obviously, the drawings in the following description are some embodiments of the present application. Those skilled in the art can also obtain other drawings based on these drawings without paying any creative work.

FIG. 1 is a schematic flowchart diagram of a first video generating method according to an embodiment of the present application;

2 is a schematic flowchart of a second video generating method according to an embodiment of the present application;

FIG. 3 is a schematic flowchart diagram of a third video generating method according to an embodiment of the present application;

FIG. 4 is a schematic flowchart diagram of a fourth video generating method according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a video generating apparatus according to an embodiment of the present disclosure;

FIG. 6 is a schematic structural diagram of another video generating apparatus according to an embodiment of the present disclosure;

FIG. 7 is a schematic structural diagram of an embodiment of an electronic device according to the present application.

Detailed ways

The embodiments of the present application are described in detail below, and the examples of the embodiments are illustrated in the drawings, wherein the same or similar reference numerals are used to refer to the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the accompanying drawings are intended to be illustrative, and are not to be construed as limiting.

The existing somatosensory dance game is mainly applied to fixed devices, such as a somatosensory dance machine, a computer, etc., and the portability is poor. In addition, the judgment of the user's body movement is determined by determining whether the arrow direction of the user's foot is correct or not, and the manner of dancing is relatively simple. Moreover, when the user is playing the game, the technical problem of the user's participation is low due to the inability to record the game process. In the embodiment of the present application, the selected audio and the standard action corresponding to each time node in the audio are acquired; Audio, and capture each video frame during playback of the audio; display the corresponding standard action when the audio is played to each time node, and identify the human motion in the video frame frame synchronously acquired by the time node; The degree of difference between the standard motion and the human motion generates motion evaluation information of the human motion; and generates a target video based on audio, each video frame frame, and motion evaluation information of each human motion. In this embodiment, since the standard action is a human body action that the user needs to make, compared with the dance mode of the user's foot arrow in the prior art, the dance action can be effectively enriched and the user experience can be improved. In addition, according to the degree of difference between the standard action and the human body action at the same time node, the action evaluation information of the human body action is generated, which enables the user to know in time whether the human body action made by the user is standard, and further enhance the user experience. Finally, by generating a video at the end of the audio playback, the user can play back or share the video, enhancing the user's sense of participation.

The video generation method, apparatus, and electronic device of the embodiments of the present application are described below with reference to the accompanying drawings.

FIG. 1 is a schematic flowchart diagram of a first video generating method according to an embodiment of the present application. The video generation method can be applied to an application of an electronic device, such as a personal computer (PC), a cloud device or a mobile device, a mobile device such as a smart phone, or a tablet computer.

As shown in FIG. 1, the video generation method includes the following steps:

Step 101: Acquire selected audio, and standard actions corresponding to each time node in the audio.

As a possible implementation manner, an application condition of the audio selection may be set on the application of the electronic device. For example, the trigger condition may be an audio selection control, and the user may trigger the selection of audio through the audio selection control. For example, when the user triggers the audio selection control, the song selection interface can be invoked, and then the user can arbitrarily select an audio from the song selection interface as the audio selected by itself. When the user selects the audio, the application can get the audio selected by the user.

As another possible implementation manner, a shooting control may be set on an application of the electronic device, and when the application detects the user's operation for the shooting control, for example, when the user clicks the shooting control, the application interface The song selection interface can be automatically displayed, and then the user can select an audio from the song selection interface according to his own needs as the audio selected by himself. When the user selects the audio, the application can get the audio selected by the user.

In this embodiment, the audio in the song selection interface may be pre-imported into a corresponding standard action. Specifically, each time node in the audio has a corresponding standard action. Therefore, after the application obtains the selected audio, the application The program can obtain standard actions corresponding to each time node from the audio.

In step 102, the audio is played, and each video frame is collected during the playing of the audio.

Optionally, in the shooting interface, after the user selects the audio, the electronic device can play the audio according to the user's operation. For example, when the electronic device detects that the user clicks the audio, the electronic device can play the audio and simultaneously open the audio. The camera captures each video frame.

In step 103, when the audio is played to each time node, the corresponding standard action is displayed.

Since the user needs to react for a period of time from the time when the user sees the standard action to the human body action, in the embodiment of the present application, in order to facilitate the user to make the human body action in time, the audio can be pre-set before each time node is played. The advance time is displayed to show the corresponding standard action. The preset advance time can be set by the user according to his own needs, or the preset advance time can be preset by the built-in program of the electronic device, which is not limited. It should be understood that the preset advance time should not be set too long, for example, the preset advance time may be 0.2 s.

Specifically, for each time node, the time node is compared with the advance time to obtain a difference, and then the difference is used as a starting time, and then a schematic diagram of the standard action can be displayed from the starting time.

As a possible implementation manner, a schematic diagram of a standard motion may be displayed in any area of the shooting interface. The schematic diagram of the standard motion may be fixed, or the schematic diagram of the standard motion may move along a preset trajectory. limit. The preset track may be preset for the built-in program of the electronic device.

As another possible implementation manner, in order to prevent the user from viewing the content on the screen of the electronic device, the user can watch the standard action. In this embodiment, the semi-transparent mask can be displayed on the shooting interface, wherein the mask has In the hollowed out area of interest, the area of interest displays an image showing the standard action, ie a schematic showing the standard actions in the area of interest. Alternatively, the corresponding standard action can be displayed in the form of a barrage on the shooting interface, which is not limited.

When the schematic diagram of the standard motion moves along the preset trajectory, while the photographing interface displays the schematic diagram of the standard motion, the schematic diagram of the standard motion can be controlled to move along the preset trajectory.

Step 104: Identify a human body motion in a video frame frame that is synchronously acquired by the time node.

As a possible implementation manner, the camera for collecting video frame frames may be a camera capable of collecting user depth information, and the acquired depth information may identify human body motions in the video frame. For example, the camera may be a Red-Green-Blue Depth (RGBD), and the depth information of the human body in the video picture frame may be acquired while being imaged, so that the human body motion in the video picture frame can be identified according to the depth information. In addition, the body motion depth information can be acquired by the structured light or the TOF lens, so that the human body motion in the video frame frame can be identified according to the depth information, which is not limited.

As another possible implementation, the joints of the human body in the frame of the video picture can be identified. For example, the face information of the video frame and the position information of the face can be recognized according to the face recognition technology, and then the position information of each joint of the human body can be calculated according to the proportional relationship between the limb and the height in the human anatomy. Of course, the position information of each joint of the human body in the video picture frame can also be determined by other algorithms, which is not limited.

After identifying each joint, the two joints adjacent to each joint of the human body can be connected to obtain the connection between the adjacent two joints, and finally according to the actual angle between the connection between the adjacent joints and the preset reference direction. Determine the human motion in the video frame. The preset reference direction may be a horizontal direction or a vertical direction.

Step 105: Generate action evaluation information of the human body action according to the degree of difference between the standard action and the human body action at the same time node.

In the embodiment of the present application, the action evaluation information of the human body action includes a human action action score, which is used to indicate the degree of difference between the human body action and the corresponding standard action. Specifically, the higher the human action action score indicates the human body action and the corresponding The smaller the difference between the standard actions, and the lower the human action score, the greater the difference between the human body action and the corresponding standard action.

In the embodiment of the present application, before the motion evaluation information of the human body motion is generated, whether the human body motion and the standard motion match are determined according to whether the degree of difference between the human body motion and the standard motion is greater than a difference threshold. Specifically, it is possible to determine a standard angle between a line connecting each adjacent two joints and a reference direction when performing a standard action, and compare the corresponding standard angle with the actual line for each adjacent two joints. The difference between the angles. When the difference calculated by the connection between each adjacent two joints is within the error range, it can be determined that the human motion in the video frame is matched with the standard motion, and when there is at least one adjacent two joints When the difference calculated by the connection is not within the error range, it can be determined that the human motion in the video frame does not match the standard motion.

Optionally, when the human motion in the video frame does not match the standard motion, it indicates that the difference between the human motion and the corresponding standard motion made by the user is greater. At this time, the human motion can be performed by the user. The obtained score is set to 0, and when the human body motion in the video frame frame matches the standard motion, it indicates that the difference between the human body motion and the corresponding standard motion made by the user is small, and at this time, each neighbor can be The connection between the two joints determines the scoring coefficient of the connection according to the corresponding difference and error range. For example, the marker error range is [a, b] and the error is Δ, which can be based on the formula p=1-[2Δ/ (ab)], the score coefficient p of the connection is calculated, or the score coefficient of the connection may be calculated according to other algorithms, which is not limited. After obtaining the scoring coefficient of the connection, the evaluation information of the connection may be generated according to the scoring coefficient of the connection and the score corresponding to the connection. For example, the evaluation information of the connection may be equal to the scoring coefficient of the connection multiplied by the connection. Corresponding score. Finally, the motion evaluation information of the human body motion can be obtained by adding the evaluation information of the links between the adjacent two joints.

Further, the motion evaluation information of the human body motion may further include an animation effect corresponding to the section to which the human motion score belongs. For example, when the human action score is 100, if the human action score belongs to the interval [90, 100], the animation effect can be “perfect or perfect” and match the diamond flash, the interval [80, 90), the animation effect can It is "very good or good" and is matched with flowers.

For example, according to the degree of difference between the standard action of the time node A and the human body motion, the generated human action score is 94 points, and the animation effect generated on the shooting interface is “perfect” and is matched with the diamond flashing. Thereby, the user can be made aware of whether the human body movements made by the user are in a timely manner, thereby improving the user's sense of substitution.

Step 106: Generate a target video according to audio, each video frame frame, and motion evaluation information of each human body motion.

In the embodiment of the present application, when the audio playback ends, the action evaluation information of the human body action corresponding to the different time nodes may be acquired, and then the target video is generated according to the audio, the acquired video picture frames, and the motion evaluation information of the corresponding human body motion. .

As a possible implementation manner, according to the human body motion recognized by each video picture frame, motion evaluation information corresponding to the human body motion may be added in each video picture frame, and then the video picture frame after the information is evaluated according to the audio and the added motion , generate the target video.

The video generating method of the embodiment obtains the selected audio and the standard actions corresponding to each time node in the audio; plays the audio, and collects each video frame during the playing of the audio; when the audio is played to each time node Displaying the corresponding standard action and identifying the human body motion in the video frame frame acquired by the time node synchronously; generating the action evaluation information of the human body action according to the degree of difference between the standard action and the human body action at the same time; according to the audio, each The video frame frame and the motion evaluation information of each human body motion generate a target video. In this embodiment, since the standard action is a human body action that the user needs to make, compared with the dance mode of the user's foot arrow in the prior art, the dance action can be effectively enriched and the user experience can be improved. In addition, according to the degree of difference between the standard action and the human body action at the same time node, the action evaluation information of the human body action is generated, which enables the user to know in time whether the human body action made by the user is standard, and further enhance the user experience. Finally, by generating a video at the end of the audio playback, the user can play back or share the video, enhancing the user's sense of participation.

As a possible implementation manner, in order to prevent the user from inadvertently triggering the shooting control of the electronic device, the camera may accidentally acquire the image, or, in order to prevent the camera from collecting the image when the user is not aligned, the input is invalid. In the case of the image, in the embodiment of the present application, before the electronic device performs image collection, the preparation stage may be entered in advance. The above process will be described in detail below with reference to FIG.

FIG. 2 is a schematic flowchart diagram of a second video generating method according to an embodiment of the present application.

As shown in FIG. 2, the video generation method includes the following steps:

In step 201, the preparation action is displayed, and the preparation image is collected.

In the embodiment of the present application, the preparation action may be displayed on the preparation interface, and the preparation action may be preset by a built-in program of the electronic device, and the preparation action may be, for example, a two-handed action, or other, which is not limited thereto. While displaying the preparation action, the camera of the electronic device can capture the preparation image, wherein the preparation image includes the human body action made by the user.

As a possible implementation manner, the preparation action may be displayed in any area of the preparation interface, and the preparation action may be fixed in a preset time period, or the preparation action may be moved along a preset track, which is not limited. The preset track may be preset for the built-in program of the electronic device.

As another possible implementation manner, in order to prevent the user from viewing the content on the screen of the electronic device, the user can watch the preparation action. In this embodiment, the semi-transparent mask can be displayed on the preparation interface, wherein the mask has In the hollowed out interest area, an image for indicating the preparation action is displayed in the attention area, that is, a schematic diagram showing the preparation action in the attention area. Alternatively, the preparation action may be displayed in the form of a barrage in the preparation interface, which is not limited. Thereby, the user can view other content while watching the preparation action, thereby improving the user experience.

Step 202: Determine that the human body action in the preparation image matches the preparation action.

In the embodiment of the present application, the human body motion in the preparation image can be identified, and then it is determined whether the human body motion in the preparation image matches the preparation motion, and when the human body motion in the preparation image is determined to match the preparation motion, the image acquisition can be started.

As a possible implementation manner, the camera for collecting the prepared image may be a camera capable of collecting user depth information, and the acquired depth information may identify the human body motion in the preparation image. For example, the camera may be a depth camera, and the depth information of the human body in the preparation image may be acquired while being imaged, so that the human body motion in the preparation image can be identified according to the depth information. In addition, the body motion depth information can be acquired by the structured light or the TOF lens, so that the human body motion in the preparation image can be identified according to the depth information, which is not limited.

As another possible implementation manner, each joint of the human body in the preparation image can be identified, and then the two joints adjacent to each joint of the human body are connected to obtain a connection between the adjacent two joints, and finally according to the relationship between the adjacent two joints. The actual angle between the connection and the preset reference direction determines the human motion in the video frame.

After recognizing the human body motion in the preparation image, whether the human body motion matches the preparation motion can be determined according to whether the degree of difference between the human body motion and the preparation motion is greater than a difference threshold. Specifically, it is possible to determine a standard angle between a line connecting each adjacent two joints and a reference direction when performing the preparation action, and compare the corresponding standard angle with the actual line for each adjacent two joints. The difference between the angles. When the difference calculated by the connection between each adjacent two joints is within the error range, it can be determined that the human body motion in the preparation image matches the preparation motion, and when there is at least one adjacent joint between the two joints When the difference calculated by the line is not within the error range, it can be determined that the human body action in the preparation image does not match the preparation action.

The video generation method of this embodiment advances into an accurate stage before the image acquisition by the electronic device. Specifically, the preparation action is displayed, and the preparation image is collected; and the human body action in the preparation image is determined to match the preparation action. In this embodiment, when the human body action and the preparation action are matched, the image acquisition is started, thereby preventing the user from inadvertently triggering the shooting control of the electronic device, thereby causing the camera to accidentally acquire the image, or avoiding the camera being in the wrong state. Image acquisition is performed in the case of a prospective user, resulting in the entry of an invalid image, which ensures the validity and accuracy of subsequent image acquisition.

As a possible implementation manner, in order to enhance the participation and interest in the video generation process, the human body action made by the user can be evaluated. Referring to FIG. 3, based on the embodiment shown in FIG. 1, at step 105. The video generation method may further include the following steps:

Step 301: Display motion evaluation information of each human body motion on a shooting interface for collecting each video frame.

In the embodiment of the present application, when the shooting interface displays the standard action, the video frame frames that are synchronously collected may be multiple, and each video frame frame has a corresponding motion evaluation information, and the motion evaluation information of the human body action is added to the synchronous collection. The video picture frame, that is, the action evaluation information of each human body action is displayed on the shooting interface for collecting each video picture frame. As a possible implementation manner, the generated multiple motion evaluation information may be filtered, the highest evaluation motion evaluation information is retained, and then the highest evaluation motion evaluation information is added to the synchronously collected multiple video frame frames. At least one video frame frame, wherein at least one video frame frame displays a human body motion corresponding to the highest rated motion evaluation information.

Step 302: When the audio playback ends, the total evaluation information is generated according to the motion evaluation information of each human body motion.

In the embodiment of the present application, when the audio playing ends, the action evaluation information of each human body action may be used, and the human action scores included therein are generated to generate a total score, and the animation effect corresponding to the total score belongs to the interval. Generate total rating information.

As a possible implementation manner, the weight corresponding to each standard motion in the audio may be preset, and after determining the motion evaluation information of each human motion, the human motion score of each human motion may be multiplied by the corresponding weight. The product value is obtained, so that the total score is obtained by accumulating the product value, and then the corresponding animation effect is determined according to the interval to which the total score is assigned.

For example, when there are 100 time nodes in the audio, that is, there are 100 standard actions, the weight corresponding to each standard action can be set. For example, the weight corresponding to each standard action can be set to 0.01, when determining each human body action. After the action evaluation information, the product value can be obtained by multiplying the human action score of each human body action by the corresponding weight, thereby obtaining the total score by accumulating the product value. If the total score score obtained is 87, it can be seen that the interval to which it belongs is [80, 90), so the animation effect can be "good" and flash with flowers.

In step 303, the total evaluation information is displayed on the result display interface.

In this embodiment, after the total evaluation information is determined, the total evaluation information may be displayed on the result display interface, so that the user can understand whether the human body action made by the user is standard, and the user experience is improved.

In the video generation method of the embodiment, the action evaluation information of each human body motion is displayed on the shooting interface for collecting each video frame frame, and when the audio playback ends, the action evaluation information is generated according to each human body motion, and the total is generated. Evaluation information, in the results display interface, display the total evaluation information. Thereby, the user can know whether the human body movement made by himself is standard, and the user experience is improved.

In the embodiment of the present application, the result display interface further includes: reviewing the control, the shooting control, and the sharing control. Specifically, when the electronic device detects the trigger operation of the user for the lookback control, the target video can be played, so that the user can understand and correct the human body motion when playing the video, so that the action is more standard when the video is recorded next time; When the device detects the trigger operation of the user for the shooting control, the device may display the shooting interface, and trigger steps 102-106 to regenerate the target video, that is, the user may shoot the video again by triggering the shooting control; and when the electronic device detects the sharing control Share the target video when the action is triggered.

As a possible implementation manner, referring to FIG. 4, sharing the target video includes the following steps:

Step 401, showing a sharing interface.

In the embodiment of the present application, the sharing interface includes a self-owned platform sharing control and a third-party platform sharing control. The third party platform may be, for example, Instagram, Facebook, Twitter, or the like.

In the embodiment of the present application, the sharing interface is displayed, so that the user can share the target video through the sharing control of the sharing interface.

Step 402: When detecting a trigger operation for sharing the control of the own platform, displaying the shooting control and the display control on the sharing interface.

In the embodiment of the present application, when the user triggers the sharing control of the own platform, the sharing interface can display the shooting control and the display control. When the user clicks the shooting control, the electronic device can acquire the audio in the target video and display the preparation interface, so that the user The video can be regenerated based on the audio in the target video. When the user clicks on the display control, step 403 can be triggered.

Step 403: When the triggering operation for the display control is detected, the video aggregation page is displayed; the video aggregation page includes the target video and/or the video shared by the own platform.

In the embodiment of the present application, when the user clicks on the display control, the electronic device can display the video aggregation page, so that the user can share the target video or view the video shared by other users.

Optionally, the video aggregation page can also include a shooting control so that the user can reselect the audio through the shooting control and record the video.

The video generating method of the embodiment displays the shooting control and the display control on the sharing interface when the triggering operation for sharing the control of the own platform is detected by displaying the sharing interface, and displaying the video when the triggering operation for the display control is detected. Aggregate page; the video aggregate page contains the target video and/or videos that have been shared on the own platform. Thereby, the user can share the target video, so that other users can watch the target video and enhance the user's participation.

In order to implement the above embodiments, the present application also proposes a video generating apparatus.

FIG. 5 is a schematic structural diagram of a video generating apparatus according to an embodiment of the present disclosure.

As shown in FIG. 5, the video generating apparatus 500 includes a selection module 510, an acquisition module 520, a presentation module 530, an evaluation module 540, and a generation module 550. among them,

The selection module 510 is configured to acquire selected audio and standard actions corresponding to each time node in the audio.

The acquisition module 520 is configured to play audio and collect each video frame during the playing of the audio.

The display module 530 is configured to display a corresponding standard action when the audio is played to each time node, and identify a human body motion of the video frame frame acquired by the time node synchronously.

As a possible implementation manner, the display module 530 is specifically configured to identify each joint of the human body in the frame of the video picture; connect two adjacent joints in each joint of the human body to obtain a connection between two adjacent joints; The actual angle between the line between the adjacent joints and the preset reference direction determines the human body motion.

The evaluation module 540 is configured to generate motion evaluation information of the human body motion according to the degree of difference between the standard motion and the human body motion at the same time node.

The generating module 550 is configured to generate a target video according to audio, each video frame frame, and motion evaluation information of each human body motion.

As a possible implementation manner, the generating module 550 is specifically configured to: add motion evaluation information corresponding to the human body motion in each video frame frame according to the human body motion recognized by each video frame frame; and evaluate information according to the audio and the added motion The subsequent video frame frame generates the target video.

Further, as a possible implementation manner of the embodiment of the present application, referring to FIG. 6, on the basis of the embodiment shown in FIG. 5, the video generating apparatus 500 may further include:

The display determining module 560 is configured to display a preparation action before the audio is played and synchronously capture the video picture, and collect the preparation image to determine that the human body action in the preparation image matches the preparation action.

The display generation module 570 is configured to display the action evaluation information of the human body action according to the difference degree between the standard action and the human body action at the same time node, and display each human body action on the shooting interface of collecting each video picture frame. Action evaluation information; when the audio playback ends, the total evaluation information is generated according to the motion evaluation information of each human body motion; and the total evaluation information is displayed on the result display interface.

The interface display module 580 is configured to display a song selection interface when detecting an operation for the shooting control before acquiring the selected audio.

In the embodiment of the present application, the result display interface further includes: a lookback control, a shooting control, and a sharing control; the display generating module 570 is further configured to play the target video when the triggering operation for the lookback control is detected; When the shooting operation of the shooting control is performed, the shooting interface is displayed to regenerate the target video; when the triggering operation for the sharing control is detected, the target video is shared.

As a possible implementation manner, the display generation module 570 is specifically configured to display a sharing interface; wherein the sharing interface includes a self-owned platform sharing control and a third-party platform sharing control; when detecting a trigger operation for sharing the control of the own platform Displaying the shooting control and display control in the sharing interface; displaying the video aggregation page when detecting the triggering operation for the display control; the video aggregation page includes the target video and/or the video shared by the own platform.

It should be noted that the foregoing description of the video generation method embodiment is also applicable to the video generation apparatus 500 of this embodiment, and details are not described herein again.

The video generating apparatus of this embodiment acquires the selected audio and the standard actions corresponding to each time node in the audio; plays the audio, and collects each video picture frame during the playing of the audio; when the audio is played to each time node Displaying the corresponding standard action and identifying the human body motion in the video frame frame acquired by the time node synchronously; generating the action evaluation information of the human body action according to the degree of difference between the standard action and the human body action at the same time; according to the audio, each The video frame frame and the motion evaluation information of each human body motion generate a target video. In this embodiment, since the standard action is a human body action that the user needs to make, compared with the dance mode of the user's foot arrow in the prior art, the dance action can be effectively enriched and the user experience can be improved. In addition, according to the degree of difference between the standard action and the human body action at the same time node, the action evaluation information of the human body action is generated, which enables the user to know in time whether the human body action made by the user is standard, and further enhance the user experience. Finally, by generating a video at the end of the audio playback, the user can play back or share the video, enhancing the user's sense of participation.

The embodiment of the present application further provides an electronic device, where the electronic device includes the device described in any of the foregoing embodiments.

FIG. 7 is a schematic structural diagram of an embodiment of an electronic device according to the present application, which may implement the process of the embodiment shown in FIG. 1-6 of the present application. As shown in FIG. 7, the electronic device may include: a housing 71, a processor 72, and a memory. 73, a circuit board 74 and a power supply circuit 75, wherein the circuit board 74 is disposed inside the space surrounded by the housing 71, the processor 72 and the memory 73 are disposed on the circuit board 74, and the power supply circuit 75 is used for the electronic device Each circuit or device is powered; the memory 73 is for storing executable program code; the processor 72 runs a program corresponding to the executable program code by reading the executable program code stored in the memory 73 for performing any of the foregoing embodiments The video generation method.

For the specific execution of the foregoing steps by the processor 72 and the steps performed by the processor 72 by running the executable program code, refer to the description of the embodiment shown in FIG. 1-6 of the present application, and details are not described herein again.

The electronic device exists in a variety of forms including, but not limited to:

(1) Mobile communication devices: These devices are characterized by mobile communication functions and are mainly aimed at providing voice and data communication. Such terminals include: smart phones (such as iPhone), multimedia phones, functional phones, and low-end phones.

(2) Ultra-mobile personal computer equipment: This type of equipment belongs to the category of personal computers, has computing and processing functions, and generally has mobile Internet access. Such terminals include: PDAs, MIDs, and UMPC devices, such as the iPad.

(3) Portable entertainment devices: These devices can display and play multimedia content. Such devices include: audio, video players (such as iPod), handheld game consoles, e-books, and smart toys and portable car navigation devices.

(4) Server: A device that provides computing services. The server consists of a processor, a hard disk, a memory, a system bus, etc. The server is similar to a general-purpose computer architecture, but because of the need to provide highly reliable services, processing power and stability High reliability in terms of reliability, security, scalability, and manageability.

(5) Other electronic devices with data interaction functions.

One of ordinary skill in the art can understand that all or part of the process of implementing the foregoing embodiments can be completed by a computer program to instruct related hardware, and the program can be stored in a computer readable storage medium. When executed, the flow of an embodiment of the methods as described above may be included. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

The foregoing is only a specific embodiment of the present application, but the scope of protection of the present application is not limited thereto, and any change or replacement that can be easily conceived by those skilled in the art within the technical scope disclosed by the present application is All should be covered by the scope of this application. Therefore, the scope of protection of this application should be determined by the scope of protection of the claims.

In order to implement the above embodiments, the present application further provides a non-transitory computer readable storage medium having stored thereon a computer program, wherein the program is executed by a processor to implement a video generation method as described in the foregoing embodiments. .

In order to implement the above embodiments, the present application also provides a computer program product that, when executed by a processor, executes a video generation method as described in the foregoing embodiments.

In the description of the present specification, the description with reference to the terms "one embodiment", "some embodiments", "example", "specific example", or "some examples" and the like means a specific feature described in connection with the embodiment or example. A structure, material or feature is included in at least one embodiment or example of the application. In the present specification, the schematic representation of the above terms is not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in a suitable manner in any one or more embodiments or examples. In addition, various embodiments or examples described in the specification, as well as features of various embodiments or examples, may be combined and combined.

Moreover, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, features defining "first" or "second" may include at least one of the features, either explicitly or implicitly. In the description of the present application, the meaning of "a plurality" is at least two, such as two, three, etc., unless specifically defined otherwise.

Any process or method description in the flowcharts or otherwise described herein may be understood to represent a module, segment or portion of code comprising one or more executable instructions for implementing the steps of a custom logic function or process. And the scope of the preferred embodiments of the present application includes additional implementations, in which the functions may be performed in a substantially simultaneous manner or in the reverse order depending on the functions involved, in accordance with the illustrated or discussed order. It will be understood by those skilled in the art to which the embodiments of the present application pertain.

The logic and/or steps represented in the flowchart or otherwise described herein, for example, may be considered as an ordered list of executable instructions for implementing logical functions, and may be embodied in any computer readable medium, Used in conjunction with, or in conjunction with, an instruction execution system, apparatus, or device (eg, a computer-based system, a system including a processor, or other system that can fetch instructions and execute instructions from an instruction execution system, apparatus, or device) Or use with equipment. For the purposes of this specification, a "computer-readable medium" can be any apparatus that can contain, store, communicate, propagate, or transport a program for use in an instruction execution system, apparatus, or device, or in conjunction with the instruction execution system, apparatus, or device. More specific examples (non-exhaustive list) of computer readable media include the following: electrical connections (electronic devices) having one or more wires, portable computer disk cartridges (magnetic devices), random access memory (RAM), Read only memory (ROM), erasable editable read only memory (EPROM or flash memory), fiber optic devices, and portable compact disk read only memory (CDROM). In addition, the computer readable medium may even be a paper or other suitable medium on which the program can be printed, as it may be optically scanned, for example by paper or other medium, followed by editing, interpretation or, if appropriate, other suitable The method is processed to obtain the program electronically and then stored in computer memory.

It should be understood that portions of the application can be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, multiple steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware and in another embodiment, it can be implemented by any one or combination of the following techniques well known in the art: discrete with logic gates for implementing logic functions on data signals Logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), and the like.

One of ordinary skill in the art can understand that all or part of the steps carried by the method of implementing the above embodiments can be completed by a program to instruct related hardware, and the program can be stored in a computer readable storage medium. When executed, one or a combination of the steps of the method embodiments is included.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing module, or each unit may exist physically separately, or two or more units may be integrated into one module. The above integrated modules can be implemented in the form of hardware or in the form of software functional modules. The integrated modules, if implemented in the form of software functional modules and sold or used as separate products, may also be stored in a computer readable storage medium.

The above mentioned storage medium may be a read only memory, a magnetic disk or an optical disk or the like. While the embodiments of the present application have been shown and described above, it is understood that the above-described embodiments are illustrative and are not to be construed as limiting the scope of the present application. The embodiments are subject to variations, modifications, substitutions and variations.

Claims

A video generation method, comprising the steps of:

Obtaining selected audio, and standard actions corresponding to each time node in the audio;

Playing the audio, and collecting each video frame during the playing of the audio;

Displaying a corresponding standard action when the audio is played to each time node, and identifying a human body motion in the video frame frame acquired by the time node synchronously;

Generating action evaluation information of the human body motion according to a degree of difference between the standard action of the node at the same time and the human body motion;

A target video is generated based on the audio, each video frame frame, and motion evaluation information of each human body motion.
The video generating method according to claim 1, wherein before the playing the audio and simultaneously acquiring the video image, the method further comprises:

Display preparation actions and collect preparation images;

It is determined that the human body motion in the preparation image matches the preparation motion.
The video generating method according to claim 1 or 2, wherein the generating the target video according to the audio, each video frame frame, and the action evaluation information of each human body motion comprises:

Adding action evaluation information corresponding to the human body motion in each video frame frame according to the human body motion recognized by each video frame frame;

The target video is generated based on the audio and a video frame frame after adding the motion evaluation information.
The video generating method according to any one of claims 1 to 3, wherein the action evaluation of the human body motion is generated according to the degree of difference between the standard action and the human body motion of the same time node After the information, it also includes:

Displaying motion evaluation information of each human body motion on a shooting interface for collecting each video frame;

When the audio playback ends, the overall evaluation information is generated according to the motion evaluation information of each human body motion;

In the result display interface, the total evaluation information is displayed.
The video generating method according to claim 4, wherein the result display interface further comprises: reviewing a control, a shooting control, and a sharing control;

Playing the target video when a triggering operation for the lookback control is detected;

When the triggering operation for the shooting control is detected, the shooting interface is displayed to regenerate the target video;

The target video is shared when a triggering operation for the sharing control is detected.
The video generating method according to claim 5, wherein the sharing the target video comprises:

a sharing interface; wherein the sharing interface includes a self-owned platform sharing control and a third-party platform sharing control;

Displaying the shooting control and the display control on the sharing interface when detecting a triggering operation for the own platform sharing control;

When a triggering operation for the display control is detected, a video aggregation page is displayed; the video aggregation page includes the target video and/or a video that has been shared on the own platform.
The video generating method according to any one of claims 1 to 6, wherein before the obtaining the selected audio, the method further comprises:

The song selection interface is displayed when an operation for the shooting control is detected.
The video generating method according to any one of claims 1 to 7, wherein the recognizing the human body motion of the video frame frame acquired by the time node synchronously includes:

Identifying the joints of the human body in the frame of the video picture;

Connecting two adjacent joints in each joint of the human body to obtain a connection between two adjacent joints;

The human body motion is determined according to the actual angle between the connection between the adjacent two joints and the preset reference direction.
A video generating device, the device comprising:

a selection module for acquiring selected audio and standard actions corresponding to each time node in the audio;

An acquisition module, configured to play the audio, and collect each video frame during the playing of the audio;

a display module, configured to display a corresponding standard action when the audio is played to each time node, and identify a human body motion of the video frame frame that is synchronously acquired by the time node;

An evaluation module, configured to generate action evaluation information of the human body action according to a degree of difference between the standard action and the human body action at the same time node;

And a generating module, configured to generate a target video according to the audio, each video frame frame, and motion evaluation information of each human body motion.
The video generating apparatus according to claim 9, wherein the apparatus further comprises:

And a display determining module, configured to display a preparation action, and collect a preparation image, and determine that the human body action in the preparation image matches the preparation action before the playing the audio and synchronously acquiring the video image.
The video generating apparatus according to claim 9 or 10, wherein the generating module is specifically configured to:

Adding action evaluation information corresponding to the human body motion in each video frame frame according to the human body motion recognized by each video frame frame;

The target video is generated based on the audio and a video frame frame after adding the motion evaluation information.
The video generating apparatus according to any one of claims 9-11, wherein the apparatus further comprises:

a display generation module, configured to: after the motion evaluation information of the human motion is generated, the motion evaluation information of the human motion is generated according to the degree of difference between the standard motion and the human motion according to the same time node On the interface, the action evaluation information of each human body action is displayed; when the audio play ends, the total evaluation information is generated according to the action evaluation information of each human body motion; and the total evaluation information is displayed on the result display interface.
The video generating apparatus according to claim 12, wherein the result display interface further comprises: a look back control, a shooting control, and a sharing control; and the display generating module is further configured to:

Playing the target video when a triggering operation for the lookback control is detected;

When the triggering operation for the shooting control is detected, the shooting interface is displayed to regenerate the target video;

The target video is shared when a triggering operation for the sharing control is detected.
The video generating apparatus according to claim 13, wherein the display generating module is specifically configured to:

a sharing interface; wherein the sharing interface includes a self-owned platform sharing control and a third-party platform sharing control;

Displaying the shooting control and the display control on the sharing interface when detecting a triggering operation for the own platform sharing control;

When a triggering operation for the display control is detected, a video aggregation page is displayed; the video aggregation page includes the target video and/or a video that has been shared on the own platform.
The video generating apparatus according to any one of claims 9 to 14, wherein the apparatus further comprises:

The interface display module is configured to display a song selection interface when detecting an operation for the shooting control before the obtaining the selected audio.
The video generating apparatus according to any one of claims 9 to 15, wherein the display module is specifically configured to:

Identifying the joints of the human body in the frame of the video picture;

Connecting two adjacent joints in each joint of the human body to obtain a connection between two adjacent joints;

The human body motion is determined according to the actual angle between the connection between the adjacent two joints and the preset reference direction.
An electronic device, comprising: a housing, a processor, a memory, a circuit board, and a power supply circuit, wherein the circuit board is disposed inside the space enclosed by the housing, and the processor and the memory are disposed on the circuit board; a circuit for powering each circuit or device of the above electronic device; a memory for storing executable program code; the processor running a program corresponding to the executable program code by reading executable program code stored in the memory, for A video generation method according to any one of claims 1-8.
A non-transitory computer readable storage medium having stored thereon a computer program, wherein the program is executed by a processor to implement the video generating method according to any one of claims 1-8.
A computer program product, wherein the video generation method according to any one of claims 1-8 is performed when an instruction in the computer program product is executed by a processor.