CN113038280A

CN113038280A - Video interaction method and device and storage medium

Info

Publication number: CN113038280A
Application number: CN201911251512.4A
Authority: CN
Inventors: 唐自信
Original assignee: Shanghai Hode Information Technology Co Ltd
Current assignee: Shanghai Hode Information Technology Co Ltd
Priority date: 2019-12-09
Filing date: 2019-12-09
Publication date: 2021-06-25

Abstract

The invention relates to a video interaction method, a video interaction device and a storage medium, and belongs to the technical field of internet. The method comprises the steps of firstly playing a first video; simultaneously shooting and playing audience videos of audiences; further identifying the audience images in the audience video to generate an identification result; the operation instruction associated with the first video or the audience video is generated based on the identification result and is operated, so that feedback or operation according to the self-timer picture of the audience is realized while the video is played, the operation can be associated with the first video to realize control over video playing, and can also be associated with the audience video, the participation sense and immersion degree of the audience are effectively improved, and further, the user experience of the network video audience is greatly improved.

Description

Video interaction method and device and storage medium

Technical Field

The invention relates to the technical field of internet, in particular to the technical field of network video application, and specifically relates to a video interaction method, a video interaction device and a storage medium.

Background

Network video is a very popular internet business at present. With the continuous development of the mobile internet, the network video resources are more and more abundant, so that the network video resources become an important part of internet application.

At present, the requirement of network video audiences is no longer simply watching videos, and various unique application forms are continuously developed to meet the requirements of the audiences in different scenes, and in these scenes, the audiences may be required to make corresponding reactions according to the network videos, and further, it may be desirable that such reactions need to be fed back into the videos to realize the participation and immersion of users.

In the prior art, the transmission of barrage is the most common way for audience participation. However, in some scenarios, user participation cannot be achieved by sending a barrage, or, in certain scenarios, a user cannot actually send a barrage.

For example, taking a very popular baked tutorial video on the network as an example, the viewer usually has a requirement of actual operation while watching the tutorial, and after watching a step in the video tutorial, the viewer needs to pause the video, perform a corresponding simulation operation, and then continue to play the video to watch the next segment, and learn a corresponding operation method. Teaching videos of art, calligraphy and the like, teaching videos of handcraft such as paper folding and the like have the same requirements. The user experience is poor because it is difficult for the viewer to have his hands free to control the video playback in actual operations such as baking.

In other scenarios, such as video for broadcasting exercises, dancing, and fitness demonstrations, viewers need to watch the video and imitate the movements thereof, and to take a self-timer photograph, and preferably compare the self-timer photograph with the video to determine whether the movements are standard. The current prior art cannot meet the requirement, and the participation sense of users is low.

Therefore, how to provide a novel video interaction method, when playing a video, various feedback information of audiences is obtained, and then audience participation and immersion are improved, user experience is improved, and the problem to be solved in the field is urgently solved.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, and provides a video interaction method, a device and a storage medium, wherein the video interaction method, the device and the storage medium are used for playing the self-timer of audiences while playing the video, and realizing the feedback or operation of the played video according to a self-timer picture, so that the participation and immersion of the audiences are improved, and the user experience of the network video audiences is effectively improved.

In order to achieve the above object, the video interaction method of the present invention comprises the following steps:

playing the first video;

shooting and playing the video of the audience;

identifying the audience images in the audience video to generate an identification result;

and generating and executing an operation instruction according to the identification result, wherein the operation instruction is related to at least one of the first video and the audience video.

In the video interaction method, the operation instruction is a play control instruction for controlling the play of the first video.

In the video interaction method, the identifying of the audience images in the audience video to generate an identification result specifically comprises the following steps:

identifying a viewer hand image in said viewer video;

the generating and operating the operation instruction according to the identification result specifically comprises:

and generating and operating the play control instruction according to the hand image of the audience.

judging whether an object image exists in the audience video or not, and generating a judgment result;

and when the judgment result shows that no person image exists, generating prompt information, and playing the prompt information in at least one of the first video and the audience video.

In the video interaction method, the judgment of whether there is an object image in the audience video generates a judgment result, specifically:

and judging whether a human image exists in the appointed area in the audience video to generate a judgment result.

In the video interaction method, the played audience video comprises a timer,

when the judgment result shows that the designated area has the figure image, starting the timer to time; and stopping the timer to time when the judgment result shows that no person image exists in the designated area.

In the video interaction method, at least one of the first video and the audience video comprises a social function interaction interface,

the operation instruction is associated with the interactive interface so as to realize the corresponding social function through the interactive interface.

identifying viewer actions in said viewer video;

generating and operating an operation instruction according to the identification result, specifically:

and generating and operating a social function operation instruction corresponding to the action according to the audience action.

In the video interaction method, the first video comprises a character dynamic picture;

the identifying of the audience images in the audience video to generate an identification result specifically comprises the following steps:

identifying the audience posture in the audience video in real time to generate audience posture data;

and generating evaluation information according to the coincidence degree of the audience posture data and the character dynamic picture, and playing the evaluation information in at least one of the first video and the audience video.

In the video interaction method, the playing of the first video specifically comprises:

playing a first video comprising a character dynamic picture, and synchronously playing a character posture demonstration picture corresponding to the character dynamic picture;

the generating of evaluation information according to the coincidence degree of the audience posture data and the character dynamic picture specifically comprises the following steps:

and generating evaluation information according to the coincidence degree of the audience posture data and the character posture demonstration picture.

In the video interaction method, the playing of the first video specifically comprises the following steps:

playing a first video comprising a character dynamic picture;

identifying a character dynamic picture in the first video to generate first video character posture data;

generating the character posture demonstration picture according to the first video character posture data;

playing the figure posture demonstration picture;

and generating evaluation information according to the coincidence degree of the audience posture data and the first video character posture data.

In the video interaction method, the playing of the first video is live broadcasting or recorded broadcasting.

The present invention also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the above-described video interaction method.

The invention also provides a video interaction device, which comprises a processor, a memory, a display and a camera, wherein the memory is stored with a computer program, and the computer program is executed by the processor to realize the video interaction method.

In the video interaction device, the first video and the audience video are simultaneously displayed on the same display.

The video interaction method, the device and the storage medium are adopted, and the method firstly plays a first video; simultaneously shooting and playing audience videos of audiences; further identifying the audience images in the audience video to generate an identification result; the operation instruction associated with the first video or the audience video is generated based on the identification result and is operated, so that feedback or operation according to the self-timer picture of the audience is realized while the video is played, the operation can be associated with the first video to realize control over video playing, and can also be associated with the audience video, the participation sense and immersion degree of the audience are effectively improved, and further, the user experience of the network video audience is greatly improved.

Drawings

FIG. 1 is a flowchart illustrating steps of a video interaction method according to the present invention.

Fig. 2 is a schematic interface diagram of a baking teaching video interaction according to embodiment 1 implemented by the method of the present invention.

Fig. 3 is a schematic interface diagram of an accompanied video interaction according to embodiment 2 implemented by the method of the present invention.

Fig. 4 is a schematic interface diagram of the meditation video interaction according to embodiment 3 implemented by the method of the present invention.

Fig. 5 is a schematic interface diagram of implementing live video interaction described in embodiment 4 by using the method of the present invention.

FIG. 6 is a schematic interface diagram of implementing dance video interaction described in embodiment 5 by using the method of the present invention.

Detailed Description

In order to clearly understand the technical contents of the present invention, the following examples are given in detail.

Please refer to fig. 1, which is a flowchart illustrating a video interaction method according to the present invention.

In one embodiment, the video interaction method comprises the following steps:

playing a first video, wherein the first video can be a live video or a recorded video;

shooting and playing the video of the audience;

In a preferred embodiment, the operation command is a play control command for controlling the playing of the first video.

Correspondingly, the identifying the audience image in the audience video to generate an identification result specifically comprises:

identifying a viewer hand image in said viewer video;

For example, a two-handed T-shaped play is pause, a single-handed clockwise circle is play, a counterclockwise circle is play from scratch, a single-handed left swipe is rewind, a single-handed right swipe is fast forward, and so on. The corresponding relation between the gesture and the playing control instruction can be set according to actual needs.

In another preferred embodiment, the identifying the audience image in the audience video to generate an identification result specifically includes:

This embodiment can be applied in scenarios where the viewer is urged to learn or rest all the time in front of the shot.

In a further preferred embodiment, the determining whether there is an object image in the viewer video and generating a determination result specifically includes:

Also, a timer may be included in the viewer video being played,

The embodiment can be applied to scenes of urging audiences to sit, meditation and the like in the scene.

In yet another preferred embodiment, at least one of the first video and the viewer video includes a social function interactive interface,

identifying viewer actions in said viewer video;

The embodiment is particularly suitable for the operation of the audience for realizing social functions of praise, coin-feed and the like on the live video (the first video) through actions.

In yet another preferred embodiment, the first video comprises a character moving picture;

This embodiment is particularly suitable for application scenarios where the viewer imitates movements in the first video, in particular learning dance, fitness movements.

In a further preferred embodiment, the playing the first video specifically includes:

In a more preferred embodiment, the playing the first video specifically includes the following steps:

playing a first video comprising a character dynamic picture;

playing the figure posture demonstration picture;

The invention also provides a video interaction device and a computer readable storage medium. The video interaction device comprises a processor, a memory, a display and a camera. The memory is the computer readable storage medium, and a computer program is stored thereon. When the computer program is executed by the processor, the video interaction method according to the above embodiments is implemented. And in the video interaction device, the first video and the audience video are simultaneously displayed on the same display.

The following describes how to implement the video interaction method of the present invention in practical applications by using several embodiments.

Example 1

Take the example of watching a baking instructional video.

The viewer first plays the baking teaching video (first video) with a device such as a mobile phone or tablet, and the teaching video may be a recorded video.

At the same time, the front camera of the device starts to shoot the audience, and the self-timer video (audience video) is played on the device at the same time, and the interface is shown in fig. 2.

The equipment further identifies the audience images in the audience video to generate an identification result; specifically, a viewer hand image in the identified viewer video; and generating and operating the playing control instruction according to the hand image of the audience.

For example, when it is recognized that the viewer has made a T-gesture with both hands, the teaching video playback is paused, one-handed clockwise rounding is recognized as beginning playback, counterclockwise rounding is recognized as beginning playback, one-handed left-handed sliding is reversed, one-handed right-handed sliding is fast forward, and so on. The corresponding relation between the gesture and the playing control instruction can be set according to actual needs.

In the actual implementation method of the gesture recognition, LibHand or other software implementation with similar functions can be adopted. LibHand is an open source code library for presenting and recognizing gestures of a person. And provides a simple programming interface to acquire and analyze gesture images.

In this embodiment, the spectator can carry out the corresponding operation practice by watching the baking teaching video. The device simultaneously plays the teaching video and the self-timer video of the audience. The joining audience begins with the teaching video and the face is merged. The general dough kneading process is simple in action but long in time, teaching videos are often spliced, and actual operation definitely needs longer time. The spectator certainly hopes to be able to pause the video, but both hands are stained with flour again, and it is very inconvenient to go to the operation equipment, just can make the T word gesture in the front of the camera both hands, and after the T word gesture in the auto heterodyne video is discerned, can be associated with the pause video playback control instruction that this gesture corresponds. When the user and the face want to continue to learn the subsequent operation, a single hand can be used for drawing a circle clockwise, after the gesture is recognized, the corresponding video playing starting control instruction can be associated, and the video continues to start playing. Therefore, the method greatly facilitates the operation of the video by the audience in the scene, and is particularly suitable for the watching scene of the teaching video needing to be watched while learning, such as baking, art, calligraphy, handwork and the like.

The corresponding relationship between the gestures and the commands is only used for illustration, and the playing control commands corresponding to the gestures can be set according to actual needs and user habits.

Example 2(AR present)

Also take the example of watching a baking instructional video.

The difference from embodiment 1 is that in this embodiment 2, the baking teaching video is generally a live video. The audience can realize the live broadcast interaction with the teaching demonstration anchor through gestures.

At least one of the live video (first video) and the self-timer video (viewer video) includes a social function interactive interface that enables a social function of delivering a gift. In one aspect, the social function interactive interface may have touch buttons as with conventional interfaces. In addition, the corresponding interactive function can also be triggered by the action of the audience.

In particular, in this embodiment, viewer actions in a viewer video may be identified; and generating and operating a social function operation instruction corresponding to the action according to the action of the audience, thereby achieving the effect of delivering the gift.

For example, when watching a live video, the viewer holds up his arm and takes a sigmoid path with one hand, which generates a gift-offering instruction, achieving an interactive effect of offering a gift to the anchor. The corresponding relation between the gesture and the playing control instruction can be set according to actual needs.

In the actual implementation method of gesture recognition, as in embodiment 1, LibHand or other software implementation with similar functions may be adopted.

Example 3

Take watching a reading-accompanying video as an example.

The accompanying reading video is a novel network video application form, can be a courseware teaching video, is matched with an accompanying reading function, and can also be a simple accompanying reading video combined with animation.

Similar to embodiment 1, the viewer first starts playing the reading accompanying video (first video) by using a device such as a mobile phone or a tablet, and the reading accompanying video may be a recorded video or a live video.

At the same time, the front camera of the device starts to shoot the audience and the self-timer video (audience video) is played on the device at the same time. The interface is shown in figure 3.

Further judging whether there is an object image in the audience video, namely judging whether the audience is in the audience video; if the video is in the audience video, normally playing the accompanying video; if not, generating a prompt message and playing the prompt message in at least one of the first video and the viewer video.

The prompt message can be a voice or text prompt, such as displaying a prompt word of 'refuel', 'fast-back learning' and the like in the video of the audience, and if the accompany video is combined with an animation, the prompt message can also be demonstrated and read by an animation character in the accompany video. Or the accompanying reading video is a courseware teaching video, when the audience is judged not to be in the audience video, the playing of the courseware teaching video can be paused until the object image in the audience video is identified, and the playing is resumed.

The network video courseware, especially various examination taking-by-examination videos, is popular with more and more students due to the specialty and the flexibility of the network video courseware. By learning by watching the network, greater learning autonomy can be achieved. However, at the same time, autonomous learning often lacks supervision, resulting in low learning efficiency. The embodiment can detect whether the audience is at the learning position or not in the process of watching courseware or other autonomous learning by the audience, and give corresponding prompts or control the playing of video courseware, thereby improving the learning efficiency.

Example 4

Take the meditation video as an example.

As the modern people have fast pace of life and great pressure brought by work and the like, the relaxation modes of sitting, meditation and the like are more and more popular. Meditation videos are generated to meet this need. The video of the music playing device is only a simple natural scene picture or a sitting and meditation demonstration picture, and the music playing device is matched with soothing music to bring a relaxing effect to audiences.

Similarly to embodiment 3, the viewer first starts playing the meditation video (first video) using a device such as a mobile phone or a tablet, and the meditation video may be a recorded video or a live video.

At the same time, the front camera of the device starts to shoot the audience and the self-timer video (audience video) is played on the device at the same time. The interface is shown in fig. 4.

The difference from embodiment 3 is that this embodiment not only determines whether there is a person image in the viewer video, but also needs to further determine whether the person image is in a specified area of the viewer video. If not, a corresponding prompt may be given by a method similar to that of example 3.

The designated area may be defined by a contour, which may be a sitting posture contour, to ensure that the viewer maintains a standard posture while watching the video and meditating for better relaxation.

Further, in this embodiment, a timer may be displayed in the viewer video, and when there is a person image in the designated area, the timer is started to time; and when no person image exists, stopping the timer for timing. So as to meet the requirement that the audience needs to time the meditation duration.

Example 5

Take live video as an example.

With the development of mobile internet, personal casting has become an occupation welcomed by young people, and live video has gradually become an important social means.

Similar to the embodiments described above, the viewer first starts to open the live video (first video) using a device such as a cell phone or tablet.

At the same time, the front camera of the device starts to shoot the audience and the self-timer video (audience video) is played on the device at the same time. The interface is shown in fig. 5.

The difference from the above embodiments is that at this time, at least one of the live video (first video) and the self-timer video (viewer video) includes a social function interactive interface, and the social function interactive interface can implement social functions of like, coin-in, collection, flower delivery, and the like. On one hand, the social function interactive interface can be provided with touch keys like a conventional interface, and in addition, corresponding interactive functions can be triggered through viewer actions.

In particular, in this embodiment, viewer actions in a viewer video may be identified; and generating and operating a social function operation instruction corresponding to the action according to the audience action.

For example, limb movements of the viewer may be identified, such as the viewer making a biventricular movement of both hands in front of the chest, which may be identified and associated to a collection function. Or may be a gesture action that identifies the viewer, such as the viewer holding a thumb up on one hand, which may be identified and associated with a like function, and so forth.

On the actual implementation method of limb movement recognition, the method can be implemented by using an open-source human posture recognition project OpenPose or other similar methods. The OpenPose human posture recognition project is an open source library which is developed based on a convolutional neural network and supervised learning and taking cafe as a framework. The gesture estimation of human body action, facial expression, finger motion and the like can be realized. The related human posture estimation technology can be well applied to the fields of sports fitness, motion acquisition, 3D fitting and the like.

In the implementation method of gesture recognition, LibHand or other software with similar functions can be used for implementation as described in embodiment 1.

Through the embodiment, the audience can realize the corresponding social function through actions or gestures, so that the audience obtains a lot of participation feelings, and the user experience is improved.

Example 6

Take a dance video as an example.

Practicing dancing by watching dance videos is yet another major audience source for internet videos.

Similar to the embodiments described above, the viewer first starts to open the dance video (first video) using a device such as a cell phone or tablet.

At the same time, the front camera of the device starts to shoot the audience and the self-timer video (audience video) is played on the device at the same time. The interface is shown in fig. 6.

The difference from the above embodiments is that the dance video includes a character moving picture; further, audience posture data can be generated by identifying the audience posture in the audience video in real time; and finally, generating evaluation information according to the coincidence degree of the audience posture data and the character dynamic picture, and playing the evaluation information in at least one of the first video and the audience video. The evaluation information may be a score information or information of other evaluation methods.

Further, in this embodiment, the playing the first video specifically includes:

playing a first video comprising a character dynamic picture, and synchronously playing a character posture demonstration picture corresponding to the character dynamic picture; namely, the first video comprises a character dance picture and also comprises a posture demonstration picture similar to the shape of a matchmaker, and the posture demonstration picture can show dance movements more clearly and help audiences to learn; and further, evaluation information can be generated according to the coincidence degree of the audience posture data and the character posture demonstration picture.

Furthermore, especially when the dance video is a live video, the gesture demonstration picture can be generated in real time, and the specific process comprises the following steps:

generating first video character pose data by identifying character dynamic pictures in the first video; generating a character posture demonstration picture according to the first video character posture data;

accordingly, evaluation information can be generated according to the coincidence degree of the audience posture data and the first video character posture data.

In the actual implementation method of limb movement recognition, the same method can be adopted for recognizing the movement in the audience video and the movement in the dance video, that is, the open-source human body posture recognition item openpos or other similar methods can be used for implementation.

Through the embodiment, on one hand, the audience can learn the dance movements in the video more intuitively, on the other hand, the audience can obtain the objective evaluation of the movements of the audience, and the dance learning method is very convenient for a dancer to learn. In addition, the embodiment can also be applied to application scenarios such as fitness activity, gymnastic activity learning, and the like.

Example 7

The invention can also be applied to multi-person interaction scenes. The multi-person interaction means that at least more than 3 users respectively shoot self-timer videos (audience videos) as live videos (first videos) of other users, namely, one audience can see the self-timer videos and simultaneously can see the live videos of other users participating in the multi-person interaction. In such a multi-person interactive scene, effects such as a photo-watching meeting or a laughing challenge can be realized.

Taking a film-watching conference as an example, a plurality of viewers can watch one recorded video (movie, etc.) or a live video at the same time. In the process, people can realize real-time video communication interaction and share watching experience instantly. An experience of watching video with all participants of a multi-person interactive scene is achieved.

The laughing challenge is similar to a film watching, the participants can see video images of other users in the interactive scene, and after all users are ready to finish (the self-timer video comprises face images), the laughing challenge can be started, at the moment, if the face expression identifies that a certain user laughs, the user loses the challenge, and other participants win.

The recognition of the facial expression of the user in this embodiment can also be implemented by using the open-source human gesture recognition project openpos or other similar methods as described in embodiments 5 and 6.

In this specification, the invention has been described with reference to specific embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims

1. A video interaction method, comprising the steps of:

playing the first video;

shooting and playing the video of the audience;

2. The video interaction method of claim 1,

the operation instruction is a play control instruction for controlling the first video to be played.

3. The video interaction method of claim 2,

identifying a viewer hand image in said viewer video;

4. The video interaction method of claim 1,

5. The video interaction method of claim 4,

the judgment of whether there is an object image in the audience video generates a judgment result, specifically:

6. The video interaction method of claim 5,

a timer is included in the viewer's video that is played,

7. The video interaction method of claim 1,

at least one of the first video and the viewer video includes a social function interactive interface,

8. The video interaction method of claim 7,

identifying viewer actions in said viewer video;

9. The video interaction method of claim 1,

the first video comprises a character dynamic picture;

10. The video interaction method of claim 9,

the playing of the first video specifically includes:

11. The video interaction method of claim 10,

the playing of the first video specifically comprises the following steps:

playing a first video comprising a character dynamic picture;

playing the figure posture demonstration picture;

12. The video interaction method of claim 1,

the first video playing is live broadcasting or recorded broadcasting.

13. A video interaction device comprising a processor, a memory, a display and a camera, said memory having stored thereon a computer program, wherein said computer program, when executed by said processor, implements the video interaction method of any one of claims 1 to 12.

14. The video interaction device of claim 13, wherein the first video and the viewer video are simultaneously displayed on a same display.