CN113840177A

CN113840177A - Live broadcast interaction method and device, storage medium and electronic equipment

Info

Publication number: CN113840177A
Application number: CN202111107176.3A
Authority: CN
Inventors: 郭昀霖
Original assignee: Guangzhou Boguan Information Technology Co Ltd
Current assignee: Guangzhou Boguan Information Technology Co Ltd
Priority date: 2021-09-22
Filing date: 2021-09-22
Publication date: 2021-12-24

Abstract

The disclosure provides a live broadcast interaction method and device, a storage medium and electronic equipment, and relates to the technical field of image processing. The live broadcast interaction method comprises the following steps: detecting a gesture outline according to the collected live video stream; recognizing the detected gesture outline through a pre-trained gesture recognition machine learning model taking a pre-configured gesture skeleton data set as a prior condition, and determining a gesture recognition result corresponding to the gesture outline; determining a triggering instruction of the interaction function corresponding to the gesture recognition result based on semantic analysis; and sending the trigger instruction of the interaction function corresponding to the gesture recognition result and the live video stream to the audience terminal through the live broadcast server, so that the audience terminal executes the corresponding interaction function according to the trigger instruction when playing the live video stream. The gesture recognition method and the gesture recognition device have the advantages that the gesture skeleton data set which is configured in advance is used as the prior condition, the gesture of the live broadcast video stream is recognized through the trained machine learning model, and the accuracy and the efficiency of gesture recognition in the live broadcast interaction process are improved.

Description

Live broadcast interaction method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a live broadcast interaction method and apparatus, a storage medium, and an electronic device.

Background

With the development of the internet and the rise of online video and audio platforms, live webcasting has become a popular form of instant video and audio entertainment.

In the process of network live broadcast, audiences can watch the live broadcast content of the main broadcast through audience terminals and interact with the main broadcast in the modes of characters or giving virtual gifts and the like, the main broadcast can show thank you to the audiences after receiving the virtual gifts given by the audiences, and under the ordinary condition, the main broadcast can interact with the audiences in a language mode and thank you to the audiences.

Disclosure of Invention

The present disclosure provides a live broadcast interaction method, device, storage medium and electronic device, so as to improve accuracy and efficiency of gesture recognition in live broadcast interaction.

According to a first aspect of the present disclosure, a live broadcast interaction method is provided, which is applied to a main broadcast terminal, and the method includes:

detecting a gesture outline according to the collected live video stream;

recognizing the detected gesture outline through a pre-trained gesture recognition machine learning model taking a pre-configured gesture skeleton data set as a prior condition, and determining a gesture recognition result corresponding to the gesture outline;

determining a triggering instruction of the interaction function corresponding to the gesture recognition result based on semantic analysis;

and sending the trigger instruction of the interaction function corresponding to the gesture recognition result and the live video stream to the audience terminal through the live broadcast server, so that the audience terminal executes the corresponding interaction function according to the trigger instruction when playing the live video stream.

According to a second aspect of the present disclosure, there is also provided a live broadcast interaction method applied to a viewer terminal, the method including:

receiving a trigger instruction and a live broadcast video stream, which are sent by a main broadcast terminal through a live broadcast server and correspond to an interaction function, of a gesture recognition result; the trigger instruction of the gesture recognition result corresponding to the interactive function is the anchor terminal, and a gesture outline is detected according to the collected live video stream; recognizing the detected gesture outline through a pre-trained gesture recognition machine learning model taking a pre-configured gesture skeleton data set as a prior condition, determining a gesture recognition result corresponding to the gesture outline, and performing semantic analysis on the gesture recognition result corresponding to the gesture outline to determine;

and when the live video stream is played, executing the corresponding interactive function according to the trigger instruction of the interactive function corresponding to the gesture recognition result.

According to a third aspect of the present disclosure, there is also provided a live broadcast interaction apparatus, applied to a main broadcast terminal, the apparatus including:

a detection module configured to detect a gesture profile from a captured live video stream;

the recognition module is configured to recognize the detected gesture outline through a pre-trained gesture recognition machine learning model with a pre-configured gesture skeleton data set as a prior condition, and determine a gesture recognition result corresponding to the gesture outline;

the determining module is configured to determine that the gesture recognition result corresponds to a trigger instruction of the interactive function based on semantic analysis;

and the sending module is configured to send the trigger instruction of the interaction function corresponding to the gesture recognition result and the live video stream to the audience terminal through the live broadcast server so that the audience terminal executes the corresponding interaction function according to the trigger instruction when playing the live video stream.

According to a fourth aspect of the present disclosure, there is also provided a live broadcast interactive device applied to a viewer terminal, the device including:

According to a fifth aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the live interaction method of the above-described embodiments.

According to a sixth aspect of the present disclosure, there is provided an electronic device comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the live interaction method of the above embodiments via execution of the executable instructions.

The technical scheme of the disclosure has the following beneficial effects:

the live broadcast interaction implementation scheme can be used for recognizing the detected gesture outline through a pre-trained gesture recognition machine learning model with a pre-configured gesture skeleton data set as a prior condition, determining a gesture recognition result corresponding to the gesture outline, and on the basis of determining the interaction gesture, recognizing the interaction gesture type by combining pose information, so that the accuracy and efficiency of gesture recognition in live broadcast interaction are improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is apparent that the drawings in the following description are only some embodiments of the present disclosure, and that other drawings can be obtained from those drawings without inventive effort for a person skilled in the art.

Fig. 1 shows a schematic architecture diagram of a live interactive system in the present exemplary embodiment;

FIG. 2 shows a flow diagram of a live interaction method in the present exemplary embodiment;

FIG. 3 is a schematic flow chart illustrating a process for determining a gesture recognition result corresponding to a determined gesture profile in the exemplary embodiment;

FIG. 4 is a diagram illustrating a live interface in the exemplary embodiment;

FIG. 5 shows a flow diagram of a live interaction method in the present exemplary embodiment;

fig. 6 is a schematic structural diagram of a live interaction device in the present exemplary embodiment;

fig. 7 is a schematic structural diagram of a live interaction device in the present exemplary embodiment;

fig. 8 shows a schematic structural diagram of an electronic device in the present exemplary embodiment.

Detailed Description

Exemplary embodiments will now be described more fully with reference to the accompanying drawings. The exemplary embodiments, however, may be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other apparatus, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the steps. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

In the related art, the webcast technology provides a realization scheme for enabling a main broadcasting terminal and audience terminals to interact through an interactive special effect. In the live broadcast process of the network, the audience terminal can interact with the anchor terminal in a mode of presenting a virtual gift through characters (such as a bullet screen), and after the anchor terminal receives the interaction information of the audience terminal, the anchor terminal can make an interaction gesture, such as a heart gesture or a fist holding gesture, and at the moment, the audience terminal can display an interaction special effect corresponding to the interaction gesture.

However, in the interaction scheme provided in the related art, generally, the recognition process of the interaction gesture is long in time consumption and low in efficiency, and the display time delay of the interaction special effect may occur, which affects the viewing experience of the user.

In view of the foregoing problems, exemplary embodiments of the present disclosure provide a live broadcast interaction scheme, which is directed to a live broadcast service scenario. Application scenarios of the live interaction scheme include but are not limited to: in the network live broadcast process, a main broadcast terminal and at least one audience terminal are respectively connected through a live broadcast server, when the main broadcast terminal detects that a trigger condition meeting gesture identification is met, the live broadcast video stream of the main broadcast terminal can be collected, the gesture of the main broadcast in the live broadcast video stream is identified to obtain a gesture identification result, if the gesture of the main broadcast is determined to be an interactive gesture, a trigger instruction of the gesture identification result corresponding to an interactive function can be sent to the live broadcast server, the trigger instruction of the gesture identification result corresponding to the interactive function and the live broadcast video stream are sent to the audience terminal through the live broadcast server, when the audience terminal plays the live broadcast video stream, the corresponding interactive function can be executed according to the trigger instruction of the gesture identification result corresponding to the interactive function, wherein the trigger condition of the gesture identification can be interactive information received by the main broadcast terminal, and the interactive information can be a virtual object generated after the audience terminal detects that a virtual live broadcast gift is presented by a user The present invention is not limited to the above, and it can be understood that the audience terminal can send the live broadcast interactive message to the anchor terminal through the live broadcast server.

In order to implement the above live interactive scheme, exemplary embodiments of the present disclosure provide a live interactive system. Fig. 1 shows a schematic architecture diagram of the live interactive system. As shown in fig. 1, the live interactive system 100 may include an anchor terminal 110, a viewer terminal 120, and a live server 130. The live server 130 is a background server deployed by a service provider (such as a live application platform, a video application platform, or other third-party application platform). Anchor terminal 110 may be a terminal device used by an anchor initiating a webcast, viewer terminal 120 may be a terminal device used by a viewer watching the webcast, and the terminal device may be a smart phone, a personal computer, a tablet computer, or the like. The live server 130 may establish a connection with the anchor terminal 110 and the viewer 120, respectively, through a network to implement webcasting.

It should be understood that the live server 130 may be a single server or a cluster formed by multiple servers, and the present disclosure is not limited to the specific architecture of the live server 130.

The following explains the direct broadcast interaction scheme from the perspective of the anchor terminal. Fig. 2 shows an exemplary flow of a method for performing a live interaction by an anchor terminal, comprising:

step S201, detecting a gesture outline according to the collected live video stream.

In the embodiment of the disclosure, the anchor terminal can respond to the live broadcast initiating operation of the anchor, start the camera to collect live broadcast picture content in real time to generate live broadcast video stream, and can send the live broadcast video stream to the live broadcast server, and the live broadcast server can send the live broadcast video stream to the audience terminal. After the anchor terminal receives the live broadcast interaction message sent by the audience terminal to the anchor terminal through the live broadcast server, the anchor terminal can detect a gesture outline according to the collected live broadcast video stream so as to determine whether the anchor makes an interaction gesture.

Step S202, recognizing the detected gesture outline through a pre-trained gesture recognition machine learning model taking a pre-configured gesture skeleton data set as a prior condition, and determining a gesture recognition result corresponding to the gesture outline.

In the embodiment of the present disclosure, the gesture recognition machine learning model is used to recognize gesture types to determine whether a user makes an interactive gesture, the gesture skeleton data set includes a preconfigured relationship data table, the relationship data table reflects a corresponding relationship between the interactive gesture and gesture pose information, and a specific interactive gesture type can be determined by using the preconfigured gesture skeleton data set to obtain a gesture recognition result.

Step S203, determining a trigger instruction of the gesture recognition result corresponding to the interactive function based on semantic analysis.

In the embodiment of the disclosure, semantic analysis may be performed on the gesture recognition result, and a trigger instruction of the interactive function corresponding to the gesture recognition result is determined, so as to trigger execution of the interactive function at the audience terminal.

And step S204, sending the trigger instruction of the interaction function corresponding to the gesture recognition result and the live video stream to the audience terminal through the live broadcast server, so that the audience terminal executes the corresponding interaction function according to the trigger instruction when playing the live video stream.

In summary, the live broadcast interaction method provided by the embodiment of the present disclosure can detect a gesture profile according to a collected live broadcast video stream, and perform gesture recognition by using the gesture profile, so as to reduce data processing amount and improve gesture recognition efficiency; and recognizing the detected gesture outline through a pre-trained gesture recognition machine learning model taking a pre-configured gesture skeleton data set as a prior condition, and determining a gesture recognition result corresponding to the gesture outline, wherein the interactive gesture type can be recognized by combining pose information on the basis of determining the interactive gesture, so that the accuracy and efficiency of gesture recognition in live interaction are improved.

In step S201, the anchor terminal may detect a gesture profile according to the captured live video stream.

In the embodiment of the disclosure, in order to reduce the waste of data processing resources of the anchor terminal, after determining that live broadcasting starts, the anchor terminal may collect live video streams of the anchor terminal and start a gesture recognition function when detecting that a trigger condition of gesture recognition is satisfied. The gesture recognition trigger condition may be a live broadcast interactive message received by the anchor terminal and sent by the audience terminal through the live broadcast server, where the live broadcast interactive message may be a virtual article presenting message generated after the audience terminal detects a virtual gift presenting operation of the user, or the live broadcast interactive message may be a text interactive message in which the audience terminal detects that the user performs live broadcast, which is not limited in the embodiment of the present disclosure.

In an alternative embodiment, the process of sending the live interactive message by the viewer terminal through the live server may include: the method comprises the steps that after a spectator terminal detects virtual gift presenting operation, a virtual article presenting message is generated and sent to a live broadcast server, the live broadcast server can send the virtual article presenting message to an anchor terminal, and the anchor terminal can detect a gesture outline from live broadcast video streams collected in real time after receiving the virtual article presenting message sent by the spectator terminal.

In an alternative embodiment, the anchor terminal may detect a gesture profile from the captured live video stream. Specifically, the method may include: preprocessing the collected live video stream to obtain a human body contour binary image; and detecting the gesture outline in the human body outline binary image by using the gesture outline model.

The method comprises the following steps of preprocessing the collected live video stream, and obtaining a human body contour binary image, wherein the process of preprocessing the collected live video stream can comprise the following steps: acquiring each frame of live broadcast picture image in a live broadcast video stream; and preprocessing each frame of live broadcast picture image according to the time sequence of each frame of live broadcast picture image in the live broadcast video stream to obtain a human body contour binary image.

In an alternative embodiment, the process of preprocessing the live-action picture image to obtain the binary image of the human body contour may include: the live broadcast picture is input into the skin color detection model, so that the skin color detection model identifies the human body area in the live broadcast picture, deletes the background area except the human body area in the live broadcast picture, and outputs the human body area image, wherein the skin color detection model can be a threshold skin color detection model based on a YCbCr color space, or the skin color detection model can be a skin detection model based on the YCbCr color space and an elliptical skin model, and the like.

The process of detecting the gesture outline in the human body outline binary image by using the gesture outline model can include: and performing hand recognition on the body contour binary image by using the gesture contour model to obtain a hand region image, and extracting the contour of the hand region image to obtain a gesture contour.

In an alternative embodiment, the process of using the gesture contour model to perform hand recognition on the binary image of the human body contour to obtain the hand region image may include: and inputting the human body contour binary image into a hand recognition model, so that the hand recognition model recognizes and divides a hand region in the human body contour binary image, and outputting a hand region image. The hand recognition model may be a BlazePalm model or a neural network model.

In step S202, the anchor terminal may identify the detected gesture contour through a pre-trained gesture recognition machine learning model with a pre-configured gesture skeleton data set as a priori condition, and determine a gesture recognition result corresponding to the gesture contour.

In an optional implementation manner, after the anchor terminal detects a gesture contour, the anchor terminal may identify the detected gesture contour through a pre-trained gesture recognition machine learning model with a pre-configured gesture skeleton data set as a priori condition, and determine a gesture recognition result corresponding to the gesture contour. Specifically, as shown in fig. 3, the process of determining the gesture recognition result corresponding to the gesture contour according to the gesture contour may include:

step S301, recognizing the detected gesture outline through a pre-trained gesture recognition machine learning model, and obtaining a gesture outline type recognition result corresponding to the gesture outline.

In the embodiment of the present disclosure, the gesture recognition machine learning model may be trained in advance, specifically, the process of training the gesture recognition machine learning model may include: the method comprises the steps of obtaining a sample gesture image set, dividing the sample gesture image set into a training set verification set, carrying out iterative training on a gesture recognition machine learning model to be trained by using sample gesture images in the training set until the gesture recognition machine learning model converges, and optimizing the gesture recognition machine learning model by using the verification set to obtain the trained gesture recognition machine learning model. The sample gesture image set may be composed of a plurality of sample images of various gestures, and the sample gesture image set may be formed by using an existing gesture image data set, or by collecting a large number of gesture images to form the sample gesture image set. The gesture recognition machine learning model may be a deep learning neural network model, a convolutional neural network model, a cyclic neural network, and the like, which is not limited in the embodiments of the present disclosure.

In step S301, the anchor terminal may recognize the detected gesture outline through a pre-trained gesture recognition machine learning model, and obtain a gesture outline type recognition result corresponding to the gesture outline.

In an optional implementation manner, the anchor terminal may recognize the detected gesture outline through a pre-trained gesture recognition machine learning model, and the process of obtaining a gesture outline type recognition result corresponding to the gesture outline may include: after the anchor terminal detects the gesture outline according to the collected live video stream, the anchor terminal can acquire the gesture outline corresponding to the current frame live video image, calls a pre-trained gesture recognition machine learning model, recognizes the gesture outline, and obtains a gesture outline type recognition result corresponding to the gesture outline, wherein the gesture outline type recognition result can indicate the hand motion corresponding to the gesture outline as an interactive gesture, and the gesture outline type recognition result can also indicate the hand motion corresponding to the gesture outline as a non-interactive gesture. The interactive gesture may be a hearts gesture, a kiss gesture, or a fist holding gesture, etc. made by the anchor. The current frame live broadcast picture image is a frame of live broadcast picture image of which the gesture outline is detected in the live broadcast video stream collected by the anchor terminal.

It should be noted that, in the embodiment of the present disclosure, if the anchor terminal determines that the gesture contour type recognition result indicates that the hand motion corresponding to the gesture contour is a non-interactive gesture, it may be determined that the anchor does not make an interactive gesture, the current flow is ended, and gesture contour type recognition is performed on the gesture contour corresponding to the next frame of live broadcast picture image; if the anchor terminal determines that the gesture outline type recognition result indicates that the hand action corresponding to the gesture outline is the interactive gesture, the step S302 is continuously executed, so that the waste of data processing resources of the anchor terminal can be reduced.

Step S302, determining gesture pose information to be recognized of the gesture outline type recognition result.

In the embodiment of the disclosure, the gesture pose information is position information and angle information of a hand feature point in a reference coordinate system, where the hand feature point may be a hand joint point, the reference coordinate system may be a three-dimensional coordinate system established with a target position of the hand as a coordinate origin, the target position may be a position of the wrist, the position information may be position coordinate information of the hand feature point in the reference coordinate system, the angle information may be angle information of a hand feature line in the reference coordinate system, and the hand feature line is a connection line of the hand feature point and the coordinate origin of the reference coordinate system.

In an optional implementation manner, the anchor terminal may determine gesture pose information to be recognized of the gesture contour type recognition result, specifically, the gesture pose information to be recognized includes: detecting a hand characteristic point and a hand characteristic line in the hand area image corresponding to the gesture outline type recognition result; determining position information of the hand characteristic point in a reference coordinate system and angle information of the hand characteristic line in the reference coordinate system; and obtaining gesture pose information to be recognized based on the combination of the position information of the hand characteristic points and the angle information of the hand characteristic lines, wherein the reference coordinate system is a three-dimensional coordinate system, in the gesture pose information F (x, y, z | w), x represents the position coordinate value of the hand on an x axis in the reference coordinate system, y represents the position coordinate value of the hand on a y axis in the reference coordinate system, z represents the position coordinate value of the hand on a z axis in the reference coordinate system, and w represents the angle information of the hand characteristic lines in the reference coordinate system.

The process of detecting the hand feature point and the hand feature line in the hand region image corresponding to the gesture contour type recognition result may include: determining a gesture outline corresponding to the gesture outline type recognition result, acquiring a hand area image associated with the gesture outline corresponding to the gesture outline type recognition result, recognizing the hand area image to determine hand joint points, establishing a three-dimensional coordinate system by taking the wrist position as the coordinate origin, and connecting the hand joint points and the coordinate origin to obtain a hand characteristic line.

The process of obtaining the gesture pose information to be recognized based on the combination of the position information of the hand feature points and the angle information of the hand feature lines may include: the anchor terminal can combine the position information of each hand joint point and the angle information of the corresponding hand characteristic line to obtain the pose information of each hand joint point, and further, the anchor terminal can acquire the pose information of each hand joint point and determine the set of the pose information of each hand joint point as the gesture pose information to be recognized.

Step S303, a relation data table configured in advance in the gesture skeleton data set is obtained.

In the embodiment of the present disclosure, the gesture skeleton data set is a set of pre-configured interactive gestures and corresponding pose information, the corresponding relationship between the interactive gestures and the gesture pose information is stored by a relationship data table, and the relationship data table reflects the corresponding relationship between the interactive gestures and the gesture pose information.

Wherein the process of pre-configuring the gesture skeletal data set may comprise: acquiring a plurality of sample interactive gesture images, determining gesture pose information of an interactive gesture in each sample interactive gesture image, determining a preset deviation value of the gesture pose information of each interactive gesture, establishing a relational data table of the interactive gesture, the gesture pose information and the preset deviation value, and obtaining a gesture skeleton data set { C₁(F),C₂(F),C₃(F)…C_n(F) Where C represents a type of interactive gesture, C₁(F) Representing pose information corresponding to interactive gestures of the first kind, C₂(F) The pose information corresponding to the second type of interactive gesture is represented, and the preset deviation value can be determined based on actual needs, which is not limited by the embodiment of the disclosure. For example, the first type of interactive gesture may be a bixin gesture and the second type of interactive gesture may be a fist holding gesture.

Wherein, the process of determining the pose information of the interaction gesture in each sample interaction gesture image may include: inputting the sample interactive gesture image into a hand recognition model so that the hand recognition model recognizes and divides a hand region in the sample interactive gesture image, outputting a sample hand region image, and determining a hand feature point and a hand feature line in the sample hand region image; determining position information of the hand characteristic point in a reference coordinate system and angle information of the hand characteristic line in the reference coordinate system; and obtaining the pose information of the interactive gesture based on the combination of the position information of the hand characteristic points and the angle information of the hand characteristic lines.

In an alternative embodiment, the anchor terminal may obtain a preconfigured relational data table in the gesture skeletal data set.

Step S304, according to the corresponding relation between the interactive gesture and the gesture pose information in the obtained relation data table, taking the interactive gesture corresponding to the gesture pose information to be recognized as a gesture recognition result.

In an optional implementation manner, the process of the anchor terminal taking the interactive gesture corresponding to the gesture pose information to be recognized as the gesture recognition result according to the corresponding relationship between the interactive gesture and the gesture pose information in the obtained relationship data table may include:

comparing the gesture pose information to be recognized with each gesture pose information in the relation data table to obtain a comparison result; if the comparison result indicates that target gesture pose information exists in the gesture pose information, taking the interactive gesture corresponding to the target gesture pose information as a gesture recognition result based on the corresponding relationship between the interactive gesture and the gesture pose information, and executing the step S203; if the comparison result indicates that the target gesture pose information does not exist in the gesture pose information, it can be determined that the interactive gesture is not the interactive gesture needing to display the interactive function, the process is ended, and the identification precision of the interactive gesture corresponding to the interactive function can be further improved, wherein the target gesture pose information is the gesture pose information, of which the difference value with the gesture pose information to be identified is smaller than or equal to the preset deviation value, in the gesture pose information. Because the process of the identification of the interactive gesture and the identification process of the interactive gesture type can be separately carried out, the gesture identification efficiency can be further improved, the interactive special effect can be rapidly and accurately displayed after the audience terminal makes the interactive gesture on the anchor, and the live broadcast watching experience of the user is improved.

In step S203, the anchor terminal may determine that the gesture recognition result corresponds to a trigger instruction of the interactive function based on semantic analysis.

In the embodiment of the present disclosure, the interactive function may include an interactive special effect, and the interactive function may also include sending thank you text information including user information of the audience terminal in a live chat room, or prompting or triggering a microphone or other interactive functions with a target audience terminal when detecting that a virtual resource sent by the audience terminal meets a certain requirement, where the target audience terminal may be an audience terminal sending an interactive message.

In an optional implementation manner, the process of the anchor terminal determining that the gesture recognition result corresponds to the trigger instruction of the interactive function based on the semantic analysis may include: performing semantic analysis on the gesture recognition result, and determining semantic information corresponding to the gesture recognition result; determining an interactive function to be triggered corresponding to the semantic information; and determining a triggering instruction of the interactive function to be triggered corresponding to the gesture recognition result.

In an optional implementation manner, the process of determining a trigger instruction of the interactive function to be triggered corresponding to the gesture recognition result may include: acquiring a pre-established corresponding relation table of interactive functions and interactive function triggering instructions; determining an interactive function triggering instruction corresponding to the interactive function to be triggered according to a corresponding relation table of the interactive function and the interactive function triggering instruction; wherein, the corresponding relation table reflects the corresponding relation between the interactive function and the interactive function triggering instruction.

For example, if the gesture recognition result indicates that the interactive gesture is a heart-to-heart gesture, the corresponding semantic information "heart-to-heart" in the gesture recognition result may be obtained, and the interactive function to be triggered corresponding to the semantic information "heart-to-heart" is determined to be a heart-to-heart interactive special effect, and then a pre-established correspondence table of the interactive function and the interactive function trigger instruction may be searched for, and the interactive function trigger instruction corresponding to the heart-to-heart interactive special effect may be obtained.

In step S204, the anchor terminal may send the trigger instruction of the interaction function corresponding to the gesture recognition result and the live video stream to the audience terminal through the live server, so that the audience terminal executes the corresponding interaction function according to the trigger instruction when playing the live video stream.

In the embodiment of the disclosure, after receiving the trigger instruction of the interactive function corresponding to the gesture recognition result and the live video stream sent by the anchor terminal through the live server, the viewer terminal may execute the corresponding interactive function according to the trigger instruction according to the gesture recognition result and the trigger instruction of the interactive function when the live video stream is played.

When the viewer terminal plays the live video stream, the process of executing the corresponding interactive function according to the trigger instruction may include: and determining the corresponding interactive function according to the trigger instruction of the interactive function corresponding to the gesture recognition result, and displaying the interactive function in a live broadcast picture of the live broadcast video stream.

In an optional implementation manner, the interactive function includes an interactive special effect, and before the anchor terminal sends the trigger instruction of the interactive function corresponding to the gesture recognition result and the live video stream to the audience terminal through the live server, the anchor terminal may further determine, according to position information of the gesture contour in a live frame of the live video stream, display position information of the interactive special effect corresponding to the gesture recognition result in the live frame of the audience terminal.

Then send the trigger instruction of the interactive function corresponding to the gesture recognition result and the live video stream to the audience terminal through the live server, so that the audience terminal executes the corresponding interactive function according to the trigger instruction when playing the live video stream, including: and sending the trigger instruction of the interactive special effect corresponding to the gesture recognition result, the display position information and the live broadcast video stream to the audience terminal through the live broadcast server, so that the audience terminal displays the interactive special effect according to the trigger instruction of the interactive special effect on the display position information when playing the live broadcast video stream.

When the audience terminal plays the live video stream, the process of displaying the interactive special effect at the display position information according to the trigger instruction of the interactive special effect may include: determining a corresponding interactive special effect identifier according to a trigger instruction of the interactive special effect; and displaying the interactive special effect corresponding to the interactive special effect identifier in a live broadcast picture, and displaying a display position corresponding to the position information. As shown in fig. 4, fig. 4 shows a live interface diagram of a viewer terminal, wherein after an interactive gesture is made by a anchor, an interactive special effect 402 is displayed in a hand area 401 of the anchor.

The following explains the live interaction scheme from the viewpoint of the viewer terminal. Fig. 5 shows an exemplary flow of a method for performing live interaction by a viewer terminal, which may include:

and S501, receiving a trigger instruction and a live video stream of an interactive function, which are sent by the anchor terminal through the live server and correspond to the gesture recognition result.

In the embodiment of the disclosure, the trigger instruction of the gesture recognition result corresponding to the interactive function is the anchor terminal, and the gesture outline is detected according to the collected live video stream; the gesture recognition method comprises the steps of recognizing a detected gesture outline through a pre-trained gesture recognition machine learning model with a pre-configured gesture skeleton data set as a prior condition, determining a gesture recognition result corresponding to the gesture outline, and performing semantic analysis on the gesture recognition result corresponding to the gesture outline to determine the gesture outline.

In step S501, the audience terminal may be the anchor terminal, and the gesture recognition result corresponds to the trigger instruction of the interactive function and the live video stream, which are sent by the live server. And the triggering instruction of the interaction function corresponding to the gesture recognition result is used for indicating the audience to execute the interaction function corresponding to the gesture recognition result when the live video stream is played.

And S502, when the live video stream is played, executing a corresponding interactive function according to a trigger instruction of the interactive function corresponding to the gesture recognition result.

In step S502, when playing the live video stream, the viewer terminal may execute the corresponding interactive function according to the trigger instruction of the interactive function corresponding to the gesture recognition result.

In an optional implementation manner, when the viewer terminal plays a live video stream, the process of executing the corresponding interactive function according to the trigger instruction of the interactive function corresponding to the gesture recognition result may include: and determining the corresponding interactive function according to the trigger instruction of the interactive function corresponding to the gesture recognition result, and displaying the interactive function in a live broadcast picture of the live broadcast video stream.

It can be understood that, before receiving a trigger instruction and a live video stream, which are sent by an anchor terminal through a live server and correspond to an interaction function according to a gesture recognition result, the audience terminal may send a live interactive message to the anchor terminal, where the live interactive message may be a virtual article presentation message generated after the audience terminal detects a virtual gift presentation operation of a user, or the live interactive message may be a message generated by the audience terminal detecting a text interactive operation of the user on live content.

In an optional implementation manner, the interactive function includes an interactive special effect, and the process that the audience terminal receives a trigger instruction and a live video stream, where the trigger instruction and the live video stream are sent by the anchor terminal through the live server, and the gesture recognition result corresponds to the interactive function may include: receiving a trigger instruction of an interactive special effect corresponding to a gesture recognition result, display position information of the interactive special effect in a live broadcast picture of a viewer terminal and a live broadcast video stream, which are sent by a main broadcast terminal through a live broadcast server; the anchor terminal determines display position information according to position information of the gesture outline in a live broadcast picture of the live broadcast video stream;

the process of executing the corresponding interactive function according to the trigger instruction of the interactive function corresponding to the gesture recognition result when the viewer terminal plays the live video stream may include: determining an interaction special effect corresponding to a trigger instruction of an interaction function; and when the live video stream is played, displaying the interactive special effect on the display position information according to the trigger instruction of the interactive special effect.

The embodiment of the present disclosure provides a live broadcast interaction device, which is applied to a main broadcast terminal, as shown in fig. 6, a live broadcast interaction device 600 device includes:

a detection module 601 configured to detect a gesture profile from a captured live video stream;

the recognition module 602 is configured to recognize the detected gesture contour through a pre-trained gesture recognition machine learning model with a pre-configured gesture skeleton data set as a priori condition, and determine a gesture recognition result corresponding to the gesture contour;

a determining module 603 configured to determine that the gesture recognition result corresponds to a trigger instruction of the interactive function based on the semantic analysis;

the sending module 604 is configured to send the trigger instruction of the interaction function corresponding to the gesture recognition result and the live video stream to the audience terminal through the live server, so that the audience terminal executes the corresponding interaction function according to the trigger instruction when playing the live video stream.

In an alternative embodiment, the identifying module 602 is configured to:

recognizing the detected gesture outline through a pre-trained gesture recognition machine learning model to obtain a gesture outline type recognition result corresponding to the gesture outline;

determining gesture pose information to be recognized of a gesture contour type recognition result;

acquiring a relation data table configured in advance in a gesture skeleton data set, wherein the relation data table reflects the corresponding relation between an interactive gesture and gesture pose information;

and taking the interactive gesture corresponding to the gesture pose information to be recognized as a gesture recognition result according to the corresponding relation between the interactive gesture and the gesture pose information in the obtained relation data table.

In an alternative embodiment, the identifying module 602 is configured to:

detecting a hand characteristic point and a hand characteristic line in the hand area image corresponding to the gesture outline type recognition result;

determining position information of the hand characteristic point in a reference coordinate system and angle information of the hand characteristic line in the reference coordinate system;

and obtaining gesture pose information to be recognized based on the combination of the position information of the hand characteristic points and the angle information of the hand characteristic lines.

In an alternative embodiment, the identifying module 602 is configured to:

comparing the gesture pose information to be recognized with each gesture pose information in the relation data table to obtain a comparison result;

if the comparison result indicates that target gesture pose information exists in the gesture pose information, taking the interactive gesture corresponding to the target gesture pose information as a gesture recognition result based on the corresponding relation between the interactive gesture and the gesture pose information; the target gesture pose information is gesture pose information of which the difference value with the gesture pose information to be recognized in the gesture pose information is smaller than or equal to a preset deviation value.

In an alternative embodiment, the detection module 601 is configured to:

preprocessing the collected live video stream to obtain a human body contour binary image;

and detecting the gesture outline in the human body outline binary image by using the gesture outline model.

In an alternative embodiment, as shown in fig. 6, the live interactive apparatus 600 further includes: an acquisition module 605 configured to:

and when detecting that the triggering condition of gesture recognition is met, acquiring the live video stream of the anchor terminal.

In an alternative embodiment, the determining module 603 is configured to:

performing semantic analysis on the gesture recognition result, and determining semantic information corresponding to the gesture recognition result;

determining an interactive function to be triggered corresponding to the semantic information;

and determining a triggering instruction of the interactive function to be triggered corresponding to the gesture recognition result.

In an alternative embodiment, the determining module 603 is configured to:

acquiring a pre-established corresponding relation table of interactive functions and interactive function triggering instructions; the corresponding relation table reflects the corresponding relation between the interactive function and the interactive function triggering instruction;

and determining an interactive function triggering instruction corresponding to the interactive function to be triggered according to the corresponding relation table of the interactive function and the interactive function triggering instruction.

In an alternative embodiment, the interactive function includes an interactive special effect, and the determining module 603 is further configured to:

determining display position information of an interactive special effect corresponding to the gesture recognition result in a live broadcast picture of the audience terminal according to position information of the gesture outline in the live broadcast picture of the live broadcast video stream;

a sending module 604 configured to:

and sending the trigger instruction of the interactive special effect corresponding to the gesture recognition result, the display position information and the live broadcast video stream to the audience terminal through the live broadcast server, so that the audience terminal displays the interactive special effect according to the trigger instruction of the interactive special effect on the display position information when playing the live broadcast video stream.

The embodiment of the present disclosure provides a live broadcast interaction device, which is applied to audience terminals, as shown in fig. 7, the live broadcast interaction device 700 includes:

the receiving module 701 is configured to receive a trigger instruction and a live video stream, which are sent by the anchor terminal through the live server and correspond to the interaction function in the gesture recognition result; the gesture recognition result is a trigger instruction corresponding to the interaction function, the trigger instruction is an anchor terminal, and a gesture outline is detected according to the collected live video stream; recognizing the detected gesture outline through a pre-trained gesture recognition machine learning model taking a pre-configured gesture skeleton data set as a prior condition, determining a gesture recognition result corresponding to the gesture outline, and performing semantic analysis on the gesture recognition result corresponding to the gesture outline to determine;

and the playing module 702 is configured to execute the corresponding interactive function according to the trigger instruction of the interactive function corresponding to the gesture recognition result when the live video stream is played.

In an alternative embodiment, the interactive function includes an interactive special effect, and the receiving module 701 is configured to:

receiving a trigger instruction of an interactive special effect corresponding to a gesture recognition result, display position information of the interactive special effect in a live broadcast picture of a viewer terminal and a live broadcast video stream, which are sent by a main broadcast terminal through a live broadcast server; the anchor terminal determines display position information according to position information of the gesture outline in a live broadcast picture of the live broadcast video stream;

when the live video stream is played, executing the corresponding interactive function according to the trigger instruction of the interactive function corresponding to the gesture recognition result, wherein the method comprises the following steps:

determining an interaction special effect corresponding to a trigger instruction of an interaction function;

and when the live video stream is played, displaying the interactive special effect on the display position information according to the trigger instruction of the interactive special effect.

Exemplary embodiments of the present disclosure also provide a computer-readable storage medium, which may be implemented as a program product including program code for causing an electronic device to perform the steps according to various exemplary embodiments of the present disclosure described in the above section "live interactive apparatus" of this specification, when the program product is run on the electronic device. In one embodiment, the program product may be embodied as a portable compact disc read only memory (CD-ROM) and include program code, and may be run on an electronic device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Exemplary embodiments of the present disclosure also provide an electronic device, which may be a background server of a live platform. The electronic device is explained below with reference to fig. 8. It should be understood that the electronic device 800 shown in fig. 8 is only one example and should not bring any limitations to the functionality or scope of use of the embodiments of the present disclosure.

As shown in fig. 8, electronic device 800 is in the form of a general purpose computing device. The components of the electronic device 800 may include, but are not limited to: at least one processing unit 810, at least one memory unit 820, and a bus 830 that couples the various system components including the memory unit 820 and the processing unit 810.

Wherein the storage unit stores program code, which can be executed by the processing unit 810, to cause the processing unit 810 to perform the steps according to various exemplary embodiments of the present invention described in the above section "live interactive apparatus" of this specification. For example, processing unit 810 may perform the device steps shown in fig. 2, and so on.

The storage unit 820 may include volatile storage units such as a random access storage unit (RAM)821 and/or a cache storage unit 822, and may further include a read only storage unit (ROM) 823.

Storage unit 820 may also include a program/utility 824 having a set (at least one) of program modules 825, such program modules 825 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 830 may include a data bus, an address bus, and a control bus.

The electronic device 800 may also communicate with one or more external devices 900 (e.g., keyboard, pointing device, bluetooth device, etc.), which may be through an input/output (I/O) interface 840. The electronic device 800 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet) via the network adapter 850. As shown, the network adapter 850 communicates with the other modules of the electronic device 800 via the bus 830. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 800, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit, according to exemplary embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, apparatus, or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the following claims.

Claims

1. A live broadcast interaction method is applied to a main broadcast terminal, and is characterized by comprising the following steps:

detecting a gesture outline according to the collected live video stream;

2. The live broadcast interaction method of claim 1, wherein the identifying the detected gesture contour through a pre-trained gesture recognition machine learning model with a pre-configured gesture skeletal data set as a priori condition to determine a gesture recognition result corresponding to the gesture contour comprises:

determining gesture pose information to be recognized of the gesture contour type recognition result;

and according to the corresponding relation between the interactive gesture and the gesture pose information in the obtained relation data table, taking the interactive gesture corresponding to the gesture pose information to be recognized as a gesture recognition result.

3. The live interaction method as claimed in claim 2, wherein the determining gesture pose information to be recognized of the gesture contour type recognition result comprises:

detecting a hand characteristic point and a hand characteristic line in a hand area image corresponding to the gesture outline type recognition result;

determining position information of the hand feature points in a reference coordinate system and angle information of the hand feature lines in the reference coordinate system;

4. The live interaction method as claimed in claim 2, wherein the relationship data table includes interaction gestures and gesture pose information corresponding thereto; the step of taking the interactive gesture corresponding to the gesture pose information to be recognized as a gesture recognition result according to the corresponding relation between the interactive gesture and the gesture pose information in the obtained relation data table comprises the following steps:

comparing the gesture pose information to be recognized with each gesture pose information in a relation data table to obtain a comparison result;

if the comparison result indicates that target gesture pose information exists in the gesture pose information, taking the interactive gesture corresponding to the target gesture pose information as a gesture recognition result based on the corresponding relation between the interactive gesture and the gesture pose information; the target gesture pose information is gesture pose information of which the difference value with the gesture pose information to be recognized is smaller than or equal to a preset deviation value.

5. The live interaction method as claimed in claim 1, wherein the detecting a gesture profile from the captured live video stream comprises:

6. The live interaction method of claim 1, wherein prior to detecting a gesture profile from the captured live video stream, the method further comprises:

7. The live interaction method as claimed in claim 1, wherein the determining, based on the semantic analysis, the trigger instruction of the gesture recognition result corresponding to the interaction function includes:

8. The live broadcast interaction method of claim 7, wherein the determining a trigger instruction of the interaction function to be triggered corresponding to the gesture recognition result comprises:

9. The live interaction method of claim 1, wherein the interaction function comprises an interactive special effect, and wherein the method further comprises:

according to the position information of the gesture outline in the live broadcast picture of the live broadcast video stream, determining the display position information of the interactive special effect corresponding to the gesture recognition result in the live broadcast picture of the audience terminal;

send gesture recognition result to interactive function's trigger instruction and live broadcast video stream to audience's terminal through live broadcast server to make audience's terminal when broadcast live broadcast video stream, carry out corresponding interactive function according to trigger instruction, include:

and sending the trigger instruction of the interactive special effect corresponding to the gesture recognition result, the display position information and the live broadcast video stream to audience terminals through a live broadcast server, so that the audience terminals display the interactive special effect at the display position information according to the trigger instruction of the interactive special effect when the live broadcast video stream is played.

10. The utility model provides a live interactive installation, is applied to anchor terminal, its characterized in that, the device includes:

11. A live broadcast interaction method is applied to audience terminals, and is characterized by comprising the following steps:

12. The live broadcast interaction method of claim 11, wherein the interaction function includes an interaction special effect, and the receiving of the trigger instruction and the live broadcast video stream, which are sent by the anchor terminal through the live broadcast server and correspond to the gesture recognition result of the interaction function, comprises:

receiving a trigger instruction of an interactive special effect, display position information of the interactive special effect in a live broadcast picture of a viewer terminal and the live broadcast video stream, which are sent by the anchor terminal through the live broadcast server and correspond to the gesture recognition result; the anchor terminal determines the display position information according to the position information of the gesture outline in the live broadcast picture of the live broadcast video stream;

determining an interaction special effect corresponding to the trigger instruction of the interaction function;

13. A live broadcast interactive device applied to audience terminals is characterized by comprising:

the receiving module is configured to receive a trigger instruction and a live video stream, which are sent by the anchor terminal through the live server and correspond to the interaction function, of the gesture recognition result; the trigger instruction of the gesture recognition result corresponding to the interactive function is the anchor terminal, and a gesture outline is detected according to the collected live video stream; recognizing the detected gesture outline through a pre-trained gesture recognition machine learning model taking a pre-configured gesture skeleton data set as a prior condition, determining a gesture recognition result corresponding to the gesture outline, and performing semantic analysis on the gesture recognition result corresponding to the gesture outline to determine;

and the playing module is configured to execute the corresponding interactive function according to the trigger instruction of the interactive function corresponding to the gesture recognition result when the live video stream is played.

14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the live interaction method of any one of claims 1 to 9, and of any one of claims 11 to 12.

15. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the steps of the live interaction method of any one of claims 1 to 9, and any one of claims 11 to 12, via execution of the executable instructions.