CN115988232A

CN115988232A - Interaction method and device for live broadcast room virtual image, electronic equipment and storage medium

Info

Publication number: CN115988232A
Application number: CN202211725720.5A
Authority: CN
Inventors: 曾衍
Original assignee: Guangzhou Cubesili Information Technology Co Ltd
Current assignee: Guangzhou Cubesili Information Technology Co Ltd
Priority date: 2022-12-30
Filing date: 2022-12-30
Publication date: 2023-04-18

Abstract

The application relates to the field of live broadcasting, and provides a live broadcasting room virtual image interaction method, device, equipment and storage medium. The method and the device can realize interaction between the user based on voice input and the three-dimensional virtual image in the live broadcast room, and improve interaction convenience between the user in the live broadcast room and the three-dimensional virtual image in the live broadcast room. The method comprises the following steps: receiving interactive voice input by a target user to the three-dimensional virtual image in the live broadcast room, then obtaining a target action script matched with the interactive voice in the script database according to the interactive voice, then obtaining material resources corresponding to the target action script in the material resource library according to the target action script, and finally playing the response action of the three-dimensional virtual image in the live broadcast room according to the target action script and the material resources.

Description

Interaction method and device for live broadcast room virtual image, electronic equipment and storage medium

Technical Field

The present application relates to the field of live webcast technologies, and in particular, to a live webcast virtual image interaction method and apparatus, an electronic device, and a computer-readable storage medium.

Background

With the development of network live broadcast technology, live broadcast interaction modes for a main broadcast and audiences in a live broadcast room are more and more, and a three-dimensional virtual image is also applied to a live broadcast scene.

In the prior art, a live broadcast room can display three-dimensional virtual images in forms of characters, animals and the like, the three-dimensional virtual images can make response actions under interactive operation of a main broadcast and audiences, an interactive scheme provided by the prior art needs the interactive operation made by a user to be more complicated, and the technical problem of poor interaction convenience with the three-dimensional virtual images in the live broadcast room exists.

Disclosure of Invention

Based on this, it is necessary to provide a live broadcast room avatar interaction method, apparatus, electronic device and computer readable storage medium for solving the above technical problems.

In a first aspect, the application provides a live broadcast room avatar interaction method. The method comprises the following steps:

receiving interactive voice input by a target user to the three-dimensional virtual image in the live broadcast room;

acquiring a target action script matched with the interactive voice in a script database according to the interactive voice;

acquiring material resources corresponding to the target action script in a material resource library according to the target action script;

and playing the response action of the three-dimensional virtual image according to the target action script and the material resource.

In a second aspect, the application provides a live broadcast room avatar interaction device. The device comprises:

the voice receiving module is used for receiving interactive voice input by a target user to the three-dimensional virtual image in the live broadcast room;

the script acquisition module is used for acquiring a target action script matched with the interactive voice in a script database according to the interactive voice;

the resource acquisition module is used for acquiring material resources corresponding to the target action script in a material resource library according to the target action script;

and the action playing module is used for playing the response action of the three-dimensional virtual image according to the target action script and the material resources.

In a third aspect, the present application provides an electronic device. The electronic device comprises a memory and a processor, the memory stores a computer program, and the processor realizes the following steps when executing the computer program:

receiving interactive voice input by a target user to the three-dimensional virtual image in the live broadcast room; acquiring a target action script matched with the interactive voice in a script database according to the interactive voice; acquiring material resources corresponding to the target action script in a material resource library according to the target action script; and playing the response action of the three-dimensional virtual image according to the target action script and the material resources.

In a fourth aspect, the present application provides a computer-readable storage medium. The computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:

receiving interactive voice input by a target user to the three-dimensional virtual image in the live broadcast room; acquiring a target action script matched with the interactive voice in a script database according to the interactive voice; acquiring material resources corresponding to the target action script in a material resource library according to the target action script; and playing the response action of the three-dimensional virtual image according to the target action script and the material resource.

The interaction method, the device, the equipment and the storage medium of the virtual image in the live broadcast room receive the interactive voice input by the target user to the three-dimensional virtual image in the live broadcast room, then obtain the target action script matched with the interactive voice in the script database according to the interactive voice, then obtain the material resources corresponding to the target action script in the material resource library according to the target action script, and finally play the response action of the three-dimensional virtual image in the live broadcast room according to the target action script and the material resources. The scheme can realize the interaction between the user based on voice input and the three-dimensional virtual image in the live broadcast room, improves the interaction convenience between the user in the live broadcast room and the three-dimensional virtual image in the live broadcast room, can be applied to the interaction processing between the user and the three-dimensional virtual image in the meta-space live broadcast scene, enables the interaction between the user in the meta-space live broadcast and the three-dimensional virtual image to be more convenient, and also enriches the interaction mode between the user in the meta-space live broadcast and the three-dimensional virtual image.

Drawings

Fig. 1 is an application scene diagram of an interaction method of an avatar in a live broadcast room in an embodiment of the present application;

FIG. 2 is a schematic flowchart of a method for interacting with an avatar in a live broadcast room in an embodiment of the present application;

FIG. 3 is a schematic flow chart of a method for interacting with an avatar in a live broadcast room according to an embodiment of the present application;

FIG. 4 is a flowchart illustrating steps of identifying a target user in an embodiment of the present application;

FIG. 5 is a block diagram of an interaction apparatus for an avatar in a live broadcast room in an embodiment of the present application;

fig. 6 is an internal structural diagram of an electronic device in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The interaction method of the live broadcast room avatar provided by the embodiment of the application can be applied to an application scene shown in fig. 1, the application scene can include a terminal and a server, wherein the terminal can specifically include a main broadcast end and a plurality of audience ends (such as audience end 1 and audience end 2) of a live broadcast room, the main broadcast end and the plurality of audience ends can respectively communicate with the server through the internet, and the server provides live broadcast related services for the main broadcast end and the plurality of audience ends of the live broadcast room. The terminal can be but is not limited to various personal computers, notebook computers, smart phones, tablet computers and portable wearable devices, and the portable wearable devices can be head-mounted devices and the like; the server may be implemented as a stand-alone server or as a server cluster comprised of multiple servers. The application scene can be a meta-universe live broadcast scene, the server can create an independent meta-universe scene for a broadcaster and audiences of a live broadcast room, a virtual space parallel to the real world is built, a real world mapping and interactive virtual world is created through linking and creating by using scientific and technological means, various three-dimensional virtual images such as people and animals can be displayed in the virtual space, and the broadcaster and a user can freely interact with the three-dimensional virtual images in the virtual space. Specifically, a user in a live broadcast room can complete human-computer interaction through terminals such as a mobile phone and AR glasses, basic control operations such as gesture movement switching and selective exit are achieved, the terminals can receive and display data such as live broadcast video streams and three-dimensional virtual images sent by a server, display information is displayed for a meta-space scene, and meanwhile meta-space information is provided for the user.

Wherein, to the live scene of unit universe, anchor or performance honored guest can be opened on green curtain studio, can place the photography luminaire on the stage of green curtain studio, a microphone, a video camera, the live broadcast video that the camera of placing on the stage was shot can push away the video stream to the on-the-spot keying server in real time, keying server can separate the personage with green curtain background, withhold the personage, can withhold out the personage with the green mode of hardware withholding through software in the prior art, the personage video stream that has handled can carry out VR panorama synthesis, need fix a position the personage to the suitable coordinate position of the scene background of preparing in advance, this scene can be a two-dimensional background, also can be a three-dimensional background scene. When spectator plays through VR equipment, the video in the VR live broadcast is mainly a panoramic video, can include two kinds of videos of 180 degrees and 360 degrees, can play the live stream of video at a transparent hemisphere (180 degrees) or ball (360 degrees) or the inner face of 1/4 spheroid after receiving corresponding degree video, and place virtual camera in suitable position and carry out the picture display, and can also show live room avatar in this yuans of space and be used for the user to interact with it, reach the effect of virtuality and reality combination.

In the scenes including but not limited to the metacosmic live broadcast, based on the interaction method of the live broadcast room avatar, the terminal can display the three-dimensional avatar in the live broadcast room, the user can enable the three-dimensional avatar to make a response action by inputting interactive voice on the terminal, and convenient interaction with the three-dimensional avatar is realized.

The following describes an interaction method of a live broadcast avatar according to the present application, based on an application environment shown in fig. 1, with reference to various embodiments and corresponding drawings.

In one embodiment, as shown in fig. 2, there is provided a method for interacting with a live-air avatar, which may be performed by a terminal as shown in fig. 1, including the steps of:

step S201, receiving interactive voice input by a target user to the three-dimensional virtual image in the live broadcast room.

In this step, the target user refers to a specific user or users in the live broadcast room, and may be a main broadcast or a viewer. Specifically, the anchor or the viewer having the preset authority may be determined as the target user. In practical application, for the identity authority, the anchor in the anchor room may be determined as a target user, and for the level authority, the audience reaching a preset audience level in the live broadcast room may be determined as a target user, wherein the setting and the promotion of the audience level may be set according to a live broadcast service rule, and only an illustration example of the target user is introduced here. Further, if the home terminal user is determined as the target user, the terminal may receive the interactive voice input by the target user to the three-dimensional avatar in the live broadcast room, where the three-dimensional avatar may be a three-dimensional avatar in various forms such as a character and an animal displayed in the live broadcast room, the interactive voice refers to the voice used by the target user to interact with the three-dimensional avatar, and may be in a sentence form or a word form, and based on the interactive voice, the three-dimensional avatar may possibly perform corresponding response actions, so that the interactive voice generally includes a meaning that the three-dimensional avatar is required or requested to perform some response actions, for example, the meaning that the three-dimensional avatar is required to sit down is included in the "please sit down", the meaning that the three-dimensional avatar is required to dance is included in the "can give dancing to people", and therefore, for example, the meaning that the three-dimensional avatar is required to sit down and the meaning that the person can dance can be given as the interactive voice input to the terminal by the user.

And step S202, acquiring a target action script matched with the interactive voice in the script database according to the interactive voice.

Step S203, according to the target action script, obtaining the material resource corresponding to the target action script in the material resource library.

For steps S202 and S203, the script database and the material repository may be configured first. For the three-dimensional virtual image, the three-dimensional virtual image in various forms such as characters, animals and the like can be designed by a designer according to a certain specification, the specification design can keep reusability of an action script, the three-dimensional virtual image can be accompanied by the movement of each bone joint of the whole three-dimensional virtual image when performing certain actions, the bone joints can mark the movement fineness of the three-dimensional virtual image, generally, the higher the fineness is, the finer the number of the bone joints is, the finer the design is, and the three-dimensional virtual image can be accompanied by material resources such as sounds, pictures and the like when performing certain actions. For sound material resources, such as background music, expression music and the like, and for picture material resources, such as clothes, props and the like of the three-dimensional virtual image. The material resources are required to be manufactured and recorded in advance, so that a material resource library is formed, various material resources contained in the material resource library can be respectively associated with the action script, and the aim of obtaining the action script to obtain the corresponding material resources from the action script is fulfilled.

Specifically, for the action script, different three-dimensional avatars can have some specific actions, and some specific sound, picture and other material resources are needed to be matched for expression when the specific actions are made, the specific actions and the material resources can be controlled and bound through the code script, for example, the three-dimensional avatars have the actions of walking, standing, raising hands, singing, dancing and the like, the corresponding code script is needed to be compiled for the skeleton joints of the models and the required material resources in the model making of the three-dimensional avatars, and the action scripts of the actions of walking, standing, raising hands, singing, dancing and the like are stored, the identification and the description information of each action script are set, and the description information can specifically include the corresponding information such as the trigger instruction of the action script. In practical application, if other three-dimensional virtual image models are designed according to the same standard, the corresponding action scripts can be multiplexed to a certain extent, and the production efficiency of the action scripts is improved. Therefore, a plurality of action scripts can be obtained in different three-dimensional virtual image production, as mentioned above, in order to make the action scripts convenient to use, multiplex and manage subsequently, a unique identifier can be set for each action script, the name of the action script is defined (which can be used for describing the action meaning of the action script, such as walking, standing, raising hands, singing, dancing and the like), and a label corresponding to each action script, such as a three-dimensional virtual image, a corresponding live broadcast scene, a corresponding image attribute, corresponding time and the like, can be set for each action script for further classified storage, so as to form a script database.

Based on this, in step S202, after the terminal obtains the interactive voice input by the target user, the terminal may use the existing voice recognition model to recognize the interactive voice as a corresponding text, where the text is called an interactive voice text, for example, the terminal may recognize the "please sing song" in the voice form as the "please sing song" in the text form to obtain the interactive voice text, and then the terminal may send the interactive voice text to the server, and the server performs matching in the script database according to the interactive voice text, and may perform matching in a fuzzy matching and precise matching manner, and an action script matched with the interactive voice text in the obtaining script database is taken as a target action script, as an example, for the interactive voice text "sing song", the server may match the interactive voice text with the names of each action script in the script database, and match out the action script with the name of singing song as the target action script and return to the terminal. In specific implementation, a matching degree threshold value can be predefined, when the matching degree reaches the matching degree threshold value, the server can consider that matching is successful and send the matched target action text to the terminal, if the matching degree does not reach the matching degree threshold value, the server can consider that matching is failed, and at the moment, the server prompts the target user that matching is failed through the terminal without launching the action text, and the like. In the concrete implementation, further, in the cold start process of the action script in the early stage, the precipitated action scripts may be relatively few, in order to improve the efficiency and effectiveness of search matching, the name or the instruction corresponding to the currently available action script of the target user can be prompted through the terminal, so that the target user can input the corresponding interactive voice for interaction accordingly, and when the precipitation of the action script in the later stage is large, the prompting link can be omitted.

With respect to step S202, in some embodiments, step S202 may further include: determining live broadcast scenes where the three-dimensional virtual images are located and image attributes of the three-dimensional virtual images; and acquiring a target action script matched with the live broadcast scene, the image attribute and the interactive voice in a script database according to the live broadcast scene, the image attribute and the interactive voice. In this embodiment, the three-dimensional avatar displayed in the live broadcast room may be in various live broadcast scenes, where the live broadcast scenes refer to virtual scenes laid in the live broadcast room, and may be outdoor scenes, or indoor scenes, such as exhibition stands, stages, and the like, and the three-dimensional avatar may have its own avatar attributes, such as character characteristics, singing level, and the like. Specifically, labels in the aspects of live broadcast scenes, image attributes and the like corresponding to the action samples can be set for the action samples in the script database, and therefore the server can match corresponding target action scripts in the script database according to the live broadcast scenes, the image attributes and the interactive voice where the three-dimensional virtual images are located, so that the three-dimensional virtual images can perform more vivid and adaptive response actions on different live broadcast scenes and different image attributes for the same interactive voice, and interactive representation of the three-dimensional virtual images is enriched.

In step S203, the terminal may receive the target action script sent by the server, and then download the material resources corresponding to the target action script from the material resource library according to the target action script, and specifically, the terminal may download the material resources corresponding to the identifier of the target action script from the material resource library according to the identifier of the target action script.

For step S203, in one embodiment, step S203 may further include: acquiring an identifier of a target action script and acquiring an identifier of a three-dimensional virtual image; and if the identification of the target action script is matched with the identification of the three-dimensional virtual image, acquiring the material resources corresponding to the target action script in the material resource library. Specifically, the target action script issued by the server is not necessarily adapted to the three-dimensional avatar currently displayed in the live broadcast room, such as the bone structure is not adapted, in order to prevent control errors, after the terminal receives the target action script, the identifier of the target action script is matched with the identifier of the three-dimensional avatar, as described above, the adapted three-dimensional avatar is correspondingly recorded when the action script is made, the three-dimensional avatar adaptable to the target action script can be recorded by setting the identifier corresponding relationship, so that whether the identifier of the target action script is matched with the identifier of the three-dimensional avatar can be judged based on the identifier corresponding relationship, if so, the terminal can download corresponding material resources, such as music, pictures and other material resources, from a material resource library according to the identifier of the target action script, otherwise, the terminal can not process the three-dimensional avatar, and the terminal can feed back information that the identifier is not matched to the server so as to form an alarm and repair subsequent problems.

And step S204, playing the response action of the three-dimensional virtual image according to the target action script and the material resources.

In step S204, the terminal may bind the target action script to the three-dimensional avatar, so that the three-dimensional avatar performs corresponding action presentation according to the target action script and the material resource, for example, the three-dimensional avatar may perform a walking response action according to the issued target action script, and the target action script may specify which skeleton of the three-dimensional avatar performs movement, which direction to move, where the moving coordinates are, how long the moving time is, and the like.

The interaction method of the virtual image in the live broadcast room of the embodiment receives the interactive voice input by the target user to the three-dimensional virtual image in the live broadcast room, then obtains the target action script matched with the interactive voice in the script database according to the interactive voice, then obtains the material resources corresponding to the target action script in the material resource library according to the target action script, and finally can play the response action of the three-dimensional virtual image in the live broadcast room according to the target action script and the material resources. The scheme can realize the interaction between the user based on voice input and the three-dimensional virtual image in the live broadcast room, improves the interaction convenience between the user in the live broadcast room and the three-dimensional virtual image in the live broadcast room, can be applied to the interaction processing between the user and the three-dimensional virtual image in the meta-space live broadcast scene, enables the interaction between the user in the meta-space live broadcast and the three-dimensional virtual image to be more convenient, and also enriches the interaction mode between the user in the meta-space live broadcast and the three-dimensional virtual image.

As a specific example, the method for interacting an avatar in a live broadcast room of the present application may specifically include the steps shown in fig. 3, where the three-dimensional avatar and material resource making, action script and resource material management, voice input, action script and material resource matching, and three-dimensional avatar actions may refer to corresponding contents disclosed in the above embodiments, and then, in action synchronization, if a picture of a response action of the three-dimensional avatar needs to be synchronized with a live broadcast picture in real time, a stream pushing synchronization scheme may be adopted, for example, a main broadcast is performing a lottery activity through the three-dimensional avatar, the three-dimensional avatar is communicating with a main broadcast or a viewer, and at this time, a response action of the three-dimensional avatar needs to be synchronized with live broadcast audio and video in real time, a picture or audio containing the response action of the three-dimensional avatar may be mixed into an original live broadcast stream, and streamed to each viewer, so that the viewer can see that the action of the three-dimensional avatar is synchronized with the live broadcast; if the response action of the three-dimensional virtual image is not required to be synchronized with the live audio and video pictures in real time, the server can play the response action by each end in a mode of issuing a target action script at the same time, namely, the response action of the three-dimensional virtual image is played by combining material resources after each end receives the target action script. In the result reporting process, when the playing is successful, the terminal can report the information that the three-dimensional virtual image plays the target action script successfully to the server, and if the playing fails, the terminal can report the information that the three-dimensional virtual image plays the target action script unsuccessfully, so that the alarm and the follow-up problem repair can be conveniently formed.

In some embodiments, as shown in fig. 4, before step S201, namely receiving the interactive voice input by the target user to the three-dimensional avatar in the live broadcast room, the method further comprises the following steps:

step S401, if the home terminal user has the preset authority, displaying a preset test text for the interactive voice test.

Step S402, obtaining the test voice input by the home terminal user.

Step S403, identifying a test speech text corresponding to the test speech.

And S404, if the test voice text is matched with the preset test text, setting the home terminal user as a target user.

In this embodiment, the target user is identified only by checking the voice input after the home terminal user has the preset permission as described in the above embodiment, so that whether the terminal device of the user has good voice input performance can be tested together, and waste of searching matching resources of the server terminal is avoided. Specifically, in step S401, if the local user has a preset authority, such as the local user is the anchor of the live broadcast room, the audience reaching a certain audience level in the live broadcast room, and the like, the terminal may individually display a preset test text for the interactive voice test for the local user, where the preset test text may be a preset sentence related to the interaction of the three-dimensional avatar, such as "please dance for everybody", and the like, and then the terminal may prompt the local user to read the preset test text and input a corresponding voice, where the corresponding voice is called a test voice. Then, in step S402, the terminal obtains a test voice input by the home terminal user, and in step S403, a text corresponding to the test voice may be recognized through the voice recognition model, where the corresponding text is referred to as a test voice text, then in step S404, the terminal may match the test voice text with a preset test text, and in a case where the test voice text is consistent with the preset test text, the terminal may determine that the test voice text matches the preset test text, and at this time, set the home terminal user as a target user.

In one embodiment, the presenting the preset test text for the interactive voice test in step S401 may include: acquiring a preset test text for an interactive voice test, wherein the preset test text sequentially comprises a plurality of test text sections; and displaying the sequencing information of the test text sections and the test text sections in the preset test text in different viewing angles of the home terminal user to the live broadcast room so that the home terminal user can read and input the corresponding test voice in sequence.

In this embodiment, the preset test text obtained by the terminal sequentially includes a plurality of test text segments, the plurality of test text segments may be obtained by segmenting from the preset test text in advance, for example, the "please dance with your person" may be segmented into two test text segments, the "please dance with your person" and the "dance with your person" in advance, the two test text segments further have corresponding sorting information, that is, the respective sorting information in the preset test text, for example, the sorting information of the "please dance with your person" may be set as 1 to represent the beginning, and the sorting information of the "dance with your person" is set as N to represent the end, etc. Therefore, when the terminal displays the test text segments, each test text segment and the corresponding sequencing information are displayed together, if displaying 'please give everybody' together with 1, and displaying 'dancing in individual' together with N to show the sequencing information corresponding to the test text segment, specifically, under the scenes such as VR live broadcast and Yuancosmopdirect broadcast, the user can have multiple viewing angles to the live broadcast room on the terminal, the user can change the viewing angles of the live broadcast room in a rotating mode and other modes by means of a mobile phone and VR glasses, if wearing the VR glasses, the user can change the viewing angles from the viewing angles at the front of the stage in the live broadcast room to the viewing angles at the left of the stage in the live broadcast room by rotating, based on the method, the terminal can respectively display the test text segments and the respective sequencing information in different viewing angles of the live broadcast room of the terminal user, if displaying 'please give everybody' and the sequencing information at the front of the stage, the 'dancing' and the sequencing information thereof are displayed in different viewing angles at the left of the stage, and the text segments are read and input by the test text segments. Through the scheme of the embodiment, interactive verification modes in scenes such as VR live broadcast and meta-space live broadcast can be combined, so that a user of a terminal with certain viewing angle switching conditions and performance can be identified as a target user, and the capability of identifying the target user can be provided for diversified interactive modes related to three-dimensional virtual image interaction in scenes such as VR live broadcast and meta-space live broadcast.

Based on the above scheme of the embodiment, further, in an embodiment, after the method sets the home terminal user as the target user, the method may further include:

acquiring a preset interactive text aiming at the three-dimensional virtual image and sent by a server, wherein the preset interactive text sequentially comprises a plurality of interactive text segments; and displaying the plurality of interactive text segments and the sequencing information of the plurality of interactive text segments in the preset interactive text in different viewing visual angles of the target user to the live broadcast room, so that the target user can read and input corresponding interactive voice in sequence.

The present embodiment is a scheme of determining a home terminal user as a target user based on the scheme of the above embodiment, and then at a stage of performing actual interaction with a three-dimensional avatar, a server may pre-define some interaction texts for the three-dimensional avatar, which are called as preset interaction texts, for example, corresponding interaction texts may be set at a specific festival, such as a spring festival, the interaction texts may enable the three-dimensional avatar to perform a specific response action representing spring festival festivity, the preset interaction texts may be pre-divided into a plurality of sequential interaction text segments, and a plurality of target users may be in a live broadcast room, the server may issue the preset interaction texts sequentially including the plurality of interaction text segments to terminals of each target user in the live broadcast room, after receiving the preset interaction texts, the terminals obtain the plurality of interaction text segments therein and their respective ranking information in the preset interaction texts, and then display the plurality of interaction text segments and their respective ranking information in different viewing angles of the target user to the live broadcast room, so that the target user sequentially reads and inputs corresponding interaction voices.

Based on this, further, the obtaining of the target action script matched with the interactive voice in the script database according to the interactive voice in step S202 of the method of the present application specifically includes:

identifying an interactive voice text corresponding to the interactive voice, and sending the interactive voice text to a server; and receiving the target action script returned by the server.

The terminal receives interactive voice input by a target user, then identifies the interactive voice text corresponding to the interactive voice, then sends the interactive voice text to the server, the server can match the interactive voice text with a preset interactive text, if the interactive voice text is matched with the preset interactive text, a target action script corresponding to the preset interactive text is obtained from a script database and returned to the terminal, and therefore the terminal receives the target action script and plays response actions of the three-dimensional virtual image. In an actual scene, the number of target users in the live broadcast room is multiple, so that there may be multiple interactive voice texts received by the server, and the server can stop matching the interactive voice texts from other terminals and the preset interactive text when the first interactive voice text is successfully matched, instruct the terminals of the target users to cancel the display of the preset interactive text, and send the target action script to the terminals to play the response action of the three-dimensional virtual image. According to the scheme of the embodiment, interaction modes related to interaction of the three-dimensional virtual image in scenes such as VR live broadcast and metacosmic live broadcast can be enriched, a plurality of target users in a live broadcast room can find and read interaction text sections distributed at a plurality of watching visual angles in the live broadcast room as soon as possible to form complete interaction voice, so that the three-dimensional virtual image can make specific response actions under the condition that the corresponding interaction voice text is successfully matched with the preset interaction text for the first time, identification information such as a nickname of the target user successfully matched with the three-dimensional virtual image can be further broadcasted through the response actions of the three-dimensional virtual image, and the effect of enriching the interaction modes is achieved.

In some other embodiments, the presenting the preset test text for the interactive voice test in step S401 may include: acquiring a preset test text for an interactive voice test, wherein the preset test text comprises preset test keywords; and displaying the preset test text and highlighting the preset test keywords contained in the preset test text.

In this embodiment, the preset test text acquired by the terminal includes the preset test keyword, and illustratively, "dancing with one' in" please give people dancing with one "of the preset test text may be set as the preset test keyword. Then the terminal can display the preset test text in a live broadcast room for a home terminal user, highlight the preset test keywords contained in the preset test text, and prompt the user to read the preset test keywords at relatively high volume in the process of reading the preset test text, wherein the relative means the relative volume when reading other words in the preset test text, for example, the volume X can be used for reading and requesting a majordomo, and the volume Y larger than the volume X is used for reading the dancing of the player.

From this, the terminal can receive and acquire the test voice that this end user only inputs to this test text of predetermineeing to and the test speech text that the test voice corresponds is discerned, after the test speech text that the test voice corresponds is discerned, still include: and acquiring a test voice text segment corresponding to the high-volume voice segment in the test voice text. The high-volume voice segment in the test voice refers to a segment of voice with relatively high volume in the test voice read and input by the local user, the meaning of the relatively high volume can refer to the content as described above, specifically, the high-volume voice segment in the test voice can be extracted according to the volume information of the test voice, then, a segment of text corresponding to the high-volume voice segment can be obtained in the recognized test voice text, the segment of text is recorded as a test voice text segment, and exemplarily, "dancing in one's life" is read at relatively high volume in the process of reading "please dance in one's life" by the local user, and "dancing in one's life" is a test voice text segment.

Based on this, if the test speech text in step S404 of the method of the present application matches with the preset test text, the method sets the home terminal user as the target user, and specifically includes: and if the test voice text is matched with the preset test text and the test voice text segment is matched with the preset test keyword, setting the home terminal user as the target user. In this embodiment, the home terminal user is identified as the target user only when the test voice text matches the preset test text and the test voice text segment matches the preset test keyword, so that the home terminal user with clear voice entry and clear instruction representation can be accurately selected as the target user, and resource waste caused in search matching of the server is avoided.

Based on the above scheme of the embodiment, further, in an embodiment, the step S202 in the method of the present application, namely obtaining a target action script matched with the interactive voice in the script database according to the interactive voice, specifically includes:

identifying an interactive voice text corresponding to interactive voice, and acquiring an interactive voice text segment corresponding to a high-volume voice segment in the interactive voice text; sending the interactive voice text segment to a server; and receiving the target action script returned by the server.

In this embodiment, the target user may autonomously input the interactive voice through the terminal, for example, the target user may autonomously input the interactive voice with the content of "you can dance people", and in the interactive voice input process, the target user may read "dancing people" with relatively high volume to complete the input of the interactive voice. Then, the terminal can identify the interactive voice text corresponding to the interactive voice to obtain a text form of 'you can dance people', the terminal can also obtain a high-volume voice section according to the volume information of the interactive voice, and then obtains a text section 'dancing people' corresponding to the high-volume voice section from the interactive voice text to be marked as the interactive voice text section. Then, the terminal can send the interactive voice text segment to the server, the server can match the interactive voice text segment in the script database, specifically, each action script in the script database can respectively correspond to a preset keyword, the preset keyword can be the name of the action script, and the like, the server can match the interactive voice text segment with the preset keywords, if the preset keyword 'dancing' is matched, the server can determine that the interactive voice text segment is matched with the preset keyword, under the condition, the server can obtain the action script corresponding to the preset keyword from the script database as a target action script and return the target action script to the terminal, and the terminal receives the target action script returned by the server. According to the scheme of the embodiment, the user can send out interactive voice to the three-dimensional virtual image through a natural and plain non-instruction type sentence to make the three-dimensional virtual image make a response action, so that each user in the live broadcast room can express nature even in the voice interaction process with the three-dimensional virtual image, and the interaction experience is improved.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.

Based on the same inventive concept, the embodiment of the application also provides a live broadcast room avatar interaction device for realizing the live broadcast room avatar interaction method. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme recorded in the method, so specific limitations in the following embodiment of the interaction device for one or more live broadcast room avatars can be referred to the above limitations on the interaction method for the live broadcast room avatars, and details are not described herein again.

In one embodiment, as shown in fig. 5, an interactive apparatus for a live room avatar is provided, the apparatus 500 may include:

a voice receiving module 501, configured to receive an interactive voice input by a target user to a three-dimensional virtual image in a live broadcast room;

a script obtaining module 502, configured to obtain, according to the interactive voice, a target action script matched with the interactive voice in a script database;

a resource obtaining module 503, configured to obtain, according to the target action script, a material resource corresponding to the target action script in a material resource library;

and the action playing module 504 is used for playing the response action of the three-dimensional virtual image according to the target action script and the material resource.

In one embodiment, the apparatus 500 may further include: the user identification module is used for displaying a preset test text for the interactive voice test if the home terminal user has a preset authority; acquiring test voice input by the home terminal user; identifying a test voice text corresponding to the test voice; and if the test voice text is matched with the preset test text, setting the home terminal user as a target user.

In one embodiment, the user identification module is used for acquiring a preset test text for the interactive voice test; the preset test text sequentially comprises a plurality of test text sections; and displaying the plurality of test text segments and the sequencing information of the plurality of test text segments in the preset test text in different viewing angles of the home terminal user to the live broadcast room, so that the home terminal user can read and input the test voice in sequence.

In one embodiment, the voice receiving module 501 is further configured to obtain a preset interactive text for the three-dimensional avatar sent by the server; the preset interactive text sequentially comprises a plurality of interactive text sections; displaying the plurality of interactive text segments and the sequencing information of the plurality of interactive text segments in the preset interactive text in different viewing visual angles of the target user to the live broadcast room, so that the target user can read and input the interactive voice in sequence; a script obtaining module 502, configured to identify an interactive voice text corresponding to the interactive voice, and send the interactive voice text to the server; receiving a target action script returned by the server; and the target action script is the target action script corresponding to the preset interactive text which is obtained and returned from the script database by the server when the interactive voice text is matched with the preset interactive text.

In one embodiment, the user identification module is used for acquiring a preset test text for the interactive voice test; the preset test text comprises preset test keywords; displaying the preset test text and highlighting a preset test keyword contained in the preset test text; acquiring a test voice text segment corresponding to a high-volume voice segment in the test voice text; and if the test voice text is matched with the preset test text and the test voice text segment is matched with the preset test keyword, setting the home terminal user as a target user.

In one embodiment, the script obtaining module 502 is configured to identify an interactive voice text corresponding to the interactive voice, and obtain an interactive voice text segment corresponding to a high-volume voice segment in the interactive voice text; sending the interactive voice text segment to a server; receiving a target action script returned by the server; and the target action script is the target action script corresponding to the preset keyword, which is obtained and returned from the script database by the server when the interactive voice text segment is matched with the preset keyword.

In one embodiment, the resource obtaining module 503 is configured to obtain an identifier of the target action script, and obtain an identifier of the three-dimensional avatar; and if the identification of the target action script is matched with the identification of the three-dimensional virtual image, acquiring material resources corresponding to the target action script in a material resource library.

In one embodiment, the script obtaining module 502 is configured to determine a live scene in which the three-dimensional avatar is located and an avatar attribute of the three-dimensional avatar; and acquiring a target action script matched with the live broadcast scene, the image attribute and the interactive voice in the script database according to the live broadcast scene, the image attribute and the interactive voice.

The modules in the interactive device of the live broadcast room avatar can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent of a processor in the electronic device, or can be stored in a memory in the electronic device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, an electronic device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 6. The electronic device comprises a processor, a memory, a communication interface, a display screen and an input device which are connected through a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the electronic device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a live room avatar interaction method. The display screen of the electronic equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the electronic equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the configuration shown in fig. 6 is a block diagram of only a portion of the configuration associated with the present application, and does not constitute a limitation on the electronic device to which the present application is applied, and a particular electronic device may include more or less components than those shown in the drawings, or may combine certain components, or have a different arrangement of components.

In one embodiment, an electronic device is further provided, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.

In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, which may be stored in a non-volatile computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include a Read-only Memory (ROM), a magnetic tape, a floppy disk, a flash Memory, an optical Memory, a high-density embedded nonvolatile Memory, a resistive random access Memory (ReRAM), a Magnetic Random Access Memory (MRAM), a Ferroelectric Random Access Memory (FRAM), a Phase Change Memory (PCM), a graphene Memory, and the like. Volatile memory can include Random Access Memory (RAM), external cache memory, or the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.

It should be noted that, the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims

1. A method of interacting with a live room avatar, the method comprising:

and playing the response action of the three-dimensional virtual image according to the target action script and the material resources.

2. The method of claim 1, wherein prior to receiving the interactive voice input by the target user into the three-dimensional avatar in the live broadcast room, further comprising:

if the home terminal user has the preset authority, displaying a preset test text for the interactive voice test;

acquiring test voice input by the home terminal user;

identifying a test voice text corresponding to the test voice;

and if the test voice text is matched with the preset test text, setting the home terminal user as a target user.

3. The method of claim 2, wherein presenting the predetermined test text for the interactive voice test comprises:

acquiring a preset test text for an interactive voice test; the preset test text sequentially comprises a plurality of test text sections;

and displaying the sequencing information of the test text segments and the test text segments in the preset test text in different viewing angles of the home terminal user to the live broadcast room, so that the home terminal user can read and input the test voice in sequence.

4. The method of claim 3,

after the home terminal user is set as the target user, the method further includes:

acquiring a preset interactive text aiming at the three-dimensional virtual image and sent by a server; the preset interactive text sequentially comprises a plurality of interactive text sections;

displaying the plurality of interactive text segments and the sequencing information of the plurality of interactive text segments in the preset interactive text in different viewing visual angles of the target user to the live broadcast room, so that the target user can read and input the interactive voice in sequence;

the obtaining of the target action script matched with the interactive voice in the script database according to the interactive voice comprises:

identifying an interactive voice text corresponding to the interactive voice, and sending the interactive voice text to the server;

receiving a target action script returned by the server; and the target action script is the target action script corresponding to the preset interactive text which is obtained and returned from the script database by the server when the interactive voice text is matched with the preset interactive text.

5. The method of claim 2,

the displaying of the preset test text for the interactive voice test includes:

acquiring a preset test text for an interactive voice test; the preset test text comprises preset test keywords;

displaying the preset test text and highlighting a preset test keyword contained in the preset test text;

after the test voice text corresponding to the test voice is identified, the method further includes:

acquiring a test voice text segment corresponding to a high-volume voice segment in the test voice text;

if the test voice text is matched with the preset test text, setting the home terminal user as a target user, including:

and if the test voice text is matched with the preset test text and the test voice text segment is matched with the preset test keyword, setting the home terminal user as a target user.

6. The method according to claim 5, wherein the obtaining a target action script matched with the interactive voice in a script database according to the interactive voice comprises:

identifying an interactive voice text corresponding to the interactive voice, and acquiring an interactive voice text segment corresponding to a high-volume voice segment in the interactive voice text;

sending the interactive voice text segment to a server;

receiving a target action script returned by the server; and the target action script is the target action script corresponding to the preset keyword, which is obtained and returned from the script database by the server when the interactive voice text segment is matched with the preset keyword.

7. The method according to any one of claims 1 to 6, wherein the obtaining material resources corresponding to the target action script in a material resource library according to the target action script comprises:

acquiring an identifier of the target action script and acquiring an identifier of the three-dimensional virtual image;

and if the identification of the target action script is matched with the identification of the three-dimensional virtual image, acquiring material resources corresponding to the target action script in a material resource library.

8. The method according to any one of claims 1 to 6, wherein the obtaining a target action script matched with the interactive voice in a script database according to the interactive voice comprises:

determining a live broadcast scene where the three-dimensional virtual image is located and image attributes of the three-dimensional virtual image;

and acquiring a target action script matched with the live broadcast scene, the image attribute and the interactive voice in the script database according to the live broadcast scene, the image attribute and the interactive voice.

9. An apparatus for live room avatar interaction, the apparatus comprising:

10. An electronic device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor realizes the steps of the method of any of claims 1 to 8 when executing the computer program.

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 8.