CN107423809A

CN107423809A - The multi-modal exchange method of virtual robot and system applied to net cast platform

Info

Publication number: CN107423809A
Application number: CN201710551230.0A
Authority: CN
Inventors: 黄钊
Original assignee: Beijing Guangnian Wuxian Technology Co Ltd
Current assignee: Beijing Virtual Point Technology Co Ltd
Priority date: 2017-07-07
Filing date: 2017-07-07
Publication date: 2017-12-01
Anticipated expiration: 2037-07-07
Also published as: CN107423809B

Abstract

The invention discloses a kind of multi-modal exchange method of virtual robot applied to net cast platform, the net cast platform application access possesses the virtual robot of multi-modal interaction capabilities, and multi-modal exchange method comprises the following steps：Show that there is specific vivid virtual robot in predeterminable area, into the live auxiliary mode of acquiescence, the multi-modal data of real-time reception direct broadcasting room input and multi-modal instruction；The multi-modal data and the multi-modal instruction are parsed, using the multi-modal interaction capabilities of the virtual robot, differentiates and determines the live auxiliary mode of target；The live auxiliary mode of target is opened, the virtual robot carries out multi-modal interaction and displaying according to the live auxiliary mode of target.The present invention shows multimode multi-modal interaction using live-mode conversion, improves user interest, keeps user's viscosity, improves Consumer's Experience.

Description

The multi-modal exchange method of virtual robot and system applied to net cast platform

Technical field

The present invention relates to the live platform technology field in internet, more particularly to it is a kind of applied to the virtual of net cast platform Robot multi-modal exchange method and system.

Background technology

With the development of network direct broadcasting industry, user can be by watching, doing the modes such as activity on network direct broadcasting platform Virtual prize is got, and the virtual prize of acquisition is given to the main broadcaster liked to oneself, interaction is carried out, so as to cultivate the sight of user See custom and platform viscosity.

However, in existing network direct broadcasting platform, monitoring the system of main broadcaster's live state, its is not perfect, and main broadcaster Acting style is single, and the experience sense brought to user is bad, therefore improves the intelligent of live platform, is that present urgent need solves Important technological problems.

The content of the invention

In order to solve the above-mentioned technical problem, embodiments herein provide firstly a kind of applied to net cast platform The multi-modal exchange method of virtual robot, the application access of virtual robot of the net cast platform, the virtual robot Possess multi-modal interaction capabilities, the multi-modal exchange method comprises the following steps：Multi-modal information input step, in preset areas Domain, which is shown, has specific vivid virtual robot, into the live auxiliary mode of acquiescence, the multimode of real-time reception direct broadcasting room input State data and multi-modal instruction；Data processing and pattern discrimination step, parse the multi-modal data and/or the multi-modal finger Order, using the multi-modal interaction capabilities of the virtual robot, differentiates and determines the live auxiliary mode of target；Multi-modal friendship Mutual information exports step, opens the live auxiliary mode of target, and the virtual robot carries out more according to the live auxiliary mode of target Mode interaction and displaying.

Preferably, the data processing includes with pattern discrimination：Receive it is live during the multi-modal data, extraction For the wake-up data of the virtual robot；Into the multi-modal interactive mode of one of which with the wake-up Data Matching In, and perform multi-modal interaction and displaying action under current multi-modal state interactive mode.

Preferably, the multi-modal interactive mode includes：Dialogue mode, performance basic model, with spectators' interactive mode and With other virtual robot interactive modes.

Preferably, in the data processing and pattern discrimination, further, obtain main broadcaster is directed to what patten transformation was set The multi-modal instruction；Parse and respond the patten transformation and set, other multimodes are switched to from current multi-modal state interactive mode State interactive mode is the live auxiliary mode of target.

Preferably, the multi-modal data and/or multi-modal instruction include：Text information, voice messaging, visual information, The one or more of control command information and combinations thereof information.

On the other hand, embodiments herein proposes a kind of storage medium, is stored thereon with executable any of the above item The program code of described method and step.

On the other hand, embodiments herein provides a kind of virtual robot multimode applied to net cast platform again State interactive system, the application access of virtual robot of the net cast platform, the virtual robot possess multi-modal interaction Ability, the multi-modal interactive system are included with lower module：Multi-modal information input module, shown in predeterminable area with specific The virtual robot of image, into giving tacit consent to live auxiliary mode, the multi-modal data of real-time reception direct broadcasting room input and multi-modal Instruction；Data processing and pattern discrimination module, parse the multi-modal data and the multi-modal instruction, utilize the virtual machine The multi-modal interaction capabilities of device people, differentiate and determine the live auxiliary mode of target；Multi-modal interactive information output module, is opened The live auxiliary mode of target is opened, the virtual robot carries out multi-modal interaction and displaying according to the live auxiliary mode of target.

Preferably, in the data processing and pattern discrimination module, based on the multi-modal data, extraction is for described The wake-up data of virtual robot；Into the multi-modal interactive mode of one of which with the wake-up Data Matching, and perform and work as Multi-modal interaction and displaying action under preceding multi-modal interactive mode.

Preferably, in the data processing and pattern discrimination module, further, being set for patten transformation for main broadcaster is obtained The multi-modal instruction put；Parse and respond the patten transformation and set, other are switched to from current multi-modal state interactive mode Multi-modal interactive mode is the live auxiliary mode of target.

Preferably, institute's modal data and/or the multi-modal instruction include：Text information, voice messaging, vision letter The one or more of breath, control command information and combinations thereof information.

Compared with prior art, one or more of such scheme embodiment can have the following advantages that or beneficial to effect Fruit：

The solution that main broadcaster carries out live work is aided in by virtual robot the embodiments of the invention provide a kind of, The program makes virtual robot according to the live auxiliary mode of determination to show multi-modal interaction, it is possible to increase the interest of user, The viscosity with user is kept, improves Consumer's Experience.

Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification Obtain it is clear that or being understood by implementing technical scheme.The purpose of the present invention and other advantages can by Specifically noted structure and/or flow are realized and obtained in specification, claims and accompanying drawing.

Brief description of the drawings

Accompanying drawing is used for providing to the technical scheme of the application or further understanding for prior art, and constitution instruction A part.Wherein, the accompanying drawing for expressing the embodiment of the present application is used for the technical side for explaining the application together with embodiments herein Case, but do not form the limitation to technical scheme.

Fig. 1 is the multi-modal interactive application schematic diagram of a scenario of network direct broadcasting platform of the embodiment of the present application.

Fig. 2 a are the pattern schematic diagram of a scenario of playing the fool to help make people laugh of the multi-modal interactive system of network direct broadcasting platform of the embodiment of the present application.

Fig. 2 b are buffering and the performance basic model field of the multi-modal interactive system of network direct broadcasting platform of the embodiment of the present application Scape schematic diagram.

Fig. 2 c are showing with spectators' interactive mode scene for the multi-modal interactive system of network direct broadcasting platform of the embodiment of the present application It is intended to.

Fig. 2 d connect wheat for the multi-modal interactive system of network direct broadcasting platform of the embodiment of the present application with another virtual robot Pattern schematic diagram of a scenario.

Fig. 2 e are being interacted with other virtual robots for the multi-modal interactive system of network direct broadcasting platform of the embodiment of the present application Pattern schematic diagram of a scenario.

Fig. 3 is the structural representation of the multi-modal interactive system of network direct broadcasting platform of the embodiment of the present application.

Fig. 4 is the mode transition diagram of the multi-modal interactive system of network direct broadcasting platform of the embodiment of the present application.

Fig. 5 is the module frame chart of the multi-modal interactive system of network direct broadcasting platform of the embodiment of the present application.

Fig. 6 be the embodiment of the present application the multi-modal interactive system of network direct broadcasting platform in side face detection module 522 module Block diagram.

Fig. 7 is the flow that side face detection function is realized in the multi-modal interactive system of network direct broadcasting platform of the embodiment of the present application Figure.

Fig. 8 is the module of the sound identification module 524 of the multi-modal interactive system of network direct broadcasting platform of the embodiment of the present application Block diagram.

Fig. 9 is the flow that speech identifying function is realized in the multi-modal interactive system of network direct broadcasting platform of the embodiment of the present application Figure.

Figure 10 is the module of the pattern discrimination module 523 of the multi-modal interactive system of network direct broadcasting platform of the embodiment of the present application Block diagram.

Figure 11 is the module of the semantic module 525 of the multi-modal interactive system of network direct broadcasting platform of the embodiment of the present application Block diagram.

Figure 12 is the stream that semantic analysis function is realized in the multi-modal interactive system of network direct broadcasting platform of the embodiment of the present application Cheng Tu.

Embodiment

Embodiments of the present invention are described in detail below with reference to drawings and Examples, and how the present invention is applied whereby Technological means solves technical problem, and the implementation process for reaching relevant art effect can fully understand and implement according to this.This Shen Each feature that please be in embodiment and embodiment, can be combined with each other under the premise of not colliding, the technical scheme formed Within protection scope of the present invention.

In addition, the flow of accompanying drawing can be in the computer system of such as one group computer executable instructions the step of illustrating Middle execution.Also, although logical order is shown in flow charts, in some cases, can be with different from herein Order performs shown or described step.

Fig. 1 is the application scenarios schematic diagram of the multi-modal interactive system of network direct broadcasting platform of the embodiment of the present application.Such as Fig. 1 institutes Show, said system is applied in network direct broadcasting platform 300, it is necessary to be installed in main broadcaster's equipment 120 live before system application Class application software, above-mentioned live software is opened by main broadcaster 111, live task is actively initiated, is carried out into direct broadcasting room platform 300 Live performance.In addition, spectators user (211 ... 21n) needs to install on its user equipment (221 ... 22n) to set with main broadcaster There is the live class application software of same names, user (211 ... 21n) can be in its equipment (221 ... 22n) in standby 121 Direct broadcasting room network address is inputted, is entered by internet in direct broadcasting room platform 300, user (211 ... 21n) is used by direct broadcasting room Family display interface (2211 ... 22n1) watches the live performance of main broadcaster 111.It should be noted that the application is directed to user equipment (221 ... 22n) and the type of main broadcaster's equipment 121 are also not specifically limited, and can be for example：Smart mobile phone, computer, flat board Apparatus such as computer.

Furtherly, when main broadcaster 111 opens live class application software, and after initiating live command, live class application software Direct broadcasting room main broadcaster display interface 1211 is included on the display screen of main broadcaster's equipment 121.With reference to figure 1, direct broadcasting room main broadcaster shows boundary Face 1211 possesses following viewing area：The main broadcaster that Real-time Feedback main broadcaster 111 performs video pictures performs region；Scroll spectators The barrage for data that the barrage information of transmission, spectators leave a message information and spectators give a present, spectators' message, spectators are given a present viewing area；By Main broadcaster 111 send it is live open, terminate, with the control command such as spectators Lian Mai (such as：Above-mentioned control command can be with function button Mode realize) main broadcaster's master control area；The machine of the status informations such as Real-time Feedback virtual robot 111f expressions, language, action People aids in Performance Area.In addition, after user (211 ... 21n) enters main broadcaster's direct broadcasting room 300, user can pass through live use Family display interface (2211 ... 22n1) is watched performs region and robot assisted with main broadcaster in main broadcaster's display interface 1211 The roughly the same performance picture of Performance Area；But live user's display interface (2211 ... 22n1) and live main broadcaster's display interface 1211 compare, and have 2 differences：First, barrage in live user's display interface (2211 ... 22n1), leaving a message, giving a present Viewing area is except possessing barrage in live main broadcaster's display interface 1211, spectators leave a message, spectators give a present in addition to the function of viewing area, user (211 ... 21n) can also input message text message in the region；Second, live user's display interface (2211 ... User's control area in 22n1) includes the control button that user leaves direct broadcasting room.

It should be noted that the multi-modal interactive system of network direct broadcasting platform of the application is configured with and possesses multi-modal interactive energy The virtual robot 111f of power, its using animating image as carrier, can with output character information, voice messaging, expression animation information, The multi-modal informations such as action message.In the embodiment of the present application, the multi-modal interactive system of network direct broadcasting platform utilizes virtual machine People 111f can be implemented function such as：In the case where main broadcaster is without instruction, virtual robot 111f can aid in main broadcaster's carry out table Drill, and extend appreciation the spectators user specified；Can according to corresponding to being converted to the different instruction of main broadcaster multi-modal interactive mode.Virtual machine The loading of device people can run into the imperfect main broadcaster of eloquence or main broadcaster it is tired out in the case of replace main broadcaster and audience interaction, and Performed accordingly to spectators, additionally it is possible to engaged in the dialogue with main broadcaster, keep the visit capacity and temperature of direct broadcasting room, maintain live matter Amount and live duration.

Wherein, above-mentioned main broadcaster's instruction includes following operation：Main broadcaster side face is named in being spoken towards virtual robot 111f, main broadcaster The name of virtual robot, main broadcaster say " dancing " in speaking, " singing ", the key instruction such as " tell a story ", main broadcaster press with The button of spectators' interaction, main broadcaster press the button interacted with other virtual robots.In addition, above-mentioned multi-modal interactive mode includes Following pattern：Pattern, dialogue mode, the function of playing the fool to help make people laugh are performed basic model, handed over spectators' interactive mode, with other virtual robots Mutual pattern.

Next, aid in live mould with virtual robot for main broadcaster's instruction in the multi-modal interactive system of network direct broadcasting platform The matching of formula is how to implement the auxiliary performance process under each pattern to be described in detail with conversion and virtual robot.

(the first pattern)

In the embodiment of the present application, if main broadcaster carries out live performance, and when not sending any special instruction, virtually Robot is in mode state of playing the fool to help make people laugh.

Fig. 2 a are the pattern schematic diagram of a scenario of playing the fool to help make people laugh of the multi-modal interactive system of network direct broadcasting platform of the embodiment of the present application, such as Shown in Fig. 2 a, between main broadcaster in main broadcaster's display interface 1211, main broadcaster's master control area also include the mood order played the fool to help make people laugh under pattern by Button, such as：The basic emotion order such as exciting, cheerful and light-hearted, tranquil, surprised, sad；, can meanwhile virtual robot is under the pattern of playing the fool to help make people laugh To export the multi-modal information of playing the fool to help make people laugh for including voice, action and expression of corresponding mood simultaneously by its animating image.Need Bright, in mood of the same race, each mode is provided with much information content, virtual robot according to much information content with A kind of output information acted as the mode under particular command mood of machine output.Specifically, (one embodiment) is in excitement Order mood under, the exportable audio-frequency information of virtual robot includes：It is " excellent！", " pretty good ", " continuation " etc.； Exportable action message includes：Turn-take, erect thumb, dancing etc.；Exportable facial expressions and acts have dew tooth to laugh, face upward neck laugh Randomly choosed Deng the different information of, virtual robot for each mode, and carry out following multi-modal information of playing the fool to help make people laugh Matching output：The expression information that the language and Lu Ya of the action of rotation and " pretty good " are laughed is matched；Or dance The language of action and " excellent " and the expression for facing upward neck laugh.(another embodiment) under surprised order mood, virtually The exportable audio-frequency information of robot includes：" genuine ", " mon Dieu ", " refreshing horse " etc.；Exportable action message includes：It is double Hand is spread out, takes a step back, waved；Exportable facial expressions and acts have nozzle type in vertical ellipse, opened eyes wide, virtual machine People is randomly choosed for the different information of each mode, and carries out following multi-modal information matching output of playing the fool to help make people laugh：It is double The action that hand is spread out is matched with the language of " genuine " and pop-eyed expression information；Or the action to take a step back and The language and nozzle type of " refreshing horse " are in the expression of vertical ellipticalness.

On the other hand, main broadcaster's master control area in direct broadcasting room main broadcaster's display platform interface is also equipped with the thanks under the pattern of playing the fool to help make people laugh Order button.When main broadcaster, which presses, thanks to order button, the system is given a present the statistical results of data according to spectators, if randomly selecting The dry name personnel of giving a present are thanked.Specifically, when main broadcaster, which sends, thanks to order, system is given a present situation according to spectators, is counted Spectators' user's name of " sending yacht ", spectators' user's name of " seeing off villa " etc., randomly select three thing spectators that give a present, virtual machine For device people by its animating image, output thanks to the audio-frequency information of order, such as：" present for thanking to * * * ", " it thanks * * * branch Hold ", " thank * * * concern ", " present that thank you " etc. (at * * * Corresponding matching give a present the user's name of spectators).

In direct broadcasting room user's display interface, user can not only watch the performance of main broadcaster in real time, additionally it is possible to see virtual The animation fragment (voice, expression and the matching of action output) of the interspersed display of robot.

(second of pattern)

In the embodiment of the present application, if main broadcaster tells the name of virtual robot, and/or main broadcaster during live performance When side face is towards virtual robot, virtual robot enters dialogue mode state.

Fig. 2 b are the dialogue mode schematic diagram of a scenario of the multi-modal interactive system of network direct broadcasting platform of the embodiment of the present application, such as Shown in Fig. 2 b, when system enters dialogue mode, main broadcaster's display interface 1211 and direct broadcasting room user's display interface between main broadcaster (2211 ... 22n1) is switched to dialogue mode interface from mode interface of playing the fool to help make people laugh.In dialogue mode interface, the image of main broadcaster 110 It is shown in the animating image of virtual robot on device screen, while the conversation content of the two rolls in real time in a text form Display.

Specifically, in the embodiment of the present application, when main broadcaster side face is said towards virtual robot and/or to virtual robot： " figure figure (" figure figure " is the title that main broadcaster 110 aids in live virtual robot in the embodiment of the present application), says hello with everybody ！" now, system dialog mode command starts, and in one embodiment, main broadcaster and virtual robot can be completed following right Words.

Virtual robot is said to main broadcaster：", hello, and I makes figure scheme, and is everybody animation main broadcaster.”

Main broadcaster says to virtual robot：" I performed just now how”

Virtual robot is said to main broadcaster：It is " especially good！”

Main broadcaster says to virtual robot：" you feel also where need to improve”

Virtual robot is said to main broadcaster：" if expression enriches some with regard to more preferable again！”……

In above-mentioned dialogue, virtual robot can carry out response dialogue in real time by the enquirement of main broadcaster.Need to illustrate , when the multi-modal interactive system of network direct broadcasting platform is under dialogue mode, and no answer information (response dialog text information) During output, virtual robot enter dialogue mode in buffer status.In buffer status, virtual robot is with its animating image Carrier is exported for information, expression and voice messaging are included on direct broadcasting room display interface, filled up due to response time mistake The long time blank for causing live process to occur.Specifically, for example：Virtual robot is in this state, its expression exported Mood is very delight, and the voice content of output is the frequently-used data with audience interaction.

(the third pattern)

In the embodiment of the present application, when system is in dialogue mode, if main broadcaster during live performance to virtual Specific performance form order is said by robot, and virtual robot enters performance basic model state.Wherein, performance form order To include the sentence of following keyword：" singing ", " dancing ", " telling a story " etc..

Fig. 2 c are the performance basic model scene signal of the multi-modal interactive system of network direct broadcasting platform of the embodiment of the present application Figure, as shown in Figure 2 c, when system enters performance basic model, main broadcaster's display interface 1211 and direct broadcasting room user show between main broadcaster Show that interface (2211 ... 22n1) can be ordered performance form from dialogue mode changing interface to performance basic model interface, system Order is parsed, and the order that virtual robot response parses is performed accordingly, and it, will be advance using animating image as carrier Set and match the Video stream information of completion (Video stream information includes：Voice messaging, action message etc.) it is shown in direct broadcasting room and shows Show on interface.Wherein, every kind of performance form order includes the Video stream information of multigroup different content.Specifically, (an implementation Example) under dialogue mode, when main broadcaster says performance form order to virtual robot：" figure figure, you have a dance to everybody！” When, it is dancing that Solutions of Systems, which separates out the performance form order that main broadcaster initiates, is had in some groups of keyword corresponding to " dancing " One group of data is selected in the data of different content at random to be exported so that show on virtual robot direct broadcasting room display interface The state of dancing.(another embodiment) under dialogue mode, when main broadcaster says performance form order to virtual robot：" figure Figure, you can tell a joke story to everybody" when, it is to tell funny stories that Solutions of Systems, which separates out the performance form order that main broadcaster initiates, is being closed Keyword is that some groups corresponding to " saying " ＆ " joke " have and select one group of data in the data of different content at random and exported, So that show the state told funny stories on virtual robot direct broadcasting room display interface.It should be noted that in the embodiment of the present application In, data group number, data content that the application is included for the keyword species of performance form order and each keyword It is not especially limited, the application, which implements personnel, to be adjusted in real time according to the actual requirements.

(the 4th kind of pattern)

In the embodiment of the present application, when said system is under dialogue mode, if main broadcaster presses during live performance Lower and spectators' interactive controlling order button, virtual robot enter and spectators' interactive mode state.

Fig. 2 d are showing with spectators' interactive mode scene for the multi-modal interactive system of network direct broadcasting platform of the embodiment of the present application It is intended to, as shown in Figure 2 d, when system is entered with spectators' interactive mode, main broadcaster's display interface 1211 and direct broadcasting room are used between main broadcaster Family display interface (2211 ... 22n1) stops collection master from dialogue mode changing interface to spectators' interactive mode interface, system 111 live image is broadcast, virtual robot randomly selects several spectators message and answered, and direct broadcasting room display interface can not only It is enough to scroll the text information talked with spectators, the animating image of virtual robot can also be utilized to be returned what spectators left a message Answer statement is bright again reads out, and carries out the synchronism output of audio and word.When main broadcaster, which presses, returns to default mode button, terminate With spectators' interactive mode, stop the video image that collection connects wheat spectators, return to the pattern of playing the fool to help make people laugh.

(the 5th kind of pattern)

In the embodiment of the present application, when said system is under dialogue mode, if main broadcaster presses during live performance Lower and other intelligent robot interactive controlling order buttons, the main broadcaster 111a that another virtual robot is aided in is in the company of receiving wheat After notice, pressing receiving, even wheat request button, the system then enter and other virtual robot interactive mode states.

Fig. 2 e are being interacted with other virtual robots for the multi-modal interactive system of network direct broadcasting platform of the embodiment of the present application Pattern schematic diagram of a scenario, as shown in Figure 2 e, under the scene, virtual robot A and virtual robot B carries out even wheat dialogue.Its In, virtual robot A is the live virtual robot of auxiliary of main broadcaster 111 in the embodiment of the present application, and virtual robot B is and master Broadcast the main broadcaster 111a of the 111 company wheats live virtual robot of auxiliary.Before even wheat pattern, user (211 ... 21n) passes through master The live performance that the direct broadcasting room platform belonging to 111 watches main broadcaster 111 is broadcast, user (211a ... 21na) passes through main broadcaster 111a institutes The direct broadcasting room platform of category watches main broadcaster 111a live performance, connects when main broadcaster 111 sends to main broadcaster 111a with virtual robot Wheat ask, and main broadcaster 111a receive this connect wheat request when, the system enter with other virtual robot interactive modes, it is live Between main broadcaster's display interface (1211 and 1211a) and direct broadcasting room user display interface (2211 ... 22n1 and 211a ... 21na), From dialogue mode changing interface to other virtual robot interactive mode interfaces, user (211 ... 21n) and user (211a ... 21na) can watch the dialogue of the main broadcaster 111 and main broadcaster 111a live virtual robot of affiliated auxiliary simultaneously Journey.In this mode, system stops the live image of collection main broadcaster 111 and main broadcaster 111a, and direct broadcasting room display interface shows two The rolling dialogue text information of virtual robot dialogue.When main broadcaster 111, which presses, returns to default mode button, terminate the pattern, Return to the pattern of playing the fool to help make people laugh.

Fig. 3 is the structural representation of the multi-modal interactive system of network direct broadcasting platform of the embodiment of the present application, as shown in figure 3, The multi-modal interactive system of network direct broadcasting platform includes following key element：Main broadcaster's camera 511, main broadcaster's microphone 512, main broadcaster's master control Button 513, direct broadcasting room platform 300, cloud server 400.

The element of the system is illustrated one by one below.Main broadcaster's camera 511, it gathers main broadcaster's 111 in real time Live image；Main broadcaster's microphone 512, it gathers the live voice messaging of main broadcaster 111 in real time；Main broadcaster's master control button 513, its acceptor 111 order control is broadcast, sends control command signal.Furtherly, main broadcaster's camera 511, main broadcaster's microphone 512 and main broadcaster The live image information collected, live voice messaging and control command signal are transferred to direct broadcasting room by master control button 513 respectively In the data acquisition interface at the main broadcaster end of platform 300.It should be noted that the application is directed to the installation site of main broadcaster's camera, master Broadcast microphone apparatus-form and the output form of installation site and control command (button be control command output form one Individual specific example) it is not especially limited.

With reference to figure 3, direct broadcasting room platform 300 includes live class using main broadcaster end and passes through internet communication with above-mentioned main broadcaster end Live class application user terminal.Live class is configured with api interface using main broadcaster end, and the interface possesses corresponding rule of communication and number According to transformat, virtual robot 1211 is connected in the form of functional interposer by api interface and live class using main broadcaster end Connect, and applied installed in live class in main broadcaster end.Therefore, the plug-in unit of virtual robot 1211 needs to meet that the data of api interface pass Defeated rule, being just loaded into live class application software, (virtual robot plug-in unit is arranged on the live class application in main broadcaster's equipment 121 In software) in, and then carried out in real time with cloud server 400 and live class application user terminal respectively by the Internet transport protocol Information exchange.In addition, the plug-in unit of virtual robot 1211 needs to run simultaneously with live class application software, to realize virtual machine The new auxiliary direct broadcast function of conventional direct broadcasting room platform is added on appended by people.

Cloud server 400, it is connected by internet with the plug-in unit of virtual robot 1211, possesses the memory space of magnanimity With powerful computing capability, efficient computing, storage and analysis can be carried out to substantial amounts of data.In the embodiment of the present application, Virtual robot 1211 makes it possess multi-modal interaction capabilities, example using the powerful calculating of cloud server 400 and storage capacity Such as：The one or more of output character information, voice messaging, visual information and combinations thereof information.

In the embodiment of the present application, virtual robot plug-in unit carries out the defeated of multi-modal data by distinctive animating image Go out, can be that conventional live class application software increases multi-modal interactive function, so as to constitute in the application when it is operated The multi-modal interactive system of network direct broadcasting platform.In the multi-modal interactive system operation of network direct broadcasting platform, it possesses following function： First, main broadcaster's voice letter that it can receive main broadcaster's live information of the transmission of main broadcaster's camera 511, main broadcaster's microphone 512 is sent The control command signal that breath, main broadcaster's master control button 513 are sent, additionally it is possible to which user's word that user's transmission is received by internet is believed Breath, user give a present data and specify even wheat object video information；Second, it is realized by internet and cloud server The mass data received can be carried out analysis in real time by cloud server 400 and calculated by 400 message reference with interacting Processing；Third, its can be by text information from internet to user's Real-time Feedback virtual robot response, voice messaging, animation Information and Video stream information etc..Furtherly, the multi-modal interactive system of network direct broadcasting platform in data processing, also has Standby following function, such as：The side face of main broadcaster 111 is detected, Text region is carried out to voice messaging, system model is carried out Identification, output of response text etc. is carried out to language and characters information (the corresponding text information that voice messaging is converted into).

In the embodiment of the present application, the multi-modal interactive system of network direct broadcasting platform possesses various modes, works as virtual robot After 1211 plug-in units access live class application software, the system enters net cast process, and it can turn between various modes Change.Wherein, the multi-modal interactive system of network direct broadcasting platform includes following pattern：Pattern of playing the fool to help make people laugh and multi-modal interactive mode.Enter one Step, multi-modal interactive mode include：Dialogue mode, performance basic model, with spectators' interactive mode and with other virtual robots Interactive mode.Fig. 4 is the mode transition diagram of the multi-modal interactive system of network direct broadcasting platform of the embodiment of the present application, as shown in figure 4, The mode transition procedure meets following steps.

First, during net cast, show that there is specific vivid virtual robot in predeterminable area, into acquiescence Live auxiliary mode, the multi-modal data of real-time reception direct broadcasting room input and multi-modal instruction.Then, the system analysis is multi-modal Data and multi-modal instruction, using the multi-modal interaction capabilities of virtual robot, differentiate and determine the live auxiliary mode of target.Its In, multi-modal data and multi-modal instruction specifically include following information：Live image information, voice messaging, control command signal, User's text information and user give a present data.Further say, the system according to receive it is live during send it is multi-modal Data, extraction are directed to the wake-up data of virtual robot, into the multi-modal interactive mode of one of which with waking up Data Matching In (in the embodiment of the present application, system is entered in dialogue mode in advance), and perform multimode under current multi-modal state interactive mode State interaction and displaying action；Then, the multi-modal instruction set for patten transformation of main broadcaster is obtained, to above-mentioned multi-modal instruction Function parsing is carried out with multi-modal data, and response modes conversion is set, and it is more to switch to other from current multi-modal state interactive mode Modality interaction pattern is the live auxiliary mode of target.Specifically, first, virtual robot to main broadcaster whether open wake up data enter Row judges, if not opening wake-up data, system is in pattern of playing the fool to help make people laugh；Data are waken up if opening, system enters dialogue In pattern (the multi-modal interactive mode of one of which), then, the multi-modal instruction of system analysis, and utilize multi-modal data, from working as Preceding dialogue mode switches to other live auxiliary modes of multi-modal interactive mode i.e. target.Wherein, data are waken up and refers to that main broadcaster exists Tell the name of virtual robot during live performance, and/or main broadcaster side face is towards virtual robot；Multi-modal data and more Mode order includes：The one or more of text information, voice messaging, visual information and combinations thereof information；In addition, target is live Auxiliary mode including dance, sing, tell a story etc. including performance basic model, with spectators' interactive mode and with other virtual machines Device people's interactive mode.

It should be noted that during net cast, the default mode of system is pattern of playing the fool to help make people laugh.

It should be noted that the application is not especially limited for multi-modal data and the actual species of instruction, implement people Member situation can be adjusted correspondingly to multi-modal data and the content of instruction according to the actual requirements.Finally, when new live auxiliary After pattern terminates, system returns to default mode (pattern of playing the fool to help make people laugh).

Fig. 5 is the module frame chart of the multi-modal interactive system of network direct broadcasting platform of the embodiment of the present application, as shown in figure 5, should System is made up of following equipment：Multi-modal information input module 51, data processing and pattern discrimination module 52 and multi-modal interact Message output module 53.Wherein, multi-modal information input module 51, its gather in real time and receive direct broadcasting room input for live During multi-modal data and for patten transformation set multi-modal instruction, and to difference in functionality information carry out function coding Processing, the multi-modal input data bag for handling completion is forwarded to cloud server 400；Data processing and pattern discrimination module 52, it receives and parses through the multi-modal input data bag of the transmission of multi-modal information input module 51, according to the wake-up number got According to the multi-modal data and instruction that are set for patten transformation, differentiate and determine the live auxiliary mode of target, and then call more Mode interaction capabilities are handled the data under corresponding modes to obtain the multi-modal output data packet for the direct broadcasting room, are passed through Internet sends multi-modal output data packet to multi-modal interactive information output module 53；Multi-modal interactive information output module 53, it opens the live auxiliary mode of target, and virtual robot carries out multi-modal interaction and displaying according to current live auxiliary mode, Multi-modal output data packet is parsed, obtains and exports corresponding system output information under the live auxiliary mode of target.Need to illustrate , data processing performs with the module of pattern discrimination module 52 by cloud server 400, wherein, cloud server 400 possesses more Mode interaction capabilities, it is possible to achieve the function such as the detection of side face, speech recognition, semantic analysis.

The module composition and function of the multi-modal interactive system of network direct broadcasting platform are described in detail below.First, it is right Multi-modal information input module 51 is described in detail.As shown in figure 5, the module is by six acquisition modules (511~516) and letter Breath forwarding module 517 forms.Wherein, the first acquisition module 511, the video letter of its collection main broadcaster performance in real time during live Breath, above- mentioned information is converted into single-frame images form from video format, then frame image information is carried out function coding (for example, its Function coding is that 111), output includes the image input data bag of two field picture function coding and frame image data；Second collection mould Block 512, its during live in real time collection main broadcaster voice messaging, by main broadcaster's voice messaging carry out function coding (for example, its Function coding is that 112), output includes the input voice data bag of phonetic function coding and speech data；3rd acquisition module 513, it gathers the control command signal that main broadcaster's master control area is sent, above-mentioned control command signal is entered in real time during live Row function coding (for example, its function coding is 113), output include control command semiotic function coding and control command signal Order input data bag；4th acquisition module 514, what its real-time collection direct broadcasting room user terminal during live was sent includes seeing The writing text information of crowd's message information and barrage information, above-mentioned writing text information is subjected to function coding (for example, its function It is encoded to 114), output includes the text input packet of writing text informational function coding and text data；5th collection mould Block 515, its spectators that collection direct broadcasting room user terminal is sent in real time during live give a present information, wherein, spectators give a present information Including present code and user's name of giving a present, above-mentioned spectators are given a present into information progress function coding (for example, its function coding is 115), output includes give a present informational function coding and spectators of spectators and given a present the information input data bags of giving a present of data；6th collection Module 516, it is gathered the voice messaging of specific even wheat person (other virtual robots) by internet, above-mentioned video information is entered Row function coding (for example, its function coding is 116), output include the specific even function coding of wheat person's voice messaging and specific company Company's wheat person's video information input data bag of wheat person's image information；Information forwarding module 517, it receives the first~six acquisition module Packet, six kinds of packets that same frequency acquisition is received are integrated, and the coding of packet is carried out to integral data, So as to obtain the new collection information input data bag with data packet coding.Wherein, control command signal includes handing over spectators Mutual command signal (for example, its function coding is 121), with other virtual robot interactive signals (for example, its function coding is 122) the mood command signal of playing the fool to help make people laugh (for example, its function coding is 1231~123n) and including several mood.

Then, data processing and the composition and function of pattern discrimination module 32 are described in detail.With reference to figure 5, the mould Block is by data reception module 521, side face detection module 522, sound identification module 524, pattern discrimination module 523, semantic analysis 529 groups of module 525, bright read through model 526, performance basic model module 527, mode module of playing the fool to help make people laugh 528 and data transmission blocks Into.Next it is described in detail one by one with the function of modules in pattern discrimination module 32 and composition for data processing.

Data reception module 521, it receives the collection information input data bag that above- mentioned information forwarding module 517 is sent, root Parsed according to data packet coding and data work(coding, the form by the data conversion after parsing into function packet, point It is sent in follow-up each module.Wherein, collection information input data is converted into the performance data bag for possessing following Data Identification：Number According to packet encoder, data function coding, data message.Specifically, (one embodiment) is encoded to 122 when parsing data function When, the data content corresponding to it with other virtual robot interactive signals is " 1 ", by the information according to the above-mentioned data possessed Mark is encoded, and is obtained corresponding performance data bag and is transmitted into pattern discrimination module 523；(second embodiment) works as solution Data function is separated out when being encoded to 113, its corresponding data content is the writing text information of spectators' message, by the information according to The above-mentioned Data Identification possessed is encoded, and is obtained corresponding performance data bag and is transmitted into pattern discrimination module 523；(the Three embodiments) when parsing data function and being encoded to 114, its corresponding data content is that spectators give a present information, and this is believed Breath is encoded according to the above-mentioned Data Identification possessed, is obtained corresponding performance data bag and is transmitted to pattern discrimination module 523 In；(the 4th embodiment) when parsing data function and being encoded to 111, its corresponding data content is single-frame images data, The information is encoded according to the above-mentioned Data Identification possessed, obtains corresponding performance data bag and transmit to side face to detect mould In block 522.

Fig. 6 be the embodiment of the present application the multi-modal interactive system of network direct broadcasting platform in side face detection module 522 module Block diagram, as shown in fig. 6, the module includes such as lower unit：Image input units 5221, side face detection unit 5222, side face signal Identifying unit 5223 and data outputting unit 5224.Wherein, image input units 5221, it receives and parses through above-mentioned data receiver The data function that module 521 is sent is encoded to 111 performance data bag, obtains single-frame images data；Side face detection unit 5222, It is detected to the face side face image in single-frame images, exports testing result；Side face signal determining unit 5223, it is based on Above-mentioned side face testing result, outlet side face signal；Data outputting unit 5224, its by side face signal carry out function coding (for example, Its function coding forms new mode decision data bag for 222).

Fig. 7 is the flow that side face detection function is realized in the multi-modal interactive system of network direct broadcasting platform of the embodiment of the present application Figure, as shown in fig. 7, after image input units 5221 get single-frame images data, enters side face detection unit 5222, at this In unit, using the side face image in Adaboost algorithm detection image, according to the face side face cascade sort inspection previously generated Device is surveyed, judges to whether there is side face image in single-frame images, and then exports testing result, and testing result is transmitted to side face and believed Number identifying unit 5223.Then side face signal determining unit 5223 is based on above-mentioned side face testing result, judges side face signal data Content, and export into data outputting unit 5224, wherein, when detecting side face image, the data content of side face signal is " 1 ", when not detecting side face image, the data content of side face signal is " 0 ".When data outputting unit 5224 receives side face After signal data, the data processed result for side face detection module 522 is recompiled, obtains new mode decision number According to bag, wherein, mode decision data bag includes the data marks such as data packet coding, side face semiotic function coding, side face signal data Know.

Furtherly, in side face detection unit 5223, the structure of face side face classification and Detection device, human face data is passed through Storehouse, using the face anglec of rotation as the rotating range that 45 °~90 ° are label side face, to the face characteristic that has extracted according to above-mentioned Rotating range is recalculated, and so as to obtain face side face feature, and then obtains side face feature point according to Adaboost algorithm Class detector.

It should be noted that in embodiments of the present invention, using Adaboost algorithm to the side face in live single-frame images State is detected, and the application is not especially limited for the implementation of face side face detection, can be entered using other method Row substitutes.

Fig. 8 is the module of the sound identification module 524 of the multi-modal interactive system of network direct broadcasting platform of the embodiment of the present application Block diagram, as shown in figure 8, the module includes such as lower unit：Voice-input unit 5241, audio text conversion unit 5242, word Matching unit 5243, language and characters output unit 5244.Wherein, voice-input unit 5241, it receives and parses through above-mentioned data The data function that receiving module 521 is sent is encoded to 112 performance data bag, obtains speech data；Audio text conversion unit 5242, its voice text data for being converted into matching with speech data by above-mentioned speech data；Characters matching unit 5243, according to Default key word information, above-mentioned voice text data is matched, export keyword code；Language and characters output unit 5244, the voice text data obtained for sound identification module 524 and keyword code data are carried out function coding by it, And form new mode decision data bag.

Fig. 9 is the flow that speech identifying function is realized in the multi-modal interactive system of network direct broadcasting platform of the embodiment of the present application Figure, as shown in figure 9, voice-input unit 5241 can obtain data packet coding and voice by after phonetic function resolve packet Data, then speech data is sent into audio text conversion module 5242, perform audio text conversion module 5242.At this , it is necessary to which audio-frequency information is converted into text information in module.

Specifically, above-mentioned transfer process needs to complete following steps：1) voice signal is subjected to Jing Yin section of excision of head and the tail, divided The Signal Pretreatments such as frame；2) using be stored in acoustic model that training in advance in audio text conversion unit 5242 completes and Language model, extract the feature of input voice data；3) acoustic model and language model is reused to enter single frames phonetic feature Row matching；4) semantic understanding database is utilized, above-mentioned matching result is integrated, exports voice identification result (speech text Information).

After main broadcaster's converting voice message into text message, characters matching unit 5243 is entered.In the unit, storage There is the keyword database related to pattern discrimination set in advance, the corresponding keyword code of each keyword, work as detection When occurring the keyword in keyword database into above-mentioned language and characters, code corresponding to the keyword is exported.Wherein, it is crucial Word database can possess keyword related to pattern discrimination as follows, such as：Do not occur keyword (for example, its corresponding code is 212999), " scheming figure ", (main broadcaster 111 aids in the name of live virtual robot, for example, 212001) its corresponding code is, " schemed Figure " " small spirit " (while include the name that can mutually interconnect the live virtual robot of auxiliary corresponding to main broadcaster's difference of wheat interaction, example Such as, its corresponding code for 212006), " singing " (for example, its corresponding code is 212021), " singing " ＆ " May " is (for example, its is right Answer code for 212025), " singing " " Sun Yanzi " (for example, its corresponding code be 212027), " jump " (for example, its corresponding code For 212201), " jump " ＆ " ballet " (for example, its corresponding code be 212014), jump ＆ " peacock dance ", " saying " (for example, it is correspondingly Code be 212401), " telling " ＆ " joke " (for example, its corresponding code is 212412), " telling " ＆ " story " be (for example, its corresponding generation Code for 212420), " saying " ＆ " children's story " (for example, its corresponding code be 212421), " saying " ＆ " history story " (for example, its 212425) etc. corresponding code is.

Finally, language and characters output unit 5244 is performed.The unit receives the data of the transmission of voice-input unit 5241 The keyword generation that the speech text data and characters matching unit 5243 that packet encoder, audio text conversion unit 5242 export are sent After code data, above-mentioned speech text data and keyword code data are first subjected to function coding (for example, speech text data pair The function coding answered is 211,212) function coding corresponding to keyword code data is；Then, sound identification module will be directed to Obtained new data are recompiled, and obtain new mode decision data bag, wherein, the mode decision data bag of this module Include data packet coding, speech text function coding, speech text data, the function coding of keyword code, keyword code The Data Identifications such as data.

Figure 10 is the module of the pattern discrimination module 523 of the multi-modal interactive system of network direct broadcasting platform of the embodiment of the present application Block diagram, as shown in Figure 10, the module are divided into such as lower unit：Data input cell 5231, pattern discrimination unit 5232, data classification Unit 5233 and data classification transmission unit 5234.It is described in detail below for unit in pattern discrimination module 523.

First, data input cell 5231, it can receive the spectators of the transmission of data reception module 521 and give a present informational function Packet, even spectators' message text message performance data bag, wheat person voice messaging performance data bag and various control command work( Can packet, and carried out crucial according to pre- under data, and target auxiliary live-mode in parsing extraction mode decision Response data.Wherein, control command performance data bag is specifically included as issued orders：With spectators' interactive command (for example, its function is compiled Code for 121), with other virtual robot interactive commands (for example, its function coding is 122), very delight comprising the pattern of playing the fool to help make people laugh Including order (for example, its function coding is 1232) and the tranquil order (for example, its function coding is 1235) of pattern of playing the fool to help make people laugh etc. The order of pattern of playing the fool to help make people laugh mood and end dialogue mode order (for example, its function coding is 124).In specific implementation process, The module obtains following data after completing parsing：Including control command data, side face signal, keyword code data etc. Crucial foundation data；Spectators' text message；Even wheat person's voice messaging；Spectators give a present information etc..

Then, pattern discrimination unit 5232 is described in detail.The module is according to the data of data input cell 5231 Analysis result, analytical model crucial foundation data in judging, decision-making system target pattern, obtains corresponding schema code, and Function coding is carried out to the schema code.Specifically, (one embodiment) is when the data content for parsing side face signal is " 0 " With keyword code data be " 212999 " and/or control command signal is that function coding is " when 12301~12320 ", to judge Target pattern is mode state (for example, the function coding for pattern of playing the fool to help make people laugh is 2131) of playing the fool to help make people laugh；(second embodiment) is when parsing The data content of side face signal is " 1 " and/or keyword code data scope is " 212001~212004 " (i.e. keyword bags Containing a virtual robot name) and control command signal when to be function coding be " 120 (no control command signal) ", judge mesh Mark pattern is dialogue mode state (for example, the function coding of dialogue mode is 2132), and locks dialogue mode state；Work as parsing To control command signal be function coding for " 121~124 (for example, the function coding for terminating dialogue mode order is 124) " and/ Or keyword code data scope is " when 212021~212900 " (keyword for including specific show content), mould is talked with releasing Formula lock-out state；(the 3rd embodiment) in the case where being currently at dialogue mode state, parsing keyword code data scope is " when 212021~212900 ", it is to perform basic model state (for example, the function coding of performance basic model to judge target pattern For 2133)；(the 4th embodiment) judges that target pattern is when it is " 121 " to parse control command signal to be function coding With spectators' interactive mode state (for example, being 2134 with the function coding of spectators' interactive mode)；(the 5th embodiment) works as parsing It is that function coding is " 122 " and keyword code data scope is " 212006~212020 " is (i.e. crucial to go out control command signal Word includes two virtual robot names) when, it is (for example, being interacted with spectators with spectators' interactive mode state to judge target pattern 2135) function coding of pattern is.

After judging current live pattern, the output data according to required under present mode enters data into list The data that parse of member 5231 such as are classified, reconfigured at the processing, obtain the pre- response data packet under target pattern, and will The present mode function coding pre- response data packet of pattern corresponding with present mode is sent to data classification transmission unit 5234 In.Specifically, (one embodiment) when determinating mode for play the fool to help make people laugh pattern when, the pre- response data packet of pattern of playing the fool to help make people laugh includes packet Encode, mode capabilities of playing the fool to help make people laugh coding, pattern of playing the fool to help make people laugh mood command code data, spectators give a present information data etc.；(second implementation Example) when determinating mode is dialogue mode, the pre- response data packet of dialogue mode includes data packet coding, dialogue mode function is compiled Code, main broadcaster's speech text information, keyword code data etc.；(the 3rd embodiment) is performance basic model when determinating mode When, the pre- response data packet of performance basic model includes data packet coding, performance basic model function coding, keyword code data Deng；(the 4th embodiment) includes when determinating mode is with spectators' interactive mode with the pre- response data packet of spectators' interactive mode Data packet coding and spectators' interactive mode function coding and spectators' interactive command function coding, keyword code data, spectators Message text data etc.；(the 5th embodiment) is empty with other when determinating mode is with other virtual robot interactive modes Intend robot interactive pattern pre- response data packet include data packet coding, with other virtual robot interactive mode function codings, With other virtual robot interactive command function coding, keyword code data, speech text function coding, speech text data Deng.

Finally, data classification transmission unit 5234, it is according to present mode function coding data, by under its associative mode The pre- response data packet of pattern is distributed in subsequent module.

Figure 11 is the module of the semantic module 525 of the multi-modal interactive system of network direct broadcasting platform of the embodiment of the present application Block diagram, as shown in figure 11, the module possess such as lower unit：Data input cell 5251, reply data search unit 5252, data Text output unit 5253.Wherein, data input cell 5251, it receives and parses through including for the transmission of pattern discrimination module 523 The pre- response data packet of pattern of speech text, obtain speech text information data；Reply data search unit 5252, it is based on The default reply data library searching of unit response text corresponding with input text, exports response text message；Response text Data outputting unit 5253, response text data is subjected to function coding (for example, its function coding is 217), and formed newly Response text response data packet.

Figure 12 is the stream that semantic analysis function is realized in the multi-modal interactive system of network direct broadcasting platform of the embodiment of the present application Cheng Tu is right after data input cell 5251 gets pattern and response data packet comprising language and characters information with reference to figure 12 It is parsed, and extracts the language and characters information data of input.Response unit 5252 is searched for by language and characters information data, is passed through Search engine, using response dialogue data base resource, search the response text data corresponding with the search text of input.Connect , after response text data output unit 5253 obtains response text data, data packet coding will be included, mode capabilities encode Including data, command-control signal coding, the function coding of response text, response text data, keyword code data etc. Data are recompiled, and then obtain new response text response data packet.Wherein, in response dialog database resource construction During, first using the input and output in a large amount of conventional conversation history data and netspeak conversation history data as training number According to, response dialog text model is generated, recycles the substantial amounts of input text in actual application as test data, so that Complete the foundation of response dialogue data base resource.

It is basic below for the bright read through model 526 in data processing and Model checking module 52, performance referring again to Fig. 5 Mode module 527, mode module of playing the fool to help make people laugh 528 and data transmission blocks 529 are described in detail one by one.

Wherein, bright read through model 526, it receives and parses through the response dialogue response packet of the transmission of semantic module 525, Response Text Information Data is extracted, response Text Information Data is converted into audio lattice using the bright read from database of default word Formula, obtain response voice messaging data.Then, response voice messaging data are subjected to function coding (for example, its function coding is 218).Finally, will be encoded comprising bag data, the work(of mode capabilities coded data, command-control signal coding, response speech data Data including energy coding, response voice messaging data, keyword code data etc. are recompiled, and then obtain new answer Answer voice response packet.

Then, basic model module 527 is performed, it receives and parses through the performance base that above-mentioned pattern discrimination module 523 is sent The pre- response data packet of this pattern, acquisition keyword code data (" 212021~212900 "), existed using keyword basic code Video stream data corresponding with keyword code is searched in default performance basic function database, performance basic model is obtained and regards Frequency flow data, and the data are subjected to function coding (for example, its function coding is 215), finally, will be encoded comprising bag data, Perform basic model function coding, keyword code data, the function coding for performing basic model video stream data, performance substantially Data including analogue video flow data etc. are recompiled, and then obtain new performance video flowing response data packet.Wherein, Performance master database is set in advance in performance basic model module 527, if each of which keyword code is all corresponding The related performance Video stream information of dry group, the module randomly choose one group of associated video flow data output.Specifically, in a reality Apply in example, if it is " 212025 " to parse keyword code data, key word information corresponding to the code is " singing " ＆ " Mays My god ", the code data corresponds to the Video stream information of some groups of May songs, therefore can export one group at random.

Then, mode module of playing the fool to help make people laugh 528, it is pre- that it receives and parses through the pattern of playing the fool to help make people laugh that above-mentioned pattern discrimination module 523 is sent Response data packet, play the fool to help make people laugh pattern mood command code data (" 12301~12320 "), spectators of acquisition give a present information data, utilized Pattern of playing the fool to help make people laugh mood command code data, it is default play the fool to help make people laugh performance database in search mood order corresponding to play the fool to help make people laugh it is multi-modal Information (the tri-state combinations matches data of some voice messagings, some action messages, some expression informations under identical mood), Acquisition is played the fool to help make people laugh pattern multi-modal data, and above-mentioned data are recompiled into (for example, its function coding is 216), finally, Data packet coding, mode capabilities of playing the fool to help make people laugh coding, pattern of playing the fool to help make people laugh mood command code data, pattern of playing the fool to help make people laugh multi-modal data will be included Data including function coding, pattern of playing the fool to help make people laugh multi-modal data etc. are recompiled, and then obtain new pattern multimode of playing the fool to help make people laugh State response data packet.Wherein, performance database of playing the fool to help make people laugh is set in advance in mode module 528 of playing the fool to help make people laugh, its each mood Command code all corresponds to some groups of multi-modal informations of playing the fool to help make people laugh, and the module randomly chooses one group of multi-modal information output of playing the fool to help make people laugh.Specifically Ground, in one embodiment, if it is " 12308 " to parse pattern mood command code data of playing the fool to help make people laugh, corresponding to the command code Mood is " thanks ", wherein, thank data group corresponding to mood order to include some voice messagings (for example, " sense in pattern of playing the fool to help make people laugh Thank to * * * concern ", " present that thanks * * * " etc.), some action messages (for example, representing the gesture that thanks, nodding) and if Dry expression information (for example, smile), the module, which can randomly select spectators, gives a present and information and extracts the user that spectators give a present in information Title, the thanks of specific user are carried out, and then it is multi-modal to play the fool to help make people laugh corresponding to generation after above-mentioned tri-state information is randomly choosed Information.

Finally, referring again to Fig. 5, data processing and the data transmission blocks 529 including pattern discrimination module 52 are carried out Describe in detail.In the module, it receives and parses through the response text response data packet of the output of semantic module 525, read aloud The performance video flowing response data packet that the response voice response packet of the output of module 526, performance basic model module 527 are sent With the pattern multi-mode response packet of playing the fool to help make people laugh for the transmission of mode module 528 of playing the fool to help make people laugh, the response with identical data packet encoder is obtained Packet, and integration coding is carried out, obtain including data packet coding, target pattern function coding, response speech data work( Can coding, response speech data, response text data function coding, response text data, performance basic model video stream data Function coding, performance basic model video stream data, pattern of playing the fool to help make people laugh multi-modal data function coding, the multi-modal number of pattern of playing the fool to help make people laugh According to the output response data bag including grade, new output response data bag transmitted by internet defeated to multi-modal interactive information Go out in module 53.

After server 400 completes data processing and pattern discrimination work beyond the clouds, by multi-modal interactive information output module 53 pairs of above-mentioned results carry out further parsing and distribution, as shown in figure 5, multi-modal interactive information output module 53 includes Following module：Information turns to receive module 531, interface output module 532, video flowing output module 533, the and of voice output module 534 Text output module 535.Wherein, information turns to receive module 531, and it receives and parses through data processing and sent with pattern discrimination module 52 Output response data bag, according to the target pattern function coding, response speech data function coding, response textual data parsed According to function coding, perform the function coding of basic model video stream data and pattern multi-modal data function coding of playing the fool to help make people laugh, by mesh Mode capabilities coding is marked to send into interface output module 532；By target pattern function coding, performance basic model video fluxion Sent according to the pattern multi-modal data of playing the fool to help make people laugh to video flowing output module 533；By target pattern function coding and response voice number According to transmission into voice output module 534；Target pattern function coding and response text data are sent to text output module In 535.Interface output module 532, it is according to the function coding of target pattern, by current live critical transition into associative mode Target direct broadcasting room display interface.Video flowing output module 533, it is based on target pattern function coding (pattern of playing the fool to help make people laugh or performance base This pattern), export the performance basic model video stream data under associative mode and pattern multi-modal data of playing the fool to help make people laugh.Voice output mould Block 534, it is based on target pattern function coding, and (dialogue mode interacts mould with spectators' interactive mode or with other auxiliary robots Formula), export response speech data.Text output module 535, it is based on target pattern function coding (dialogue mode or and spectators Interactive mode or with other auxiliary robot interactive modes), export response text data.

It should be noted that in video flowing output module 533, buffer DB is stored with.When information turns to receive module 531 Parse that target pattern is dialogue mode and response speech data and response text data is space-time, to video flowing output module 533 send buffer command signals, after video flowing output module 533 receives buffer command signals, transfer buffered data at random Default some sections of Video stream informations in storehouse, and export.Wherein, cache database, regarded for above-mentioned buffer status is default Frequency flow database, every group of video stream data are provided with the multi-modal information including voice messaging, action message, expression information.

It should be noted that in the embodiment of the present application, the function coding for all inputoutput datas is only this Shen A specific example please, the application implement distinctive mark of the personnel according to practical situations design data function, the present invention This partial content is not specifically limited.

Because the method for the present invention describes what is realized in computer systems.The computer system can for example be set In the control core processor of robot.For example, method described herein can be implemented as to perform with control logic Software, it is performed by the CPU in robot operating system.Function as described herein, which can be implemented as being stored in non-transitory, to be had Programmed instruction set in shape computer-readable medium.When implemented in this fashion, the computer program includes one group of instruction, When group instruction is run by computer, it promotes computer to perform the method that can implement above-mentioned function.FPGA can be temporary When or be permanently mounted in non-transitory tangible computer computer-readable recording medium, such as ROM chip, computer storage, Disk or other storage mediums.

It should be understood that disclosed embodiment of this invention is not limited to specific structure disclosed herein, processing step Or material, and the equivalent substitute for these features that those of ordinary skill in the related art are understood should be extended to.It should also manage Solution, term as used herein are only used for describing the purpose of specific embodiment, and are not intended to limit.

" one embodiment " or " embodiment " mentioned in specification means special characteristic, the structure described in conjunction with the embodiments Or during characteristic is included at least one embodiment of the present invention.Therefore, the phrase " reality that specification various places throughout occurs Apply example " or " embodiment " same embodiment might not be referred both to.

While it is disclosed that embodiment as above, but described content only to facilitate understand the present invention and adopt Embodiment, it is not limited to the present invention.Any those skilled in the art to which this invention pertains, this is not being departed from On the premise of the disclosed spirit and scope of invention, any modification and change can be made in the implementing form and in details, But the scope of patent protection of the present invention, still should be subject to the scope of the claims as defined in the appended claims.

Claims

1. a kind of multi-modal exchange method of virtual robot applied to net cast platform, it is characterised in that the video is straight The application access of virtual robot of platform is broadcast, the virtual robot possesses multi-modal interaction capabilities, the multi-modal interaction side Method comprises the following steps：

Multi-modal information input step, show that there is specific vivid virtual robot in predeterminable area, it is live auxiliary into giving tacit consent to Help pattern, the multi-modal data of real-time reception direct broadcasting room input and multi-modal instruction；

Data processing and pattern discrimination step, parse the multi-modal data and/or the multi-modal instruction, using described virtual The multi-modal interaction capabilities of robot, differentiate and determine the live auxiliary mode of target；

Multi-modal interactive information exports step, opens the live auxiliary mode of target, and the virtual robot is live auxiliary according to target Pattern is helped to carry out multi-modal interaction and displaying.

2. according to the method for claim 1, it is characterised in that the data processing includes with pattern discrimination：

Receive it is live during the multi-modal data, extraction for the virtual robot wake-up data；

Into with the multi-modal interactive mode of one of which of the wake-up Data Matching, and performing current multi-modal state interactive mode Under multi-modal interaction and displaying action.

3. method according to claim 1 or 2, it is characterised in that

The multi-modal interactive mode includes：Dialogue mode, performance basic model, with spectators' interactive mode and with other virtual machines Device people's interactive mode.

4. according to the method in claim 2 or 3, it is characterised in that in the data processing and pattern discrimination, enter one Step,

Obtain the multi-modal instruction set for patten transformation of main broadcaster；

Parse and respond the patten transformation and set, other multi-modal interactive modes are switched to i.e. from current multi-modal state interactive mode The live auxiliary mode of target.

5. according to the method for claim 4, it is characterised in that

The multi-modal data and/or multi-modal instruction include：Text information, voice messaging, visual information, control command information And combinations thereof information one or more.

6. a kind of storage medium, it is stored thereon with the program of the executable method and step as any one of claim 1-5 Code.

7. a kind of multi-modal interactive system of virtual robot applied to net cast platform, it is characterised in that the video is straight The application access of virtual robot of platform is broadcast, the virtual robot possesses multi-modal interaction capabilities, the multi-modal interaction system System is included with lower module：

Multi-modal information input module, show that there is specific vivid virtual robot in predeterminable area, it is live auxiliary into giving tacit consent to Help pattern, the multi-modal data of real-time reception direct broadcasting room input and multi-modal instruction；

Data processing and pattern discrimination module, parse the multi-modal data and the multi-modal instruction, utilize the virtual machine The multi-modal interaction capabilities of device people, differentiate and determine the live auxiliary mode of target；

Multi-modal interactive information output module, opens the live auxiliary mode of target, and the virtual robot is live auxiliary according to target Pattern is helped to carry out multi-modal interaction and displaying.

8. system according to claim 7, it is characterised in that in the data processing and pattern discrimination module,

Based on the multi-modal data, wake-up data of the extraction for the virtual robot；

Into the multi-modal interactive mode of one of which with the wake-up Data Matching, and perform under current multi-modal state interactive mode Multi-modal interaction and displaying action.

9. the system according to claim 7 or 8, it is characterised in that

10. system according to claim 8 or claim 9, it is characterised in that in the data processing and pattern discrimination module, Further,

11. system according to claim 10, it is characterised in that

Institute's modal data and/or the multi-modal instruction include：Text information, voice messaging, visual information, control command The one or more of information and combinations thereof information.