CN110399040B

CN110399040B - Multi-mode interaction method, user terminal equipment, server and system

Info

Publication number: CN110399040B
Application number: CN201910667966.3A
Authority: CN
Inventors: 徐伟刚; 陈熙旻
Original assignee: Yutou Technology Hangzhou Co Ltd
Current assignee: Yutou Technology Hangzhou Co Ltd
Priority date: 2019-07-23
Filing date: 2019-07-23
Publication date: 2023-05-12
Anticipated expiration: 2039-07-23
Also published as: CN110399040A

Abstract

The invention relates to a multi-mode interaction method, user equipment, a server and a system, wherein the method comprises the following steps: one or more equipment ends receive interaction requests of one or more modes input by a user and send the interaction requests to a server; the server receives the interaction request sent by the equipment end, generates one or more interaction commands according to the interaction request, and sends the interaction commands to the equipment end; at least some of the interactive commands include an interactive package or information of the interactive package; and the equipment end receives and executes the interaction command, and the interaction package is dynamically loaded to realize interaction with a user. The multi-mode interaction method, the user terminal equipment, the server and the system are convenient to expand, the upgrading cost is low, and the method and the system have the advantages of being local and cloud.

Description

Multi-mode interaction method, user terminal equipment, server and system

Technical Field

The present invention relates to the field of man-machine interaction technologies, and in particular, to a multi-mode interaction method, a client device, a server, and a system.

Background

In the PC age, the main modalities of human-computer interaction are devices such as mice, keyboards, displays, speakers, etc. In recent years, with technical innovation, especially with development of voice recognition and semantic understanding technologies, the existing interaction modes are greatly expanded, and man-machine interaction of voice modes is becoming popular.

Many manufacturers have introduced intelligent devices such as intelligent speakers, the intelligent devices receive voice input from users and upload the voice input to a voice service in a server side such as a cloud server, the voice service processes the voice input and data and then issues the voice command and data to the devices, and a software development kit (Software Development Kit, abbreviated as SDK, also called a software development kit) at the device side interprets and executes the command. The SDK encapsulates some functions of the device side and invokes with the device side system through an application programming interface (Application Programming Interface, abbreviated as API).

The intelligent device is different from the PC, except that voice interaction is introduced, more importantly, the interaction content is processed by the server and is issued to the device, so that the computing capacity of the server is utilized, and the purpose of operation is achieved by dynamically issuing the interaction content.

Existing device-side based interaction methods, as well as existing server-side based interaction methods, all have a number of drawbacks. For example, the interactive capability based on the device side is determined by the hardware capability of the device and the system capability of the device, and the interactive content cannot be dynamically changed. Although the existing interaction method based on the server side utilizes the computing capability of the cloud and can be dynamically issued to the device side, the existing interaction method based on the server side still has the following disadvantages:

1. The interactive capability is determined by the cloud rendering language capability and the device-side interactive SDK capability. The SDK is integrated into the system of the device, and the upgrade cost is high, so that the expansion of the SDK capability is inconvenient.

2. Regarding display services such as a graphical user interface (Graphical User Interface, abbreviated as GUI, also called as graphical user interface), the manner of rendering the GUI with a cloud is difficult to be unified with the style of the device end due to different GUI systems of different device ends and different display layouts and behaviors of the controls.

Disclosure of Invention

The invention aims to provide a novel multi-mode interaction method, user equipment, a server and a system.

The aim of the invention is achieved by adopting the following technical scheme. The multi-mode interaction method provided by the invention comprises the following steps: one or more equipment ends receive interaction requests of one or more modes input by a user and send the interaction requests to a server; the server receives the interaction request sent by the equipment end, generates one or more interaction commands according to the interaction request, and sends the interaction commands to the equipment end; at least some of the interactive commands include an interactive package or information of the interactive package; and the equipment end receives and executes the interaction command, and the interaction package is dynamically loaded to realize interaction with a user.

The object of the invention can be further achieved by the following technical measures.

The multi-modal interaction method includes that the interaction package comprises one or more objects based on a first language as a first object and instructions for the first object; the dynamically loading the interaction package to enable interaction with a user includes: a dynamic loading mode and a bridging mode are adopted, and a second object bridged with the first object is controlled according to the first object and the instruction aiming at the first object; wherein the second object is an object based on a second language local to the device side.

Before the step of receiving the interaction request of one or more modes input by the user, the multi-mode interaction method further includes: developing in advance at a development end to obtain the interaction package, and sending the interaction package to the server; and the server receives the interaction package uploaded by the development terminal and stores the interaction package.

The foregoing multimodal interaction method, wherein the second object includes one or more second attributes based on the second language and one or more second methods; the step of developing the interactive package at the development end in advance comprises the following steps: obtaining an attribute based on the first language as a first attribute according to the second attribute mapping in a bridging manner, and obtaining a method based on the first language as a first method according to the second method mapping to obtain a bridging object based on the first language, which comprises the first attribute and the first method, as the first object; packaging one or more of the first object, script, and resource data to obtain the interaction package.

In the multi-mode interaction method, the interaction package comprises object data and a script, wherein the object data comprises data for adding an extended object to the equipment end and/or data for updating an original object in the equipment end, and the script comprises an instruction for the extended object and/or an instruction for the original object; the dynamically loading the interaction package to enable interaction with a user includes: and controlling the object of one or more modes according to the interaction package, wherein the method specifically comprises one or more of updating the original object, adding the extension object, operating the original object, operating the updated original object or operating the extension object according to the interaction package.

In the multi-mode interaction method, the information of the interaction package comprises an address of the interaction package; the device side receiving and executing the interaction command further comprises: and before the interaction package is dynamically loaded, acquiring the interaction package according to the address of the interaction package.

In the foregoing multi-modal interaction method, at least some of the interaction commands include version numbers; before the step of obtaining the interactive packet according to the address of the interactive packet, the method further comprises: and searching whether the interaction packet with the same version number exists in the equipment end according to the version number, and if not, acquiring the interaction packet according to the address of the interaction packet.

In the foregoing multi-mode interaction method, the receiving, by the one or more device sides, an interaction request of one or more modes input by a user, and sending the interaction request to a server side includes: receiving an interaction request of a first mode input by a user by utilizing a dynamic engine of the equipment end, and sending the interaction request of the first mode to the service end through the interaction engine, or receiving an interaction request of a second mode input by the user by utilizing an interaction SDK of the equipment end, and sending the interaction request of the second mode to the service end; the generating one or more interaction commands according to the interaction request comprises: the server judges according to the received interaction request, if the interaction request is the interaction request of the first mode, the interaction command comprising the interaction package or the information of the interaction package is generated as the first interaction command, and if the interaction request is the interaction request of the second mode, the second interaction command comprising interaction data of one or more modes and/or addresses of the interaction data and instructions of one or more modes, which are used for the interaction SDK of the equipment end to carry out interaction output, is generated.

In the foregoing multi-mode interaction method, the receiving and executing the interaction command by the device side further includes: judging the received interaction command; if the interaction command is the first interaction command or a part of the interaction command having the first modality, the dynamically loading the interaction package specifically includes: the interaction engine is utilized to acquire and decompress the interaction package, the dynamic engine is utilized to dynamically load and analyze the interaction package, and one or more modal engines in the equipment end are called by the dynamic engine to operate the object of the corresponding mode; and if the interaction command is the second interaction command or part of the interaction command with the second modality, distributing the second interaction command or the part with the second modality to the interaction SDK, and calling one or more modality engines in the equipment end through the interaction SDK to execute the second interaction command so as to realize interaction with a user.

In the foregoing multi-mode interaction method, before the step of sending the interaction command to the device side, the method further includes: integrating one or more interaction commands into a JavaScript object numbered musical notation format; the sending the interaction command to the equipment end comprises the following steps: and sending the interaction command of the JavaScript object numbered musical notation format to the equipment end.

In the foregoing multi-mode interaction method, the interaction command of the JavaScript object notation format includes: a key field for recording a modality type of the interactive command and a value field for recording contents of the interactive command; wherein the value field includes one or more of the version number, the address of the interaction package, the interaction data and/or the address of the interaction data recorded in the form of key-value pairs.

The aim of the invention is also achieved by adopting the following technical scheme. The user terminal device according to the present invention comprises: the user input receiving module is used for receiving an interaction request of one or more modes input by a user and sending the interaction request to the server; the interactive command execution module is used for receiving and executing the interactive command sent by the server; wherein at least some of the interactive commands include an interactive package or information of the interactive package; the interaction command execution module comprises a dynamic loading unit, and the dynamic loading unit is used for dynamically loading the interaction package to realize interaction with a user.

The user side device comprises one or more objects based on a first language as first objects and instructions for the first objects; the dynamic loading unit is specifically configured to: a dynamic loading mode and a bridging mode are adopted, and a second object bridged with the first object is controlled according to the first object and the instruction aiming at the first object; wherein the second object is a second language-based object local to the client device.

The foregoing client device, the interaction package includes object data and a script, where the object data includes data for adding an extended object to the client device and/or data for updating an original object in the client device, and the script includes an instruction for the extended object and/or an instruction for the original object; the dynamic loading unit is specifically configured to: and controlling the object of one or more modes according to the interaction package, wherein the method specifically comprises one or more of updating the original object, adding the extension object, operating the original object, operating the updated original object or operating the extension object according to the interaction package.

The information of the interaction package comprises the address of the interaction package; the interactive command execution module further comprises an interactive packet acquisition unit, which is used for acquiring the interactive packet according to the address of the interactive packet.

At least some of the interactive commands of the user equipment include version numbers; the client device further comprises a version number judging unit, which is used for searching whether the interactive packet with the same version number exists in the client device according to the version number, and if not, notifying the interactive packet obtaining unit to obtain the interactive packet according to the address of the interactive packet.

In the foregoing client device, the user input receiving module includes a first mode input receiving unit and a second mode input receiving unit; the first-modality input receiving unit is used for receiving an interaction request of a first modality input by a user by utilizing a dynamic engine, and sending the interaction request of the first modality to the server through the interaction engine so that the server can generate the interaction command comprising the interaction package or the information of the interaction package as a first interaction command according to the interaction request of the first modality; the second-modality input receiving unit is configured to receive an interaction request of a second modality input by a user by using an interaction SDK, and send the interaction request of the second modality to the server, so that the server generates a second interaction command including interaction data of one or more modalities and/or addresses of the interaction data for the interaction SDK of the user terminal device to perform interaction output, and instructions of one or more modalities according to the interaction request of the second modality; the interactive command execution module further comprises an interactive command receiving unit, which is used for receiving the interactive command sent by the server, and the interactive command receiving unit is used for receiving the first interactive command or the second interactive command.

In the foregoing client device, the interactive command execution module further includes: the judging unit is used for judging the received interaction command, informing the dynamic loading unit to process if the interaction command is the first interaction command or the part with the first mode in the interaction command, and informing the interaction SDK executing unit to process if the interaction command is the second interaction command or the part with the second mode in the interaction command; the dynamic loading unit is specifically configured to: the interaction engine is utilized to acquire and decompress the interaction package, the dynamic engine is utilized to dynamically load and analyze the interaction package, and one or more modal engines in the user side equipment are called by the dynamic engine to operate the object of the corresponding mode; the interactive SDK execution unit is used for: and distributing the second interaction command or the part related to the second modality to an interaction SDK of the user side equipment, and calling one or more modality engines in the user side equipment through the interaction SDK to execute the second interaction command so as to realize interaction with a user.

In the foregoing client device, the interaction command execution module is specifically configured to: receiving and executing an interaction command in a JavaScript object numbered musical notation format; the JavaScript object numbered musical notation format interaction command comprises a key field for recording the modal type of the interaction command and a value field for recording the content of the interaction command; the value field includes one or more of the version number, the address of the interaction package, the interaction data, and/or the address of the interaction data recorded in the form of a key-value pair.

The aim of the invention is also achieved by adopting the following technical scheme. The server according to the present invention comprises: the system comprises an interaction request receiving module, a processing module and a processing module, wherein the interaction request receiving module is used for receiving interaction requests of one or more modes sent by one or more user side devices; the interactive command generation module is used for generating one or more interactive commands according to the interactive request; the interactive command sending module is used for sending the interactive command to the user terminal equipment; wherein at least some of the interactive commands include an interactive package or information of the interactive package.

The aforementioned server, wherein the interaction package comprises one or more objects based on a first language as a first object, and instructions for the first object; wherein the first object is a bridging object of a second object, and the second object is a second language-based object local to the client device.

The server further comprises an interaction package storage module, wherein the interaction package storage module is used for receiving the interaction package uploaded by the development terminal and storing the interaction package.

The foregoing server, the interaction package includes object data and a script, where the object data includes data for adding an extended object to the client device and/or data for updating an original object in the client device, and the script includes an instruction for the extended object and/or an instruction for the original object.

The foregoing server, the interaction request receiving module is specifically configured to: receiving an interaction request of a first mode or an interaction request of a second mode sent by the user side equipment; wherein the first modality interaction request is received through a dynamic engine of the client device, and the second modality interaction request is received through an interaction SDK of the client device; the interactive command generating module comprises one or more of a first interactive command generating unit and a second interactive command generating unit; the first interaction command generating unit is used for generating the interaction command comprising the interaction package or the information of the interaction package as a first interaction command according to the first modal request; the second interaction command generating unit is configured to generate, according to the second modality request, a second interaction command including interaction data of one or more modalities for the interaction SDK of the user side device to perform interaction output and/or an address of the interaction data, and instructions of the one or more modalities.

The server further comprises a unified format module, wherein the unified format module is used for integrating one or more interaction commands into a JavaScript object numbered musical notation format; the interactive command sending module is specifically configured to send an interactive command in the JavaScript object numbered musical notation format to the user equipment; the JavaScript object numbered musical notation format interaction command comprises a key field for recording the modal type of the interaction command and a value field for recording the content of the interaction command; the value field includes one or more of the version number, the address of the interaction package, the interaction data, and/or the address of the interaction data recorded in the form of a key-value pair.

The aim of the invention is also achieved by adopting the following technical scheme. According to the multi-mode interaction system provided by the invention, the system comprises at least one equipment end and at least one service end, and any multi-mode interaction method is realized.

Compared with the prior art, the invention has obvious advantages and beneficial effects. By means of the technical scheme, the multi-mode interaction method, the user equipment, the server and the system provided by the invention have at least the following advantages and beneficial effects:

(1) According to the method, the dynamic loading technology of the dynamic engine such as JavaScript is utilized to upload the locally developed multi-mode interaction package to the server, the instruction issued to the device is the download address of the package, the device can download the multi-mode interaction package from the download address, the dynamically loaded application range is expanded from the interaction mode of the device to the interaction mode of combining the device and the server, and the mode of locally developing the cloud deployment is increased on the basis that the multi-mode interaction has the modes of purely local and purely cloud, and meanwhile, the method has the advantages of locally developing the cloud;

(2) The interactive package is developed in a local development environment, and is in butt joint with a local system in a bridging mode, so that the expansion of the capacity is independent of the version update of the SDK, the expansion is convenient, and the upgrading cost is low;

(3) According to the invention, through a bridging mode, different languages can be opened, and the running environments of the different languages can be opened, so that the interactive content issued by the server is in butt joint with the local system of the equipment end; taking GUI as an example, the invention can enable the GUI control issued by the server to be in butt joint with various local control systems of a plurality of equipment ends and be unified with the local control system style;

(4) The invention can expand the interactive capability by using the interactive package to expand the object for the equipment end or update the original object.

The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention, as well as the preferred embodiments thereof, together with the following detailed description of the invention given in conjunction with the accompanying drawings.

Drawings

FIG. 1 is a flow diagram of a prior art multi-modal interaction method;

FIG. 2 is a flow diagram of a multi-modal interaction method of one embodiment of the invention;

FIG. 3 is a flow chart of a multi-modal interaction method according to another embodiment of the invention;

fig. 4 is a schematic structural diagram of a ue according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a server according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a multi-modal interaction system in accordance with one embodiment of the invention.

Detailed Description

In order to further describe the technical means and effects adopted to achieve the preset purpose of the present invention, the following detailed description refers to the specific implementation, structure, features and effects of the multi-modal interaction method, the client device, the server and the system according to the present invention with reference to the accompanying drawings and preferred embodiments.

Modalities herein, also referred to as interactive modalities, include speech modalities, video modalities, display modalities, and the like.

Fig. 1 is a schematic flow chart of a conventional server-based multi-modal interaction method. Referring to fig. 1, an interactive SDK at a device side obtains user input and uploads the user input to a skill service at a server side; the skill service of the service end performs unified decision processing based on the service and the scene, and comprises the steps of requesting services of multiple modes such as voice, video, display and the like to perform corresponding processing such as voice recognition, video processing and the like according to received user input so as to respectively obtain data such as voice stream, video stream, image, display layout and the like, performing unified processing on the data returned by the services of multiple modes, and then issuing commands and data for the equipment end to output to a user to an interaction engine of the equipment end; the interaction engine of the equipment end interprets the command and distributes the command to the interaction SDK; the interactive SDK invokes the speech, video, display, etc. engines in the device-side system to interact with the end user.

The ability provided by the interactive system, interactive application, and interacting with the outside world is referred to herein as interactive ability, or skill. Taking sound box equipment as an example, the traditional sound box only has the interactive capability of receiving input of the mode of key operation, controlling audio playing and the like; compared with the traditional sound box, the intelligent sound box has more capabilities, such as the capability of receiving the input of a voice mode, the capability of controlling other devices such as a desk lamp and the like.

Note that in the example shown in fig. 1, the server issues a fixed protocol. The fixed protocol is also called a static protocol, and refers to the content and format of a developed program such as an executable program, a library, and the like. The capabilities specified by the fixed protocol are determined by the device side interactive SDK capabilities. Fixed protocols to take into account versatility, it is difficult to exploit the local special capabilities of each modality of each device, the extension of the capabilities being dependent on the version update of the SDK.

FIG. 2 is a schematic flow diagram of one embodiment of a multimodal interaction method of the present invention. FIG. 3 is a schematic flow chart of another embodiment of the multi-modal interaction method of the present invention. Referring to fig. 2 and 3, the multi-modal interaction method of the present invention mainly includes the following steps:

in step S11, one or more device sides receive an interaction request of one or more modalities input by a user, and send the interaction request to a server. It should be noted that the interaction request is a signal transmitted during the process of inputting to the system by the user of the man-machine interaction, so that the interaction request is also called as interaction input, and the device side is also called as client side.

In step S12, the server receives the interaction request sent by the device, generates one or more interaction commands according to the interaction request, and sends the interaction commands to the device for the device to output interactively according to the interaction commands. It should be noted that the interactive command is a signal transmitted during feedback from the human-computer interaction system to the user, and thus the interactive command is also referred to as an interactive output. Wherein at least some of the interactive commands include an interactive package or information of an interactive package. Optionally, the information of the interaction package includes a download address of the interaction package. Optionally, the interaction command further includes a field for recording a modality type corresponding to the interaction command.

In step S13, the device side receives and executes the interaction command sent by the server side, so as to implement interaction with the user. The device side executing the interaction command specifically includes: the interaction package is dynamically loaded to enable interaction with the user.

Dynamic loading is defined as loading executable files that the computer system itself does not exist in and running code logic in these files when the program is running. Optionally, the interaction package is dynamically loaded with a dynamic engine to enable interaction with the user. Optionally, the dynamic engine enables interaction with the user by invoking one or more modal engines in the device-side system, such as a speech engine, a video engine, a GUI engine (graphical user interface engine, also referred to as a display engine), and the like.

In some embodiments of the invention, the interaction package includes a script that includes instructions issued to the device side. It should be noted that the present invention is not limited to the programming language of the interaction package, but is generally a script language, for example, may be JavaScript language (abbreviated as JS), or may be Lua language. It should also be noted that the present invention is not limited to the type of dynamic engine that is specifically used, but is generally an engine that corresponds to the programming language of the interaction package.

It should be noted that the "interactive package" and the "interactive SDK" shown in the present invention are different. The interactive SDK is a module for executing interactive instructions on the device and is statically installed in the device. The interactive package of the invention is interactive content and is dynamically issued into the device. The interactive packages can be considered as part of an extension of the SDK, which is advantageous over the SDK in that dynamic updates are dynamically loaded. Dynamic loading and dynamic updating of interaction packages may be achieved, for example, by a dynamic engine. For this reason, the interaction package may also be referred to as an SDK dynamic plug-in.

In an alternative example, the programming language of the interaction package is JavaScript language, the dynamic engine is JavaScript engine, and the dynamic loading capability is the capability of the JavaScript engine to dynamically run JavaScript. In fact, existing mainstream JavaScript engines have this capability.

Optionally, the input to the client is handled separately for each device, each modality. All the inputs are uploaded to the server, and the server performs unified decision processing and output of the packaging and distributing equipment.

According to the multi-mode interaction method, skills and applications are dynamically issued by utilizing a dynamic loading technology, and the issued protocols are not fixed protocols but dynamic protocols, namely, the formats and contents of issued interaction commands are not fixed, so that the interaction capacity is expanded and updated by dynamically issuing interaction packages.

In some embodiments of the invention, the interaction package includes object data and a script. Further, the interaction package also includes resource data. The object data includes data for adding an extended object to the device side and/or data for updating an original object in the device side. The script includes instructions for the extended object, and/or instructions for the original object. Optionally, the script also includes generic logic such as pure mathematical calculations for object instructions built into the dynamic engine. The dynamically loading interactive package in the step S13 includes: the method comprises the steps of controlling the object of one or more modes according to the interaction package, and specifically comprises one or more of updating the original object, adding the extension object, operating the original object, operating the updated original object or operating the extension object according to the interaction package. Optionally, the object to be operated is operated by calling a modality engine in the device side corresponding to the object.

According to the multi-mode interaction method, the interaction capacity can be expanded by utilizing the interaction package to expand the object for the equipment end or update the original object.

It should be noted that although fig. 3 only shows a "speech engine, video engine, display engine," other types of modality engines may also be included. Indeed, in embodiments of the present invention that utilize an interaction package to add an extension object, the JavaScript engine may also invoke an engine corresponding to the extension object to operate on the extension object.

In some embodiments of the present invention, the interactive command includes an address of the interactive packet. The aforementioned step S13 further includes: before the interactive package is dynamically loaded, the interactive package is obtained according to the address of the interactive package in the interactive command. The interactive command issued to the device may be the download address of the interactive package or the complete interactive package, but is generally the download address of the interactive package, so as to save traffic.

Optionally, the interaction engine at the device end is utilized to receive and interpret the interaction commands and download the interaction package.

In some embodiments of the invention, at least some of the interactive commands include a version number for indicating a version of the interactive package, the interactive command, or the protocol format. Before the step of downloading the interaction package according to the address of the interaction package in step S13, the multi-modal interaction method of the present invention further includes: and searching whether the interaction package with the same version number exists in the local area of the equipment terminal according to the version number in the interaction command, and if the interaction package does not exist or the version numbers are different, acquiring the downloaded interaction package according to the address of the interaction package. Alternatively, the step may be performed using an interaction engine at the device side. According to the multi-mode interaction method, the version number field is added in the interaction command, so that the protocol can be modified or expanded conveniently, and the compatibility is improved.

In some embodiments of the invention, the interaction package is based on a first language. And the aforementioned step S13 specifically includes: the interaction package based on the first language is executed by adopting a dynamic loading mode and adopting a bridging mode, and the interaction package is used for controlling the local object based on the second language of the equipment end according to the interaction package so as to realize interaction with a user. In particular, this step may be implemented with an engine of a first language that supports dynamic loading and bridging. Generally, the capability of the first language is provided by its engine, and the capability of the first language to communicate with the local second language is provided by the engine of the first language, and the first language can have the capability of the second language. In practice, the mainstream JavaScript engine has this bridging capability.

The bridging method is a method for connecting two objects together and avoiding strong coupling between the two objects, and the two objects are connected through a bridge while allowing independent changes of the two objects. The multi-mode interaction method of the invention realizes object expansion by utilizing a bridging mode, and can directly call a local method for the object expansion attribute.

Further, in some embodiments of the present invention, the interaction package includes one or more first language-based objects as the first object, and instructions for the first object. Optionally, the instructions for the first object are also based on the first language. Optionally, the interaction package further comprises resource data. The dynamically loaded interactive package in step S13 specifically includes: and controlling a second object bridged with the first object according to the first object and the instruction aiming at the first object by adopting a dynamic loading mode and adopting a bridging mode. Wherein the second object is a device-side local second language-based object. It should be noted that the first object and the second object may be the original objects described above, or may be the extended objects described above.

In some examples, controlling the native language-based object in accordance with the JavaScript script and the bridged JavaScript object is implemented by a JavaScript engine. Specifically, the JavaScript engine provides a registration function, and a user of the JavaScript engine registers a binding relationship between the JavaScript object and the local object. The JavaScript engine maintains all binding relationships through the mapping table. The user may be a device side or a multi-modal interactive system.

It should be noted that, for the second language, which is a development language of the local object at the device side, the present invention is not limited in type, for example, in the case where the first language is JavaScript, the second language may be c++, java, or the like. As an optional example, taking the display modality as an example, the foregoing object based on the first language is a JavaScript object, and the foregoing object based on the second language local to the device side may be a GUI control based on c++ or Java language local to the device side.

According to the multi-mode interaction method, different languages can be opened by utilizing a bridging mode, and the running environments of the different languages are opened, so that the interaction content issued by the server side is in butt joint with the local system of the equipment side. Taking GUI as an example, the invention can enable the GUI control issued by the server to be in butt joint with various local control systems of a plurality of equipment ends and be unified with the local control system style.

In some embodiments of the present invention, before step S11, the multi-modal interaction method of the present invention further includes:

step S21, development is performed at a development end in advance to obtain an interaction package, and the interaction package is sent to a server. Alternatively, the development terminal may be a development platform such as a personal computer (personal computer, abbreviated as PC) of a desktop computer, a notebook computer, or the like.

In step S22, the server receives the interaction packet uploaded by the development terminal, and stores the interaction packet. Alternatively, the interaction package may be sent to a storage service of the server for storage. Alternatively, in the case where the first language is JavaScript, the storage service is specifically a JavaScript script storage service.

In some embodiments of the invention, the second object includes one or more second attributes based on the second language and one or more second methods. The step S21 specifically includes:

step S31, mapping according to a second attribute to obtain a corresponding attribute based on a first language as a first attribute, mapping according to a second method to obtain a corresponding method based on the first language as a first method, so as to obtain a bridging object containing the first attribute and a second object based on the first language of the first method as a first object;

Step S32, packaging the one or more bridged first objects, the script, and the resource data to obtain an interaction package.

In some embodiments of the present invention, for example, in the case where the second object is a newly added extension object, the second object is developed at the development end, and then the foregoing steps S31 and S32 are performed.

Optionally, the interaction package of the JavaScript script is completely generated by the development terminal. The server is only responsible for storing and forwarding, and the interaction package of the JavaScript script is not required to be generated by utilizing services of multiple modes such as voice service, video service, display service and the like of the server.

As a specific embodiment, for the case where the display modality and the first language are JavaScript, the aforementioned second object is a GUI control local to the device side, and the aforementioned first object is a JavaScript object bridged with the GUI control. And developing a GUI control at a development end, wherein the GUI control comprises layout attributes, resource data such as picture resources, text resources and the like, and an interaction method. At the development end, the layout attribute is mapped into the JavaScript object attribute with the same name and the interaction method is mapped into the JavaScript object method with the same name in a bridging mode. And then packaging the JavaScript object attribute and the JavaScript object method into a JavaScript object. And then, packaging and compressing one or more bridged control objects, resource data and JavaScript scripts to obtain the JavaScript interaction package. And finally, the JavaScript interaction packet is sent to a server and stored in the server.

According to the multi-mode interaction method of the embodiment of the present invention, the bridged object is loaded and executed by using the dynamic engine with dynamic loading dynamic updating capability and bridging capability in the step S13, so that the first language also has the capability of the second language.

In some embodiments of the invention, as shown in FIG. 3, the interaction engine is used to: and obtaining the interaction package, and decompressing to obtain the object data, the JavaScript script and the resource data. The JavaScript engine is used for: and analyzing and executing the JavaScript script. And if the JavaScript script has an object for operating a certain mode, calling a method of the mode engine. The modality engine is an executor of a second object, which is the content that is executed by it.

In some embodiments of the invention, different modes of interaction may be performed simultaneously.

Specifically, the foregoing step S11 specifically includes one or more of the following:

receiving an interaction request of a first mode input by a user by using a dynamic engine of a first language of the equipment end, and sending the interaction request of the first mode to the server end through the interaction engine;

And receiving an interaction request of a second mode input by a user by utilizing the interaction SDK of the equipment end, and sending the interaction request of the second mode to the service end.

Optionally, the dynamic engine provides a registration listening mechanism for the developer, which inputs are decided by the developer to be processed by the dynamic engine. For example, the user input processed by the dynamic engine may be selected based on the scene or based on the interaction modality category.

The step of generating one or more interaction commands by the server in step S12 according to the interaction request specifically includes:

the server judges according to the received interaction request;

if the interaction request of the first mode is received, generating an interaction command which is used for the equipment end to carry out interaction output through a dynamic engine and comprises an interaction package or information of the interaction package as a first interaction command; optionally, generating the first interaction command according to the interaction request of the first modality and by requesting the aforementioned storage service;

if the interaction request of the second mode is received, generating a second interaction command, wherein the second interaction command comprises one or more interaction data and/or addresses of the interaction data for the equipment end to carry out interaction output through the interaction SDK and one or more mode instructions; optionally, one or more types of interaction data such as voice stream, video stream, image, display layout and the like are obtained according to the interaction request of the second modality and by requesting one or more types of services such as voice, video, display and the like through the skill service, so as to obtain the second interaction command. Optionally, the second interactive command is an interactive command based on a fixed protocol.

Further, the step S13 further includes:

the equipment end judges the received interaction command, including judging whether the received interaction command is a first interaction command or a second interaction command, or judging whether the received interaction command has a part related to a first mode or a part related to a second mode; alternatively, the determination may be made using an interaction engine at the device side;

if the received interaction command is a first interaction command or a part of the received interaction command having a first mode, downloading the interaction packet to a local area according to the address of the interaction packet, and dynamically loading the interaction packet by using a dynamic engine to realize interaction with a user, wherein the method specifically comprises the following steps: the method comprises the steps of obtaining and decompressing an interaction package by using an interaction engine of a device end, dynamically loading and analyzing the interaction package by using a dynamic engine of the device end, and calling one or more modal engines in the device end to operate an object of a corresponding modality by using the dynamic engine;

if the received interaction command is a second interaction command or part of the received interaction command with respect to a second modality, the second interaction command or the part with respect to the second modality is distributed to an interaction SDK at the device side, and one or more modality engines such as a voice engine, a video engine, a GUI engine (graphical user interface engine, which may also be referred to as a display engine) in the system at the device side are called through the interaction SDK to execute the second interaction command to implement interaction with the user.

In some embodiments of the present invention, before the step of sending the interaction command to the device side in the step S12, the method further includes: the server integrates one or more interactive commands into an interactive command in a unified format. Optionally, one or more interactive commands with respect to one or more modalities, or the first interactive command and the second interactive command, are integrated into an interactive command in a unified format. The step S12 of sending the interaction command to the device specifically includes: and sending the integrated interaction command with the uniform format to the equipment end. As an alternative embodiment, the unified format is JavaScript object notation (JavaScript Object Notation, abbreviated JSON) format. Specifically, the interactive command in the JavaScript object notation format is in the form of a key-value pair (or referred to as key-value form), including a key field for recording the modality type of the interactive command and a value field for recording the content of the interactive command. Further, the value field includes a record in the form of key-value pairs: a version number, an address of the interaction package, interaction data, and/or one or more of addresses of the interaction data.

Note that JSON is a recognized standard data format, most programming languages such as c++, java, lua and the like support JSON, and the parsing libraries of JSON are also many, so that c++, java have many popular JSON libraries, and the implementation of an interaction engine is facilitated.

As a specific example, the present invention proposes an interaction command in JSON standard format, as follows:

{

"type":data,

……

}。

where "type" data is a key value pair used to represent an interactive command. Wherein type is a key field for recording a modality type of the interactive command (name of the interactive modality), such as a voice modality, a video modality, a display modality, and the like. Optionally, the type is a string format. data is a value field for recording data required for interaction. Optionally, the data is also in JSON format, including one or more key-value pairs, one key-value pair in the data being used to record the type and specific value of one data.

For example, one format of an interactive command regarding a display modality is:

wherein the display field is a key field of this command, indicating that a display command is being processed. The contents of the brackets "[ ]" are the value fields of this command. The value field of the entire command in turn includes a plurality of key-value pairs: version this key represents the version number; width is the key value pair, height is the display height, which is to facilitate the adaptation of the screen layout of the device side display system; the url is the download address of the GUI interactive package.

For another example, one format of an interactive command for a voice modality is:

wherein the voice field indicates that a voice command is being processed; version key value represents version number; text key value pairs are phonetic text; the url key pair is the download address of the voice file. Note that the voice file is different from the voice text, which is text for recording the content of the voice stream, and the voice file is a voice stream of the voice text.

Note that in the foregoing example, the interactive command regarding the display modality is the aforementioned first interactive command in which the download address of the interactive package is recorded; the interactive command regarding the voice modality is the aforementioned second interactive command, in which specific contents of the interactive data or an address of the interactive data are recorded.

According to the multi-mode interaction method, the interaction command is integrated into the JSON format, so that the mode can be conveniently expanded, and data required by interaction can be conveniently expanded.

It should be noted that integrating the interactive command into the JSON format is only a preferred embodiment of the present invention, and the present invention does not necessarily include the step of integrating the interactive command, nor does the integrated format have to be the JSON format. For example, it may also be implemented with a fixed command value: for example, a fixed integer is used to indicate that the device is performing different functions, such as 1 for playing music, 2 for lighting, etc.

It should be noted that the present invention is not limited to the type of the device side and the server side. Optionally, the foregoing server is a server. The equipment end is user end equipment and comprises products with multi-mode interaction such as smart phones, smart sound boxes, set top boxes, smart glasses, smart watches, mobile computers and the like.

Alternatively, the multi-mode interaction method provided by the invention can be implemented by software such as Application (APP). The device side and the service side are provided with software for realizing the multi-mode interaction method provided by the invention, so that the steps shown in the previous embodiment can be realized by utilizing the device side and the service side.

In some embodiments of the invention, interaction of GUI modalities may be implemented using web means. Specifically, the invention implements interaction by embedding webview in the interaction SDK. Wherein the webview can be regarded as a page browser.

Note that web pages (also called web pages) are typically composed of html+css language, and a page browser is an engine that exposes web pages. The html language supports embedding JavaScript language scripts with < script > tags. JavaScript scripts run in the browser sandbox and can access browser-provided capabilities, such as an operating window, by manipulating JavaScript objects built into the browser.

Fig. 4 is a schematic block diagram of a ue 100 according to an embodiment of the present invention. The user terminal device 100 may be a product with multi-modal interaction, such as a smart phone, a smart speaker, a set top box, smart glasses, a smart watch, a mobile computer, and the like. Fig. 5 is a schematic block diagram of a server 200 according to an embodiment of the present invention.

Referring to fig. 4 and 5, in some embodiments, the client device 100 according to the present invention mainly includes a user input receiving module 110 and an interactive command executing module 120.

The user input receiving module 110 is configured to receive an interaction request of one or more modalities input by a user, and send the interaction request to the server 200.

The interactive command execution module 120 is configured to receive and execute interactive commands issued by the server 200. Wherein at least some of the interactive commands include an interactive package or information of an interactive package.

Further, the interactive command execution module 120 includes a dynamic loading unit (not shown in the figure). The dynamic loading unit is used for dynamically loading the interaction package to realize interaction with a user.

In some embodiments of the invention, the interaction package includes object data and a script. Further, the interaction package also includes resource data. The object data includes data for adding an extended object to the client device 100 and/or data for updating an original object in the client device 100. The script includes instructions for the extended object, and/or instructions for the original object. Optionally, the script also includes generic logic such as pure mathematical calculations for object instructions built into the dynamic engine. The aforementioned dynamic loading unit is specifically configured to: the method comprises the steps of controlling the object of one or more modes according to the interaction package, and specifically comprises one or more of updating the original object, adding the extension object, operating the original object, operating the updated original object or operating the extension object according to the interaction package. Optionally, the object to be operated is operated by invoking a modality engine in the user side device 100 corresponding to the object.

In some embodiments of the present invention, the information of the interaction packet includes an address of the interaction packet. The interactive command execution module 120 further includes an interactive packet acquisition unit, configured to acquire an interactive packet according to an address of the interactive packet.

In some embodiments of the invention, at least some of the interactive commands include a version number. The client device 100 further includes a version number determining unit, configured to: whether the interactive packets with the same version number exist in the user terminal device 100 is searched according to the version number, and if the interactive packets do not exist or the version numbers are different, the interactive packet acquisition unit is informed to acquire the interactive packets according to the addresses of the interactive packets.

In some embodiments of the invention, the interaction package is based on a first language. The aforementioned dynamic loading unit is specifically configured to: the interaction package is executed by using a dynamic engine in a dynamic loading manner and in a bridging manner, so as to control the local object based on the second language of the client device 100 according to the interaction package. Wherein the dynamic engine is an engine of a first language that supports dynamic loading and bridging.

Further, in some embodiments of the present invention, the interaction package includes one or more first language-based objects as the first object, and instructions for the first object. Optionally, the instructions for the first object are also based on the first language. Optionally, the interaction package further comprises resource data. The aforementioned dynamic loading unit is specifically configured to: and controlling a second object bridged with the first object according to the first object and the instruction aiming at the first object by adopting a dynamic loading mode and adopting a bridging mode. Wherein the second object is a second language-based object local to the client device 100.

In some embodiments of the present invention, the user input receiving module 110 includes a first modality input receiving unit and a second modality input receiving unit. The first modality input receiving unit is configured to: the dynamic engine is utilized to receive an interaction request of a first mode input by a user, and the interaction request of the first mode is sent to the server 200 through the interaction engine, so that the server 200 generates the interaction command including the interaction package or the information of the interaction package according to the interaction request of the first mode as a first interaction command. The second modality input receiving unit is configured to: and receiving an interaction request of a second mode input by a user by utilizing the interaction SDK, and sending the interaction request of the second mode to the server 200 so that the server 200 can generate a second interaction command according to the interaction request of the second mode. The second interaction command includes interaction data of one or more modes and/or addresses of the interaction data for the interaction SDK of the user equipment 100 to perform interaction output, and instructions of one or more modes. The interactive command execution module 120 further includes an interactive command receiving unit for: receiving the interaction command issued by the server 200 includes receiving the first interaction command or the second interaction command described above.

In some embodiments of the present invention, the interactive command execution module 120 further includes a judgment unit and an interactive SDK execution unit.

The judging unit is used for: judging the received interaction command, specifically comprising judging whether the received interaction command is a first interaction command or a second interaction command or judging whether the received interaction command has a part related to a first mode or a part related to a second mode; if the received interaction command is a first interaction command or a part of the interaction command with a first mode, notifying an interaction packet acquisition unit and/or a dynamic loading unit to process; if the received interactive command is a second interactive command or a part of the interactive command with a second mode, notifying the interactive SDK executing unit to process.

In some embodiments of the present invention, the interactive packet obtaining unit is specifically configured to: the interaction package is obtained and decompressed by the interaction engine of the client device 100. The dynamic loading unit is specifically used for: the interaction package is dynamically loaded and parsed by the dynamic engine of the client device 100, and one or more modality engines in the client device 100 are invoked by the dynamic engine to operate objects of the corresponding modality.

The interactive SDK execution unit is used for: the second interaction command or a portion related to the second modality is distributed to the interaction SDK of the user side device 100, and the second interaction command is executed by the interaction SDK invoking one or more modality engines in the user side device 100 to achieve interaction with the user.

In some embodiments of the present invention, the interactive command execution module 120 is specifically configured to: and receiving, interpreting and executing the interaction command in the JavaScript object numbered musical notation format.

As an alternative embodiment, the interaction command of the JavaScript object profile format includes: a key field for recording a modality type of the interactive command and a value field for recording contents of the interactive command. Wherein the value field includes one or more of a version number recorded in the form of a key-value pair, an address of an interaction package, interaction data, and/or an address of interaction data.

Referring to fig. 4 and 5, in some embodiments, the server 200 of the present invention mainly includes an interactive request receiving module 210, an interactive command generating module 220, and an interactive command transmitting module 230.

The interactive request receiving module 210 is configured to receive interactive requests of one or more modalities sent by one or more client devices 100.

The interactive command generating module 220 is configured to generate one or more interactive commands according to the interactive request. Wherein at least some of the interactive commands include an interactive package or information of an interactive package. Optionally, the information of the interaction package includes a download address of the interaction package.

The interactive command sending module 230 is configured to send an interactive command to the user equipment 100.

In some embodiments of the invention, the interaction package includes object data and a script. Further, the interaction package also includes resource data. The object data includes data for adding an extended object to the client device 100 and/or data for updating an original object in the client device 100. The script includes instructions for the extended object, and/or instructions for the original object. Optionally, the script also includes generic logic such as pure mathematical calculations for object instructions built into the dynamic engine.

In some embodiments of the invention, at least some of the interactive commands include a version number.

In some embodiments of the invention, server 200 also includes an interaction package storage module. The interactive package storage module is also called a storage service and is used for receiving the interactive package uploaded by the development terminal and storing the interactive package.

In some embodiments of the invention, the interaction package is based on a first language, including one or more first language-based objects as first objects, and instructions for the first objects. Optionally, the instructions for the first object are also based on the first language. Wherein the first object is a bridging object of the second object. The second object is a second language-based object local to the client device 100.

In some embodiments of the present invention, the interactive request receiving module 210 is specifically configured to: an interaction request of a first modality or an interaction request of a second modality issued by the user side device 100 is received. Wherein the first modality of the interaction request is a user input interaction request received through a dynamic engine of the user side device 100, and the second modality of the interaction request is a user input interaction request received through an interaction SDK of the user side device 100.

The interactive command generation module 220 includes one or more of a first interactive command generation unit, a second interactive command generation unit. The first interactive command generating unit is used for generating the interactive command including the interactive package or the information of the interactive package as a first interactive command according to the first modal request. The second interaction command generating unit is used for generating a second interaction command according to the second modal request. The second interaction command includes interaction data of one or more modes and/or addresses of the interaction data for the interaction SDK of the user equipment 100 to perform interaction output, and instructions of one or more modes.

In some embodiments of the invention, server 200 also includes a unified format module. The unified format module is used for: one or more interactive commands are integrated into an interactive command in a unified format. Optionally, one or more interactive commands with respect to one or more modalities, or the first interactive command and the second interactive command, are integrated into an interactive command in a unified format. As an alternative embodiment, the unified format is JavaScript object notation format. The interactive command sending module 230 is specifically configured to send the interactive command in a unified format to the user equipment 100.

FIG. 6 is a schematic block diagram of one embodiment of a multimodal interaction system 300 of the present invention. Referring to fig. 6, an exemplary multi-modal interaction system 300 of the present invention mainly includes at least one device side and at least one server side, and implements the multi-modal interaction method shown in any of the foregoing embodiments.

The present invention is not limited to the above-mentioned embodiments, but is not limited to the above-mentioned embodiments, and any simple modification, equivalent changes and modification made to the above-mentioned embodiments according to the technical matters of the present invention can be made by those skilled in the art without departing from the scope of the present invention.

Claims

1. A method of multimodal interaction, the method comprising the steps of:

one or more equipment ends receive interaction requests of one or more modes input by a user and send the interaction requests to a server;

the server receives the interaction request sent by the equipment end, generates one or more interaction commands according to the interaction request, and sends the interaction commands to the equipment end; at least some of the interactive commands include an interactive package or information of the interactive package;

The equipment end receives and executes the interaction command, including dynamically loading the interaction package to realize interaction with a user;

wherein the interaction package includes one or more first language-based objects as first objects, and instructions for the first objects;

the dynamically loading the interaction package to enable interaction with a user includes: a dynamic loading mode and a bridging mode are adopted, and a second object bridged with the first object is controlled according to the first object and the instruction aiming at the first object; wherein the second object is an object based on a second language local to the device side.

2. The multi-modal interaction method of claim 1, further comprising, prior to the step of the one or more device ends receiving the user input of the one or more modalities of interaction request:

developing in advance at a development end to obtain the interaction package, and sending the interaction package to the server;

and the server receives the interaction package uploaded by the development terminal and stores the interaction package.

3. The multi-modal interaction method of claim 2 wherein:

the second object includes one or more second attributes based on the second language and one or more second methods;

The step of developing the interactive package at the development end in advance comprises the following steps:

obtaining an attribute based on the first language as a first attribute according to the second attribute mapping in a bridging manner, and obtaining a method based on the first language as a first method according to the second method mapping to obtain a bridging object based on the first language, which comprises the first attribute and the first method, as the first object;

packaging one or more of the first object, script, and resource data to obtain the interaction package.

4. The multi-modal interaction method of claim 1 wherein:

the interaction package comprises object data and a script, wherein the object data comprises data for adding an extended object to the equipment end and/or data for updating an original object in the equipment end, and the script comprises an instruction for the extended object and/or an instruction for the original object;

the dynamically loading the interaction package to enable interaction with a user includes: and controlling the object of one or more modes according to the interaction package, wherein the method specifically comprises one or more of updating the original object, adding the extension object, operating the original object, operating the updated original object or operating the extension object according to the interaction package.

5. The multi-modal interaction method of claim 1 wherein:

the information of the interaction package comprises an address of the interaction package;

the device side receiving and executing the interaction command further comprises: and before the interaction package is dynamically loaded, acquiring the interaction package according to the address of the interaction package.

6. The multi-modal interaction method as set forth in claim 5 wherein:

at least some of the interactive commands include a version number;

before the step of obtaining the interactive packet according to the address of the interactive packet, the method further comprises: and searching whether the interaction packet with the same version number exists in the equipment end according to the version number, and if not, acquiring the interaction packet according to the address of the interaction packet.

7. The multi-modal interaction method as claimed in claim 1, wherein,

the one or more device sides receiving an interaction request of one or more modes input by a user, and sending the interaction request to a server side comprises:

receiving an interaction request of a first mode input by a user by utilizing a dynamic engine of the equipment end, sending the interaction request of the first mode to the service end through the interaction engine,

Or receiving an interaction request of a second mode input by a user by utilizing the interaction SDK of the equipment end, and sending the interaction request of the second mode to the server end;

the generating one or more interaction commands according to the interaction request comprises:

the server judges according to the received interaction request,

if the interaction request is the interaction request of the first modality, generating the interaction command comprising the interaction package or the information of the interaction package as a first interaction command,

and if the interaction request is the interaction request of the second modality, generating a second interaction command containing interaction data of one or more modalities for the interaction SDK of the equipment end to carry out interaction output and/or addresses of the interaction data and instructions of one or more modalities.

8. The multi-modal interaction method of claim 7, wherein the device side receiving and executing the interaction command further comprises:

judging the received interaction command;

if the interaction command is the first interaction command or a part of the interaction command having the first modality, the dynamically loading the interaction package specifically includes: the interaction engine is utilized to acquire and decompress the interaction package, the dynamic engine is utilized to dynamically load and analyze the interaction package, and one or more modal engines in the equipment end are called by the dynamic engine to operate the object of the corresponding mode;

And if the interaction command is the second interaction command or part of the interaction command with the second modality, distributing the second interaction command or the part with the second modality to the interaction SDK, and calling one or more modality engines in the equipment end through the interaction SDK to execute the second interaction command so as to realize interaction with a user.

9. The method of multi-modal interaction as claimed in any of claims 1 to 8 wherein,

before the step of sending the interaction command to the equipment end, the method further comprises the following steps: integrating one or more interaction commands into a JavaScript object numbered musical notation format;

the sending the interaction command to the equipment end comprises the following steps: and sending the interaction command of the JavaScript object numbered musical notation format to the equipment end.

10. The multi-modal interaction method according to claim 9, wherein the JavaScript object numbered musical notation format interaction command comprises: a key field for recording a modality type of the interactive command and a value field for recording contents of the interactive command; wherein the value field includes one or more of a version number recorded in the form of a key-value pair, an address of the interaction package, interaction data, and/or an address of the interaction data.

11. A client device, the client device comprising:

the user input receiving module is used for receiving an interaction request of one or more modes input by a user and sending the interaction request to the server;

the interactive command execution module is used for receiving and executing the interactive command sent by the server; wherein at least some of the interactive commands include an interactive package or information of the interactive package;

the interactive command execution module comprises a dynamic loading unit, a dynamic loading unit and a dynamic interaction unit, wherein the dynamic loading unit is used for dynamically loading the interactive package so as to realize interaction with a user;

the dynamic loading unit is specifically configured to: a dynamic loading mode and a bridging mode are adopted, and a second object bridged with the first object is controlled according to the first object and the instruction aiming at the first object; wherein the second object is a second language-based object local to the client device.

12. The client device of claim 11, wherein:

the interaction package comprises object data and a script, wherein the object data comprises data for adding an extended object to the user side equipment and/or data for updating an original object in the user side equipment, and the script comprises instructions for the extended object and/or instructions for the original object;

The dynamic loading unit is specifically configured to: and controlling the object of one or more modes according to the interaction package, wherein the method specifically comprises one or more of updating the original object, adding the extension object, operating the original object, operating the updated original object or operating the extension object according to the interaction package.

13. The client device of claim 11, wherein:

the interactive command execution module further comprises an interactive packet acquisition unit, which is used for acquiring the interactive packet according to the address of the interactive packet.

14. The client device of claim 13, wherein:

at least some of the interactive commands include a version number;

the client device further comprises a version number judging unit, which is used for searching whether the interactive packet with the same version number exists in the client device according to the version number, and if not, notifying the interactive packet obtaining unit to obtain the interactive packet according to the address of the interactive packet.

15. The client device of claim 11, wherein,

The user input receiving module comprises a first mode input receiving unit and a second mode input receiving unit;

the first-modality input receiving unit is used for receiving an interaction request of a first modality input by a user by utilizing a dynamic engine, and sending the interaction request of the first modality to the server through the interaction engine so that the server can generate the interaction command comprising the interaction package or the information of the interaction package as a first interaction command according to the interaction request of the first modality;

the second-modality input receiving unit is configured to receive an interaction request of a second modality input by a user by using an interaction SDK, and send the interaction request of the second modality to the server, so that the server generates a second interaction command including interaction data of one or more modalities and/or addresses of the interaction data for the interaction SDK of the user terminal device to perform interaction output, and instructions of one or more modalities according to the interaction request of the second modality;

the interactive command execution module further comprises an interactive command receiving unit, which is used for receiving the interactive command sent by the server, and the interactive command receiving unit is used for receiving the first interactive command or the second interactive command.

16. The client device of claim 15, wherein,

the interactive command execution module further includes:

the judging unit is used for judging the received interaction command, informing the dynamic loading unit to process if the interaction command is the first interaction command or the part with the first mode in the interaction command, and informing the interaction SDK executing unit to process if the interaction command is the second interaction command or the part with the second mode in the interaction command;

the dynamic loading unit is specifically configured to: the interaction engine is utilized to acquire and decompress the interaction package, the dynamic engine is utilized to dynamically load and analyze the interaction package, and one or more modal engines in the user side equipment are called by the dynamic engine to operate the object of the corresponding mode;

the interactive SDK execution unit is used for: and distributing the second interaction command or the part related to the second modality to an interaction SDK of the user side equipment, and calling one or more modality engines in the user side equipment through the interaction SDK to execute the second interaction command so as to realize interaction with a user.

17. The user terminal device according to any one of the claims 11 to 16, characterized in that,

the interactive command execution module is specifically configured to: receiving and executing an interaction command in a JavaScript object numbered musical notation format;

the JavaScript object numbered musical notation format interaction command comprises a key field for recording the modal type of the interaction command and a value field for recording the content of the interaction command; the value field includes one or more of a version number recorded in the form of a key-value pair, an address of the interaction package, interaction data, and/or an address of the interaction data.

18. A server, the server comprising:

the system comprises an interaction request receiving module, a processing module and a processing module, wherein the interaction request receiving module is used for receiving interaction requests of one or more modes sent by one or more user side devices;

the interactive command generation module is used for generating one or more interactive commands according to the interactive request;

the interactive command sending module is used for sending the interactive command to the user terminal equipment;

wherein at least some of the interactive commands include an interactive package or information of the interactive package;

wherein the interaction package includes one or more first language-based objects as first objects, and instructions for the first objects; wherein the first object is a bridging object of a second object, and the second object is a second language-based object local to the client device.

19. The server of claim 18, further comprising an interaction package storage module configured to receive the interaction package uploaded by a development terminal and store the interaction package.

20. The server according to claim 18, wherein: the interaction package comprises object data and a script, wherein the object data comprises data for adding an extended object to the user side equipment and/or data for updating an original object in the user side equipment, and the script comprises instructions for the extended object and/or instructions for the original object.

21. The server according to claim 18, wherein the server is configured to,

the interaction request receiving module is specifically configured to: receiving an interaction request of a first mode or an interaction request of a second mode sent by the user side equipment; wherein the first modality interaction request is received through a dynamic engine of the client device, and the second modality interaction request is received through an interaction SDK of the client device;

the interactive command generating module comprises one or more of a first interactive command generating unit and a second interactive command generating unit;

The first interaction command generating unit is used for generating the interaction command comprising the interaction package or the information of the interaction package as a first interaction command according to the first modal request;

the second interaction command generating unit is configured to generate, according to the second modality request, a second interaction command including interaction data of one or more modalities for the interaction SDK of the user side device to perform interaction output and/or an address of the interaction data, and instructions of the one or more modalities.

22. The server according to any one of claims 18 to 21, characterized in that:

the system also comprises a unified format module, a unified format module and a unified format module, wherein the unified format module is used for integrating one or more interaction commands into a JavaScript object numbered musical notation format;

the interactive command sending module is specifically configured to send an interactive command in the JavaScript object numbered musical notation format to the user equipment;

23. A multi-modal interaction system, characterized in that the system comprises at least one device side and at least one service side, implementing the method according to any of claims 1 to 10.