CN111009245B

CN111009245B - Instruction execution method, system and storage medium

Info

Publication number: CN111009245B
Application number: CN201911309677.2A
Authority: CN
Inventors: 杨治银; 王爱飞
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-12-18
Filing date: 2019-12-18
Publication date: 2021-09-14
Anticipated expiration: 2039-12-18
Also published as: CN111009245A

Abstract

The application discloses an instruction execution method, a system and a storage medium, wherein the method comprises the following steps: receiving a voice information analysis request sent by a first terminal; obtaining a corpus information set from a second server according to the request; acquiring a target instruction corresponding to any one of the corpus information in the corpus information set; sending a target instruction to a first terminal; receiving a target instruction execution state message sent by a first terminal; when the residual corpus information exists in the corpus information set, determining a residual instruction matched with the residual corpus information based on the current notification message; sending a residual instruction to the first terminal; receiving a residual instruction execution state message which is sent by the first terminal and is used as the current notification message again; when the corpus information centralizes to have the residual corpus information, repeating the steps: and determining the residual instructions matched with the residual corpus information based on the current notification message. According to the technical scheme, the user can use a plurality of skill services through one-time voice, and the efficiency of obtaining information by the user is improved.

Description

Instruction execution method, system and storage medium

Technical Field

The present application relates to the field of internet communications technologies, and in particular, to a method, a system, and a storage medium for executing instructions.

Background

At present, on an intelligent voice device, services are generally provided for a user in a skill service mode, such as weather inquiry, news listening, music listening, broadcast listening, express delivery searching and the like. In many scenarios, a user needs to use multiple skill services. In the current implementation mode, multiple skill services are independent, and when a user needs to use multiple skills, the user needs to wait for completion of one skill service and then request the next skill. For example, when getting up in the morning, the user says "jingle, good morning" to the sound box, and in this scene, the user may want to ask about the weather of today, the latest news of today, and the like. At this time, the user needs to say "jingle, what the weather is like today" again for the sound box, and needs to wait for the weather skill to return and end; then "jingle, play today's news" is entered to trigger the news skill service. Therefore, the user cannot use a plurality of skill services seamlessly in one voice input; and the user cannot customize the skills to personal preferences.

Therefore, it is necessary to provide an instruction execution method, system and storage medium, which enable a user to use multiple skill services seamlessly through one voice, and improve the efficiency of obtaining information for the user.

Disclosure of Invention

The application provides an instruction execution method, an instruction execution system and a storage medium, which can enable a user to use a plurality of skill services seamlessly through one-time voice, and improve the efficiency of obtaining information by the user.

In one aspect, the present application provides a method for executing an instruction, the method comprising:

receiving a voice information analysis request sent by a first terminal in response to voice information; the voice information analysis request carries the voice information and identification information of the first terminal;

sending a corpus information acquisition request to a second server according to the voice information analysis request, wherein the corpus information acquisition request carries the identification information of the first terminal and the text information corresponding to the voice information;

receiving a corpus information set which is sent by the second server, is determined based on the corpus information acquisition request and is matched with the identification information of the first terminal and the text information; the corpus information set comprises at least two corpus information;

acquiring a target instruction corresponding to any one of the corpus information in the corpus information set;

sending the target instruction to the first terminal so as to enable the first terminal to execute the target instruction;

receiving a target instruction execution state message which is sent by the first terminal after the target instruction is executed and is used as a current notification message;

when residual corpus information exists in the corpus information set, determining a residual instruction matched with the residual corpus information based on the current notification message; the residual corpus information is corpus information except the currently matched corpus information in the corpus information set;

sending the residual instruction to the first terminal so as to enable the first terminal to execute the residual instruction;

receiving a remaining instruction execution state message which is sent by the first terminal after the execution of the remaining instruction is finished and is used as the current notification message again;

when the corpus information is concentrated with residual corpus information, repeating the steps: and determining a residual instruction matched with the residual corpus information based on the current notification message.

Another aspect provides a method of instruction execution, the method comprising:

receiving a corpus information acquisition request sent by a first server according to a voice information analysis request, wherein the corpus information acquisition request carries identification information of a first terminal and text information corresponding to the voice information; the voice information and the identification information of the first terminal are information carried in a voice information analysis request sent by the first terminal to the first server in response to the voice information;

determining a corpus information set matched with the identification information of the first terminal and the text information based on the corpus information acquisition request; the corpus information set comprises at least two corpus information;

sending the corpus information set to the first server; so that the first server acquires a target instruction corresponding to any one of the corpus information in the corpus information set; sending the target instruction to the first terminal so as to enable the first terminal to execute the target instruction; after the target instruction is executed, sending a target instruction execution state message to the first server; taking the target instruction execution state message as a current notification message, so that when the first server has residual corpus information in the corpus information set, determining a residual instruction matched with the residual corpus information based on the current notification message; the residual corpus information is corpus information except the currently matched corpus information in the corpus information set; and sending the residual instruction to the first terminal to enable the first terminal to execute the residual instruction; after the execution of the residual instruction is finished, sending a residual instruction execution state message to the first server, and using the residual instruction execution state message as the current notification message again; so that the first server repeats the steps when the corpus information set has residual corpus information: and determining a residual instruction matched with the residual corpus information based on the current notification message.

responding to the voice information, and sending a voice information analysis request to a first server; the voice information analysis request carries the voice information and identification information of the first terminal; enabling the first server to send a corpus information acquisition request to a second server according to the voice information analysis request, wherein the corpus information acquisition request carries identification information of the first terminal and text information corresponding to the voice information; enabling the second server to determine a corpus information set matched with the identification information of the first terminal and the text information based on the corpus information acquisition request; the corpus information set comprises at least two corpus information; sending the corpus information set to the first server;

receiving a target instruction sent by the first server, wherein the target instruction is an instruction corresponding to any one of the corpus information in the corpus information set acquired by the first server;

executing the target instruction, and sending a target instruction execution state message to the first server after the target instruction is executed; taking the target instruction execution state message as a current notification message;

receiving a residual instruction which is sent by the first server and is determined based on the current notification message and matched with the residual corpus information when the corpus information set has the residual corpus information, wherein the residual corpus information is the corpus information except the currently matched corpus information in the corpus information set;

executing the residual instruction, sending a residual instruction execution state message to the first server after the residual instruction is executed, and using the residual instruction execution state message as the current notification message again; so that the first server repeats the steps when the corpus information set has residual corpus information: and residual instructions which are determined based on the current notification message and are matched with the residual corpus information.

Another aspect provides an instruction execution first server, the first server comprising:

the voice information analysis request receiving module is used for receiving a voice information analysis request sent by the first terminal in response to the voice information; the voice information analysis request carries the voice information and identification information of the first terminal;

a corpus information acquisition request sending module, configured to send a corpus information acquisition request to a second server according to the voice information analysis request, where the corpus information acquisition request carries identification information of the first terminal and text information corresponding to the voice information;

a corpus information set receiving module, configured to receive a corpus information set that is sent by the second server, determined based on the corpus information acquisition request, and matches the identification information of the first terminal and the text information; the corpus information set comprises at least two corpus information;

the target instruction acquisition module is used for acquiring a target instruction corresponding to any one of the corpus information in the corpus information set;

a target instruction sending module, configured to send the target instruction to the first terminal, so that the first terminal executes the target instruction;

a target instruction execution state message receiving module, configured to receive a target instruction execution state message, which is sent by the first terminal after the target instruction execution is completed and serves as a current notification message;

a residual instruction determining module, configured to determine, based on the current notification message, a residual instruction matching the residual corpus information when the corpus information set has residual corpus information; the residual corpus information is corpus information except the currently matched corpus information in the corpus information set;

a remaining instruction sending module, configured to send the remaining instruction to the first terminal, so that the first terminal executes the remaining instruction;

a remaining instruction execution state message receiving module, configured to receive a remaining instruction execution state message that is sent by the first terminal after the execution of the remaining instruction is completed and is reused as the current notification message; when the corpus information is concentrated with residual corpus information, repeating the steps: and determining a residual instruction matched with the residual corpus information based on the current notification message.

Another aspect provides an instruction execution second server, comprising:

the system comprises a corpus information acquisition request receiving module, a corpus information acquisition module and a voice information analysis module, wherein the corpus information acquisition request receiving module is used for receiving a corpus information acquisition request sent by a first server according to a voice information analysis request, and the corpus information acquisition request carries identification information of a first terminal and text information corresponding to voice information; the voice information and the identification information of the first terminal are information carried in a voice information analysis request sent by the first terminal to the first server in response to the voice information;

a corpus information set determining module, configured to determine a corpus information set matching the identification information of the first terminal and the text information based on the corpus information acquisition request; the corpus information set comprises at least two corpus information;

a corpus information set sending module, configured to send the corpus information set to the first server; so that the first server acquires a target instruction corresponding to any one of the corpus information in the corpus information set; sending the target instruction to the first terminal so as to enable the first terminal to execute the target instruction; after the target instruction is executed, sending a target instruction execution state message to the first server; taking the target instruction execution state message as a current notification message, so that when the first server has residual corpus information in the corpus information set, determining a residual instruction matched with the residual corpus information based on the current notification message; the residual corpus information is corpus information except the currently matched corpus information in the corpus information set; and sending the residual instruction to the first terminal to enable the first terminal to execute the residual instruction; after the execution of the residual instruction is finished, sending a residual instruction execution state message to the first server, and using the residual instruction execution state message as the current notification message again; so that the first server repeats the steps when the corpus information set has residual corpus information: and determining a residual instruction matched with the residual corpus information based on the current notification message.

Another aspect provides an instruction execution first terminal, comprising:

the voice information analysis request sending module is used for responding to the voice information and sending a voice information analysis request to the first server; the voice information analysis request carries the voice information and identification information of the first terminal; enabling the first server to send a corpus information acquisition request to a second server according to the voice information analysis request, wherein the corpus information acquisition request carries identification information of the first terminal and text information corresponding to the voice information; enabling the second server to determine a corpus information set matched with the identification information of the first terminal and the text information based on the corpus information acquisition request; the corpus information set comprises at least two corpus information; sending the corpus information set to the first server;

a target instruction receiving module, configured to receive a target instruction sent by the first server, where the target instruction is an instruction corresponding to any corpus information in the corpus information set acquired by the first server;

a target instruction execution state message sending module, configured to execute the target instruction, and send a target instruction execution state message to the first server after the target instruction is executed; taking the target instruction execution state message as a current notification message;

a residual instruction receiving module, configured to receive a residual instruction, which is sent by the first server and is determined based on the current notification message when residual corpus information exists in the corpus information set and matches the residual corpus information, where the residual corpus information is corpus information in the corpus information set except for currently matched corpus information;

a remaining instruction execution state message sending module, configured to execute the remaining instruction, send a remaining instruction execution state message to the first server after the execution of the remaining instruction is completed, and regard the remaining instruction execution state message as the current notification message again; so that the first server repeats the steps when the corpus information set has residual corpus information: and residual instructions which are determined based on the current notification message and are matched with the residual corpus information.

In another aspect, an instruction execution system is provided, the system including a first terminal, a first server, and a second server;

the first terminal is used for responding to voice information and sending a voice information analysis request to the first server; the voice information analysis request carries the voice information and identification information of the first terminal; executing a target instruction, and sending a target instruction execution state message to the first server after the target instruction is executed; taking the target instruction execution state message as a current notification message; executing a residual instruction, sending a residual instruction execution state message to the first server after the residual instruction is executed, and taking the residual instruction execution state message as the current notification message again;

the first server is used for sending a corpus information acquisition request to the second server according to the voice information analysis request, wherein the corpus information acquisition request carries the identification information of the first terminal and the text information corresponding to the voice information; acquiring a target instruction corresponding to any one of the corpus information in the corpus information set; and sending the target instruction to the first terminal; when the corpus information set contains residual corpus information, determining a residual instruction matched with the residual corpus information based on the current notification message; the residual corpus information is corpus information except the currently matched corpus information in the corpus information set; and sending the remaining instructions to the first terminal; and repeating the steps of: when residual corpus information exists in the corpus information set, determining a residual instruction matched with the residual corpus information based on the current notification message;

the second server is used for determining a corpus information set matched with the identification information of the first terminal and the text information based on the corpus information acquisition request; the corpus information set comprises at least two corpus information; and sending the corpus information set to the first server.

Another aspect provides an instruction execution server, including a processor and a memory, where at least one instruction or at least one program is stored in the memory, and the at least one instruction or the at least one program is loaded by the processor and executed to implement the instruction execution method as described above.

Another aspect provides an instruction execution terminal, including a processor and a memory, where at least one instruction or at least one program is stored in the memory, and the at least one instruction or the at least one program is loaded by the processor and executed to implement the instruction execution method as described above.

Another aspect provides a computer-readable storage medium storing at least one instruction or at least one program, the at least one instruction or at least one program being loaded and executed by a processor to implement the instruction execution method as described above.

The instruction execution method, the instruction execution system and the storage medium have the following technical effects:

the method comprises the steps that a voice information analysis request sent by a first terminal in response to voice information is received; acquiring a matched corpus information set comprising at least two corpus information from a second server according to the voice information analysis request; then sending a target instruction corresponding to any one of the corpus information in the corpus information set to the first terminal, so that the first terminal executes the target instruction; after the first terminal finishes executing the target instruction, continuing to send an instruction corresponding to the next corpus information to the first terminal, so that the first terminal sequentially executes a plurality of instructions corresponding to the corpus information set; by adopting the technical scheme, the user can use a plurality of skill services seamlessly through one-time voice, and the efficiency of obtaining information by the user is improved.

Drawings

In order to more clearly illustrate the technical solutions and advantages of the embodiments of the present application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic diagram of a system provided by an embodiment of the present application;

FIG. 2 is a flowchart illustrating a method for executing instructions according to an embodiment of the present disclosure;

fig. 3 is a flowchart illustrating a method for determining a target area and a target intent according to an embodiment of the present application;

fig. 4 is a flowchart illustrating a method for determining a corpus information set matching the identification information of the first terminal and the text information according to an embodiment of the present application;

FIG. 5 is a block diagram of an instruction execution system according to an embodiment of the present disclosure;

FIG. 6 is a flowchart illustrating another method for executing instructions according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a blockchain system according to an embodiment of the present disclosure;

FIG. 8 is a block diagram according to an embodiment of the present application

FIG. 9 is a flowchart illustrating another method for executing instructions according to an embodiment of the present application;

FIG. 10 is a flowchart illustrating another method for executing instructions according to an embodiment of the present application;

FIG. 11 is a flowchart illustrating another method for executing instructions according to an embodiment of the present application;

FIG. 12 is a block diagram illustrating a first server for executing instructions according to an embodiment of the present disclosure;

fig. 13 is a schematic structural diagram of an instruction execution second server according to an embodiment of the present application;

fig. 14 is a schematic structural diagram of an instruction execution first terminal according to an embodiment of the present application.

Detailed Description

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like.

Key technologies for Speech Technology (Speech Technology) are automatic Speech recognition Technology (ASR) and Speech synthesis Technology (TTS), as well as voiceprint recognition Technology. The computer can listen, see, speak and feel, and the development direction of the future human-computer interaction is provided, wherein the voice becomes one of the best viewed human-computer interaction modes in the future.

Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.

The technical solution provided by the embodiments of the present application relates to a speech technology and a natural language processing technology of artificial intelligence, and will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Referring to fig. 1, fig. 1 is a schematic diagram of a system according to an embodiment of the present disclosure, and as shown in fig. 1, the system may include at least a first server 01, a second server 02, and a first terminal 03.

Specifically, the first server 01 may include an independently operating server, or a distributed server, or a server cluster composed of a plurality of servers. The first server 01 may comprise a network communication unit, a processor, a memory, etc. The first server 01 may provide a background service for the first terminal 0.

Specifically, the second server 02 may include a server that operates independently, or a distributed server, or a server cluster composed of a plurality of servers. The second server 02 may comprise a network communication unit, a processor and a memory, etc. The second server 02 may interact with the first server 01.

Specifically, the first terminal 01 may include a smart phone, a tablet computer, a notebook computer, a digital assistant, a smart wearable device, a vehicle-mounted terminal, and other types of physical devices, and may also include software running in the physical devices, for example, the first terminal 01 may include an application program for detecting voice information.

The following describes an instruction execution method of the present application based on the above system, and fig. 2 is a flowchart of an instruction execution method provided by an embodiment of the present application, and the present specification provides the method operation steps as described in the embodiment or the flowchart, but may include more or less operation steps based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. In practice, the system or server product may be implemented in a sequential or parallel manner (e.g., parallel processor or multi-threaded environment) according to the embodiments or methods shown in the figures. Specifically, as shown in fig. 2, the method may include:

s201: the first terminal responds to the voice information and sends a voice information analysis request to the first server; the voice information analysis request carries the voice information and the identification information of the first terminal.

In an embodiment of the present specification, the first server may be a voice server (TVS); the first terminal may be an intelligent voice device, for example, the first terminal may be an intelligent sound box;

in this embodiment of the present description, the voice information may be voice information sent by a user, for example, the user may speak "good morning" to the smart speaker, and the corresponding voice information is "good morning" voice; the identification information of the first terminal may be an id (identity document) of the first terminal, and may be used as a unique identifier of the first terminal.

S203: and the first server sends a corpus information acquisition request to a second server according to the voice information analysis request, wherein the corpus information acquisition request carries the identification information of the first terminal and the text information corresponding to the voice information.

In an embodiment of the present disclosure, the second server may be a domain linkage server, and the domain linkage server may be configured to determine a corresponding multi-skill list according to the voice information.

In an embodiment of this specification, before the step of sending, by the first server, the corpus information acquisition request to the second server according to the voice information parsing request, the method further includes:

and the first server converts the voice information into text information based on the voice information analysis request.

In this embodiment, the first server may convert the voice information into text information, for example, may convert the user's voice "good morning" into text information.

In this embodiment of the present specification, as shown in fig. 3, after the step of sending, by the first server, the corpus information acquisition request to the second server according to the voice information parsing request, the method may further include:

s2041: the second server sends a domain intention acquisition request to a third server based on the corpus information acquisition request, wherein the domain intention acquisition request carries the text information;

in this embodiment of the present disclosure, the third server may be a semantic server, where the semantic service, in combination with a trigger corpus (text information), analyzes a field and an intention corresponding to a skill linkage service, and then sends a request to a skill management server (TSKM); after receiving the request, the TSKM routes the data to a domain linkage server (UGCSKILL) according to the domain intention;

in this embodiment of the specification, the first terminal may set a mapping relationship between the trigger corpus and the multi-skill information in a factory configuration, for example, the smart speaker device may respond to "good morning" speech of any user and sequentially broadcast information such as weather and news of the day.

S2043: the third server determines a target field and a target intention corresponding to the text information according to the field intention acquisition request;

in the embodiment of the specification, the field is unique identification information of the skill, and the intention is a specific user intention supported by the skill. The skills may include smart home, weather, and news.

S2045: the third server sends a target field and a target intention corresponding to the text information to the second server;

correspondingly, the determining, by the second server, the corpus information set matched with the identification information of the first terminal and the text information based on the corpus information acquisition request includes:

the second server determines a preset database matched with the identification information of the first terminal based on the corpus information acquisition request;

and the second server searches a corpus information set matched with the target field and the target intention corresponding to the text information from the preset database.

In the embodiment of the present specification, each first terminal corresponds to a preset database, and the preset database stores a mapping relationship between text information and corpus information.

In a specific embodiment, a field corresponding to the text information can be determined according to a target field corresponding to the text information, and then a corpus information set matched with the target intention is determined from the field; for example, the text information is "good morning", and the corresponding fields include smart home, weather and news skills, wherein the smart home skills can determine that a lamp in a bedroom needs to be turned on according to time information in the text information; the weather skill can broadcast the weather information of the current day; the news skill can push morning news to the user according to the time information in the text information.

S205: the second server determines a corpus information set matched with the identification information of the first terminal and the text information based on the corpus information acquisition request; the corpus information set comprises at least two corpus information.

In this embodiment of the present specification, when the text information is "good morning", the corresponding fields include smart home, weather, and news skills, and then, the corresponding corpus information set may include "turn on a lamp in a bedroom", "weather is so today", and "play news today".

In this embodiment of the present specification, as shown in fig. 4, the determining, by the second server, the corpus information set that matches the identification information of the first terminal and the text information based on the corpus information acquisition request may include:

s2051: the second server acquires current scene information based on the corpus information acquisition request;

in this embodiment, the current scene information may include current time, date, holiday, location, user emotion, and other information.

S2053: the second server determines first target information in the text information and second target information in the current scene information;

in this embodiment of the present specification, the first target information is information in a user voice, and the first target information and the second target information may be information representing time;

s2055: the second server judges whether the first target information is matched with the second target information;

in this embodiment of the present specification, if the first target information and the second target information represent the same time, for example, both represent the time of the morning, it is determined that the first target information matches the second target information; and if the time represented by the first target information is the morning and the time represented by the second target information is the afternoon, determining that the first target information is not matched with the second target information.

S2057: and if the first target information is matched with the second target information, the second server determines a corpus information set matched with the identification information of the first terminal and the text information.

In an embodiment of the present specification, if the first target information does not match the second target information, the method further includes:

the second server replaces the first target information in the text information with the second target information to obtain replaced text information;

and the second server determines a corpus information set matched with the identification information of the first terminal and the replaced text information.

In the embodiment of the present specification, for example, the first target information may be "morning" in the text information "morning good"; for example, the time in the current scene is "8 o' clock at night", and the second target information may be "night", then, the time represented by the first target information is inconsistent with the time represented by the second target information, at this time, since the second target information obtains the current time information, the reliability is high, and it can be determined that the first target information is the error of the user; in this case, the first target information in the text information needs to be replaced by the second target information, that is, the text information "good morning" may be replaced by "good evening", and accordingly, the obtained corpus information set is an information set corresponding to "good evening". If at present evening, the user says "good morning" to the audio amplifier, and at this moment, scene rules engine can be evening according to current time, replace "report + little leading good morning" of picture "the first data in the list with" report + little leading good evening ".

In the embodiment of the present specification, a corpus information set with high accuracy can be determined through the voice information of the user and the current scene information.

In an embodiment of the present specification, the method further comprises:

the second terminal responds to the operation in the target application program and determines the mapping relation between the target text information and the target corpus information set;

the second terminal constructs a text corpus database according to the mapping relation between the target text information and the target corpus information set;

the second terminal sends the text corpus database, the identification information of the second terminal and the identification information of the target application program to the second server;

the second server determines the identification information of the first terminal according to the identification information of the target application program;

the second server judges whether the identification information of the first terminal is matched with the identification information of the second terminal;

correspondingly, the determining, by the second server, the corpus information set matching the identification information of the first terminal and the text information based on the corpus information acquisition request may include:

and if the identification information of the first terminal is matched with the identification information of the second terminal, the second server searches the corpus information set matched with the identification information of the first terminal and the text information from a text corpus database corresponding to the identification information of the first terminal based on the corpus information acquisition request.

In the embodiment of the specification, the user can set the mapping relationship between the text information and the corpus information set in a user-defined manner through the second terminal, so that the information which the user is interested in can be continuously acquired through one-time voice, and the information acquisition efficiency of the user is improved.

In a specific embodiment, the second server may store the mapping relationship between the target text information and the target corpus information set based on a blockchain system, where the blockchain system includes a plurality of nodes, and the plurality of nodes form a peer-to-peer network therebetween.

In some embodiments, the blockchain system may be the structure shown in fig. 7, a Peer-To-Peer (P2P) network is formed among a plurality of nodes, and the P2P Protocol is an application layer Protocol operating on top of a Transmission Control Protocol (TCP). In the blockchain system, any machine such as a server and a terminal can be added to become a node, and the node comprises a hardware layer, a middle layer, an operating system layer and an application layer.

The functions of each node in the blockchain system shown in fig. 7 involve:

1) routing, a basic function that a node has, is used to support communication between nodes.

Besides the routing function, the node may also have the following functions:

2) the application is used for being deployed in a block chain, realizing specific services according to actual service requirements, recording data related to the realization functions to form recording data, carrying a digital signature in the recording data to represent a source of task data, and sending the recording data to other nodes in the block chain system, so that the other nodes add the recording data to a temporary block when the source and integrity of the recording data are verified successfully.

3) And the Block chain comprises a series of blocks (blocks) which are mutually connected according to the generated chronological order, new blocks cannot be removed once being added into the Block chain, and recorded data submitted by nodes in the Block chain system are recorded in the blocks.

In some embodiments, the Block Structure (Block Structure) may be the Structure shown in fig. 8, where each Block includes a hash value of the Block storing the transaction record (hash value of the Block) and a hash value of a previous Block, and the blocks are connected by the hash values to form a Block chain. The block may include information such as a time stamp at the time of block generation. A Block chain (Block chain), which is essentially a decentralized database, is a series of data blocks associated by using cryptography, and each data Block contains related information for verifying the validity (anti-counterfeiting) of the information and generating a next Block.

In an embodiment of this specification, before the step of constructing, by the second terminal, the text corpus database based on the mapping relationship between the target text information and the target corpus information set, the method further includes:

the second terminal preprocesses the target text information and the target corpus information set;

in the embodiment of the present specification, the preprocessing may be security filtering processing such as sensitive word detection, pornography, political picture detection, and the like; and then the linguistic data and the corresponding multi-skill list data information are input into a database, and the information of data operation is sent to a message queue, and the data synchronization service processes the message in the message queue and synchronizes the data and the operation thereof to a semantic server, wherein the response content does not need to be synchronized.

Correspondingly, the second terminal building a text corpus database based on the mapping relationship between the target text information and the target corpus information set includes:

and the second terminal constructs a text corpus database based on the mapping relation between the processed target text information and the processed target corpus information set.

In the embodiment of the specification, the safety and the legality of the data in the text corpus database are improved through data preprocessing.

In an embodiment of the present specification, the method may further include:

the second terminal sends the text corpus database to a third server, wherein sample corpus data are stored in the text corpus database, and the sample corpus data comprise user-defined corpus data and/or user operation record data;

the third server trains to obtain a domain intention recognition model, namely a semantic model, based on the text corpus database;

specifically, in some embodiments, the third server may label data in the text corpus database, label a corresponding field and intention of the data, and then perform model training to obtain a field intention recognition model with an accuracy greater than a preset threshold.

Correspondingly, the determining, by the third server, the target domain and the target intention corresponding to the text information according to the domain intention acquisition request may include:

and the third server inputs the text information into a field intention recognition model according to the field intention acquisition request to obtain a target field and a target intention corresponding to the text information.

S207: and the second server sends the corpus information set to the first server.

S209: and the first server acquires a target instruction corresponding to any one of the corpus information in the corpus information set.

In an embodiment of this specification, the obtaining, by the first server, a target instruction corresponding to any corpus information in the corpus information set may include:

the first server can determine target corpus information in the corpus information set and send the target corpus information to a third server;

the third server can determine a corresponding skill server according to the target corpus information; for example, if the target corpus information is 'a lamp for turning on a bedroom', the corresponding skill server is an intelligent home server;

the third server sends the target corpus information to the skill server;

the skill server determines a target instruction according to the target corpus information and sends the target instruction to the third server;

the third server sends a target instruction analysis request to the first server, wherein the target instruction analysis request carries the target instruction;

the first server converts the target instruction into a voice instruction according to the target instruction analysis request;

the first server sends a voice instruction playing request to the first terminal;

and the first terminal plays the voice command based on the voice command playing request.

In this embodiment of the present specification, the first terminal may display text information corresponding to the voice instruction in a display interface while playing the voice instruction. And after each piece of corpus information in the corpus information set is processed by the same method, the voice instruction is played through the first terminal.

In the embodiment of the present specification, under a condition that a corpus information priority in a corpus information set is not set, a target instruction corresponding to any corpus information in the corpus information set may be obtained; in some embodiments, if the priority of the corpus information in the corpus information set is set, the target instruction corresponding to the corpus information with the highest priority is obtained first.

S2011: and the first server sends the target instruction to the first terminal.

S2013: the first terminal executes the target instruction; after the target instruction is executed, the first terminal sends a target instruction execution state message to the first server; taking the target instruction execution state message as a current notification message;

s2015: when the corpus information set contains residual corpus information, the first server determines a residual instruction matched with the residual corpus information based on the current notification message; and the residual corpus information is the corpus information except the currently matched corpus information in the corpus information set.

In an embodiment of this specification, before the step of sending, by the first server, the current target instruction matching the remaining corpus information to the first terminal based on the current notification message when the remaining corpus information exists in the corpus information set, the method may further include:

the first server judges whether the corpus information set has residual corpus information or not;

and when the residual corpus information does not exist in the corpus information set, the first server sends an instruction completion message to the first terminal.

In this embodiment, the instruction completion message may be presented in the first terminal in a form of voice and/or text.

S2017: and the first server sends the residual instruction to the first terminal.

S2019: the first terminal executes the remaining instructions; after the execution of the remaining instruction is finished, the first terminal sends a remaining instruction execution state message to the first server, and the remaining instruction execution state message is used as the current notification message again;

s2021: when the corpus information is concentrated with residual corpus information, repeating the steps: and the first server determines a residual instruction matched with the residual corpus information based on the current notification message.

In this embodiment of the present specification, each time the first terminal executes one instruction, the first server is notified, and the first server issues the next execution instruction according to the notification message until all instructions corresponding to the corpus information in the first server are executed.

In a specific embodiment, the user enters the sound box device once, and can receive the return information for multiple times. If the user inputs 'morning good', the skills return a user-defined greeting, such as 'little owner morning good', and simultaneously return information such as weather, financial affairs, news and the like in sequence.

In a specific application embodiment, as shown in fig. 5, fig. 5 is a schematic structural diagram of an instruction execution system, where the instruction execution system includes:

A. the intelligent voice device integrates TVS Software development kit (TVS-SDK) and TVS service interaction, receives voice input of a user and transmits the voice input to a TVS service at the rear end; executing instructions issued by the TVS service, such as broadcasting voice, playing audio and video, presenting a User Interface (UI) and the like; reporting the state event to the TVS service, such as playing progress and the like;

the TVS server provides an access interface of the intelligent voice equipment, receives a request of the terminal, receives the state reported by the terminal voice equipment, requests subsequent services, acquires a multi-skill data list configured by a user in the skill linkage service, polls the requested skills in sequence according to the list, and finally sends results of the multiple skills to the terminal voice equipment in sequence;

C. the semantic server receives the request from the TVS, performs semantic understanding, determines the field and the intention, and sends the request to a skill management server (TSKM); training corpus data input by a user;

the TSKM skill management server routes the request to a specific skill server according to the field and intention;

E. the skill server fills the content of the service after receiving the request, and finally returns data to the screen equipment after layer-by-layer transmission; if weather skills return to today's weather conditions, FM returns to recommended broadcasts, etc.;

F. the content management server is used for inputting and managing self-defined skill data such as 'good morning' and a skill data list corresponding to the skill data through the APP of the mobile phone end by a user according to an interface for providing user management and self-defining multi-skill configuration;

G. and the skill linkage server receives a request from the TSKM, is in charge of customizing a module for analyzing linkage data, retrieves a proper skill list through a rule engine by combining the current scene, such as time, festival, place, emotion of a speaker and the like, with the preference data of the user, and finally returns the list to the TVS service.

H. And the data synchronization server integrates an offline full-scale module and a quasi-real-time incremental synchronization module, and data input and operated by a user is input into the semantic server through a flow log system for the semantic server to train the model.

The user can interact with the content management server through an application program (APP) in the mobile phone, and can perform management operations of inputting, updating, deleting, uploading and downloading of multi-skill content of multi-skill linkage triggering corpora and responses. The data includes an ID of a device (e.g., a smart speaker), an ID of a user, a trigger corpus, response content, and the like. The ID of the user may be a WeChat or QQ account, and the corresponding data is shown in table 1 below.

TABLE 1

The process of user-defined data entry is as follows:

A. firstly, a user inputs a self-defined combination through an APP, if the combination is 'good morning', a corresponding response list is as follows: "broadcast | little owner good morning", "skill | turn on bedroom light", "skill | weather so today", "skill | play news today" ];

B. user setting preferences, such as skills, industries of interest, and the like;

C. after receiving the data, the content management server performs security filtering through sensitive word detection, pornographic and political picture detection and the like, records the linguistic data and corresponding multi-skill list data information into a database, sends a message of data operation to a message queue, and synchronizes the data and the operation thereof to semantic service by the data synchronization service, wherein the response content does not need to be synchronized;

D. after receiving the synchronized user-defined corpus data, the semantic server trains a semantic model and updates the model so that the trigger corpus can correctly analyze the corresponding field and intention;

after the user inputs data at the APP end of the mobile phone, the intelligent sound box end bound with the APP can experience the multi-skill linkage service capability; the method comprises the steps that a triggering corpus is input on equipment in a voice mode, a TVS service obtains response list data corresponding to the corpus through subsequent semantic, skill and other series services, wherein the list data comprise triggering data of a plurality of skills, such as data triggering weather skills, namely 'how much weather today' exists ', data triggering music' to come to be in Joger's Lung' and data triggering smart home skills, namely 'turning on a lamp of a bedroom'; the TVS traverses the list, sequentially requests skills of weather, music, smart home and the like, and completes multi-skill linkage in combination with the state of whether the skill reported by the terminal is finished; the data in Table 2 below are given as examples

TABLE 2

As shown in fig. 6, fig. 6 is a method for executing an instruction corresponding to table 2, which specifically includes:

A. the user enters "good morning" in the sound box semantic, corresponding to step 1 in fig. 6;

B. the terminal sends a voice analysis request to a voice server (TVS), voice data is converted into words, and then text words are transmitted to a semantic server, corresponding to step 2 in FIG. 6;

C. the semantic server analyzes the corresponding field and intention of the skill linkage service according to the information of the ID, the equipment and the like of the user and the triggering linguistic data, and then sends a request to a skill management server (TSKM);

after receiving the request, the tskm routes the data to a domain linkage server (UGCSKILL) according to the domain intention, corresponding to step 3 in fig. 6;

E. the method comprises the following steps that a user-defined input of a user is pulled from a database, for example, a response content part in example data is combined with the current scene, such as time, holidays, places, occurrence emotion and the like, and the preference data of the user are combined, a list of the user is changed through a rule engine, and if the user says 'good morning' to a sound box at the current night, the scene rule engine can replace the first data 'broadcast + figure | little good morning' in the list into 'broadcast + figure | little good morning' according to the current time of the night; after being processed by the scene and preference rule engine, the final skill list is returned to the TVS service; corresponding to steps 4-5 in FIG. 6;

traversing the pulled list by the TVS service, and sequentially sending the corpora corresponding to the skills in the list to the semantic service; after the semantic service analyzes the domain and intention information of specific skills, the semantic service requests corresponding skill services such as news, broadcast (FM), music, weather and the like through the TSKM again; after the TVS service receives the skill data, the control terminal displays the content to execute instructions such as playing music and broadcasting weather conditions; the example data sequentially requests smart home skills, weather skills and news skills, which correspond to

steps

6, 13 and 20 in fig. 6;

G. under the condition that the single skill is finished, the sound box terminal equipment automatically reports the finished state to the TVS service through the state reporting module, and the TVS service can poll until the list is finished.

By adopting the method, the user can seamlessly experience services with a plurality of skills under the condition of one-time voice input. According to the method and the device, a scheme of multi-skill linkage in multiple scenes is designed, so that one-time voice input can be provided for a user, and the ability of experiencing multiple skill services without explicitly switching skills is not needed; the user is provided with a multi-scenario customizable service experience. Meanwhile, the skill developer is convenient, and other skills can be called in a linkage manner to form the own characteristic skill service.

As can be seen from the technical solutions provided in the embodiments of the present specification, a voice information analysis request sent by a first terminal in response to a voice message is received; acquiring a matched corpus information set comprising at least two corpus information from a second server according to the voice information analysis request; then sending a target instruction corresponding to any one of the corpus information in the corpus information set to the first terminal, so that the first terminal executes the target instruction; after the first terminal finishes executing the target instruction, continuing to send an instruction corresponding to the next corpus information to the first terminal, so that the first terminal sequentially executes a plurality of instructions corresponding to the corpus information set; by adopting the technical scheme, the user can use a plurality of skill services seamlessly through one-time voice, and the efficiency of obtaining information by the user is improved.

A specific embodiment of an instruction execution method in this specification is described below with a first server as an execution subject, and fig. 9 is a flowchart of the instruction execution method provided in this embodiment of the present application, and specifically, with reference to fig. 9, the method may include:

s901: receiving a voice information analysis request sent by a first terminal in response to voice information; the voice information analysis request carries the voice information and identification information of the first terminal;

s903: sending a corpus information acquisition request to a second server according to the voice information analysis request, wherein the corpus information acquisition request carries the identification information of the first terminal and the text information corresponding to the voice information;

s905: receiving a corpus information set which is sent by the second server, is determined based on the corpus information acquisition request and is matched with the identification information of the first terminal and the text information; the corpus information set comprises at least two corpus information;

s907: acquiring a target instruction corresponding to any one of the corpus information in the corpus information set;

s909: sending the target instruction to the first terminal so as to enable the first terminal to execute the target instruction;

s9011: receiving a target instruction execution state message which is sent by the first terminal after the target instruction is executed and is used as a current notification message;

s9013: when residual corpus information exists in the corpus information set, determining a residual instruction matched with the residual corpus information based on the current notification message; the residual corpus information is corpus information except the currently matched corpus information in the corpus information set;

s9015: sending the residual instruction to the first terminal so as to enable the first terminal to execute the residual instruction;

s9017: receiving a remaining instruction execution state message which is sent by the first terminal after the execution of the remaining instruction is finished and is used as the current notification message again; when the corpus information is concentrated with residual corpus information, repeating the steps: and determining a residual instruction matched with the residual corpus information based on the current notification message.

In some embodiments, before the step of sending the corpus information acquisition request to the second server according to the voice information parsing request, the method may further include:

and converting the voice information into text information based on the voice information analysis request.

In some embodiments, before the step of sending, to the first terminal, a current target instruction matching the remaining corpus information based on the current notification message when the remaining corpus information exists in the corpus information set, the method may further include:

judging whether the corpus information set has residual corpus information or not;

correspondingly, the method further comprises the following steps:

and when the residual corpus information does not exist in the corpus information set, sending an instruction completion message to the first terminal.

The present specification provides an instruction execution first server, which includes a processor and a memory, where at least one instruction or at least one program is stored in the memory, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the instruction execution method as described above.

A specific embodiment of an instruction execution method in this specification is described below with a second server as an execution subject, and fig. 10 is a flowchart of the instruction execution method provided in this embodiment of the present application, and specifically, with reference to fig. 10, the method may include:

s1001: receiving a corpus information acquisition request sent by a first server according to a voice information analysis request, wherein the corpus information acquisition request carries identification information of a first terminal and text information corresponding to the voice information; the voice information and the identification information of the first terminal are information carried in a voice information analysis request sent by the first terminal to the first server in response to the voice information;

s1003: determining a corpus information set matched with the identification information of the first terminal and the text information based on the corpus information acquisition request; the corpus information set comprises at least two corpus information;

s1005: sending the corpus information set to the first server; so that the first server acquires a target instruction corresponding to any one of the corpus information in the corpus information set; sending the target instruction to the first terminal so as to enable the first terminal to execute the target instruction; after the target instruction is executed, sending a target instruction execution state message to the first server; taking the target instruction execution state message as a current notification message, so that when the first server has residual corpus information in the corpus information set, determining a residual instruction matched with the residual corpus information based on the current notification message; the residual corpus information is corpus information except the currently matched corpus information in the corpus information set; and sending the residual instruction to the first terminal to enable the first terminal to execute the residual instruction; after the execution of the residual instruction is finished, sending a residual instruction execution state message to the first server, and using the residual instruction execution state message as the current notification message again; so that the first server repeats the steps when the corpus information set has residual corpus information: and determining a residual instruction matched with the residual corpus information based on the current notification message.

In some embodiments, the determining, based on the corpus information acquisition request, a corpus information set that matches the identification information of the first terminal and the text information includes:

acquiring current scene information based on the corpus information acquisition request;

determining first target information in the text information and second target information in the current scene information;

judging whether the first target information is matched with the second target information;

and if the first target information is matched with the second target information, determining a corpus information set matched with the identification information of the first terminal and the text information.

In some embodiments, if the first target information does not match the second target information, the method further comprises:

replacing the first target information in the text information with the second target information to obtain replaced text information;

and determining a corpus information set matched with the identification information of the first terminal and the replaced text information.

In some embodiments, the method further comprises:

receiving identification information of the second terminal, identification information of a target application program and a text corpus database constructed according to the mapping relation between target text information and a target corpus information set, which are sent by a second terminal; the mapping relation between the target text information and the target corpus information set is determined by the second terminal in response to the operation in the target application program;

determining the identification information of the first terminal according to the identification information of the target application program;

judging whether the identification information of the first terminal is matched with the identification information of the second terminal;

correspondingly, the determining, based on the corpus information acquisition request, a corpus information set matching the identification information of the first terminal and the text information includes:

and if the identification information of the first terminal is matched with the identification information of the second terminal, searching a corpus information set matched with the identification information of the first terminal and the text information from a text corpus database corresponding to the identification information of the first terminal based on the corpus information acquisition request.

In some embodiments, after the step of receiving the corpus information acquisition request sent by the first server according to the voice information parsing request, the method may further include:

sending a domain intention acquisition request to a third server based on the corpus information acquisition request, wherein the domain intention acquisition request carries the text information;

receiving a target field and a target intention which are sent by the third server and correspond to the text information determined according to the field intention acquisition request;

correspondingly, the determining, based on the corpus information obtaining request, a corpus information set matching the identification information of the first terminal and the text information may include:

determining a preset database matched with the identification information of the first terminal based on the corpus information acquisition request;

and searching a corpus information set matched with the target field and the target intention corresponding to the text information from the preset database.

In some embodiments, the receiving, which is sent by the third server, a target domain and a target intention corresponding to the text information determined according to the domain intention acquisition request may include:

receiving a target field and a target intention which are sent by the third server and correspond to the text information and are obtained by inputting the text information into a semantic model according to the field intention acquisition request; the semantic model is obtained by training the received sample corpus data which is marked with the fields and intentions and sent by the second terminal by the third server, and the sample corpus data comprises user-defined corpus data and/or user operation record data.

The present specification provides an instruction execution second server, which includes a processor and a memory, where at least one instruction or at least one program is stored in the memory, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the instruction execution method as described above.

A specific embodiment of an instruction execution method in this specification is described below with a first terminal as an execution subject, and fig. 11 is a flowchart of the instruction execution method provided in this embodiment of the present application, and specifically, with reference to fig. 11, the method may include:

s1101: responding to the voice information, and sending a voice information analysis request to a first server; the voice information analysis request carries the voice information and identification information of the first terminal; enabling the first server to send a corpus information acquisition request to a second server according to the voice information analysis request, wherein the corpus information acquisition request carries identification information of the first terminal and text information corresponding to the voice information; enabling the second server to determine a corpus information set matched with the identification information of the first terminal and the text information based on the corpus information acquisition request; the corpus information set comprises at least two corpus information; sending the corpus information set to the first server;

s1103: receiving a target instruction sent by the first server, wherein the target instruction is an instruction corresponding to any one of the corpus information in the corpus information set acquired by the first server;

s1105: executing the target instruction, and sending a target instruction execution state message to the first server after the target instruction is executed; taking the target instruction execution state message as a current notification message;

s1107: receiving a residual instruction which is sent by the first server and is determined based on the current notification message and matched with the residual corpus information when the corpus information set has the residual corpus information, wherein the residual corpus information is the corpus information except the currently matched corpus information in the corpus information set;

s1109: executing the residual instruction, sending a residual instruction execution state message to the first server after the residual instruction is executed, and using the residual instruction execution state message as the current notification message again; so that the first server repeats the steps when the corpus information set has residual corpus information: and residual instructions which are determined based on the current notification message and are matched with the residual corpus information.

The present specification provides an instruction execution first terminal, which includes a processor and a memory, where at least one instruction or at least one program is stored in the memory, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the instruction execution method as described above.

In the embodiments of the present disclosure, the memory may be used to store software programs and modules, and the processor executes various functional applications and data processing by operating the software programs and modules stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system, application programs needed by functions and the like; the storage data area may store data created according to use of the apparatus, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory may also include a memory controller to provide the processor access to the memory.

The present specification provides a computer readable storage medium, which stores at least one instruction or at least one program, and the at least one instruction or at least one program is loaded by a processor and executed to implement the instruction execution method as described above.

An embodiment of the present application further provides a first server for executing an instruction, and as shown in fig. 12, the first server may include:

a voice information analysis request receiving module 1210, configured to receive a voice information analysis request sent by the first terminal in response to the voice information; the voice information analysis request carries the voice information and identification information of the first terminal;

a corpus information acquisition request sending module 1220, configured to send a corpus information acquisition request to a second server according to the voice information analysis request, where the corpus information acquisition request carries identification information of the first terminal and text information corresponding to the voice information;

a corpus information set receiving module 1230, configured to receive a corpus information set that is sent by the second server, determined based on the corpus information acquisition request, and matches the identifier information of the first terminal and the text information; the corpus information set comprises at least two corpus information;

a target instruction obtaining module 1240, configured to obtain a target instruction corresponding to any corpus information in the corpus information set;

a target instruction sending module 1250, configured to send the target instruction to the first terminal, so that the first terminal executes the target instruction;

a target instruction execution state message receiving module 1260, configured to receive a target instruction execution state message, which is sent by the first terminal after the target instruction is executed and serves as a current notification message;

a residual instruction determining module 1270, configured to determine a residual instruction matched with the residual corpus information based on the current notification message when the corpus information set has residual corpus information; the residual corpus information is corpus information except the currently matched corpus information in the corpus information set;

a remaining instruction sending module 1280, configured to send the remaining instruction to the first terminal, so that the first terminal executes the remaining instruction;

a remaining instruction execution status message receiving module 1290, configured to receive a remaining instruction execution status message that is sent by the first terminal after the execution of the remaining instruction is completed and is reused as the current notification message; repeating the steps when the corpus information is concentrated with residual corpus information: and determining a residual instruction matched with the residual corpus information based on the current notification message.

In some embodiments, the first server may further include:

and the information conversion module is used for converting the voice information into text information based on the voice information analysis request.

The server and method embodiments in the server embodiment are based on the same inventive concept.

An embodiment of the present application further provides an instruction execution second server, as shown in fig. 13, where the second server may include:

a corpus information acquisition request receiving module 1310, configured to receive a corpus information acquisition request sent by a first server according to a voice information analysis request, where the corpus information acquisition request carries identification information of a first terminal and text information corresponding to voice information; the voice information and the identification information of the first terminal are information carried in a voice information analysis request sent by the first terminal to the first server in response to the voice information;

a corpus information set determining module 1320, configured to determine, based on the corpus information acquisition request, a corpus information set that matches the identification information of the first terminal and the text information; the corpus information set comprises at least two corpus information;

a corpus information set sending module 1330, configured to send the corpus information set to the first server; so that the first server acquires a target instruction corresponding to any one of the corpus information in the corpus information set; sending the target instruction to the first terminal so as to enable the first terminal to execute the target instruction; after the target instruction is executed, sending a target instruction execution state message to the first server; taking the target instruction execution state message as a current notification message, so that when the first server has residual corpus information in the corpus information set, determining a residual instruction matched with the residual corpus information based on the current notification message; the residual corpus information is corpus information except the currently matched corpus information in the corpus information set; and sending the residual instruction to the first terminal to enable the first terminal to execute the residual instruction; after the execution of the residual instruction is finished, sending a residual instruction execution state message to the first server, and using the residual instruction execution state message as the current notification message again; so that the first server repeats the steps when the corpus information set has residual corpus information: and residual instructions which are determined based on the current notification message and are matched with the residual corpus information.

In some embodiments, the corpus information set determination module may include:

a current scene information obtaining unit, configured to obtain current scene information based on the corpus information obtaining request;

the target information determining unit is used for determining first target information in the text information and second target information in the current scene information;

a target information matching unit, configured to determine whether the first target information matches the second target information;

and the corpus information set determining unit is used for determining a corpus information set matched with the identification information of the first terminal and the text information if the first target information is matched with the second target information.

In some embodiments, if the first target information does not match the second target information, the second server may further include:

the target information replacing module is used for replacing the first target information in the text information with the second target information to obtain replaced text information;

and the corpus information set determining module is used for determining a corpus information set matched with the identification information of the first terminal and the replaced text information.

In some embodiments, the second server may further include:

the text corpus database receiving module is used for receiving the identification information of the second terminal, the identification information of the target application program and a text corpus database constructed according to the mapping relation between the target text information and the target corpus information set, which are sent by the second terminal; the mapping relation between the target text information and the target corpus information set is determined by the second terminal in response to the operation in the target application program;

the identification information determining module of the first terminal is used for determining the identification information of the first terminal according to the identification information of the target application program;

the judging module is used for judging whether the identification information of the first terminal is matched with the identification information of the second terminal;

correspondingly, the corpus information set determining module may include:

and the first corpus information set searching unit is used for searching a corpus information set matched with the identification information of the first terminal and the text information from a text corpus database corresponding to the identification information of the first terminal based on the corpus information acquisition request if the identification information of the first terminal is matched with the identification information of the second terminal.

In some embodiments, the second server may further include:

a domain intention acquisition request sending module, configured to send a domain intention acquisition request to a third server based on the corpus information acquisition request, where the domain intention acquisition request carries the text information;

a target field and target intention receiving module, configured to receive a target field and a target intention corresponding to the text information, which are sent by the third server and determined according to the field intention acquisition request;

correspondingly, the corpus information set determining module may include:

a preset database determining unit, configured to determine, based on the corpus information acquisition request, a preset database that matches the identification information of the first terminal;

and the second corpus information set searching unit is used for searching a corpus information set matched with the target field and the target intention corresponding to the text information from the preset database.

An embodiment of the present application further provides an instruction execution first terminal, as shown in fig. 14, where the first terminal may include:

a voice information parsing request sending module 1410, configured to send a voice information parsing request to the first server in response to the voice information; the voice information analysis request carries the voice information and identification information of the first terminal; enabling the first server to send a corpus information acquisition request to a second server according to the voice information analysis request, wherein the corpus information acquisition request carries identification information of the first terminal and text information corresponding to the voice information; enabling the second server to determine a corpus information set matched with the identification information of the first terminal and the text information based on the corpus information acquisition request; the corpus information set comprises at least two corpus information; sending the corpus information set to the first server;

a target instruction receiving module 1420, configured to receive a target instruction sent by the first server, where the target instruction is an instruction corresponding to any corpus information in the corpus information set acquired by the first server;

a target instruction execution status message sending module 1430, configured to execute the target instruction, and send a target instruction execution status message to the first server after the target instruction is executed; taking the target instruction execution state message as a current notification message;

a residual instruction receiving module 1440, configured to receive a residual instruction, which is sent by the first server and is determined based on the current notification message to match the residual corpus information when the corpus information set has residual corpus information, where the residual corpus information is corpus information in the corpus information set except for currently matched corpus information;

a remaining instruction execution status message sending module 1450, configured to execute the remaining instruction, send a remaining instruction execution status message to the first server after the execution of the remaining instruction is completed, and regard the remaining instruction execution status message as the current notification message again; so that the first server repeats the steps when the corpus information set has residual corpus information: and residual instructions which are determined based on the current notification message and are matched with the residual corpus information.

The terminal and the method embodiments in the terminal embodiment are based on the same inventive concept.

The application also provides an instruction execution system, which comprises a first terminal, a first server and a second server;

As can be seen from the above embodiments of the instruction execution method, apparatus, server, terminal, storage medium, or system provided by the present application, in the embodiments of the present specification, a voice information parsing request sent by a first terminal in response to a voice message is received; acquiring a matched corpus information set comprising at least two corpus information from a second server according to the voice information analysis request; then sending a target instruction corresponding to any one of the corpus information in the corpus information set to the first terminal, so that the first terminal executes the target instruction; after the first terminal finishes executing the target instruction, continuing to send an instruction corresponding to the next corpus information to the first terminal, so that the first terminal sequentially executes a plurality of instructions corresponding to the corpus information set; by adopting the technical scheme, the user can use a plurality of skill services seamlessly through one-time voice, and the efficiency of obtaining information by the user is improved.

It should be noted that: the sequence of the embodiments of the present application is only for description, and does not represent the advantages and disadvantages of the embodiments. And specific embodiments thereof have been described above. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for embodiments of the server, the terminal, the system, and the storage medium, since they are substantially similar to the method embodiments, the description is relatively simple, and for relevant points, reference may be made to part of the description of the method embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer storage medium, and the above storage medium may be a read-only memory, a magnetic disk, an optical disk, or the like.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. An instruction execution method, the method comprising:

sending a corpus information acquisition request to a second server according to the voice information analysis request, wherein the corpus information acquisition request carries the identification information of the first terminal and the text information corresponding to the voice information; enabling the second server to acquire current scene information based on the corpus information acquisition request; determining first target information in the text information and second target information in the current scene information; if the first target information is matched with the second target information, determining a corpus information set matched with the identification information of the first terminal and the text information; the corpus information set comprises at least two corpus information;

receiving the corpus information set sent by the second server;

receiving a remaining instruction execution state message which is sent by the first terminal after the execution of the remaining instruction is finished and is used as the current notification message again; when the corpus information is concentrated with residual corpus information, repeating the steps: and determining a residual instruction matched with the residual corpus information based on the current notification message.

2. The method according to claim 1, wherein before the step of sending the corpus information acquisition request to the second server according to the voice information parsing request, the method further comprises:

3. An instruction execution method, the method comprising:

if the first target information is matched with the second target information, determining a corpus information set matched with the identification information of the first terminal and the text information; the corpus information set comprises at least two corpus information;

4. The method of claim 3, wherein if the first target information does not match the second target information, the method further comprises:

5. The method of claim 3, further comprising:

if the identification information of the first terminal is matched with the identification information of the second terminal, searching a corpus information set matched with the identification information of the first terminal and the text information from a text corpus database corresponding to the identification information of the first terminal based on the corpus information acquisition request;

after the step of receiving the corpus information acquisition request sent by the first server according to the voice information analysis request, the method further includes:

6. The method according to claim 5, wherein the receiving the target domain and the target intention corresponding to the text information determined according to the domain intention acquisition request sent by the third server comprises:

7. An instruction execution method, the method comprising:

responding to the voice information, and sending a voice information analysis request to a first server; the voice information analysis request carries the voice information and identification information of the first terminal; enabling the first server to send a corpus information acquisition request to a second server according to the voice information analysis request, wherein the corpus information acquisition request carries identification information of the first terminal and text information corresponding to the voice information; enabling the second server to acquire current scene information based on the corpus information acquisition request; determining first target information in the text information and second target information in the current scene information; if the first target information is matched with the second target information, determining a corpus information set matched with the identification information of the first terminal and the text information; the corpus information set comprises at least two corpus information; sending the corpus information set to the first server;

8. An instruction execution system, the system comprising: the system comprises a first terminal, a first server and a second server;

the second server is used for acquiring current scene information based on the corpus information acquisition request; determining first target information in the text information and second target information in the current scene information; if the first target information is matched with the second target information, determining a corpus information set matched with the identification information of the first terminal and the text information; the corpus information set comprises at least two corpus information; and sending the corpus information set to the first server.

9. A computer storage medium having stored therein at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by a processor to implement the instruction execution method of any of claims 1-2, 3-6, or 7.

10. An instruction execution server, the server comprising a processor and a memory, the memory having stored therein at least one instruction or at least one program, the at least one instruction or at least one program being loaded and executed by the processor to implement the instruction execution method of any of claims 1-2, any of claims 3-6, or claim 7.

11. An instruction execution terminal, the terminal comprising a processor and a memory, the memory having stored therein at least one instruction or at least one program, the at least one instruction or at least one program being loaded and executed by the processor to implement the instruction execution method of any of claims 1-2, any of claims 3-6, or claim 7.