CN115705226A

CN115705226A - Method, device and equipment for executing voice scene task and storage medium

Info

Publication number: CN115705226A
Application number: CN202110927751.8A
Authority: CN
Inventors: 丁磊; 蒋瑞
Original assignee: Human Horizons Shanghai Internet Technology Co Ltd
Current assignee: Human Horizons Shanghai Internet Technology Co Ltd
Priority date: 2021-08-12
Filing date: 2021-08-12
Publication date: 2023-02-17

Abstract

The embodiment of the application provides a method, a device, equipment and a storage medium for executing a voice scene task, wherein the method comprises the following steps: determining a target scene domain from a plurality of preset scene domains according to a semantic request of a current user, wherein the scene domains comprise a plurality of preset execution domains, and the execution domains comprise a plurality of tasks to be executed; determining a target execution domain from a plurality of preset execution domains in a target scene domain according to the semantic request; and executing the task to be executed in the target execution domain. The technical scheme of the application can facilitate the quick iteration of the voice conversation scene.

Description

Method, device and equipment for executing voice scene task and storage medium

Technical Field

The present application relates to the field of dialog engine technologies, and in particular, to a method, an apparatus, a device, and a storage medium for executing a voice scene task.

Background

In the related art, a dialogue logic is written into a system in a hard coding mode, so that a vehicle-end voice dialogue scene is realized. For example, if navigation of a voice conversation is desired, the execution logic of the voice conversation and the vehicle component is written into the system. If the execution logic has an update, then the entire system needs to be updated. If a new dialog scene needs to be added, new dialog logic needs to be rearranged and written into the system, and the system is updated. This approach is very inflexible and cannot update iterations quickly.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment and a storage medium for executing a voice scene task, which are used for solving the problems in the related technology, and the technical scheme is as follows:

in a first aspect, an embodiment of the present application provides a method for executing a voice scene task, including:

determining a target scene domain from a plurality of preset scene domains according to a semantic request of a current user, wherein the scene domains comprise a plurality of preset execution domains, and the execution domains comprise a plurality of tasks to be executed;

determining a target execution domain from a plurality of preset execution domains in a target scene domain according to the semantic request;

and executing the task to be executed in the target execution domain.

In one embodiment, determining a target scene domain from a plurality of preset scene domains according to a semantic request of a current user includes:

determining the print of the current user according to the semantic request;

searching the current dialogue context from the global dialogue context of the current user according to the print;

under the condition that the current conversation context is searched, determining a target scene according to a scene corresponding to the current conversation context;

and determining a target scene domain according to the target scene.

determining the print of the current user according to the semantic request;

under the condition that the current conversation context is not searched, searching a scene corresponding to the semantic request from a scene name directory tree to determine a target scene;

and determining a target scene domain according to the target scene.

In one embodiment, determining a target scene according to a scene corresponding to a current dialog context includes:

judging whether the current conversation context is reused;

and under the condition of multiplexing the current conversation context, determining a target scene according to the scene corresponding to the current conversation context.

judging whether the current conversation context is reused;

and searching a scene corresponding to the semantic request from the scene name directory tree to determine a target scene without multiplexing the current conversation context.

In one embodiment, determining a target execution domain from a plurality of preset execution domains in a target scene domain according to a semantic request includes:

determining the intention of the current user according to the semantic request;

and determining a target execution domain from a plurality of preset execution domains in the target scene domain according to the intention.

In one embodiment, prior to searching the current dialog context from the global dialog context of the current user based on the print, further comprising:

and determining the global conversation context of the current user from the global conversation context set according to the identity of the current user, wherein the global conversation context set comprises the global conversation contexts of a plurality of users.

In a second aspect, an embodiment of the present application provides an apparatus for executing a voice scene task, including:

the target scene domain determining module is used for determining a target scene domain from a plurality of preset scene domains according to a semantic request of a current user, wherein the scene domains comprise a plurality of preset execution domains, and the execution domains comprise a plurality of tasks to be executed;

the target execution domain determining module is used for determining a target execution domain from a plurality of preset execution domains in a target scene domain according to the semantic request;

and the task execution module is used for executing the task to be executed in the target execution domain.

In one embodiment, the target scene domain determining module is specifically configured to:

determining the print of the current user according to the semantic request;

and determining a target scene domain according to the target scene.

determining the print of the current user according to the semantic request;

and determining a target scene domain according to the target scene.

judging whether the current conversation context is reused;

judging whether the current conversation context is reused or not;

In one embodiment, the target execution domain determining module is specifically configured to:

In one embodiment, the apparatus further comprises:

and the global conversation context determining module is used for determining the global conversation context of the current user from the global conversation context set according to the identity of the current user before searching the current conversation context from the global conversation context of the current user according to the print, wherein the global conversation context set comprises the global conversation contexts of a plurality of users.

In a third aspect, an embodiment of the present application further provides an electronic device, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the above method.

In a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the above method.

The advantages or beneficial effects in the above technical solution at least include: the variable voice conversation scene is naturally isolated from the invariable execution engine, and the iteration of the conversation logic and the execution engine are not influenced mutually, so that the quick iteration of the voice conversation scene can be facilitated.

The foregoing summary is provided for the purpose of description only and is not intended to be limiting in any way. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features of the present application will be readily apparent by reference to the drawings and following detailed description.

Drawings

In the drawings, like reference numerals refer to the same or similar parts or elements throughout the several views unless otherwise specified. The figures are not necessarily to scale. It is appreciated that these drawings depict only some embodiments in accordance with the disclosure and are therefore not to be considered limiting of its scope.

FIG. 1 is a flowchart of a method for performing a speech scene task according to an embodiment of the present application;

FIG. 2 is an exemplary diagram of performing domain generation according to an embodiment of the present application;

FIG. 3 is an exemplary diagram of a scene domain according to an embodiment of the present application;

fig. 4 is a flowchart of a target scene domain determining method according to an embodiment of the present application;

FIG. 5 is a diagram illustrating an exemplary application of a method for performing a speech scene task according to an embodiment of the present application;

FIG. 6 is a diagram of another application example of a method for executing a speech scene task according to an embodiment of the present application;

FIG. 7 is a block diagram of an apparatus for performing a speech scene task according to an embodiment of the present disclosure;

fig. 8 is a schematic diagram of an electronic device according to an embodiment of the application.

Detailed Description

In the following, only certain exemplary embodiments are briefly described. As those skilled in the art will recognize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present application. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.

An embodiment of the present application provides a method for executing a voice scene task, as shown in fig. 1, the method includes:

step S101: determining a target scene domain from a plurality of preset scene domains according to a semantic request of a current user, wherein the scene domains comprise a plurality of preset execution domains, and the execution domains comprise a plurality of tasks to be executed;

step S102: determining a target execution domain from a plurality of preset execution domains in a target scene domain according to the semantic request;

step S103: and executing the task to be executed in the target execution domain.

In the embodiment of the application, the scene can be a vehicle-end conversation scene, such as a conversation in a vehicle control scene, a conversation in a driving scene, a conversation in a video entertainment scene, a conversation in a navigation scene, and the like. The vehicle control is understood to mean the control of vehicle body components, such as opening a vehicle door, opening a vehicle window, opening a vehicle lamp, adjusting a seat, etc. The driving scenario may be understood as a vehicle driving-related scenario, such as adjusting a suspension, controlling an Electronic Parking Brake (EPB), shifting gears, and the like.

Wherein, a scene domain can be respectively preset for each scene. Illustratively, the scenario domain includes a plurality of execution domains and a plurality of common keys; the execution domain is generated by logically arranging at least one task to be executed, and the task to be executed is generated by performing modal arrangement on modal data in at least one action domain; each modality data in the same scope has the same or associated data type. The details will be explained below.

(1) Scope of action

The scope may include a parameter domain (Param), a Tag domain (Tag), a Signal domain (Signal), a Service domain (Service), a Request domain (Request), a Mask domain (Mask), a Response domain (Response), and a Sandbox domain (Sandbox), among others.

Illustratively, the parameter domain may be used to store parameters transmitted by a scenario domain, an execution domain, a task to be executed, and the like when scheduling a runtime across blocks. The label domain can be used for bearing a user big data portrait centered on data, such as labels of age, sex, hobbies, behavior habits and the like, and is a user label pool formed by a plurality of labels. There may be multiple tag fields, one for each user, to cache the user's large data representation.

The signal field may be used to carry modal data of a Vehicle sensing device, modal data of a environment sensing (V2X) device, modal data of a wearable device, and the like, and is a signal pool that can be programmed, observed (monitored or monitored), and schedulable. There may be a plurality of signal fields, and one device corresponds to one signal field, and is used for buffering the signal mode data of the device.

The contextual dialog and the interactive scenario script need to call an Application Programming Interface (API) of an Application in the system and a service outside the system, and the service domain is used for encapsulating the calling details of the service and arranging data returned by the service. The applications in the system can include navigation, multimedia, radio (FM), account numbers, digital cars and the like, and the services outside the system can serve scenes of car enterprises, such as the geek world, and can also serve third parties, such as Artificial Intelligence (AI) service irobot.

The request field may be used in a passive dialog to generate a request field based on a single semantic input by the user. The modality data of the request field may be encoded by the encoder as data that is choreographed in the hidden field. The hidden field can be used for carrying a dialog context corresponding to the current semantic input and is multiplexed or regenerated by the dialog and the interactive scene according to needs, and the hidden field and the execution fields are bound, and each execution field is bound with one hidden field and is used for carrying the dialog context of the execution field. The response field can be used to carry the response result of services such as active dialog, passive dialog, interactive dialog, timeline, listener, etc. generated by the decoder decoding from the modal data in the hidden field bound to the enforcement field. The sandbox domain may carry the global dialog context of a device or a user, which is a super context for isolating programmable data of different devices or different users.

It is understood that during the execution of the task in the execution domain, data is generated, which is stored as a dialog context in the hidden domain, and the hidden domain is bound to the execution domain to record the dialog. The sandbox domain stores the global dialog context, including all the dialog contexts.

(2) Execution Domain

The task to be executed is generated by performing modality arrangement on modality data in at least one scope. Illustratively, as shown in FIG. 2, by modality arrangement, it can be decided how to run modality data in the scope, and operators may include comparison operations, containment operations, assignment operations, and the like. Operators can be called when generating tasks to be executed. In the process of modality arrangement (operation), a symbol for calling the modality data in each scope can be preset, for example, the modality data in the parameter domain can be called through (:), the modality data in the tag domain can be called through (|), the modality data in the signal domain can be called through (-), the modality data in the service domain can be called through (@), the modality data in the request domain can be called through (%), the modality data in the hidden domain can be called through ($), the modality data in the response domain can be called through ^ and the modality data in the sandbox domain can be called through (#).

The task to be executed is a basic dialogue or service scene unit, does not support independent execution, and is a minimum unit forming an execution domain. The tasks to be executed may be standard tasks, i.e., tasks composed of a set of conditions and an execution group, or may be logical tasks, i.e., logical block tasks composed of logical conditions such as if, elseif, else, and foreach, or may include simple tasks, i.e., tasks for quickly finding an assignment (value) by a key (key) and changing the state to END (END) or RETURN (RETURN).

Therefore, a service developer can determine a task to be executed according to a service requirement, call modal data in a corresponding action domain according to the task requirement of the task to be executed, and perform operation to generate the task to be executed. Further, an execution domain may be generated by logically arranging one or more tasks to be executed.

Illustratively, as shown in FIG. 2, the logical orchestration may include referencing (using) the task to be performed or embedding (embed) the task to be performed in the execution domain. Execution domains may include simple domains, complex domains, and aggregate domains, depending on the logical arrangement.

The simple domain comprises one or more first tasks to be executed, wherein the first tasks to be executed are used for realizing a single-turn dialogue service, namely the simple domain is used for bearing an executable single-turn dialogue or a simple active service scene or a passive service scene. The complex domain includes one or more second tasks to be executed, and the second tasks to be executed are used for realizing multiple rounds of dialogue services, that is, the complex domain is used for bearing executable multiple rounds of contextual dialogue or complex interactive service scenarios. The aggregation domain includes one or more first tasks to be executed and one or more second tasks to be executed, that is, the complex domain is used for bearing executable multi-turn, cross-domain and aggregation scenario type conversations, or an interactive service scenario aggregated by a plurality of single, isolated and incoherent capability points, service segments, scenario segments and the like.

Therefore, the execution domain may schedule one task to be executed or schedule a plurality of tasks to be executed, that is, n is a positive integer greater than or equal to 1 in fig. 2.

(3) Scene domain

As shown in fig. 3, the scenario domain includes a plurality of execution domains and a plurality of common keys, where the execution domains may be simple domains and/or complex domains and/or aggregate domains.

In one embodiment, the public key includes a third task to be executed, and the third task to be executed is set according to the multiplexing times of the third task to be executed by the corresponding scene domain. For example, the third task to be executed may be the first task to be executed or may be the second task to be executed. For example, according to the service requirement of a certain scenario domain, if the multiplexing number of a certain first task to be executed exceeds a preset number, the first task to be executed may be directly placed in the scenario domain as a common key in the scenario domain. The "task (Atomic)" shown in fig. 3 is the third task to be executed.

In one embodiment, the common key may include an atomic operation module for encapsulating the vehicle signal invocation operation. The common key may further include at least one of a time axis module, a natural language generation module, and a guidance module.

Wherein, by calling the time axis module, the task with the time axis can be generated or executed. By invoking a natural speech generation (NLG) module, user-oriented dialog results may be generated. By invoking the guidance module, a user-oriented guidance interface, such as "next" in the navigation setup process, may be generated.

The public key may include a service module that can call an API of an application within the system and a service outside the system by calling the service module. The common key may include a signal module to listen for signal modality data that is highly relevant to the scene domain or that has a frequency of use exceeding a preset frequency.

The common key may include a jumper in which a re-entry policy of a scene domain in which it is located, including a jump-out policy and a jump-in policy, is configured. Based on the jumping-out policy, it can be determined whether the scene domain can be jumped out, i.e. the task execution process of the scene domain is stopped. Based on the jump-in policy, it may be determined whether the scene domain can be entered.

In addition, the public key may also include configuration, templates, macros, etc., with each scene domain also corresponding to an interpretable script. The script for each scene domain is generated by logically arranging the execution domains and the common keys of the scene domain.

That is, a scene domain may be understood as a logical region that is composed of a class of elements with certain dependencies, such as configuration, template, macro, semaphore, timeline, codec, atomic operation, task, execution domain, etc., that are loosely coupled together.

For example, in step S101, acquiring the semantic request of the current user may include: and receiving the voice of the current user to obtain the semantic request of the current user. The semantic request may include information such as a field, an intention, a word slot, a print (dookie), and the like, and further, based on the information, a target scene domain and a target execution domain may be determined, and one or more tasks to be executed in the target execution domain may be executed, that is, a corresponding dialog scene may be implemented.

In one embodiment, as shown in fig. 4, step S101 may include:

step S401: determining the print of the current user according to the semantic request;

step S402: searching the current dialogue context from the global dialogue context of the current user according to the print;

step S403: under the condition that the current conversation context is searched, determining a target scene according to a scene corresponding to the current conversation context;

step S404: and determining a target scene domain according to the target scene.

Based on the above descriptions of hidden and sandboxed domains, the global dialog context of the current user is stored in the sandboxed domain of the user, and each dialog corresponds to a dialog context stored in the corresponding hidden domain. The print is generated by the engine and has a one-to-one binding relationship with the hidden domain, so that the print is a password for searching the hidden domain by the engine, and the hidden domain corresponding to the print can be searched from the sandbox domain of the current user according to the print. And under the condition that the corresponding hidden domain is searched, determining a target scene according to the scene corresponding to the hidden domain, wherein the target scene corresponds to the target scene domain.

In one embodiment, before step S402, the method may include: and according to the identity of the current user, determining the global conversation context of the current user from a global conversation context set (a database of a scope holder), wherein the global conversation context set comprises the global conversation contexts of a plurality of users.

Illustratively, in the database of the scope holder, the global conversation context of each user and each device is included, and the global conversation context of the current user can be obtained according to the identity of the current user.

Furthermore, the database of the scope holder also includes data structures such as an executable scenario script (schema), a sandbox domain, a signal domain, a tag domain, a registered listener, a request session and the like, and the database is used by the engine.

In one embodiment, as shown in fig. 4, step S101 may further include:

step S405: and in the case that the current conversation context is not searched, searching a scene corresponding to the semantic request from the scene name directory tree to determine a target scene.

Illustratively, in the case that the corresponding hidden field is not searched, according to the field, the intention, the word slot and the print in the semantic request, the scene corresponding to the semantic request is searched from the scene name directory tree as the target scene, and then the target scene field is determined.

In one embodiment, in step S403, determining a target scene according to a scene corresponding to the current dialog context may include: judging whether the current conversation context is reused; under the condition of multiplexing the current conversation context, determining a target scene according to a scene corresponding to the current conversation context; and searching a scene corresponding to the semantic request from the scene name directory tree to determine a target scene without multiplexing the current conversation context.

Illustratively, when the current conversation context is searched, namely the corresponding hidden domain is searched, the logic judges whether the hidden domain is multiplexed or not, if the hidden domain is multiplexed, the target scene is directly obtained according to the hidden domain, and if the hidden domain is not multiplexed, the target scene corresponding to the semantic request is searched from the scene name directory tree according to the field, the intention, the word slot and the print in the semantic request.

In one embodiment, step S102 may include: determining the intention of the current user according to the semantic request; and determining a target execution domain from a plurality of preset execution domains in the target scene domain according to the intention.

In one application example, the method may be used to perform standard tasks, as shown in FIG. 5. Specifically, the scene execution module builds a request field (build request) according to the semantic request of the user, calls a passive engine to create a new session (dialog), and binds the session. Then, scene localization (determining a target scene domain) and version localization are carried out. Specifically, the target scene domain may be determined by the method of steps S401 to S405. The version positioning may include performing version orientation based on a mapping relationship between a name of a target scene and an identity of a current user, and device information such as a vehicle type, a vehicle series, a system version, and the like. Next, the encoder encodes data, i.e., data related to the current dialog in the request domain, the hidden domain, and the parameter domain, for execution in the execution domain. In the execution of the execution domain, the execution domain localization (determining the target execution domain) is performed first, and the execution domain localization may be performed with reference to the method of step S102.

And then, performing task iteration and execution on the tasks to be executed in the target execution domain, wherein the task iteration and execution comprise sequential execution/return or ending/jumping. Specifically, by the task execution factory. In this application example, the task to be executed is a standard task. The execution condition set can perform condition determination and macro execution according to the existing several IF expressions (nesting of no more than two layers can be supported). Wherein, during the macro execution, the macro is searched by means of reference (using).

When the execution result of the condition determination of the execution condition group is true (true), the execution group executes the standard task, such as executing a service (calling an API of an application in the system or an off-system service), executing evaluation according to the evaluation expression, and executing an action according to the condition. Performing actions includes performing assignments, performing NLGs, performing guidance, and the like. The NLG public key can be searched in a reference mode, NLG is executed, a dialogue result is returned to the user, the guidance public key can be searched in the reference mode, guidance is executed, and a guidance result is returned to the user.

Among them, binding a session can be understood as: a semantic request is generated by an engine to generate a session, the session binds references of an executable scene script (schema), a sandbox domain, a signal domain and a label domain, binds hidden domain references multiplexed or generated by the engine, and binds a request domain, a response domain, a parameter domain and a service domain dynamically generated by the engine. The session is bound by the scope holder through a Thread Local variable (Thread Local) mechanism to the identity rid of the scope holder, and the successor of the scope holder searches the session quickly and conveniently from the global context through the rid. The engine will maintain the relation between rid and time domain (session) through the Thread Local mechanism.

After the task is executed, the data in the process is decoded, and the data is called back to the listener for caching, and then the session is unbound.

In yet another application example, the method may be used to perform simple tasks, as shown in FIG. 6. Specifically, the scene execution module builds a request field (build request) according to the semantic request of the user, calls a passive engine to create a new session (dialog), and binds the session. Then, scene localization (determining a target scene domain) and version localization are carried out. Specifically, the target scene domain may be determined by the method of steps S401 to S405. The version positioning may include performing version orientation based on a mapping relationship between a name of a target scene and an identity of a current user, and device information such as a vehicle type, a vehicle series, a system version, and the like. Next, the encoder encodes data, i.e., data related to the current dialog in the request domain, the hidden domain, and the parameter domain, for execution in the execution domain. In the execution of the execution domain, the execution domain localization (determining the target execution domain) is performed first, and the execution domain localization may be performed with reference to the method of step S102.

And then, performing task iteration and execution on the tasks to be executed in the target execution domain, wherein the task iteration and execution comprise sequential execution/return or ending/jumping. Specifically, by the task execution factory. In this application example, the task to be executed is a simple task. If the valuation is performed according to the valuation expression, the action is performed according to the condition. Performing actions includes performing assignments, performing NLGs, performing boot, etc. The NLG public key can be searched in a reference mode, NLG is executed, a dialogue result is returned to the user, the guidance public key can be searched in the reference mode, guidance is executed, and a guidance result is returned to the user.

After the task is executed, the data in the process is decoded, and the data is called back to the listener for caching, and then the conversation is unbound.

According to the implementation method of the embodiment of the application, various data (such as vehicle data, user data, environment data and the like) in the use process of the vehicle can be comprehensively obtained, and after the data is abstracted (in the form of modal data), the data is classified (readable, writable, readable and writable) and stored in the action domain. Then, the service developer can arrange the modal data in the action domain according to the actual requirement to generate a functional unit (task to be executed); a plurality of functional units are arranged/combined to form a relatively complete service unit, and are classified and stored in the execution domain. Therefore, service developers can conveniently arrange various conversation logics according to actual requirements so as to realize corresponding passive conversation scenes.

Compared with the prior art in which the dialogue logic is hard-coded and written down or configured through a form, in the execution method of the embodiment of the application, due to the fact that the design process of the business unit (dialogue logic) is disassembled, abstracted and modeled, the script language based on the XML can be obtained, and then the human-computer dialogue process suitable for the use habit of the user can be developed and iterated on line quickly based on the script language, so that the design of the business unit can be completed completely by a developer on line, and the business unit can be deployed to a vehicle end quickly through a heating updating operation channel.

In addition, the variable dialogue logic (the arrangement of domains) and the invariable execution engine are naturally isolated, and the iteration of the dialogue logic and the execution engine are not influenced mutually, so that the quick iteration of the dialogue logic can be facilitated.

An embodiment of the present application further provides an apparatus for executing a voice scene task, as shown in fig. 7, the apparatus includes:

a target scene domain determining module 701, configured to determine a target scene domain from multiple preset scene domains according to a semantic request of a current user, where the scene domain includes multiple preset execution domains, and the execution domain includes multiple tasks to be executed;

a target execution domain determining module 702, configured to determine a target execution domain from a plurality of preset execution domains in a target scene domain according to a semantic request;

the task execution module 703 is configured to execute the task to be executed in the target execution domain.

In an embodiment, the target scene domain determining module 701 is specifically configured to:

determining the print of the current user according to the semantic request;

and determining a target scene domain according to the target scene.

determining the print of the current user according to the semantic request;

and determining a target scene domain according to the target scene.

judging whether the current conversation context is reused;

In one embodiment, the apparatus further comprises:

The functions of each module in each apparatus in the embodiment of the present application may refer to corresponding descriptions in the above method, and are not described herein again.

Fig. 8 shows a block diagram of an electronic device according to an embodiment of the present application. As shown in fig. 8, the apparatus includes: a memory 801 and a processor 802, the memory 801 having stored therein instructions executable on the processor 802. The processor 802, when executing the instructions, implements any of the methods in the embodiments described above. The number of the memory 801 and the processor 802 may be one or more. The terminal or server is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The terminal or server may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.

The device may further include a communication interface 803 for communicating with an external device for data interactive transmission. The various devices are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor 802 may process instructions for execution within the terminal or server, including instructions stored in or on a memory to display graphical information of a GUI on an external input/output device (such as a display device coupled to an interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple terminals or servers may be connected, with each device providing portions of the necessary operations (e.g., as an array of servers, a group of blade servers, or a multi-processor system). The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 8, but this is not intended to represent only one bus or type of bus.

Optionally, in an implementation, if the memory 801, the processor 802, and the communication interface 803 are integrated on a chip, the memory 801, the processor 802, and the communication interface 803 may complete communication with each other through an internal interface.

It should be understood that the processor may be a Central Processing Unit (CPU), other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or any conventional processor or the like. It is noted that the processor may be a processor supporting advanced reduced instruction set machine (ARM) architecture.

Embodiments of the present application provide a computer-readable storage medium (such as the above-mentioned memory 801) storing computer instructions, which when executed by a processor implement the methods provided in embodiments of the present application.

Optionally, the memory 801 may include a program storage area and a data storage area, wherein the program storage area may store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of a terminal or a server, and the like. Further, the memory 801 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 801 may optionally include memory located remotely from the processor 802, which may be connected to a terminal or server via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more (two or more) executable instructions for implementing specific logical functions or steps in the process. And the scope of the preferred embodiments of the present application includes other implementations in which functions may be performed out of the order shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. All or part of the steps of the method of the above embodiments may be implemented by hardware that is configured to be instructed to perform the relevant steps by a program, which may be stored in a computer-readable storage medium, and which, when executed, includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module may also be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various changes or substitutions within the technical scope of the present application, and these should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for executing a voice scene task is characterized by comprising the following steps:

determining a target execution domain from a plurality of preset execution domains in the target scene domain according to the semantic request;

and executing the task to be executed in the target execution domain.

2. The method of claim 1, wherein determining the target scene domain from a plurality of preset scene domains according to the semantic request of the current user comprises:

determining the print of the current user according to the semantic request;

searching a current dialog context from the global dialog context of the current user according to the print;

and determining the target scene domain according to the target scene.

3. The method of claim 1, wherein determining the target scene domain from a plurality of preset scene domains according to the semantic request of the current user comprises:

determining the print of the current user according to the semantic request;

and determining the target scene domain according to the target scene.

4. The method of claim 2, wherein determining a target scene according to the scene corresponding to the current dialog context comprises:

judging whether to multiplex the current conversation context;

and under the condition of multiplexing the current conversation context, determining the target scene according to the scene corresponding to the current conversation context.

5. The method of claim 2, wherein determining a target scene according to the scene corresponding to the current dialog context comprises:

judging whether to reuse the current conversation context;

and under the condition of not multiplexing the current conversation context, searching a scene corresponding to the semantic request from a scene name directory tree to determine the target scene.

6. The method of claim 1, wherein determining a target execution domain from a plurality of preset execution domains in the target scene domain according to the semantic request comprises:

and determining the target execution domain from a plurality of preset execution domains in the target scene domain according to the intention.

7. The method of claim 2, wherein prior to searching for a current conversation context from the current user's global conversation context based on the print, further comprising:

and determining the global conversation context of the current user from a global conversation context set according to the identity of the current user, wherein the global conversation context set comprises the global conversation contexts of a plurality of users.

8. An apparatus for performing a speech scene task, comprising:

the system comprises a target scene domain determining module, a task executing module and a task executing module, wherein the target scene domain determining module is used for determining a target scene domain from a plurality of preset scene domains according to a semantic request of a current user, the scene domains comprise a plurality of preset executing domains, and the executing domains comprise a plurality of tasks to be executed;

the target execution domain determining module is used for determining a target execution domain from a plurality of preset execution domains in the target scene domain according to the semantic request;

9. The apparatus of claim 8, wherein the target scene domain determining module is specifically configured to:

determining the print of the current user according to the semantic request;

and determining the target scene domain according to the target scene.

10. The apparatus of claim 8, wherein the target scene domain determining module is specifically configured to:

determining the print of the current user according to the semantic request;

and determining the target scene domain according to the target scene.

11. The apparatus of claim 9, wherein the target scene domain determining module is specifically configured to:

judging whether to reuse the current conversation context;

12. The apparatus of claim 9, wherein the target scene domain determining module is specifically configured to:

judging whether to multiplex the current conversation context;

13. The apparatus of claim 8, wherein the target execution domain determining module is specifically configured to:

14. The apparatus of claim 9, further comprising:

and the global conversation context determining module is used for determining the global conversation context of the current user from a global conversation context set according to the identity of the current user before searching the current conversation context from the global conversation context of the current user according to the print, wherein the global conversation context set comprises the global conversation contexts of a plurality of users.

15. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 7.

16. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.