CN112992145B

CN112992145B - Offline online semantic recognition arbitration method, electronic device and storage medium

Info

Publication number: CN112992145B
Application number: CN202110503801.XA
Authority: CN
Inventors: 杨竞喆; 孙晓欣; 曹阳
Original assignee: Hubei Ecarx Technology Co Ltd
Current assignee: Ecarx Hubei Tech Co Ltd
Priority date: 2021-05-10
Filing date: 2021-05-10
Publication date: 2021-08-06
Anticipated expiration: 2041-05-10
Also published as: CN112992145A

Abstract

The embodiment of the invention provides an offline online semantic recognition arbitration method, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a user voice instruction, and executing local voice semantic processing and online voice semantic processing on the user voice instruction in a network connection state; when a local voice semantic recognition result is preferentially obtained and the vertical domain type contained in the local voice semantic recognition result is offline supported, judging whether content data aiming at a user voice instruction is needed or not, and if not, broadcasting feedback information; if yes, judging whether the local cache contains content data aiming at the voice instruction of the user, and if yes, broadcasting the content data; if not, directly calling an online content search interface, and waiting for content data returned online; and if the content data are not returned online within the first time period, a voice prompt for prompting that the content data are not searched is reported. The response speed and the stability of voice interaction are improved.

Description

Offline online semantic recognition arbitration method, electronic device and storage medium

Technical Field

The invention relates to the technical field of intelligent voice interaction, in particular to an offline online semantic recognition arbitration method, electronic equipment and a storage medium.

Background

The intelligent voice assistant is widely applied to vehicle-mounted scenes, most of modern vehicles support networking along with the development of the vehicle networking, but because the environment of the vehicle is changed frequently, the network environment of vehicle-mounted equipment is quite unstable, and therefore the modern vehicle-mounted intelligent voice assistant generally adopts a voice semantic processing mode of off-line integration.

The computing power of the cloud is stronger, the latest technology can be adopted for online voice semantic processing, and the online voice semantic processing effect is generally superior to the offline effect, so that the current vehicle-mounted voice assistant carries out voice feedback by adopting an online result.

However, the scheme of performing voice feedback by using online results has high dependence on the network, and if the network is good, the whole interaction is smooth, and the feedback result is accurate. However, the vehicle-mounted network environment is quite unstable, and in the case of a poor network, the vehicle-mounted network environment needs to wait for the online voice processing result all the time, and the process may last for nearly 10 seconds, so that the voice interaction time is prolonged. In addition, when the network is very poor, the user is likely to feed back the user 'the network state is not good and please retry' after waiting for nearly 10 seconds, which shows that the existing vehicle-mounted voice interaction processing has low response speed and unstable voice interaction under the condition of poor network environment, and thus the user experience is poor.

Disclosure of Invention

The embodiment of the invention aims to provide an offline online semantic recognition arbitration method, electronic equipment and a storage medium, so as to improve the response speed and stability of voice interaction. The specific technical scheme is as follows:

in order to achieve the above object, an embodiment of the present application provides an offline online semantic recognition arbitration method, where the method includes:

acquiring a user voice instruction and judging a network connection state;

when the network state is a connection state, local voice semantic processing and online voice semantic processing are respectively executed on the user voice instruction;

when a semantic recognition result of local voice semantic processing is preferentially obtained and a vertical domain type contained in the semantic recognition result is a vertical domain type supported by an offline, judging whether content data aiming at the user voice instruction is needed or not, and if not, broadcasting feedback information aiming at the semantic recognition result; if yes, judging whether the local cache contains content data aiming at the user voice instruction, and if yes, broadcasting the content data;

if not, directly calling an online content search interface, and waiting for content data which is returned online and aims at the user voice instruction within a preset first time period; if the content data are not returned online within the first time period, broadcasting a voice prompt for prompting that the content data are not searched;

when the semantic recognition result of local voice semantic processing is preferentially obtained and the vertical domain type contained in the semantic recognition result is the vertical domain type only supported online, waiting for feedback information which is returned online and is based on the semantic recognition result of online voice semantic processing within a preset second time length; and if no feedback information is returned online within the second duration, broadcasting a voice prompt for prompting that the operation related to the vertical domain category cannot be executed.

Optionally, the method further includes:

and when the semantic recognition result of the online voice semantic processing is preferentially obtained, stopping executing the local voice semantic processing, and receiving feedback information which is returned online and aims at the semantic recognition result of the online voice semantic processing.

Optionally, the method further includes:

when a semantic recognition result of local voice semantic processing is obtained preferentially and the semantic recognition result represents that the user voice instruction cannot be understood, waiting for feedback information which is returned online within a preset third time length and is based on the semantic recognition result of online voice semantic processing; and if no feedback information is returned online in the third time length, broadcasting a voice prompt for prompting that the voice instruction of the user cannot be understood.

Optionally, the method further includes:

and broadcasting a guidance prompt aiming at the vertical domain category contained in the semantic recognition result when the semantic recognition result of the local voice semantic processing is obtained preferentially and the semantic recognition result represents that the command semantic aiming at the vertical domain category contained in the semantic recognition result can not be understood.

Optionally, the broadcasting is used for prompting a voice prompt that the feedback information is not searched, and the broadcasting includes:

if the semantic recognition result of the local voice semantic processing contains an effective entity, broadcasting a voice prompt for prompting that the content data of the vertical domain type aiming at the effective entity is not searched;

and if the semantic recognition result of the local voice semantic processing does not contain an effective entity, broadcasting a voice prompt for prompting that the content data aiming at the vertical domain category is not searched.

Optionally, the method further includes:

when the network state is a non-connection state, local voice semantic processing is executed on the user command voice;

when the vertical domain type contained in the semantic recognition result of the local voice semantic processing is the vertical domain type supported offline, judging whether the content data aiming at the user voice instruction is needed or not, and if not, broadcasting feedback information aiming at the semantic recognition result; if yes, judging whether the local cache contains content data aiming at the user voice instruction, and if yes, broadcasting the content data;

if not, a voice prompt for prompting that the content data cannot be searched is broadcast;

and when the vertical domain type contained in the semantic recognition result of the local voice semantic processing is the vertical domain type only supported online, broadcasting a voice prompt for prompting that the operation related to the vertical domain type cannot be executed.

Optionally, the method further includes:

and when the semantic recognition result representation of the local voice semantic processing cannot understand the user voice instruction, broadcasting voice prompt for prompting that the user voice instruction cannot be understood.

Optionally, the method further includes:

and when the semantic recognition result representation of the local voice semantic processing cannot understand the command semantics of the vertical domain category contained in the semantic recognition result, broadcasting a guide prompt aiming at the vertical domain category contained in the semantic recognition result.

In order to achieve the above object, an embodiment of the present application further provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing any method step when executing the program stored in the memory.

To achieve the above object, an embodiment of the present application further provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements any of the above method steps.

The embodiment of the invention has the following beneficial effects:

by applying the offline online semantic recognition arbitration method, the electronic equipment and the storage medium provided by the embodiment of the application, the voice instruction of the user is obtained, and the network connection state is judged; when the network state is a connection state, local voice semantic processing and online voice semantic processing are respectively executed on the user voice instruction; when a semantic recognition result of local voice semantic processing is preferentially obtained and a vertical domain type contained in the semantic recognition result is a vertical domain type supported by an offline, judging whether content data aiming at a user voice instruction is needed or not, and if not, broadcasting feedback information aiming at the semantic recognition result; if yes, judging whether the local cache contains content data aiming at the voice instruction of the user, and if yes, broadcasting the content data; if not, directly calling an online content search interface, and waiting for content data which is returned online and aims at the user voice instruction within a preset first time period; if the content data are not returned online within the first time period, a voice prompt for prompting that the content data are not searched is broadcast; when the semantic recognition result of local voice semantic processing is preferentially obtained and the vertical domain type contained in the semantic recognition result is the vertical domain type only supported online, waiting for feedback information of the semantic recognition result based on online voice semantic processing returned online within a preset second time length; and if no feedback information is returned online within the second time length, broadcasting a voice prompt for prompting that the operation related to the vertical domain category cannot be executed.

Therefore, compared with the existing scheme of performing voice feedback by adopting an online result, under the condition of poor network, information feedback can be performed on the semantic recognition result based on local voice semantic processing, offline cache data corresponding to different semantic recognition results are cached locally in advance, most voice interaction requirements can be met, and the response speed of most interaction instructions under the condition of weak network is obviously improved. Even if online network search is needed, the user can not wait for too long time, and the response speed and stability of voice interaction are improved.

In addition, since the locally cached offline cache data can meet most of user command voices, the scenes waiting for online feedback results can be greatly reduced, and the pressure of the cloud server is reduced.

Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.

FIG. 1 is a schematic flow chart illustrating an offline online semantic recognition arbitration method according to an embodiment of the present disclosure;

FIG. 2 is a schematic flow chart illustrating an offline online semantic recognition arbitration method according to an embodiment of the present disclosure;

FIG. 3 is a schematic flow chart illustrating a method for offline online semantic recognition arbitration according to an embodiment of the present application;

FIG. 4 is a schematic structural diagram of an offline online semantic recognition arbitration device according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived from the embodiments given herein by one of ordinary skill in the art, are within the scope of the invention.

In order to solve the technical problems of low response speed and unstable voice interaction of the conventional vehicle-mounted voice semantic processing under the condition of poor network environment, the embodiment of the application provides an offline online semantic recognition arbitration method, electronic equipment and a storage medium.

The method can be applied to vehicle-mounted equipment and specifically comprises voice acquisition equipment, voice semantic processing equipment and voice broadcasting equipment, wherein the voice semantic processing equipment can comprise local voice semantic processing equipment and online voice semantic processing equipment.

Referring to fig. 1, fig. 1 is a schematic flowchart of an offline online semantic recognition arbitration method according to an embodiment of the present application, and as shown in fig. 1, the method may include the following steps:

s101: and acquiring a user voice instruction and judging the network connection state.

In the embodiment of the application, the voice acquisition equipment can acquire the voice instruction of the user and transmit the voice instruction of the user to the voice semantic processing equipment.

In addition, the vehicle-mounted device can judge the network connection state, wherein the network connection state comprises a connection state and a non-connection state.

S102: and when the network state is a connection state, local voice semantic processing and online voice semantic processing are respectively executed on the user voice instruction.

In the embodiment of the application, when the network state of the vehicle is the connection state, that is, the vehicle is connected to the network, the local voice semantic processing and the online voice semantic processing can be simultaneously executed on the voice command respectively.

Specifically, a local voice semantic processing module exists at the local end, and an online voice semantic processing module exists at the network end.

The local voice semantic processing module comprises: the local voice recognition module and the local natural language processing module; the on-line voice semantic processing module comprises: the system comprises an online voice recognition module and an online natural language processing module.

At the local end, the local voice recognition module recognizes and converts the user voice instruction to obtain a local recognition text, and the local natural language processing module can perform vertical domain classification, intention classification and entity extraction by combining context scene information to obtain a semantic recognition result of local voice semantic recognition.

The vertical field can represent the field related to the user voice instruction, such as music, weather, and the like; the intent may represent an operation that the user desires to perform, such as query, purchase, open, etc.; the entity may represent a specific operation object, which may be, for example, a date, a place, a person, and the like.

As an example, for the voice instruction "help me inquire about the flight of tomorrow beijing flying to shanghai", in the semantic recognition result, the vertical domain is classified as "flight", the intention is classified as "inquiry", and the entities are extracted as "tomorrow", "beijing", and "shanghai".

In the embodiment of the application, the local natural language processing can be realized by adopting a deep learning technology.

At a network end, an online voice recognition module synchronously recognizes and converts user language instructions to obtain an online recognition text, and an online natural language processing module can combine context scene information to perform vertical domain classification, intention classification and entity extraction to obtain a semantic recognition result of online semantic recognition.

In the embodiment of the application, semantic recognition results obtained by local voice semantic processing and online voice semantic processing are fed back to the local-end dialogue management module, and the dialogue management module can judge according to the feedback speed of the local voice semantic processing and the online voice semantic processing, the semantic recognition results and the data cache condition of local cache to complete subsequent voice interaction.

S103: when a semantic recognition result of local voice semantic processing is preferentially obtained and a vertical domain type contained in the semantic recognition result is a vertical domain type supported by an offline, judging whether content data aiming at a user voice instruction is needed or not, and if not, broadcasting feedback information aiming at the semantic recognition result; if the local cache contains the content data aiming at the voice instruction of the user, the content data is broadcasted.

In the embodiment of the application, if the semantic recognition result of the local voice semantic processing is preferentially obtained, whether the vertical domain type contained in the semantic recognition result of the local voice semantic processing is the vertical domain type supported by the offline is judged.

The vertical domain category may include: a vertical domain category supported offline and a vertical domain category supported only online. For the vertical domain type supported by the offline, the processing can be performed on line or offline. For the vertical domain category supported only online, only online processing is possible.

For example, in the embodiment of the present application, information such as flight, weather, and the like may be stored in a local cache in advance, so that both "flight" and "weather" are vertical domain categories supported offline; the idiom dragon-joining game cannot be processed offline, and is a vertical domain type supported only online.

In the embodiment of the application, when the semantic recognition result of the local voice semantic processing is preferentially obtained and the vertical domain type contained in the semantic recognition result is the vertical domain type supported by the offline, whether the content data aiming at the voice instruction of the user is needed or not can be judged.

Specifically, when the user voice command belongs to the control class command, the content data for the user voice command is not generally needed; when the user voice instruction belongs to a query class instruction, content data for the user voice instruction is generally required.

For example, a user voice instruction of turning on an air conditioner belongs to a control type instruction, and content data does not need to be fed back to the user; the user instruction 'inquiring weather of Beijing' belongs to an inquiry type instruction and needs to feed back content data to a user.

If the content data aiming at the voice instruction of the user is not needed, the feedback information aiming at the semantic recognition result can be directly broadcasted.

As an example, if the user voice command is "turn on the air conditioner", and the corresponding vertical domain type in the semantic recognition result is the air conditioner, which is supported offline and does not relate to content data, the feedback information, such as "turn on the air conditioner", may be directly broadcast after the operation is completed.

And if the content data aiming at the user voice instruction is needed, judging whether the local cache contains the content data aiming at the user voice instruction. That is, whether the local cache contains the corresponding content data is queried, and if so, the content data is broadcasted.

As an example, if the user voice instruction is "see the weather of beijing", the corresponding type of the vertical domain in the semantic recognition result is weather, the vertical domain is supported offline, and the content data "beijing weather" is related, it may be queried whether "beijing weather" is cached in the local cache, and if so, the broadcast may be performed.

The local cache content may be obtained from the cloud in advance, for example, new data such as "weather", "flight", and the like is obtained from the cloud every 15 minutes.

S104: if not, directly calling an online content search interface, and waiting for content data which is returned online and aims at the user voice instruction within a preset first time period; and if the content data are not returned online within the first time period, a voice prompt for prompting that the content data are not searched is reported.

In the embodiment of the application, if the local cache does not contain the content data for the user voice instruction, the online content search interface can be directly called, and the content data for the user voice instruction, which is returned online, is waited for within the preset first time period.

Specifically, an online content search interface may be configured in the car machine system, and is used to search for locally uncached data. If the content data is returned online within the first time period, broadcasting the content data; and if the content data are not returned online within the first time period, a voice prompt for prompting that the content data are not searched is reported.

In addition, when voice prompt for prompting that the content data is not searched is broadcasted, if the semantic recognition result of the local voice semantic processing contains an effective entity, the voice prompt for prompting that the content data of the vertical domain type aiming at the effective entity is not searched is broadcasted; and if the voice recognition result of the local voice semantic processing does not contain an effective entity, broadcasting a voice prompt for prompting that the content data aiming at the vertical domain category is not searched.

As an example, the preset first time duration is 4 seconds, if the user voice instruction is "see down the weather in shanghai", the corresponding vertical domain category in the semantic recognition result is weather, which is offline supported, and the valid entity "shanghai" can be extracted, but the "shanghai weather" is not cached in the local cache in advance, the online content search interface is invoked to search for "shanghai weather", and if the online returned content data is obtained within 4 seconds, the content data is broadcasted, for example, "shanghai city weather is today fine, the temperature is 11 ℃ to 23 ℃; if the content data returned online are not obtained within 4 seconds, broadcasting a voice prompt for prompting that the content data aiming at the valid entity droop field category are not searched, for example, "weather information of Shanghai city cannot be found, please try again at a later time"; if the effective entity 'shanghai' is not extracted, a voice prompt for prompting that the content data aiming at the vertical domain category is not searched can be broadcasted, for example, 'weather information cannot be found, and a user can try the bar again later'.

S105: when the semantic recognition result of local voice semantic processing is preferentially obtained and the vertical domain type contained in the semantic recognition result is the vertical domain type only supported online, waiting for feedback information of the semantic recognition result based on online voice semantic processing returned online within a preset second time length; and if no feedback information is returned online within the second time length, broadcasting a voice prompt for prompting that the operation related to the vertical domain category cannot be executed.

In the embodiment of the application, when the semantic recognition result of the local speech semantic processing is preferentially obtained and the vertical domain category included in the semantic recognition result is the vertical domain category supported only online, the feedback information of the semantic recognition result based on the online speech semantic processing returned online needs to be waited.

As an example, the preset second duration is 4 seconds, when the semantic recognition result of the local voice semantic processing is preferentially obtained and the vertical domain type included in the semantic recognition result is the "idiom dragon-joining game", the feedback information based on the semantic recognition result of the online voice semantic processing, which is returned online, may be waited for within 4 seconds, and if the feedback information is returned within 4 seconds, the broadcast is performed, for example, "the idiom dragon-joining game starts, please say an idiom"; if no feedback information is returned within 4 seconds, a voice prompt for prompting that the operation related to the vertical domain category cannot be executed is broadcasted, for example, "the network is not good enough, and the adult language is tried to connect to the game bar" later.

By applying the offline online semantic recognition arbitration method provided by the embodiment of the application, the voice instruction of the user is obtained, and the network connection state is judged; when the network state is a connection state, local voice semantic processing and online voice semantic processing are respectively executed on the user voice instruction; when a semantic recognition result of local voice semantic processing is preferentially obtained and a vertical domain type contained in the semantic recognition result is a vertical domain type supported by an offline, judging whether content data aiming at a user voice instruction is needed or not, and if not, broadcasting feedback information aiming at the semantic recognition result; if yes, judging whether the local cache contains content data aiming at the voice instruction of the user, and if yes, broadcasting the content data; if not, directly calling an online content search interface, and waiting for content data which is returned online and aims at the user voice instruction within a preset first time period; if the content data are not returned online within the first time period, a voice prompt for prompting that the content data are not searched is broadcast; when the semantic recognition result of local voice semantic processing is preferentially obtained and the vertical domain type contained in the semantic recognition result is the vertical domain type only supported online, waiting for feedback information of the semantic recognition result based on online voice semantic processing returned online within a preset second time length; and if no feedback information is returned online within the second time length, broadcasting a voice prompt for prompting that the operation related to the vertical domain category cannot be executed.

In an embodiment of the present application, if the semantic recognition result of the online speech semantic processing is obtained preferentially, it indicates that the network quality is good, in this case, the local speech semantic processing may be suspended, and feedback information for the semantic recognition result of the online speech semantic processing returned online may be received.

In one embodiment of the application, when a semantic recognition result of local voice semantic processing is preferentially obtained and the semantic recognition result represents that a user voice instruction cannot be understood, feedback information of the online voice semantic processing-based semantic recognition result returned online is waited within a preset third time length; and if no feedback information is returned online within the third time length, broadcasting voice prompt for prompting that the semantic instruction of the user cannot be understood.

Specifically, in the local speech semantic processing process, a vertical domain category may occur, the confidence of the intention classification is low, and the vertical category cannot be matched with the intention classification, and in this case, if the local speech semantic processing device cannot understand the speech instruction of the user, the online speech semantic processing may be waited.

As an example, the preset third duration is 4 seconds, if the user voice instruction is "mingming what is being done recently", for the user voice instruction, in the local voice semantic processing process, the vertical domain type and the intention classification cannot be effectively identified, so that the confidence level of the output vertical domain type and the intention classification is low, the vertical domain type and the intention classification cannot be matched, the final voice recognition result represents that the user voice instruction cannot be understood, feedback information based on the online voice semantic processing semantic recognition result returned online can be waited for within 4 seconds, and if the feedback information is returned within 4 seconds, the feedback information is broadcasted; if no feedback information is returned within 4 seconds, a voice prompt for prompting that the voice instruction of the user cannot be understood is broadcasted, for example, "do not understand your meaning too much, say again a bar instead of saying, you can also directly quit".

In an embodiment of the application, when a semantic recognition result of local voice semantic processing is preferentially obtained and the semantic recognition result represents that command semantics aiming at a vertical domain category contained in the semantic recognition result cannot be understood, a guidance prompt aiming at the vertical domain category contained in the semantic recognition result is broadcasted.

Specifically, in the local speech semantic processing process, it may happen that a vertical domain category is effectively extracted, but intention classification information for the vertical domain category cannot be extracted, so that command semantics for the vertical domain category cannot be understood.

As an example, if the user voice instruction is "enclose an entire flight of me" and "for the user semantic instruction, in the local voice semantic processing process, a vertical domain category" flight "can be effectively identified, but intention classification information for the" flight "cannot be extracted, a guidance prompt for the vertical domain category may be broadcasted, for example," do you want to query a flight, and can speak to me a flight ".

Fig. 2 of the accompanying drawings is a flowchart illustrating an arbitration process related to an offline online semantic recognition arbitration method when a network state is a connected state, where fig. 2 is another flow diagram of the offline online semantic recognition arbitration method according to an embodiment of the present application.

As shown in fig. 2, local voice semantic processing and online voice semantic processing are performed on a user voice instruction, whether a semantic recognition result of the online voice semantic processing is preferentially obtained is judged, if yes, the local voice semantic processing is stopped, and feedback information of the semantic recognition result returned online for the online voice semantic processing is received.

If the semantic recognition result of the online voice semantic processing is not obtained preferentially, namely, the semantic recognition result of the local voice semantic processing is obtained preferentially, whether the semantic recognition result represents the unintelligible user voice instruction or not is judged, and if the semantic recognition result represents the unintelligible user voice instruction, feedback information of the online voice semantic processing-based semantic recognition result returned online is waited within a preset time length.

If not, namely the semantic recognition result representation can understand the user voice instruction, judging whether the vertical domain type contained in the semantic recognition result is the vertical domain type supported only on line, if so, waiting for feedback information of the semantic recognition result returned on line based on-line voice semantic processing within a preset time length.

If not, namely the vertical domain category contained in the semantic recognition result is the vertical domain category supported offline, judging whether the semantic recognition result represents that the command semantics aiming at the vertical domain category contained in the semantic recognition result cannot be understood, and if so, namely the semantic recognition result represents that the command semantics aiming at the vertical domain category contained in the semantic recognition result cannot be understood, broadcasting a guidance prompt aiming at the vertical domain category contained in the semantic recognition result.

If not, namely the semantic recognition result representation can understand the command semantics of the vertical domain category contained in the semantic recognition result, judging whether the content data aiming at the user voice instruction is needed, and if not, broadcasting feedback information aiming at the semantic recognition result; if so, judging whether the local cache contains the content data aiming at the voice instruction of the user.

If yes, broadcasting the content data; if not, directly calling an online content search interface, and waiting for content data which is returned online and aims at the user voice instruction within a preset time length.

In one embodiment of the present application, when the network state is the disconnected state, local speech semantic processing is performed only on user command speech.

When the vertical domain type contained in the semantic recognition result of the local voice semantic processing is the vertical domain type supported offline, judging whether the content data aiming at the voice instruction of the user is needed or not, and if not, broadcasting feedback information aiming at the semantic recognition result; if the local cache contains the content data aiming at the voice instruction of the user, the content data is broadcasted.

As an example, the user command voice is "weather in Shanghai", the corresponding vertical domain category in the semantic recognition result is weather, which is supported offline, and content data of the user voice command needs to be targeted, it is determined whether the content data "Shanghai weather" is cached in the local cache in advance, and if so, the content data is broadcasted, for example, "weather in Shanghai city today is clear, and temperature is 11 to 23 ℃. If not, the voice prompt for prompting that the Shanghai weather cannot be searched can be directly broadcasted, for example, "the weather information of Shanghai cannot be searched because the network is not connected, and the user asks for trying again later".

As an example, the user commands the voice to be "i want to play a idiom pickup game", and the corresponding vertical domain category in the semantic recognition result is only supported online, and since the network is not connected, the voice prompt for prompting that the operation related to the vertical domain category cannot be executed can be directly broadcasted, for example, "the idiom pickup game only supports networking use, please try the bar after connecting the network".

In an embodiment of the application, when the semantic recognition result of the local voice semantic processing represents that the user voice instruction cannot be understood, a voice prompt for prompting that the user voice instruction cannot be understood is broadcast.

As an example, if the user voice instruction is "Xiaoming what is being done recently", and the semantic recognition result of the local voice semantic processing indicates that the user voice instruction cannot be understood for the user voice instruction, the voice prompt for prompting that the user voice instruction cannot be understood may be directly broadcast, for example, "do not understand your meaning, change one utterance and say one time, you can also directly exit".

In an embodiment of the application, when a semantic recognition result representation of local voice semantic processing cannot understand command semantics for a vertical domain category included in the semantic recognition result, a guidance prompt for the vertical domain category included in the semantic recognition result is broadcasted.

Fig. 3 of the accompanying drawings is a flowchart illustrating an arbitration process related to an offline online semantic recognition arbitration method when a network state is a non-connected state, where fig. 3 is a flowchart illustrating the offline online semantic recognition arbitration method according to an embodiment of the present application.

As shown in fig. 3, local voice semantic processing is performed on the user voice instruction, whether the semantic recognition result of the local voice semantic processing represents the user voice instruction which cannot be understood is determined, and if yes, a voice prompt for prompting the user voice instruction which cannot be understood is broadcast.

If not, namely the semantic recognition result representation can understand the voice instruction of the user, judging whether the vertical domain type contained in the semantic recognition result is the vertical domain type only supported on line, and if so, broadcasting a voice prompt for prompting that the related operation aiming at the vertical domain type cannot be executed.

If the vertical domain type contained in the semantic recognition result is the vertical domain type supported offline, whether the semantic recognition result represents the command semantic that the command semantic aiming at the vertical domain type contained in the semantic recognition result cannot be understood is judged, and if yes, the guidance prompt aiming at the vertical domain type contained in the semantic recognition result is broadcasted.

If not, namely the semantic recognition result representation can understand the command semantics of the vertical domain category contained in the semantic recognition result, judging whether the content data aiming at the user voice instruction is needed, and if not, broadcasting feedback information aiming at the semantic recognition result; if the local cache contains the content data aiming at the user voice instruction, the content data is broadcasted; if not, a voice prompt for prompting that the content data cannot be searched is broadcast.

Therefore, the offline online semantic recognition arbitration method provided by the embodiment of the application can perform voice semantic processing on the user voice command when the network state is the non-connection state to obtain the semantic recognition result, and locally pre-caches the offline cache data corresponding to different semantic recognition results, so that the requirements of partial voice interaction can be met, and the user experience of voice interaction without network connection is improved.

Corresponding to the offline online semantic recognition arbitration method provided by the embodiment of the present application, the embodiment of the present application further provides an offline online semantic recognition arbitration device, referring to fig. 4, the device may include the following modules:

an obtaining module 401, configured to obtain a user voice instruction and determine a network connection state;

a processing module 402, configured to perform local voice semantic processing and online voice semantic processing on the user voice instruction when the network state is a connected state;

a judging module 403, configured to, when a semantic recognition result of local voice semantic processing is preferentially obtained and a vertical domain category included in the semantic recognition result is a vertical domain category supported offline, judge whether content data for the user voice instruction is needed, and if not, broadcast feedback information for the semantic recognition result; if yes, judging whether the local cache contains content data aiming at the user voice instruction, and if yes, broadcasting the content data;

a first feedback module 404, configured to directly invoke an online content search interface if the local cache does not contain content data for the user voice instruction, and wait for content data for the user voice instruction that is returned online within a preset first time period; if the content data are not returned online within the first time period, a voice prompt for prompting that the content data are not searched is broadcast;

a second feedback module 405, configured to wait for feedback information of the semantic recognition result based on the online voice semantic processing, which is returned online within a preset second duration, when a semantic recognition result of the local voice semantic processing is preferentially obtained and a vertical domain category included in the semantic recognition result is a vertical domain category supported only online; and if no feedback information is returned online within the second time length, broadcasting a voice prompt for prompting that the operation related to the vertical domain category cannot be executed.

The offline online semantic recognition arbitration device provided by the embodiment of the application is applied to obtain the voice command of the user and judge the network connection state; when the network state is a connection state, local voice semantic processing and online voice semantic processing are respectively executed on the user voice instruction; when a semantic recognition result of local voice semantic processing is preferentially obtained and a vertical domain type contained in the semantic recognition result is a vertical domain type supported by an offline, judging whether content data aiming at a user voice instruction is needed or not, and if not, broadcasting feedback information aiming at the semantic recognition result; if yes, judging whether the local cache contains content data aiming at the voice instruction of the user, and if yes, broadcasting the content data; if not, directly calling an online content search interface, and waiting for content data which is returned online and aims at the user voice instruction within a preset first time period; if the content data are not returned online within the first time period, a voice prompt for prompting that the content data are not searched is broadcast; when the semantic recognition result of local voice semantic processing is preferentially obtained and the vertical domain type contained in the semantic recognition result is the vertical domain type only supported online, waiting for feedback information of the semantic recognition result based on online voice semantic processing returned online within a preset second time length; and if no feedback information is returned online within the second time length, broadcasting a voice prompt for prompting that the operation related to the vertical domain category cannot be executed.

The method and the device are based on the same application concept, and because the principles of solving the problems of the method and the device are similar, the implementation of the device and the method can be mutually referred, and repeated parts are not repeated.

An embodiment of the present invention further provides an electronic device, as shown in fig. 5, which includes a processor 501, a communication interface 502, a memory 503 and a communication bus 504, where the processor 501, the communication interface 502 and the memory 503 complete mutual communication through the communication bus 504,

a memory 503 for storing a computer program;

the processor 501, when executing the program stored in the memory 503, implements the following steps:

acquiring a user voice instruction and judging a network connection state;

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

Compared with the existing scheme of adopting the online result to perform voice feedback, the electronic equipment provided by the embodiment of the application can perform information feedback based on the semantic recognition result of local voice semantic processing under the condition of poor network, and locally pre-caches the offline cache data corresponding to different semantic recognition results, so that most voice interaction requirements can be met, and the response speed of most interaction instructions under the weak network condition is obviously improved. Even if online network search is needed, the user can not wait for too long time, and the response speed and stability of voice interaction are improved.

In yet another embodiment provided by the present application, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any of the above offline online semantic recognition arbitration methods.

In yet another embodiment provided by the present application, there is also provided a computer program product containing instructions that, when executed on a computer, cause the computer to perform the steps of any of the above-described embodiments of offline online semantic recognition arbitration methods.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the offline online semantic recognition arbitration device, the electronic device, the computer-readable storage medium and the computer program product embodiments, since they are substantially similar to the offline online semantic recognition arbitration method embodiment, the description is relatively simple, and the relevant points can be referred to the partial description of the offline online semantic recognition arbitration method embodiment.

The above description is only for the preferred embodiment of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims

1. An offline online semantic recognition arbitration method, the method comprising:

acquiring a user voice instruction and judging a network connection state;

when a semantic recognition result of local voice semantic processing is preferentially obtained and a vertical domain type contained in the semantic recognition result is a vertical domain type supported by an offline, judging whether content data aiming at the user voice instruction is needed or not, and if not, broadcasting feedback information aiming at the semantic recognition result; if yes, judging whether the local cache contains content data aiming at the user voice instruction, and if yes, broadcasting the content data; when the user voice instruction belongs to a control type instruction, content data aiming at the user voice instruction is not needed; when the user voice instruction belongs to a query instruction, content data aiming at the user voice instruction is needed; the content in the local cache is acquired from the cloud in advance;

when the semantic recognition result of local voice semantic processing is preferentially obtained and the vertical domain type contained in the semantic recognition result is the vertical domain type only supported online, waiting for feedback information which is returned online and is based on the semantic recognition result of online voice semantic processing within a preset second time length; if no feedback information is returned online within the second duration, broadcasting a voice prompt for prompting that the operation related to the vertical domain category cannot be executed;

the method further comprises the following steps:

when the network state is a non-connection state, local voice semantic processing is executed on the user voice instruction;

2. The method of claim 1, further comprising:

3. The method of claim 1, further comprising:

4. The method of claim 1, further comprising:

5. The method according to claim 1, wherein the broadcasting step of prompting a voice prompt for which no content data is searched for comprises:

6. The method of claim 1, further comprising:

7. The method of claim 1, further comprising:

8. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1 to 7 when executing a program stored in the memory.

9. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 7.