CN115662430A - Input data analysis method and device, electronic equipment and storage medium - Google Patents

Input data analysis method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115662430A
CN115662430A CN202211338183.9A CN202211338183A CN115662430A CN 115662430 A CN115662430 A CN 115662430A CN 202211338183 A CN202211338183 A CN 202211338183A CN 115662430 A CN115662430 A CN 115662430A
Authority
CN
China
Prior art keywords
analysis result
input data
offline
result
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211338183.9A
Other languages
Chinese (zh)
Other versions
CN115662430B (en
Inventor
周文欢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Apollo Intelligent Connectivity Beijing Technology Co Ltd
Apollo Zhixing Technology Guangzhou Co Ltd
Original Assignee
Apollo Intelligent Connectivity Beijing Technology Co Ltd
Apollo Zhixing Technology Guangzhou Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Apollo Intelligent Connectivity Beijing Technology Co Ltd, Apollo Zhixing Technology Guangzhou Co Ltd filed Critical Apollo Intelligent Connectivity Beijing Technology Co Ltd
Priority to CN202211338183.9A priority Critical patent/CN115662430B/en
Publication of CN115662430A publication Critical patent/CN115662430A/en
Application granted granted Critical
Publication of CN115662430B publication Critical patent/CN115662430B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The disclosure provides an input data analysis method and device, electronic equipment and a storage medium, and relates to the field of data processing, in particular to the fields of voice technology, internet of things and automatic driving. The specific implementation scheme is as follows: sending input data provided by a user to a server so that the server can analyze the input data on line; performing offline analysis on the input data to obtain an offline analysis result, and performing credible detection on the offline analysis result; under the condition that the off-line analysis result is credible and the on-line analysis result is not received, obtaining a credible off-line analysis result, and determining the credible off-line analysis result as the analysis result of the input data; and under the condition that the offline analysis result is not credible or the online analysis result is received preferentially, acquiring the online analysis result, and determining the online analysis result as the analysis result of the input data. The embodiment of the disclosure can improve the accuracy of input data analysis.

Description

Input data analysis method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of data processing, specifically to the field of voice technology, internet of things, and automatic driving, and specifically to an input data parsing method, apparatus, electronic device, and storage medium.
Background
With the popularization of intelligent devices, man-machine interaction modes are developing towards more convenient directions, and voice interaction, gesture interaction and the like are more convenient modes compared with typing, mouse or touch screen control, so that a machine can understand human languages and respond, and the machine can serve the human better.
Specifically, the voice interaction device can upload the received voice to the cloud, and performs voice recognition and natural language understanding by means of the powerful processing capacity of the cloud.
Disclosure of Invention
The disclosure provides an input data analysis method, an input data analysis device, electronic equipment and a storage medium.
According to an aspect of the present disclosure, there is provided an input data parsing method, including:
sending input data provided by a user to a server so that the server can analyze the input data on line;
performing offline analysis on the input data to obtain an offline analysis result, and performing credible detection on the offline analysis result;
under the conditions that the off-line analysis result is credible and the on-line analysis result is not received, obtaining a credible off-line analysis result, and determining the credible off-line analysis result as the analysis result of the input data;
and under the condition that the offline analysis result is not credible or the online analysis result is received preferentially, acquiring the online analysis result, and determining the online analysis result as the analysis result of the input data.
According to an aspect of the present disclosure, there is provided an input data parsing apparatus including:
the system comprises an input data acquisition module, a data analysis module and a data analysis module, wherein the input data acquisition module is used for sending input data provided by a user to a server so that the server can analyze the input data on line;
the off-line analysis credibility detection module is used for off-line analysis of the input data to obtain an off-line analysis result and carrying out credibility detection on the off-line analysis result;
a trusted result obtaining module, configured to obtain a trusted offline analysis result and determine the trusted offline analysis result as an analysis result of the input data, when the offline analysis result is trusted and the online analysis result is not received;
and the online result acquisition module is used for acquiring the online analysis result under the condition that the offline analysis result is not credible or the online analysis result is received preferentially, and determining the online analysis result as the analysis result of the input data.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of input data parsing according to any embodiment of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the input data parsing method according to any one of the embodiments of the present disclosure.
According to another aspect of the present disclosure, a computer program object is provided, comprising a computer program which, when executed by a processor, implements the input data parsing method according to any of the embodiments of the present disclosure.
The embodiment of the disclosure can improve the accuracy of input data analysis.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a flow chart of a method of input data parsing disclosed in accordance with an embodiment of the present disclosure;
FIG. 2 is a flow chart of another input data parsing method disclosed in accordance with an embodiment of the present disclosure;
FIG. 3 is a flow chart of another input data parsing method disclosed in accordance with an embodiment of the present disclosure;
FIG. 4 is a scene diagram of another input data parsing method disclosed in accordance with an embodiment of the present disclosure;
FIG. 5 is a block diagram of an input data parsing apparatus according to an embodiment of the disclosure;
FIG. 6 is a block diagram of an electronic device for implementing an input data parsing method of an embodiment of the disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a flowchart of an input data parsing method disclosed in an embodiment of the present disclosure, where the present embodiment may be applied to parsing input data. The method of the embodiment may be executed by an input data analysis device, which may be implemented in a software and/or hardware manner and is specifically configured in an electronic device with a certain data operation capability, where the electronic device may be a client device, and the client device may be a mobile phone, a tablet computer, a vehicle-mounted terminal, a desktop computer, an internet of things device, and the like.
S101, sending input data provided by a user to a server so that the server can analyze the input data on line.
The input data is used for identifying and analyzing to obtain an instruction so as to instruct a corresponding module or equipment to execute, and man-machine interaction is realized. The input data provided by the user can be data provided by the user through texts, voice, images, videos and the like. The input data (i.e., source data) may be sent directly to the server. Or the input data can be processed and the processed input data can be sent to the server. The server is used for carrying out online analysis on the input data so as to determine the intention of the user. The server is arranged in the network and is in network communication with the current electronic equipment, and correspondingly, the server analyzes the input data, which is an online analysis process. And the analysis result fed back by the server is an online analysis result.
S102, performing offline analysis on the input data to obtain an offline analysis result, and performing credible detection on the offline analysis result.
The input data is analyzed off line, namely, the input data is analyzed by using local resources, namely, the input data is not analyzed by using network resources. The offline analysis result is an analysis process that is run locally, that is, an analysis process that is run in an environment where network resources are difficult to obtain, where the environment where network resources are difficult to obtain may be an environment without a network or a weak network. The offline resolution result may include actions and action objects, e.g., the offline resolution result is an action + object. Illustratively, the input statement is a voice "open a window", and the offline analysis result is an opening operation and the operation object is a window. And the credibility detection is used for detecting whether the offline analysis result is credible. For example, at least one of the parameters of the confidence, the accuracy, whether the generated instruction can be generated and whether the generated instruction is available may be used to detect whether the offline analysis result is trusted.
S103, under the condition that the off-line analysis result is credible and the on-line analysis result is not received, the credible off-line analysis result is obtained, and the credible off-line analysis result is used for determining the analysis result of the input data.
And S104, acquiring the online analysis result under the condition that the offline analysis result is not credible or the online analysis result is received preferentially, and determining the online analysis result as the analysis result of the input data.
In fact, the online analysis result and the credible offline analysis result can be understood as an accurate analysis result. And determining the analysis result which is obtained preferentially, namely the analysis result which is obtained first in time sequence, in the online analysis result and the credible offline analysis result as the analysis result of the input data. And if the off-line analysis result is not credible, waiting for receiving the on-line analysis result, and determining the received on-line analysis result as the analysis result of the input data.
Optionally, the input data parsing method further includes: acquiring a credible offline analysis result under the condition that the offline analysis result is credible; intercepting an online analysis result sent by the server; waiting for receiving an online analysis result fed back by the server under the condition that the offline analysis result is not credible; and receiving an online analysis result fed back by the server.
And obtaining a credible offline analysis result, if the online analysis result is not received at the moment, intercepting the online analysis result, not performing any processing on the online analysis result, directly discarding the online analysis result, and releasing resources waiting for processing the online analysis result. In the processes of off-line analysis and credible detection of off-line analysis results, once the on-line analysis results are obtained, the off-line analysis process is stopped, and resources used by the off-line analysis are released.
And if the offline analysis result is determined to be not credible, waiting for receiving the online analysis result. And if the waiting time is out, a feedback response time is out to the user. And processing an untrusted offline analysis result to cause error operation is avoided. In the field of automatic driving, handling an untrusted offline analysis result may even cause a safety accident.
By processing the credible offline analysis result, intercepting the online analysis result, and waiting for receiving the online analysis result under the condition that the offline analysis result is not credible, the credibility of the adopted analysis result can be improved, the analysis accuracy is improved, correct execution can be realized based on the analysis result, the interaction accuracy is improved, and the analysis result obtained preferentially is used as the input of subsequent processing, so that the response speed of human-computer interaction can be improved.
Optionally, an instruction is generated according to the analysis result of the input data, and the instruction is sent to the corresponding module, so that the module executes the instruction to implement the corresponding function.
In the prior art, for example, in a voice recognition process, especially in a weak network state, an instruction does not return for a long time or a network is prompted to report an error, and the like, which may affect the use of the intelligent device by a user. Meanwhile, the problem of low accuracy of offline analysis results causes the problem of error in instruction analysis when the offline analysis results are directly used for instruction analysis.
According to the technical scheme, the analysis result of the data input by the user is determined by simultaneously carrying out online analysis and offline analysis and adopting the most reliable analysis result with the most time, so that the analysis efficiency and the accuracy are both considered, and the user request can be accurately responded in real time.
Fig. 2 is a flowchart of another input data parsing method disclosed according to an embodiment of the present disclosure, which is further optimized and expanded based on the above technical solution, and can be combined with the above optional embodiments. The credible detection of the offline analysis result comprises: and detecting the accuracy and the availability of the offline analysis result.
S201, sending input data provided by a user to a server, so that the server analyzes the input data on line.
And acquiring user voice data, and sending the user voice data to a server for online voice recognition to obtain a voice recognition result. Under the condition of no network or weak network, the current electronic equipment carries out off-line voice recognition on the user voice data to obtain a voice recognition result.
Optionally, the performing offline analysis on the input data to obtain an offline analysis result includes: performing voice recognition on the input data; and carrying out semantic analysis on the voice recognition result to obtain an offline analysis result.
The input data is user voice data. The method is applied to a voice interaction scene. The user voice data is audio data formed by the user voice. The user performs voice interaction with the current electronic equipment, and the current electronic equipment records the voice of the user through user authorization to obtain the voice data of the user and determines the input data provided for the user. Automatic Speech Recognition technology (Automatic Speech Recognition) can be adopted to realize Speech Recognition, and specifically, a Speech signal is converted into a text instruction. The current electronic device performs voice recognition on user voice data, and specifically, the voice recognition process is an off-line operation process.
By acquiring the voice data of the user and recognizing the text to obtain the voice recognition result, the accuracy and the real-time performance of voice interaction control can be improved aiming at the voice interaction scene.
Optionally, the input data parsing method further includes: performing word segmentation on a voice recognition result to obtain at least one alternative word; acquiring pronunciation information of the alternative words; inquiring the expected terms matched with the pronunciation information of the alternative terms in the prestored expected terms; and replacing the candidate words with the expected words, and correcting the voice recognition result.
The speech recognition result of the off-line speech recognition is often not very accurate, and the speech recognition result can be corrected in some ways. The alternative words are used as a correction unit to detect whether an error exists or not for correction. The pronunciation information may refer to the pronunciation of the alternative word. The pre-stored expected words refer to expected words, and specifically refer to words expected to be applied to control objects or control operations in the control field. The desired words may be determined according to the function of the function module of the current electronic device, as well as the control function. For example, when the user utters "zhenyanglian" to actually indicate that the control object is "sunshade screen", but the offline recognition result may be recognized as "this year", the word "sunshade screen" is expected. In addition, the user may have dialect accents, which results in a large difference between the offline recognition result and the actual speech content that the user wants to express. The expected words can be adapted to the related and standardized content of the control function which the user wants to express, namely, the speech recognition result of the user is normalized and standardized, and then the control function is adapted, and the speech is accurately recognized to obtain the speech recognition result.
And the pronunciation of the alternative word is the same as or similar to that of the expected word. The desired word is used to replace the alternative word and modify the speech recognition result. For example, the pronunciations are the same or similar, which may mean that the ratio of the number of the same syllables existing in the detected pronunciations of the candidate words to the total number of syllables of the desired word is greater than or equal to a preset threshold. Wherein the syllable may be the pronunciation of the word. A word is typically at least one word.
Known common test speech question (query) set composition can be obtained, and the test set can be formed by single Chinese character, phrase or sentence pattern. However, because different accents and actual pronunciation conditions of different users are different, when the situation that the "sunshade curtain" is recognized by the actual offline speech by mistake may be recognized as the "sun-like curtain", or "sunshade sample year" … …, for example, the recognized alternative word "sun-like curtain" may be corrected to the expected word "sunshade curtain" to obtain a more accurate offline analysis result, so that a correct instruction can be analyzed based on the offline analysis result.
The error correction of the vocabulary can be achieved by a fuzzy syllable matching strategy. Specifically, the querying for the expected term matching the pronunciation information of the alternative term may include: and determining at least two syllables in the pronunciation information and the expected word which is the same as the at least two syllables in the pronunciation information of the alternative word as the expected word matched with the pronunciation information of the alternative word. Illustratively, the pronunciation information of the alternative word "sun year" is "zhenyangnian", the pronunciation information of the desired word "sun blind" is "zhenyanglian", there are two syllables that are identical for "zhe" and "yang", thereby determining that the desired word "sun blind" matches the alternative word "sun year". In addition, at least two continuous syllables in the pronunciation information and the expected word with the same at least two continuous syllables in the pronunciation information of the candidate word can be further set to be determined as the expected word matched with the pronunciation information of the candidate word. Further, the language of the alternative word may be applied to english, french, japanese, or the like, without being particularly limited thereto.
The voice recognition accuracy can be improved by correcting the voice recognition result, and then offline analysis is carried out according to the accurate voice recognition result, so that the accuracy of the analysis result can be improved.
S202, performing offline analysis on the input data to obtain an offline analysis result, and detecting the accuracy and the availability of the offline analysis result.
The accuracy is used for detecting whether the off-line analysis result is accurate in analysis, and the availability is used for detecting whether the off-line analysis result can be used for subsequent operation. Illustratively, the accuracy can be detected by at least one of the resolution confidence of the offline resolution result, the presence or absence of additional data for resolving the input data, the richness of the input data, and the processing accuracy of the input data. Availability may be detected by at least one of whether the offline parsing result is executable, whether the instruction may be generated, and whether the generated instruction is executable.
S203, under the condition that the off-line analysis result is credible and the on-line analysis result is not received, obtaining the credible off-line analysis result, and determining the credible off-line analysis result as the analysis result of the input data.
S204, under the condition that the offline analysis result is not credible or the online analysis result is received preferentially, the online analysis result is obtained, and the online analysis result is determined as the analysis result of the input data.
Optionally, the detecting the accuracy of the offline analysis result includes at least one of the following: obtaining a statement identification confidence score corresponding to the input data, and detecting whether the statement identification confidence score is greater than or equal to a preset confidence score threshold value; and detecting whether information of multiple turns of conversations is available.
The sentence recognition confidence score is used for detecting the sentence recognition accuracy, and generally, the higher the confidence score is, the higher the sentence recognition accuracy is. Wherein the input data is a text recognition result. And the analysis result is a semantic analysis result of the text recognition result. Illustratively, the input data is a voice recognition result, and the sentence recognition confidence score is a voice recognition confidence score; the input data is an image recognition result, and the sentence recognition confidence score is an image recognition confidence score. The confidence score threshold is used to detect whether the sentence recognition of the input data is accurate. Only recognizing the confidence score aiming at the statement, and determining that the statement recognition of the input data is accurate under the condition that the confidence score is larger than or equal to the confidence score threshold value; in a case where the sentence recognition confidence score is smaller than the confidence score threshold, it is determined that the sentence recognition of the input data is inaccurate. Illustratively, the sentence recognition confidence score may be detected using a pre-trained machine learning model.
Multiple rounds of dialog refer to multiple rounds of question and answer for the same intent to clarify the user's intent. The method can acquire information of multiple rounds of conversations and indicate that the input data is the conversation data of one of the multiple rounds of conversations, so that the input data can be combined with the information of the multiple rounds of conversations to optimize the intention of a user, indicate that the intention of the input data can be more clearly and definitely, and more easily obtain an accurate analysis result. In addition, in a multi-turn dialogue scene, the input data can be analyzed based on the context state so as to correct or optimize the offline analysis result and improve the accuracy of the offline analysis result. That is, the scenes of multiple rounds of conversations can increase the richness of input data and increase more interesting representative contents, so that the accuracy of the offline analysis result is improved. Generally, in a voice interaction scenario, information of a dialog is stored, and specifically, the information may include dialog identification information, whether the dialog is a plurality of turns, whether the dialog is ended, and the like. In addition, in the scenario of the multi-turn dialog, information such as the contents of the dialog before and after the multi-turn dialog, the type of the dialog, and the intention is also stored. For example, it may be detected whether information of multiple sessions is available while the input data is being acquired, indicating whether the input data is in a session of one of the multiple sessions. The method can determine that the information of multiple rounds of conversations can be acquired under the condition that the current conversation where the input data is located is the multiple rounds of conversations and the multiple rounds of conversations are not finished; in the case where the current dialog in which the input data is present is not a multi-turn dialog or the multi-turn dialog has ended, it is determined that information of the multi-turn dialog is not available. Determining that the sentence identification of the input data is accurate under the condition that the information of a plurality of rounds of conversations can be obtained; in the case where multiple rounds of dialogue are not available, it is determined that sentence recognition of the input data is inaccurate. For example, the intention of the first-round dialog may be determined based on input data provided by a user in the first-round dialog, the input data and/or the intention may be detected, whether the dialog is a multi-round dialog may be detected, if the dialog is a multi-round dialog, the dialog may be marked as a multi-round dialog, and a multi-round dialog incomplete state may be recorded. The intention of the multiple turns of the dialog may be preset, for example, the intention of the multiple turns of the dialog is a navigation intention, and for example, the intention of the input data is to navigate to the place a, information that the multiple turns of the dialog may be acquired may be determined, and at the same time, context information, intention, and the like of the multiple turns of the dialog may be used, the second time of the input data provided by the user in a short time may be determined to be actually the second turn of the dialog in the multiple turns of the dialog, and the intention detection may be performed on the second time of the input data according to the context information and intention of the multiple turns of the dialog, that is, the first turn of the dialog and the determined intention. For another example, the content of multiple rounds of conversations is preset, for example, the content of multiple rounds of conversations is preset to describe the state, for example, the input data provided by the first round of users is "hot weather", the first round of input data may be determined as multiple rounds of conversations, and the subsequent electronic device may feed back a question of "whether to open a window", that is, the second round of conversations, and the input data provided by the corresponding third round of users may be "yes" or "no", where the third round of input data and the previous two rounds of conversations are the same content of conversations of the multiple rounds. In addition, there are other cases, and the setting is specifically performed as necessary.
Determining that the sentence identification of the input data is accurate under the condition that the sentence identification confidence score is greater than or equal to the confidence score threshold value and the information of multiple rounds of conversations can be acquired; in the case where the sentence recognition confidence score is less than the confidence score threshold or information of a plurality of rounds of dialogs is not available, it is determined that the sentence recognition of the input data is inaccurate. In addition, there are other situations that can determine the accuracy, for example, the accuracy of the offline parsing result is detected, which is not limited in particular.
The confidence score and the multi-round conversation are identified through the detection sentences, the accuracy of the offline analysis result is determined, the detection accuracy process can be simplified, and the accuracy detection precision is improved through the detection accuracy of a plurality of angles.
Optionally, the detecting the availability of the offline analysis result includes at least one of the following: acquiring a prediction intention of the offline analysis result, and detecting whether the prediction intention corresponding to the offline analysis result is an offline support intention; and detecting whether the offline analysis result is analyzed to obtain a trusted instruction.
The offline support intention refers to an intention that a corresponding function can be implemented in an offline scenario. It can be understood that the implementation of some functions corresponding to intentions requires network interaction to acquire resources, for example, navigation intentions, and map resource data needs to be acquired from the network, and in an offline scenario, these parts of resources cannot be acquired, so that the functions corresponding to the navigation intentions cannot be implemented. In addition, some of the intended processing procedures need to be performed online, and the current electronic device in the offline scenario does not have the capability of performing the operation corresponding to the intention, for example, the operation corresponding to the intention is an operation of verifying the authority information of the user, and the operation is performed by the server. In an offline scenario, the operation cannot be performed, and thus the operation corresponding to the intention cannot be performed. In particular, offline support intent may refer to an intent that does not rely on online resources or services. Optionally, functions of the function module of the current electronic device in the offline scenario may be collected, and an intention of offline support is determined correspondingly and added to a white list, where the white list stores the intention of offline support.
Trusted instructions refer to instructions that can be executed. Whether the trusted instruction is obtained through analysis specifically includes detecting whether the instruction can be generated or not and detecting whether the generated instruction is trusted or not. The trusted instruction refers to that the corresponding functional module can execute the instruction and obtain a correct execution result, and specifically, the corresponding functional module can acquire an effective resource to correctly execute the trusted instruction, for example, the trusted instruction is not dependent on an online resource, and can be correctly processed only by an offline resource.
Optionally, the range of the instruction that can be executed by each functional module may be determined according to the function of the current functional module of the electronic device in the offline scenario, so as to determine the range of the trusted instruction. And detecting whether an instruction belonging to a trusted instruction range can be generated according to the offline analysis result so as to detect whether the offline analysis result is analyzed to obtain the trusted instruction. For another example, a range of a designated field included in the offline analysis result may be configured correspondingly based on the range of the trusted instruction, and when the offline analysis result has any field in the range, it is determined that the offline analysis result is analyzed to obtain the trusted instruction; and under the condition that all fields in the range do not exist in the offline analysis result, determining that the offline analysis result cannot be analyzed to obtain the trusted instruction. Further, there are other cases where the usability can be determined, which is not particularly limited.
The usability of the offline analysis result is determined by detecting the offline support intention and the trusted instruction, the usability detection process can be refined, and the usability is detected through a plurality of angles, so that the usability detection precision is improved.
Optionally, the detecting whether the offline analysis result is analyzed to obtain a trusted instruction includes: detecting whether the offline analysis result is analyzed to obtain at least one instruction with matched function types, and determining that the instruction can analyze the detection result; acquiring a resource dependence type of the instruction obtained by analysis, and determining a resource effective detection result; and detecting whether the offline analysis result is analyzed to obtain a trusted instruction according to the instruction analyzable detection result and the resource effective detection result.
The function type is a type of a function corresponding to the command obtained by the analysis, and may be determined according to a function realized by the input data, an intention corresponding to the offline analysis result, or the like. For example, in the scene of the internet of things, the controlled electronic device is a sound box, the achievable functions include playing songs, and the function types may include types of switching songs or adjusting volume. The instruction for matching the corresponding function type comprises a song switching instruction or a volume adjusting instruction.
The instruction resolvable detection result is used for determining whether the offline resolution result can be resolved to obtain an instruction. The instruction resolvable detection result comprises a resolvable result and an unresolvable result. For example, a plurality of parsing manners may be configured according to the function type, and whether the offline parsing result can be parsed is detected by using the plurality of parsing manners. The parsed instructions may include instructions with matching function types, general instructions, or null instructions. Under the condition that any one analysis mode can be analyzed, the instruction with the matched function type can be analyzed, and the instruction analyzable detection result is an analyzable result; in the case that each analysis mode can analyze, the obtained general instruction can be analyzed, and the instruction analyzable detection result is an unresolvable result, or in the case that each analysis mode can analyze, the instruction cannot be obtained, that is, the instruction obtained by analysis is empty, and the instruction analyzable detection result is an unresolvable result. For example, the parsing process may replace a specified character in the offline parsing result with a target character recognizable by the function module, and the function type configuration parsing manner may be to configure a replacement rule corresponding to the function type.
The resource dependency type is used to determine the type of resource needed during instruction execution. The resource dependency type includes an online resource dependency type or an offline resource dependency type, and the like. As another example, a resource dependency type includes a valid resource dependency type or an invalid (untrusted) resource dependency type, and the like. The resource effective detection result is used for detecting whether effective resources required in the instruction execution process can be acquired or not, and therefore whether the analyzed instruction can be executed or not is detected. The resource validity detection result comprises valid resources or invalid resources. The resource validity detection result can be determined according to the resource dependency type. Specifically, the corresponding relationship between the resource dependency type and the effective resource detection result may be preset. In a specific example, the resource dependency type is an online resource dependency type, and the resource validity detection result is determined to be invalid resource; and determining the effective detection result of the resource as an effective resource, wherein the resource dependence type is an offline resource dependence type.
The instruction analyzable detection result and the resource effective detection result are used for cooperatively detecting whether the offline analysis result is analyzed to obtain the trusted instruction. According to the instruction analyzable detection result and the resource effective detection result, determining that an instruction with a matched function type can be obtained by analysis, and the instruction can be correctly executed, and correspondingly determining that an offline analysis result is obtained by analysis to obtain a trusted instruction; and determining that the instruction with the matched function type cannot be analyzed according to the instruction analyzable detection result and the effective resource detection result, or determining that the instruction with the matched function type cannot be correctly executed and correspondingly determining that the offline analysis result cannot be analyzed to obtain the trusted instruction. Specifically, under the condition that the instruction analyzable detection result is an analyzable result and the resource effective detection result is an effective resource, determining that the offline analysis result is analyzed to obtain a trusted instruction; and under the condition that the instruction analyzable detection result is an unintelligible result or the resource valid detection result is invalid resources, determining that the off-line analysis result cannot be analyzed to obtain the trusted instruction.
An instruction generating module can be configured, and the instruction generating module processes the offline analysis result, generates an instruction adapting to the functional module according to the information of the functional module, and performs credible detection on the instruction. Detecting whether the offline analysis result is analyzed to obtain at least one instruction with matched function types through an instruction generation module, and determining an instruction analyzable detection result; and acquiring the resource dependence type of the instruction obtained by analysis, and determining the effective detection result of the resource. And detecting whether the offline analysis result is analyzed to obtain a credible instruction according to the instruction analyzable detection result and the resource effective detection result.
Whether the instructions can be analyzed through the offline analysis results and whether the instructions depend on effective resources or not are used for detecting whether the offline analysis results can be analyzed to obtain the trusted instructions or not, so that the availability of the offline analysis results is detected, the availability is detected from the analysis angle and the executable angle, the detection dimensionality of the availability is enriched, the detection range is enlarged, and the availability detection precision is improved.
Optionally, the performing the trusted detection on the offline analysis result may include: obtaining a statement identification confidence score corresponding to the input data, and detecting whether the statement identification confidence score is greater than or equal to a preset confidence score threshold value; detecting whether information of multiple rounds of conversations can be acquired; acquiring a prediction intention of the offline analysis result, and detecting whether the prediction intention corresponding to the offline analysis result is an offline support intention; and detecting whether the offline analysis result is analyzed to obtain a trusted instruction. Determining that the offline analysis result is credible under the condition that the statement identification confidence score is greater than or equal to a preset confidence score threshold value, the information of multiple rounds of conversations can be obtained, and the offline analysis result can be analyzed to obtain a credible instruction; or determining that the offline analysis result is credible under the condition that the statement identification confidence score is larger than or equal to a preset confidence score threshold value, the prediction intention is an offline support intention, and the offline analysis result can be analyzed to obtain a credible instruction. The offline parsing results for the remaining cases are not trusted.
According to the technical scheme, whether the offline analysis result is credible or not is determined through the accuracy and the availability detection of the offline analysis result, the credible detection dimension is increased, the detection range is increased, the credible detection accuracy is improved, the credible offline analysis result is obtained, and therefore the accuracy of the analysis result corresponding to the input data is improved.
Fig. 3 is a flowchart of another input data parsing method disclosed according to an embodiment of the present disclosure, which is further optimized and expanded based on the above technical solution, and can be combined with the above optional embodiments. The input data parsing method is optimized to further comprise: obtaining context information and conversation types of current multiple rounds of conversations; determining an association intention according to the context information; determining a target instruction according to the analysis result of the input data; acquiring a prediction intention corresponding to an analysis result of the input data; and determining a target function module according to the associated intention, the predicted intention and the conversation type, and sending the target instruction to the target function module so as to enable the target function module to execute the target instruction.
S301, sending input data provided by a user to a server, so that the server analyzes the input data on line.
S302, performing offline analysis on the input data to obtain an offline analysis result, and performing credible detection on the offline analysis result.
S303, under the condition that the off-line analysis result is credible and the on-line analysis result is not received, obtaining the credible off-line analysis result, and determining the credible off-line analysis result as the analysis result of the input data.
S304, under the condition that the offline analysis result is not credible or the online analysis result is received preferentially, obtaining the online analysis result, and determining the online analysis result as the analysis result of the input data.
S305, obtaining the context information and the conversation type of the current multiple rounds of conversations.
In the case that information of multiple turns of conversations is detected, that is, the current conversation is indicated as multiple turns of conversations, the information of the current multiple turns of conversations may be acquired, and the information may specifically include context information and conversation type. The current multi-turn dialog refers to the multi-turn dialog to which the current dialog belongs, where the input data provided by the user belongs. Contextual information refers to content associated with the current multiple turn conversation. For example, the contextual information may include duration, whether it is a multiple turn conversation, whether the multiple turn conversation is over, identification information, intent of previous conversation determination, and the like. The dialog type is used to determine the scope of the user's reply content in the current multiple rounds of dialog. Illustratively, the dialog types may include: a restricted user dialog content type and an unrestricted user dialog content type.
If the input data is data input by a user of a first-turn conversation, the content related to the input data can be determined as context information of the current multiple-turn conversation, the conversation type is determined, and the context information and the conversation type are stored. If the input data is data input by the user of the second round of the dialog, the pre-stored context information and the dialog type can be directly acquired. In addition, the context information and the dialog type can be updated according to the input data of the second round of dialog for the subsequent round of dialog; or only continue to use and not update the context information and the dialog type of the first round of dialog, where the manner of updating and updating may be set as needed, and is not particularly limited. By analogy, the context information and the conversation type recorded by the previous several pairs of conversations can be obtained aiming at the input data of the subsequent conversation turn. Determining the conversation type according to the context information, and if the intention corresponding to the context information is an intention for limiting the conversation content of the user, such as an intention for navigating or making a call, determining the conversation type as limiting the conversation content type of the user; if the intent corresponding to the contextual information is an unrestricted intent, e.g., open skylight, the dialog type is determined to be an unrestricted user dialog content type.
S306, determining the association intention according to the context information.
The associated intent may refer to the user's intent as determined by the first few sessions in the current multi-session. The context information may include an intent, the intent being extracted from the context information and determined to be an associated intent. Or the context information comprises the contents of the first few rounds of conversation, and the association intention is analyzed according to the contents of the first few rounds of conversation. S305-S306 may be performed concurrently with S302.
S307, determining a target instruction according to the analysis result of the input data.
The target instruction is an instruction obtained by analyzing an analysis result of the input data. The target instruction is used for being distributed to the functional module to be executed. And the functional module receives the target instruction to execute, and realizes the function corresponding to the user intention. When the online analysis result is obtained preferentially, the online analysis result is analyzed to obtain a target instruction, or the online analysis result is directly obtained, and meanwhile, an instruction issued by the server is obtained and determined to be the target instruction. And when the credible offline analysis result is preferentially obtained, determining the instruction obtained by analyzing the credible offline analysis result as the target instruction. In the foregoing embodiment, the offline analysis result needs to be analyzed to obtain an instruction, and the trusted instruction is detected, and accordingly, when the trusted instruction can be obtained through analysis and does not wait for the online analysis result, the trusted instruction obtained through analysis is determined as the target instruction. After S303, S307 is executed.
And S308, acquiring the prediction intention corresponding to the analysis result of the input data.
The predicted intent may refer to an intent determined by the input data. In the scenario of a multi-turn conversation, the associated intent is the intent determined by the first few turns of the conversation in the current multi-turn conversation. The predicted intention is the intention determined by the current turn of the dialog in the current multiple turns of the dialog.
S308 may be performed concurrently with S302. S307 may also be transposed with respect to the order of execution of S304-S306. And when the analysis result of the input data is the online analysis result, receiving the online analysis result and simultaneously receiving the intention corresponding to the online analysis result, and determining the intention as the prediction intention corresponding to the analysis result of the input data.
S309, determining a target function module according to the associated intention, the prediction intention and the conversation type, and sending the target instruction to the target function module so that the target function module executes the target instruction.
The associated intent, the predicted intent, and the dialog type are used to collectively determine a target function module. The target function module is used for executing a target instruction. The input data analysis method can be applied to the application scene of the Internet of things, and the target function module can be configured in the equipment of the Internet of things. The internet of things equipment can be connected with network equipment, receives instructions through the network and executes the instructions, and can be divided into fixed equipment or mobile equipment, for example, the fixed equipment is intelligent household equipment, and the mobile equipment is vehicle-mounted equipment and the like. Specifically, in the application scene of the internet of things, the target function module comprises a module of the internet of things device, and can comprise a table lifting module, a sound box, a cabinet door sliding module or a vehicle window control module and the like. Illustratively, the target function module is a media control module for controlling playing media, such as controlling audio playing or video playing. As another example, the target function module is a telephone module for establishing a telephone communication connection. In another example, the target function module is a navigation module for providing navigation functions. As another example, the target function module is a hardware control module, for example, used for controlling a vehicle window, a vehicle door, an air conditioner, and the like. In addition, there are other cases, and this is not particularly limited.
In practice, the associated intent may be different from the predicted intent. In multiple rounds of dialog, the intentions expressed by the user are generally consistent and a complete intent is described over multiple dialogs. If the associated intention is different from the predicted intention, judging whether the user changes the intention, if not, jointly determining a target intention according to the associated intention and the predicted intention, determining a target function module corresponding to the target intention, and if so, determining the target function module corresponding to the predicted intention based on the predicted intention. The dialog type is used to select a target function module corresponding to the determined target intent based on some of the intents. For example, according to the dialog type, selecting a target function module corresponding to the prediction intention, selecting a target function module corresponding to the associated intention, or selecting the target function module corresponding to the associated intention and the prediction intention to jointly determine the target intention, wherein the different dialog types correspond to different selected intentions. For another example, according to the dialog type and the target intention, a target function module corresponding to the prediction intention is selected, a target function module corresponding to the associated intention is selected, or the associated intention and the prediction intention are selected to jointly determine a target function module corresponding to the target intention. Wherein the combination of different dialog types and target intentions corresponds to different selected intentions.
Illustratively, the association intent is to adjust bluetooth and the prediction intent is to adjust volume. The conversation type is the type of the conversation content of the limited user, and the corresponding selection determines the target intention according to the associated intention and the prediction intention together, namely the target intention is to adjust the Bluetooth volume. If there is no intention to associate, it cannot be determined whether to adjust the system volume or the bluetooth volume based only on the predicted intention. And sending the target instruction for adjusting the volume to the Bluetooth module. As another example, the associated intent is to open a window, and the predicted intent is that the weather is hot today. The conversation type is an unlimited user conversation content type, and the corresponding selection determines the target intention according to the prediction intention at the moment, namely the target intention is to inquire weather. And sending a target instruction of temperature detection to the temperature control module, or sending an instruction obtained by weather to the wireless module. There are other examples, not specifically limiting, and should not be construed as limiting the present disclosure.
In fact, in the current multi-round conversation, the first round conversation may generate a target instruction based on the online analysis result and send the target instruction to the target function module, and the second round conversation may generate a target instruction based on the trusted offline analysis result and send the target instruction to the target function module; or the first wheel dialog may generate a target instruction based on a trusted offline analysis result and send the target instruction to the target function module, and the second wheel dialog may generate a target instruction based on an online analysis result and send the target instruction to the target function module. At the moment, different analysis results are adopted in different rounds of conversations, a target instruction is generated and sent to a target function module, in order to avoid the problem that the intention of sequential identification is wrong and wrong recall is caused due to the fact that the instruction is generated based on the different analysis results, the correct target function module for sending the instruction is determined based on the context state so as to achieve the correct function, seamless connection of the off-line instruction can be improved, the target function module for executing the target instruction can be accurately determined when the off-line analysis results are switched, and instruction execution accuracy is improved.
In addition, when the offline analysis result is obtained, the offline analysis can be performed on the input data according to the context information, so that the offline analysis result is obtained, and the offline analysis accuracy is improved. That is, the input data parsing method further includes: in the case that information of multiple turns of a conversation can be acquired, context information of the current multiple turns of the conversation is acquired. And performing offline analysis on the input data according to the context information of the current multiple rounds of conversations, and performing credible detection on the offline analysis result. Correspondingly, the context information and the input data can also be analyzed and sent to the server, so that the server can analyze the input data on line according to the context information.
Optionally, the determining a target function module according to the associated intention, the predicted intention and the dialog type includes: determining the functional module corresponding to the associated intention as a target functional module under the condition that the conversation type is the limited user conversation content type; determining a function module corresponding to the prediction intention of the online analysis result as a target function module under the condition that the conversation type is a non-limited user conversation content type and the analysis result of the input data is an online analysis result; or determining the function module corresponding to the associated intention as a target function module under the condition that the conversation type is a non-limited user conversation content type and the analysis result of the input data is an offline analysis result.
Limiting the user dialog content type means that the user's dialog content is within a preset range. The unlimited user dialog content type means that the dialog content of the user is unlimited. For example, in a multi-turn conversation, a problem presented by current electronic devices is navigating to: 1. a ground, 2, B ground and 3, C ground. The user's dialog content can only be selected among these three options. The dialog content of the user can be a place or a certain option, and the reply mode is not limited, but the dialog content can be limited to the provided content range. As another example, in a multi-turn conversation, the current electronic device provides no questions, or what to adjust? The user's dialog content is today really hot. At this time, the dialog content of the user is not limited, and the user can reply arbitrarily. The conversation type is the type limiting the conversation content of the user, and shows that the user can only select the conversation content in a preset range, and the preset range is the range of the conversation content determined based on the association intention, and the corresponding preset ranges are different according to different association intents. In this case, the intention of the user is mainly the related intention, and the function module corresponding to the related intention is determined as the target function module. The conversation type is an unlimited user conversation content type, which indicates that the user can unlimited conversation content. In this case, the user's intention is mainly the predicted intention, but the trusted offline analysis result is not necessarily accurate, and therefore, the function module corresponding to the associated intention is determined as the target function module with respect to the trusted offline analysis result. And the online analysis result is accurate, and the functional module corresponding to the prediction intention is determined as the target functional module according to the online analysis result.
Further, if the predicted intent cannot determine the target function module, the target function module may be determined together according to the associated intent and the predicted intent at this time. Illustratively, the association intent is to adjust bluetooth and the prediction intent is to adjust volume. And the conversation type is the type of the conversation content of the limited user, and the Bluetooth module corresponding to the adjusted Bluetooth is selected and determined as the target function module. As another example, the associated intent is to open a window and the predicted intent is to make a call. The dialog type is an unrestricted user dialog content type. And the analysis result of the input data is an online analysis result, and at the moment, the telephone module corresponding to the dialed telephone is selected and determined as the target function module. And the analysis result of the input data is a credible online analysis result, and at the moment, the window control module corresponding to the window is selected to be opened and is determined as the target function module.
The target function module is determined by predicting the intention, the associated intention and the conversation type of the context of the multiple rounds of conversations and the off-line type of the analysis result, different scenes can be subdivided, the most accurate intention is determined, the corresponding function module is determined to be the target function module for executing the target instruction, the accuracy of determining the target function module is improved, the instruction execution accuracy is improved, different function modules are adapted, and application scenes are increased.
In addition, if the dialog content provided by the user is not within the preset range, the user is provided with prompt information for replying within the preset range. Optionally, the input data parsing method further includes: providing clarification information of input data to the user in the case that the conversation type is a limited user conversation content type and the association intention is different from the predicted intention; acquiring new input data provided by a user; and determining the analysis result of the new input data aiming at the new input data. The clarification information is used for prompting the user to reply within a preset range.
Optionally, the target function module includes a module of an in-vehicle device.
The vehicle-mounted equipment is equipment configured on the vehicle, can be connected with a network, and can receive and execute instructions through the network. The input data parsing method can be applied to a vehicle driving assistance application scenario and an automatic driving application scenario. The current electronic device may be configured with the target function module, or the target function module and the electronic device to which the current electronic module belongs are independent electronic devices, and the two electronic devices may communicate via a network. In fact, for mobile scenes, mobile networks such as mobile phone traffic are generally used to access the internet. If the mobile terminal enters a closed scene such as a tunnel or a ground depot, a weak network or no network condition may occur, so that the network connection is unstable, and the cloud service is unreliable. When the cloud service is unreliable, the credible offline analysis result is selected as the basis for generating the instruction, so that the accuracy and reliability of vehicle control can be improved, the vehicle safety is improved, the online analysis result provided by the cloud can be not required to be waited for a long time, and the analysis speed is improved.
The target function module is configured to be a module of the vehicle-mounted equipment, so that application scenes are enriched, the acquisition speed of an analysis result can be increased, the analysis accuracy is considered, the accuracy and the reliability of vehicle control can be improved, and the vehicle safety is improved.
According to the technical scheme, in multiple rounds of conversations, the association intention is determined according to the context information of the multiple rounds of conversations, the prediction intention corresponding to the analysis result of the input data is obtained, the target function module is determined according to the association intention, the prediction intention and the conversation type, and the target instruction is sent to the target function module to be executed, so that the target function module executing the target instruction can be accurately determined, the target instruction can be correctly executed, meanwhile, the instruction switching of the off-line analysis result can be realized, and the analysis accuracy and the execution accuracy when the off-line analysis result is switched are improved.
Fig. 4 is a scene diagram of another input data parsing method disclosed according to an embodiment of the present disclosure. The input data parsing method may include:
s401, recording by the voice client.
And starting a voice client, starting a system recording function through user authorization, recording the voice of the user, obtaining the voice data of the user, and determining input data provided for the user.
S402, sending input data provided by a user to a server so that the server performs online voice recognition on the input data.
S403, the server carries out on-line semantic analysis on the voice recognition result of the input data.
Online parsing includes online speech recognition and online semantic parsing.
The input data provided by the user is provided to an online Speech Recognition engine (i.e. a server) through network transmission, and an online Recognition ASR (Automatic Speech Recognition) result returned by the online Speech engine is obtained. After the obtained online ASR recognition result is obtained, the cloud speech recognition server transfers the recognized text to an online semantic processing server to obtain a semantic parsing result NLU (Natural Language Understanding), and in the present invention, the online returned NLU result is defined as r1.
S404, performing off-line voice recognition on the input data.
And providing the input data to an offline speech recognition engine integrated to the client, wherein the offline speech recognition ASR result returned by the recognition engine. Performing word segmentation on a voice recognition result to obtain at least one alternative word; acquiring pronunciation information of the alternative words; inquiring the expected terms matched with the pronunciation information of the alternative terms in the prestored expected terms; and replacing the candidate words with the expected words, and correcting the voice recognition result. The returned ASR result of speech recognition is offline recognition, and the recognition result is often not very accurate and needs to be corrected, and the specific method includes: and obtaining pinyin of the recognition result, and replacing the recognition result with the target vocabulary if more than two syllables of the recognized character are matched with the syllable pinyin of the target vocabulary.
S405, performing offline semantic analysis on the obtained voice recognition result to obtain an offline analysis result.
And providing the corrected voice recognition result to a local semantic analysis engine for semantic analysis to obtain an offline semantic analysis NLU result. And simultaneously carrying out online analysis and offline analysis. S403 is executed concurrently with S402. In fact, online analysis results obtained by online identification and analysis are more reliable, the accuracy of online analysis is far higher than that of offline analysis, and online analysis results should be used theoretically, but the online analysis results are slower in returning of the online analysis results due to the problem of network fluctuation, and even if the online analysis results cannot be returned due to network overtime, the online analysis results cannot be further analyzed to obtain instructions and executed.
S406, obtaining the statement identification confidence score corresponding to the input data, and detecting whether the statement identification confidence score is larger than or equal to a preset confidence score threshold value.
And when the off-line voice recognition result is corrected, calculating the sentence recognition confidence score of the voice recognition result before correction. And calculating the sentence recognition confidence score according to the matching degree of the offline recognized characters and the recognized characters and syllables which are expected to be defined. Specifically, the ratio of the number of words (words) in the speech recognition result before and after correction to the number of words included in the speech recognition result before correction is obtained and determined as the sentence recognition confidence score. And if the statement identifies that the confidence score is greater than or equal to a preset confidence score threshold value, executing S406 and S407, otherwise, if the offline analysis result is not trusted, waiting for receiving the online analysis result.
S407, detecting whether information of multiple rounds of conversations can be acquired; or acquiring the prediction intention of the offline analysis result, and detecting whether the prediction intention corresponding to the offline analysis result is an offline support intention.
The command behavior is composed of a domain and an intent. A domain may refer to a functional classification into content, where the intent is further subdivided. In fact, the input data is used to perform some functions, and the user's purpose can be differentiated according to the function, the differentiated large class is a domain, and the large class is further divided into small classes as intentions. A domain can be understood as a domain of functions. These intentions supported offline may result in unavailability offline due to resources, or intentions requiring network interaction such as navigation-related information, etc., which may be distinguished in advance. The determinable offline supportable intent is added to the whitelist. The white list stores the domain and the intention under the domain. The fields may be represented using a vertical class, such as a navigation vertical class, a music vertical class, a car control vertical class, a system control vertical class, and a telephone vertical class. And the domain to which the prediction intention belongs hits the white list, or the prediction intention hits the white list, and the prediction intention is determined to be the offline support intention. If the information of the multiple rounds of conversations can be acquired or the predicted intention is an offline support intention, S408 is executed, otherwise, the offline analysis result is not credible and waits for receiving the online analysis result.
S408, detecting whether the offline analysis result is analyzed to obtain a trusted instruction.
And sending the offline analysis result to the instruction generation module to obtain the instruction and the credible detection result fed back by the instruction generation module. Detecting whether the offline analysis result is analyzed through an instruction generation module to obtain at least one instruction with matched function types, and determining an instruction analyzable detection result; and acquiring the resource dependence type of the instruction obtained by analysis, and determining the effective detection result of the resource. And detecting whether the offline analysis result is analyzed to obtain a trusted instruction according to the instruction analyzable detection result and the resource effective detection result.
The instruction generating module may be divided into a plurality of functional instruction generating units, for example, an instruction generating unit of a navigation pendant, an instruction generating unit of a music pendant, an instruction generating unit of a vehicle control pendant, an instruction generating unit of a system control pendant, and an instruction generating unit of a telephone pendant. The current vertical instruction generation unit can process the offline analysis result and generate an instruction, and meanwhile, the instruction can acquire effective resources and determine that the generated instruction is a credible instruction. If the instruction depends on online resources and an instruction that cannot be processed offline, the generated instruction is determined to be an untrusted instruction. If all the instruction generating units cannot process the offline analysis result and generate an instruction, a general instruction is generated, and the instruction is determined To be an untrusted instruction, for example, the general instruction is a tts (Text To Speech, speech synthesis technology) instruction which does not support voice broadcast of the instruction. The instruction generated by the instruction generating unit of the current vertical class is used for being sent to the functional module corresponding to the current vertical class for execution. Illustratively, the instruction generated by the instruction generating unit of the navigation vertical class is used for the navigation module to execute.
S409, under the condition that the off-line analysis result is credible and the on-line analysis result is not received, obtaining the credible off-line analysis result and obtaining the off-line instruction of the credible off-line analysis result.
Determining that the offline analysis result is credible under the condition that the statement identification confidence score is greater than or equal to a preset confidence score threshold value, the input data can acquire information of multiple rounds of conversations, and the offline analysis result can be analyzed to obtain a credible instruction; or determining that the offline analysis result is credible under the condition that the statement identification confidence score is greater than or equal to a preset confidence score threshold value, the prediction intention is an offline support intention, and the offline analysis result can be analyzed to obtain a credible instruction. The offline resolution results for the remaining cases are not trusted.
S410, waiting to receive the online analysis result in the process of the offline analysis or the process of the offline analysis, wherein the offline analysis result is not credible.
And in the case of an untrusted offline analysis result, waiting for receiving the online analysis result.
S411, when the offline analysis result is not credible or the online analysis result is received preferentially, the online analysis result is obtained, and an online instruction of the online analysis result is obtained.
Determining a trusted instruction of a trusted offline analysis result as a target instruction; and acquiring an online instruction of the online analysis result, and generating a target instruction. Obtaining context information and conversation types of current multiple rounds of conversations; determining an association intention according to the context information; determining a target instruction according to the analysis result of the input data; acquiring a prediction intention corresponding to an analysis result of the input data; and determining a target function module according to the associated intention, the predicted intention and the conversation type, and sending the target instruction to the target function module so as to enable the target function module to execute the target instruction. The user inputs 'I wants to make a call' to enter multi-turn interaction, tts voice broadcast prompts to ask 'who you want to make a call', voice recognition is started to enter a listening state, the voice of the user is recorded to obtain input data, and the input data is the voice data input by the user in the multi-turn conversation, so that the information of the multi-turn conversation can be obtained. At this time, if the user inputs 'weather today', the normal execution behavior is to call the contact person as the 'weather today' telephone, but at this time, because the state of the context is not recorded, the 'weather today' executes the online instruction, and the current weather is broadcasted to be inconsistent with the expectation. Therefore, when each session is started, session information (session) of the session, that is, the context information, is recorded, and the session of the session holds information such as a currently processed vertical class (for distinguishing the currently processed vertical class and solving a problem that the same session is recalled by other vertical classes), whether the session is a multi-session, whether the session is ended (for notifying that the current session is ended), and a session Id (uniquely identifiable Id information for each session).
When the offline and online instructions are switched, no matter whether the current offline analysis or online analysis is, the information is read from the saved session, and then the information is distributed to the correct vertical function module and executed. For example, the user inputs 'i want to make a call', because the context information is recorded, the context information is not sent to the function module of the weather inquiry vertical line to execute the inquiry command of 'weather today', but is sent to the function module of the telephone vertical line to execute the telephone inquiry and telephone dialing command of the 'weather today' contact person, whether the contact person exists in the address list is judged, and then a correct execution behavior is given.
S412, executing the instruction.
And sending the target instruction to the target function module so as to enable the target function module to execute the target instruction. In particular to a designated target function module for execution.
According to the technical scheme, the problem of slow instruction response in the weak network environment is solved, the problem of seamless connection of in-line and out-of-line instructions can be solved, when the in-line and out-of-line switching is conveniently and quickly solved, the problem of mistaken recall (identified as other intentions) is caused by connection of the instructions due to context, the response speed of a user is greatly improved, and the user experience is improved.
Fig. 5 is a structural diagram of an input data analysis device according to an embodiment of the present disclosure, and the embodiment of the present disclosure is applied to a case where input data is analyzed. The device is realized by software and/or hardware and is specifically configured in electronic equipment with certain data operation capacity.
An input data parsing apparatus 500 as shown in fig. 5 comprises: an input data acquisition module 501, an offline analysis credibility detection module 502, an offline result acquisition module 503 and an online result acquisition module 504; wherein,
an input data obtaining module 501, configured to send input data provided by a user to a server, so that the server performs online analysis on the input data;
an offline analysis trusted detection module 502, configured to perform offline analysis on the input data to obtain an offline analysis result, and perform trusted detection on the offline analysis result;
an offline result obtaining module 503, configured to obtain a trusted offline analysis result and determine an analysis result of the input data according to the trusted offline analysis result when the offline analysis result is trusted and the online analysis result is not received;
an online result obtaining module 504, configured to obtain an online parsing result and determine the online parsing result as the parsing result of the input data when the offline parsing result is not trusted or the online parsing result is received preferentially.
According to the technical scheme, the analysis result of the data input by the user is determined by simultaneously carrying out online analysis and offline analysis and adopting the most reliable analysis result with the most time, so that the analysis efficiency and the accuracy are both considered, and the user request can be accurately responded in real time.
Further, the offline parsing trust detection module 502 includes: and the accurate available detection unit is used for detecting the accuracy and the availability of the offline analysis result.
Further, the accurate available detection unit includes at least one of the following: the recognition accuracy detection subunit is used for acquiring statement recognition confidence scores corresponding to the input data and detecting whether the statement recognition confidence scores are greater than or equal to a preset confidence score threshold value; and a multi-turn dialog detection subunit for detecting whether information of the multi-turn dialog can be acquired.
Further, the accurate available detection unit includes at least one of the following: the intention detection subunit is used for acquiring the prediction intention of the offline analysis result and detecting whether the prediction intention corresponding to the offline analysis result is an offline support intention; and the trusted instruction detection subunit is used for detecting whether the offline analysis result is analyzed to obtain a trusted instruction.
Further, the trusted instruction detection subunit is specifically configured to: detecting whether the offline analysis result is analyzed to obtain at least one instruction matched with the function type, and determining an instruction analyzable detection result; acquiring a resource dependence type of the instruction obtained by analysis, and determining a resource effective detection result; and detecting whether the offline analysis result is analyzed to obtain a trusted instruction according to the instruction analyzable detection result and the resource effective detection result.
Further, the offline parsing trust detection module 502 includes: the voice recognition module is used for carrying out voice recognition on the input data; and the offline analysis module is used for performing semantic analysis on the voice recognition result to obtain an offline analysis result.
Further, the input data analysis device further includes: the recognition result word segmentation module is used for segmenting words of the voice recognition result to obtain at least one alternative word; the pronunciation information determining module is used for acquiring pronunciation information of the alternative words; the expected word query module is used for querying an expected word matched with the pronunciation information of the alternative word in the prestored expected words; and the recognition result correction module is used for replacing the candidate words with the expected words and correcting the voice recognition result.
Further, the input data analysis device further includes: the conversation information acquisition module is used for acquiring the context information and the conversation type of the current multiple rounds of conversations; an intent determination module to determine an associated intent from the context information; the target instruction determining module is used for determining a target instruction according to the analysis result of the input data; the prediction intention determining module is used for acquiring a prediction intention corresponding to an analysis result of the input data; and the function module determining module is used for determining a target function module according to the associated intention, the prediction intention and the conversation type and sending the target instruction to the target function module so as to enable the target function module to execute the target instruction.
Further, the function module determining module includes: a first function determining unit, configured to determine, as a target function module, a function module corresponding to the association intention when the dialog type is a user-restricted dialog content type; a second function determining unit, configured to determine, as a target function module, a function module corresponding to a prediction intention of an online analysis result when the conversation type is an unlimited user conversation content type and an analysis result of the input data is the online analysis result; or a third function determining unit, configured to determine, as a target function module, the function module corresponding to the association intention when the conversation type is an unlimited user conversation content type and an analysis result of the input data is an offline analysis result.
Further, the input data parsing apparatus further includes: the online result intercepting module is used for intercepting an online analysis result sent by the server under the condition that the offline analysis result is credible and the online analysis result is not received; and the online result waiting module is used for waiting to receive the online analysis result fed back by the server under the condition that the offline analysis result is not credible.
Further, the target function module comprises a module of the vehicle-mounted device.
The input data analysis device can execute the input data analysis method provided by any embodiment of the disclosure, and has the corresponding functional modules and beneficial effects of executing the input data analysis method.
In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program object according to embodiments of the present disclosure.
FIG. 6 illustrates a schematic area diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, and the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 performs the respective methods and processes described above, such as the input data parsing method. For example, in some embodiments, the input data parsing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 600 via ROM602 and/or communications unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the input data parsing method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the input data parsing method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application specific standard objects (ASSPs), system on a chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or area diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (25)

1. An input data parsing method, comprising:
sending input data provided by a user to a server so that the server can analyze the input data on line;
performing offline analysis on the input data to obtain an offline analysis result, and performing credible detection on the offline analysis result;
under the condition that the off-line analysis result is credible and the on-line analysis result is not received, obtaining a credible off-line analysis result, and determining the credible off-line analysis result as the analysis result of the input data;
and under the condition that the offline analysis result is not credible or the online analysis result is received preferentially, acquiring the online analysis result, and determining the online analysis result as the analysis result of the input data.
2. The method of claim 1, wherein the performing a trusted test on the offline parsing result comprises:
and detecting the accuracy and the availability of the offline analysis result.
3. The method of claim 2, wherein the detecting the accuracy of the offline analysis result comprises at least one of:
obtaining a statement identification confidence score corresponding to the input data, and detecting whether the statement identification confidence score is greater than or equal to a preset confidence score threshold value; and
it is detected whether information is available for multiple sessions.
4. The method of claim 2, wherein the detecting availability of the offline resolution result comprises at least one of:
acquiring a prediction intention of the offline analysis result, and detecting whether the prediction intention corresponding to the offline analysis result is an offline support intention; and
and detecting whether the offline analysis result is analyzed to obtain a trusted instruction.
5. The method of claim 4, wherein the detecting whether the offline parsing result is parsed to obtain a trusted instruction comprises:
detecting whether the offline analysis result is analyzed to obtain at least one instruction matched with the function type, and determining an instruction analyzable detection result;
acquiring a resource dependence type of the instruction obtained by analysis, and determining a resource effective detection result;
and detecting whether the offline analysis result is analyzed to obtain a trusted instruction according to the instruction analyzable detection result and the resource effective detection result.
6. The method of claim 1, wherein the offline parsing of the input data to obtain an offline parsing result comprises:
performing voice recognition on the input data;
and carrying out semantic analysis on the voice recognition result to obtain an off-line analysis result.
7. The method of claim 6, further comprising:
performing word segmentation on a voice recognition result to obtain at least one alternative word;
acquiring pronunciation information of the alternative words;
inquiring the expected terms matched with the pronunciation information of the alternative terms in the prestored expected terms;
and replacing the candidate words with the expected words, and correcting the voice recognition result.
8. The method of claim 1, further comprising:
obtaining context information and conversation types of current multiple rounds of conversations;
determining an association intention according to the context information;
determining a target instruction according to the analysis result of the input data;
acquiring a prediction intention corresponding to an analysis result of the input data;
and determining a target function module according to the associated intention, the predicted intention and the conversation type, and sending the target instruction to the target function module so as to enable the target function module to execute the target instruction.
9. The method of claim 8, wherein said determining a target function module based on said associated intent, said predicted intent, and said dialog type comprises:
determining the functional module corresponding to the associated intention as a target functional module under the condition that the conversation type is the limited user conversation content type;
determining a function module corresponding to the prediction intention of the online analysis result as a target function module under the condition that the conversation type is a non-limited user conversation content type and the analysis result of the input data is an online analysis result; or
And determining the function module corresponding to the associated intention as a target function module under the condition that the conversation type is a non-limited user conversation content type and the analysis result of the input data is an offline analysis result.
10. The method of claim 1, further comprising:
intercepting an online analysis result sent by the server under the condition that the offline analysis result is credible and the online analysis result is not received;
and waiting for receiving the online analysis result fed back by the server under the condition that the offline analysis result is not credible.
11. The method of claim 8, wherein the target function module comprises a module of an in-vehicle device.
12. An input data parsing apparatus comprising:
the system comprises an input data acquisition module, a data analysis module and a data analysis module, wherein the input data acquisition module is used for sending input data provided by a user to a server so that the server can analyze the input data on line;
the off-line analysis credibility detection module is used for off-line analysis of the input data to obtain an off-line analysis result and carrying out credibility detection on the off-line analysis result;
the offline result acquisition module is used for acquiring a credible offline analysis result under the condition that the offline analysis result is credible and the online analysis result is not received, and determining the credible offline analysis result as the analysis result of the input data;
and the online result acquisition module is used for acquiring the online analysis result under the condition that the offline analysis result is not credible or the online analysis result is received preferentially, and determining the online analysis result as the analysis result of the input data.
13. The apparatus of claim 12, wherein the offline resolution trust detection module comprises:
and the accurate available detection unit is used for detecting the accuracy and the availability of the offline analysis result.
14. The apparatus of claim 13, wherein the accurate availability detection unit comprises at least one of:
the recognition accuracy detection subunit is used for acquiring statement recognition confidence scores corresponding to the input data and detecting whether the statement recognition confidence scores are greater than or equal to a preset confidence score threshold value; and
and the multi-turn conversation detection subunit is used for detecting whether the information of the multi-turn conversation can be acquired.
15. The apparatus of claim 13, wherein the accurate availability detection unit comprises at least one of:
the intention detection subunit is used for acquiring the prediction intention of the offline analysis result and detecting whether the prediction intention corresponding to the offline analysis result is an offline support intention; and
and the trusted instruction detection subunit is used for detecting whether the offline analysis result is analyzed to obtain a trusted instruction.
16. The apparatus of claim 15, wherein the trusted instruction detection subunit is specifically configured to:
detecting whether the offline analysis result is analyzed to obtain at least one instruction matched with the function type, and determining an instruction analyzable detection result;
acquiring a resource dependence type of the instruction obtained by analysis, and determining a resource effective detection result;
and detecting whether the offline analysis result is analyzed to obtain a trusted instruction according to the instruction analyzable detection result and the resource effective detection result.
17. The apparatus of claim 12, wherein the offline-resolution trust detection module comprises:
the voice recognition module is used for carrying out voice recognition on the input data;
and the offline analysis module is used for performing semantic analysis on the voice recognition result to obtain an offline analysis result.
18. The apparatus of claim 17, further comprising:
the recognition result word segmentation module is used for segmenting words of the voice recognition result to obtain at least one alternative word;
the pronunciation information determining module is used for acquiring pronunciation information of the alternative words;
the expected word query module is used for querying an expected word matched with the pronunciation information of the alternative word in the prestored expected words;
and the recognition result correction module is used for replacing the candidate words with the expected words and correcting the voice recognition result.
19. The apparatus of claim 12, further comprising:
the conversation information acquisition module is used for acquiring the context information and the conversation type of the current multiple rounds of conversations;
an intent determination module to determine an associated intent from the context information;
the target instruction determining module is used for determining a target instruction according to the analysis result of the input data;
the prediction intention determining module is used for acquiring a prediction intention corresponding to an analysis result of the input data;
and the function module determining module is used for determining a target function module according to the associated intention, the prediction intention and the conversation type and sending the target instruction to the target function module so as to enable the target function module to execute the target instruction.
20. The apparatus of claim 19, wherein the function module determining module comprises:
a first function determining unit, configured to determine, as a target function module, a function module corresponding to the association intention when the conversation type is a limited user conversation content type;
a second function determining unit, configured to determine, as a target function module, a function module corresponding to a prediction intention of an online analysis result when the conversation type is an unlimited user conversation content type and an analysis result of the input data is the online analysis result; or
And the third function determining unit is used for determining the function module corresponding to the association intention as a target function module under the condition that the conversation type is an unlimited user conversation content type and the analysis result of the input data is an offline analysis result.
21. The apparatus of claim 12, further comprising:
the online result intercepting module is used for intercepting an online analysis result sent by the server under the condition that the offline analysis result is credible and the online analysis result is not received;
and the online result waiting module is used for waiting to receive the online analysis result fed back by the server under the condition that the offline analysis result is not credible.
22. The apparatus of claim 19, the target function module comprising a module of an in-vehicle device.
23. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the input data parsing method of any of claims 1-11.
24. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the input data parsing method of any one of claims 1-11.
25. A computer program object comprising a computer program which, when executed by a processor, implements an input data parsing method according to any of claims 1-11.
CN202211338183.9A 2022-10-28 2022-10-28 Input data analysis method, device, electronic equipment and storage medium Active CN115662430B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211338183.9A CN115662430B (en) 2022-10-28 2022-10-28 Input data analysis method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211338183.9A CN115662430B (en) 2022-10-28 2022-10-28 Input data analysis method, device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115662430A true CN115662430A (en) 2023-01-31
CN115662430B CN115662430B (en) 2024-03-29

Family

ID=84993082

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211338183.9A Active CN115662430B (en) 2022-10-28 2022-10-28 Input data analysis method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115662430B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130085753A1 (en) * 2011-09-30 2013-04-04 Google Inc. Hybrid Client/Server Speech Recognition In A Mobile Device
US20130132084A1 (en) * 2011-11-18 2013-05-23 Soundhound, Inc. System and method for performing dual mode speech recognition
WO2016191319A1 (en) * 2015-05-27 2016-12-01 Google Inc. Local persisting of data for selectively offline capable voice action in a voice-enabled electronic device
WO2017166649A1 (en) * 2016-03-30 2017-10-05 乐视控股(北京)有限公司 Voice signal processing method and device
US20180330714A1 (en) * 2017-05-12 2018-11-15 Apple Inc. Machine learned systems
CN112331203A (en) * 2020-11-06 2021-02-05 深圳市欧瑞博科技股份有限公司 Intelligent household equipment control method and device, electronic equipment and storage medium
CN112331213A (en) * 2020-11-06 2021-02-05 深圳市欧瑞博科技股份有限公司 Intelligent household equipment control method and device, electronic equipment and storage medium
CN112509585A (en) * 2020-12-22 2021-03-16 北京百度网讯科技有限公司 Voice processing method, device and equipment of vehicle-mounted equipment and storage medium
US20210383802A1 (en) * 2020-06-05 2021-12-09 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for evaluating user intention understanding satisfaction, electronic device and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130085753A1 (en) * 2011-09-30 2013-04-04 Google Inc. Hybrid Client/Server Speech Recognition In A Mobile Device
US20130132084A1 (en) * 2011-11-18 2013-05-23 Soundhound, Inc. System and method for performing dual mode speech recognition
WO2016191319A1 (en) * 2015-05-27 2016-12-01 Google Inc. Local persisting of data for selectively offline capable voice action in a voice-enabled electronic device
WO2017166649A1 (en) * 2016-03-30 2017-10-05 乐视控股(北京)有限公司 Voice signal processing method and device
US20180330714A1 (en) * 2017-05-12 2018-11-15 Apple Inc. Machine learned systems
US20210383802A1 (en) * 2020-06-05 2021-12-09 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for evaluating user intention understanding satisfaction, electronic device and storage medium
CN112331203A (en) * 2020-11-06 2021-02-05 深圳市欧瑞博科技股份有限公司 Intelligent household equipment control method and device, electronic equipment and storage medium
CN112331213A (en) * 2020-11-06 2021-02-05 深圳市欧瑞博科技股份有限公司 Intelligent household equipment control method and device, electronic equipment and storage medium
CN112509585A (en) * 2020-12-22 2021-03-16 北京百度网讯科技有限公司 Voice processing method, device and equipment of vehicle-mounted equipment and storage medium

Also Published As

Publication number Publication date
CN115662430B (en) 2024-03-29

Similar Documents

Publication Publication Date Title
US11887604B1 (en) Speech interface device with caching component
US11817094B2 (en) Automatic speech recognition with filler model processing
CN105931644B (en) A kind of audio recognition method and mobile terminal
KR102518543B1 (en) Apparatus for correcting utterance errors of user and method thereof
US11978432B2 (en) On-device speech synthesis of textual segments for training of on-device speech recognition model
WO2020024620A1 (en) Voice information processing method and device, apparatus, and storage medium
KR20170033722A (en) Apparatus and method for processing user's locution, and dialog management apparatus
US11176934B1 (en) Language switching on a speech interface device
CN110415679A (en) Voice error correction method, device, equipment and storage medium
US20150302851A1 (en) Gesture-based cues for an automatic speech recognition system
CN111916088B (en) Voice corpus generation method and device and computer readable storage medium
CN111627432A (en) Active call-out intelligent voice robot multi-language interaction method and device
US10741178B2 (en) Method for providing vehicle AI service and device using the same
US20100131275A1 (en) Facilitating multimodal interaction with grammar-based speech applications
EP1110207B1 (en) A method and a system for voice dialling
CN114550713A (en) Dialogue system, vehicle, and dialogue system control method
JP2015052745A (en) Information processor, control method and program
CN115662430B (en) Input data analysis method, device, electronic equipment and storage medium
JP2003140690A (en) Information system, electronic equipment, and program
CN113077793B (en) Voice recognition method, device, equipment and storage medium
KR102479026B1 (en) QUERY AND RESPONSE SYSTEM AND METHOD IN MPEG IoMT ENVIRONMENT
CN112289312A (en) Voice instruction recognition method and device, electronic equipment and computer readable medium
CN115440220A (en) Speaking right switching method, device, equipment and storage medium
CN114943237A (en) Language translation method and device, electronic equipment and storage medium
KR20210032200A (en) Apparatus and method for providing multilingual conversation service

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant