CN112948585A

CN112948585A - Natural language processing method, device, equipment and storage medium based on classification

Info

Publication number: CN112948585A
Application number: CN202110310479.9A
Authority: CN
Inventors: 李洁琼; 莫凡; 毛丽旦; 于洋
Original assignee: Shanghai Xiandou Intelligent Robot Co ltd; Shanghai Xianta Intelligent Technology Co Ltd
Current assignee: Shanghai Xiandou Intelligent Robot Co ltd; Shanghai Xianta Intelligent Technology Co Ltd
Priority date: 2021-03-23
Filing date: 2021-03-23
Publication date: 2021-06-11

Abstract

The application provides a natural language processing method, a natural language processing device, natural language processing equipment and a natural language processing storage medium based on classification, wherein the method comprises the steps of obtaining input voice and obtaining a recognition text of the input voice; judging whether the sentence length of the recognized text of the input voice meets a first preset condition or not; when the sentence length of the recognition text of the input voice meets a first preset condition, classifying the recognition text of the input voice, and judging whether the recognition text of the input voice belongs to an effective recognition category according to the category of the recognition text of the input voice; and when the confusion degree of the recognition text of the input voice meets a second preset condition when the recognition text of the input voice belongs to the effective recognition category, performing natural language processing on the recognition text of the input voice and generating a feedback result. According to the method and the device, the natural language processing and response to the voice of the non-human-computer conversation are refused in the human-computer interaction process, and the robustness of the system natural language processing is further improved, so that the user experience is improved.

Description

Natural language processing method, device, equipment and storage medium based on classification

Technical Field

The present application relates to the field of intelligent dialogue, and in particular, to a natural language processing method, apparatus, device, and storage medium based on classification.

Background

At present, in the practical application scenario of the vehicle-mounted intelligent assistant, some words which are not transmitted to the vehicle-mounted instruction, such as chatting with other people on the vehicle, personal self-language and self-language, and the like, are inevitably mixed in the voice input by a user, some meaningless noises in the input voice can be shielded through VAD (voice activity detection), the voice containing the actual content can pass through an ASR (asynchronous speech recognition) system, the voice is recognized into characters and then transmitted to a subsequent NLU (line language learning) dialogue system, then an NLU module carries out natural language understanding on the input text and gives an interactive response, and then under the secondary interaction, the response of the next sentence is judged and rejected to shield a part of meaningless input text. However, the technical problem of this method is that the rejection is small, and most of the input text is still executed, which leads to further development of errors and also affects the chat experience of the user in driving. For example, in the use of the chat robot, sentences composed of words in some sets analyze the semantic meanings of the sentences as personal natural conversations, and the sentences are not interactive instructions with the robot, so that the sentences cannot be truly and effectively rejected simply through the rejection module in the ASR stage, the sentences are analyzed and are not instructions sent to the car-machine assistant, and the sentences do not need the car-machine assistant to respond.

Disclosure of Invention

An object of the embodiments of the present application is to provide a natural language processing method, apparatus, device, and storage medium based on classification, so as to improve recognition rejection accuracy of a non-human-computer conversation during an interaction with a voice assistant, thereby improving robustness of system natural language processing, and thus improving user experience.

To this end, a first aspect of the present application discloses a classification-based natural language processing method, the method comprising:

acquiring input voice;

recognizing the input voice according to an acoustic model and a first language model, and obtaining a recognition text of the input voice;

judging whether the sentence length of the recognized text of the input voice meets a first preset condition or not;

when the sentence length of the recognition text of the input voice meets the first preset condition, classifying the recognition text of the input voice to obtain the category of the recognition text of the input voice;

judging whether the recognition text of the input voice belongs to an effective recognition category or not according to the category of the recognition text of the input voice;

calculating the confusion degree of the recognition text of the input voice when the recognition text of the input voice belongs to the effective recognition category;

and when the confusion degree of the recognition text of the input voice meets a second preset condition, performing natural language processing on the recognition text of the input voice and generating a feedback result.

In the first aspect of the application, the input voice is recognized and the recognition text of the input voice is obtained, so that the recognition text of the input voice can be classified, the recognition text of the input voice is classified, whether the input voice is voice which does not need to be recognized and responded can be accurately judged based on the type of the recognition text of the input voice, and then the accuracy of the final voice assistant in the man-machine interaction for feeding back the speaker can be improved, so that the robustness of the voice assistant in natural language processing is improved, and the user experience is improved.

In the first aspect of the present application, as an optional implementation manner, after the determining whether the sentence length of the recognition text of the input speech satisfies the first preset condition, before the classifying the recognition text of the input speech to obtain the category of the recognition text of the input speech, the method further includes:

and when the sentence length of the recognition text of the input voice does not meet the first preset condition, refusing to perform natural language processing on the input voice.

In the embodiment of the application, when the sentence length of the recognition text of the input voice does not meet the first preset condition, the recognition text of the input voice is refused to be processed next step.

In the first aspect of the present application, as an optional implementation manner, the first preset condition is that a sentence length of a recognized text of the input speech is less than or equal to 50 words.

In this alternative embodiment, when the sentence length of the recognized text of the input speech is less than or equal to 50 words, it may be determined that the recognized text of the input speech satisfies the first preset condition, and when the sentence length of the recognized text of the input speech is greater than 50 words, it may be determined that the recognized text of the input speech is rejected to be processed next, that is, the final recognition result of the input speech is not responded.

In the first aspect of the present application, as an optional implementation manner, the category of the recognized text of the input speech is at least one of a task category, a car encyclopedia category, a general encyclopedia category, a chat category, an assistant category, and a recognition rejection category.

In this alternative embodiment, the category of the recognized text of the input speech is at least one of a task category, a car encyclopedia category, a general encyclopedia category, a chat category, an assistant category, and a rejection category.

In the first aspect of the present application, as an optional implementation manner, the determining, according to the category of the recognized text of the input speech, whether the recognized text of the input speech belongs to a valid recognition category includes:

determining that the recognition text of the input voice belongs to a valid recognition category when the category of the recognition text of the input voice is one of the task category, the car encyclopedia category, the common encyclopedia category, the chatting category and the assistant category;

and when the type of the recognition text of the input voice is the rejection type, determining that the recognition text of the input voice does not belong to the effective recognition type.

In this alternative embodiment, since the previous interaction between the user and the device may often involve utterances of a task category, a car encyclopedia category, a general encyclopedia category, and a chat category assistant category, when the category of the recognized text of the input voice is one of the task category, the car encyclopedia category, the general encyclopedia category, and the chat category assistant category, it may be determined that the input voice is a valid voice and responds to the voice.

In the first aspect of the present application, as an optional implementation manner, when the confusion degree of the recognized text of the input speech satisfies a second preset condition, performing natural language processing on the recognized text of the input speech to generate a feedback result includes:

and when the confusion degree of the recognition text of the input voice meets a second preset condition, recognizing the recognition text of the input voice according to a second language model and generating the feedback result.

In this alternative embodiment, the confusion is often used to count whether a sentence is reasonable, for example, when the confusion is high, it indicates that there are a lot of unknown sentences in the recognized text of the input speech, and the sentence description is incomplete, so that it can be further determined whether the input speech is valid according to the confusion of the recognized text of the input speech, and if not, the feedback result of the input speech is rejected.

A second aspect of the present application discloses a classification-based natural language processing apparatus, the apparatus comprising:

the acquisition module is used for acquiring input voice;

the first voice recognition module is used for recognizing the input voice according to the acoustic model and the first language model and obtaining a recognition text of the input voice;

the first judgment module is used for judging whether the sentence length of the recognition text of the input voice meets a first preset condition or not;

the text classification module is used for classifying the recognition text of the input voice when the sentence length of the recognition text of the input voice meets the first preset condition so as to obtain the category of the recognition text of the input voice;

the second judging module is used for judging whether the recognition text of the input voice belongs to an effective recognition category according to the category of the recognition text of the input voice;

the calculation module is used for calculating the confusion degree of the recognition text of the input voice when the recognition text of the input voice belongs to the effective recognition category;

and the natural language processing module is used for performing natural language processing on the recognition text of the input voice and generating a feedback result when the confusion degree of the recognition text of the input voice meets a second preset condition.

The device disclosed in the second aspect of the application can identify the input voice and obtain the identification text of the input voice by executing the natural language processing method based on classification, can classify the identification text of the input voice, can accurately identify whether the input voice is voice which does not need to be responded or not based on the category of the identification text of the input voice, and can improve the natural language processing effect of the final input voice, so that the voice assistant can perform the robustness of natural language processing, and improve the user experience.

In the second aspect of the present application, as an optional implementation manner, the first determining module is further configured to refuse to recognize the recognition text of the input speech when the sentence length of the recognition text of the input speech does not satisfy the first preset condition.

In this optional embodiment, through the first determining module, when the sentence length of the recognition text of the input speech does not satisfy the first preset condition, the recognition text of the input speech is rejected to be recognized, that is, the recognition text of the input speech is rejected to be processed next step.

A third aspect of the present application discloses a classification-based natural language processing apparatus, the apparatus comprising:

a processor; and

a memory configured to store machine readable instructions which, when executed by the processor, cause the processor to perform the classification-based natural language processing method of the first aspect of the application.

The device disclosed by the third aspect of the application can identify the input voice and obtain the identification text of the input voice by executing the natural language processing method based on classification, can further classify the identification text of the input voice, and can accurately identify whether the input voice is voice which does not need to be responded or not based on the classification of the identification text of the input voice by classifying the identification text of the input voice, so that the natural language processing effect of the final input voice can be improved, and thus, a voice assistant can be used for robustness of natural language processing and user experience is improved.

A fourth aspect of the present application discloses a storage medium storing a computer program for execution by a processor of the classification-based natural language processing method disclosed in the first aspect of the present application.

The storage medium disclosed in the fourth aspect of the present application is configured to perform a natural language processing method based on classification, and further recognize an input voice and obtain a recognition text of the input voice, and further classify the recognition text of the input voice, and further accurately recognize whether the input voice is a voice that does not require a response based on a category of the recognition text of the input voice, and further improve a natural language processing effect of a final input voice, so that a voice assistant performs robustness of natural language processing, and improves user experience.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

FIG. 1 is a flow chart of a natural language processing method based on classification according to an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a natural language processing apparatus based on classification according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a natural language processing device based on classification according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

Example one

Referring to fig. 1, fig. 1 is a flowchart illustrating a natural language processing method based on classification according to an embodiment of the present application. As shown in fig. 1, the method of the embodiment of the present application includes the steps of:

101. acquiring input voice;

102. recognizing the input voice according to the acoustic model and the first language model, and obtaining a recognition text of the input voice;

103. judging whether the sentence length of the recognized text of the input voice meets a first preset condition or not;

104. when the sentence length of the recognition text of the input voice meets a first preset condition, classifying the recognition text of the input voice to obtain the category of the recognition text of the input voice;

105. judging whether the recognition text of the input voice belongs to an effective recognition category or not according to the category of the recognition text of the input voice;

106. calculating the confusion degree of the recognition text of the input voice when the recognition text of the input voice belongs to the effective recognition category;

107. and when the confusion degree of the recognition text of the input voice meets a second preset condition, performing natural language processing on the recognition text of the input voice and generating a feedback result.

In the embodiment of the application, the input voice is recognized and the recognition text of the input voice is obtained, the recognition text of the input voice can be classified, the input voice can be recognized and the recognition text of the input voice can be obtained, the recognition text of the input voice can be classified, the recognition text of the input voice is classified, whether the input voice is voice which does not need to respond or not can be accurately recognized based on the type of the recognition text of the input voice, the natural language processing effect of the input voice can be improved finally, the robustness of natural language processing of a voice assistant is improved, and the user experience is improved.

Specifically, in step 101 of the embodiment of the present application, acquiring the input voice of the user may be implemented by a recording function, for example, by a recording function of a recorder.

Specifically, in step 102 illustrated at the time of the present application, the acoustic model and the first language model may use an acoustic model and a first language model in the related art, for example, the acoustic model may be a model constructed based on a hidden markov model in the related art. It should be noted that, for specific descriptions of the acoustic model and the first language model, for example, specific structures and use processes of the acoustic model and the first language model, please refer to the prior art, and details thereof are not described in the embodiments of the present application.

Specifically, in step 107 of the embodiment of the present application, the confusion, which is again the confusion or confusion (Perplexity), is a standard evaluation index of a language model, wherein, for a string S with a length of N, the language model gives a probability p (S), and the confusion of the language model is 2 { - (1/N) log2 p (S) }. In the embodiment of the application, whether the semantic meaning of the recognition text of the input voice is clear and the sentence is complete can be further judged through the confusion degree of the recognition text of the input voice, and further, the next processing on the input voice with the ambiguous semantic meaning and the incomplete sentence can not be executed.

Please refer to the prior art for other descriptions about the confusion, which will not be described in detail in the embodiments of the present application.

In the first embodiment of the present application, as an optional implementation manner, before determining whether the sentence length of the recognition text of the input speech satisfies the first preset condition, and classifying the recognition text of the input speech to obtain the category of the recognition text of the input speech, the method of the embodiment of the present application further includes the steps of:

and when the sentence length of the recognition text of the input voice does not meet the first preset condition, refusing to recognize the recognition text of the input voice.

In the embodiment of the application, when the sentence length of the recognition text of the input voice does not meet the first preset condition, the recognition text of the input voice can be rejected from being recognized, that is, the recognition text of the input voice is rejected from being processed in the next step.

In the embodiment of the present application, further, as an optional implementation manner, the first preset condition is that a sentence length of the recognized text of the input speech is less than or equal to 50 words.

In the embodiment of the present application, as an optional implementation manner, the category of the recognition text of the input speech is at least one of a task category, a car encyclopedia category, a general encyclopedia category, a chat category, an assistant category, and a recognition rejection category.

In the embodiment of the present application, as an optional implementation manner, step 105: judging whether the recognition text of the input voice belongs to an effective recognition category according to the category of the recognition text of the input voice, comprising the following steps:

when the type of the recognition text of the input voice is one of a task type, a car encyclopedia type, a common encyclopedia type, a chatting type and an assistant type, determining that the recognition text of the input voice belongs to an effective recognition type;

and when the category of the recognized text of the input voice is the rejection category, determining that the recognized text of the input voice does not belong to the valid recognition category.

In the embodiment of the present application, as an optional implementation manner, step 107: when the confusion degree of the recognition text of the input voice meets a second preset condition, performing natural language processing on the recognition text of the input voice and generating a feedback result, and the method comprises the following substeps:

and when the confusion degree of the recognition text of the input voice meets a second preset condition, performing natural language processing on the recognition text of the input voice according to the second language model and generating a feedback result.

In this alternative embodiment, the confusion degree is used to count whether a sentence is reasonable, for example, when the confusion degree is too high, it indicates that there are a lot of unknown sentences in the recognized text of the input speech, and the sentence description is incomplete, so that it can be further determined whether the input speech is valid according to the confusion degree of the recognized text of the input speech, and if not, the feedback result of the input speech is rejected.

In this embodiment of the application, the second preset condition may be that the confusion is smaller than a preset threshold, where the preset threshold may be 0.5, or 0.7, or other numerical values that can be used to determine whether the confusion satisfies the second preset condition.

In the embodiment of the present application, the second language model refers to a natural language understanding model, for example, may be a natural language understanding model with a model number of "ERNIE 2.0", where ERNIE 2.0 is capable of recognizing the user's intention based on the recognition text so as to match the corresponding feedback result. It should be noted that, the embodiment of the present application does not limit the specific model of the second language model.

Example two

Referring to fig. 2, fig. 2 is a schematic structural diagram of a classification-based natural language processing apparatus according to an embodiment of the present application. As shown in fig. 2, the apparatus of the embodiment of the present application includes:

an obtaining module 201, configured to obtain an input voice;

the voice recognition module 202 is configured to recognize an input voice according to the acoustic model and the first language model, and obtain a recognition text of the input voice;

the first judging module 203 is configured to judge whether a sentence length of the recognized text of the input speech meets a first preset condition;

the text classification module 204 is configured to classify the recognition text of the input speech to obtain a category of the recognition text of the input speech when a sentence length of the recognition text of the input speech meets a first preset condition;

a second judging module 205, configured to judge whether the recognition text of the input voice belongs to a valid recognition category according to the category of the recognition text of the input voice;

a calculating module 206, configured to calculate a confusion degree of the recognition text of the input speech when the recognition text of the input speech belongs to the valid recognition category;

and the natural language processing module 207 is configured to perform natural language processing on the recognition text of the input speech and generate a feedback result when the confusion of the recognition text of the input speech satisfies a second preset condition.

The device disclosed by the embodiment of the application can identify the input voice and obtain the identification text of the input voice by executing the natural language processing method based on classification, can classify the identification text of the input voice, can accurately identify whether the input voice is voice which does not need to be responded or not based on the classification of the identification text of the input voice, and can improve the natural language processing effect of the final input voice, so that a voice assistant can be used for robustness of natural language processing, and user experience is improved.

In this embodiment, as an optional implementation manner, the first determining module is further configured to reject to recognize the recognition text of the input speech when the sentence length of the recognition text of the input speech does not satisfy the first preset condition.

For other descriptions of the apparatus in the embodiments of the present application, refer to the detailed description of the first embodiment of the present application, which is not repeated herein.

EXAMPLE III

Referring to fig. 3, fig. 3 is a schematic structural diagram of a natural language processing device based on classification according to an embodiment of the present application. As shown in fig. 3, the apparatus of the embodiment of the present application includes:

a processor 301; and

the memory 302 is configured to store machine readable instructions, which when executed by the processor 301, cause the processor 301 to execute the classification-based natural language processing method according to the first embodiment of the present application.

Example four

The embodiment of the application discloses a storage medium, wherein a computer program is stored in the storage medium, and the computer program is executed by a processor to execute the natural language processing method based on classification disclosed by the embodiment of the application.

The storage medium disclosed by the embodiment of the application can identify the input voice and obtain the identification text of the input voice by executing the natural language processing method based on classification, can classify the identification text of the input voice, and can accurately identify whether the input voice is voice which does not need to be responded or not based on the classification of the identification text of the input voice by classifying the identification text of the input voice, so that the natural language processing effect of the final input voice can be improved, and the voice assistant can be used for robustness of natural language processing and improving user experience.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, a division of a unit is merely a division of one logic function, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

It should be noted that the functions, if implemented in the form of software functional modules and sold or used as independent products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

The above embodiments are merely examples of the present application and are not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for natural language processing based on classification, the method comprising:

acquiring input voice;

2. The method according to claim 1, wherein after the determining whether the sentence length of the recognized text of the input speech satisfies the first preset condition, before the classifying the recognized text of the input speech to obtain the category of the recognized text of the input speech, the method further comprises:

3. The method of claim 2, wherein the first preset condition is that a sentence length of the recognized text of the input speech is 50 words or less.

4. The method of claim 1, wherein the recognized text of the input speech has a category of at least one of a task category, a car encyclopedia category, a general encyclopedia category, a chat category, a helper category, and a reject category.

5. The method according to claim 4, wherein the determining whether the recognized text of the input speech belongs to a valid recognition category according to the category of the recognized text of the input speech comprises:

determining that the recognized text of the input voice belongs to the valid recognition category when the category of the recognized text of the input voice is one of the task category, the car encyclopedia category, the common encyclopedia category, the chatting category, and the assistant category;

6. The method of claim 5, wherein when the degree of confusion of the recognized text of the input speech satisfies a second preset condition, performing natural language processing on the recognized text of the input speech to generate a feedback result comprises:

7. A classification-based natural language processing apparatus, the apparatus comprising:

the acquisition module is used for acquiring input voice;

the voice recognition module is used for recognizing the input voice according to the acoustic model and the first language model and obtaining a recognition text of the input voice;

and the natural language processing module is used for performing natural language processing on the recognition text of the input voice to generate a feedback result when the confusion degree of the recognition text of the input voice meets a second preset condition.

8. The apparatus of claim 7, wherein the first determining module is further configured to reject recognition of the recognized text of the input speech when the sentence length of the recognized text of the input speech does not satisfy the first preset condition.

9. A classification-based natural language processing device, the device comprising:

a processor; and

a memory configured to store machine readable instructions that, when executed by the processor, cause the processor to perform the classification-based natural language processing method of any one of claims 1-6.

10. A storage medium storing a computer program for executing the classification-based natural language processing method according to any one of claims 1 to 6 by a processor.