CN108255956B

CN108255956B - Method and system for adaptively acquiring word bank field based on historical data and machine learning

Info

Publication number: CN108255956B
Application number: CN201711391038.6A
Authority: CN
Inventors: 蔡劲松; 苏少炜; 陈孝良; 冯大航; 常乐
Original assignee: Beijing SoundAI Technology Co Ltd
Current assignee: Beijing SoundAI Technology Co Ltd
Priority date: 2017-12-21
Filing date: 2017-12-21
Publication date: 2020-04-03
Anticipated expiration: 2037-12-21
Also published as: CN108255956A

Abstract

The invention provides a method for adaptively acquiring a speech lexicon field based on historical data and machine learning, which comprises the following steps: step S1, sentence mode classification of semantic plane is carried out on the voice recognition result, and a moving core in the voice command and a moving element related to the moving core are found; step S2, extracting the dynamic elements in the voice command, and selecting a plurality of word banks by combining machine learning and user historical data; step S3, performing syntactic plane word segmentation in the selected word stock by a natural language processing method, integrating results of a plurality of word stock fields for evaluation, solving the field with the highest evaluation score as an optimal result, outputting the optimal result, and updating user historical data; and step S4, combining the optimal result with the sentence analysis of the pragmatic plane to determine the final word stock field. The corresponding field is acquired from the historical data of the user in a self-adaptive manner by combining the use condition of the user historical word bank with machine learning, so that the flexibility and the accuracy are greatly improved.

Description

Method and system for adaptively acquiring word bank field based on historical data and machine learning

Technical Field

The disclosure relates to the field of artificial intelligence voice interaction, in particular to a method and a system for adaptively acquiring word stock based on historical data and machine learning.

Background

As one of exploration directions of man-machine interaction modes, under the continuous development in recent years, various manufacturers develop ASR and use chinese word segmentation to understand word meaning. Due to the characteristics of Chinese, the analysis of complex sentences is very complicated and time consuming. ASR vendors generally let users select corresponding lexicons to match corresponding domains, such as music domains, chat domains, etc., to reduce the complexity of the algorithm.

However, the voice instruction received by the intelligent sound box is mostly a simple sentence, namely, only one mobile structure is provided, and mostly a imperative sentence and a questioning sentence, and the mode is also relatively limited, so that the recognition result and the optimization of the corresponding field can be carried out according to the characteristics of the voice instruction received by the intelligent sound box.

The existing intelligent sound box speech word library field is lack of flexibility in selection, and generally needs manual designation, or parameter calling, or manual filling when applying for ASR service. And after the thesaurus field is specified, no method is available for carrying out corresponding adjustment according to the use scene and the historical data of the user.

BRIEF SUMMARY OF THE PRESENT DISCLOSURE

Technical problem to be solved

The present disclosure provides a method and system for adaptively acquiring a speech lexicon field based on historical data and machine learning to at least partially solve the above-mentioned technical problems.

(II) technical scheme

According to one aspect of the present disclosure, a method for adaptively acquiring a speech lexicon based on historical data and machine learning is provided, which includes: step S1, sentence mode classification of semantic plane is carried out on the voice recognition result, and a moving core in the voice command and a moving element related to the moving core are found; step S2, extracting the dynamic elements in the voice command, and selecting a plurality of word banks by combining machine learning and user historical data; step S3, performing syntactic plane word segmentation in the selected word stock by a natural language processing method, integrating results of a plurality of word stock fields for evaluation, obtaining the field with the highest evaluation score as an optimal result, outputting the optimal result, and updating user historical data; and step S4, combining the optimal result with the sentence analysis of the pragmatic plane to determine the final word stock field.

In some embodiments of the present disclosure, the sentence-mode classification of the semantic plane in step S1 adopts a pattern matching algorithm to obtain a moving core and a moving element related to the moving core in the voice instruction.

In some embodiments of the present disclosure, the step S2 includes: separating a moving core in the voice instruction and a moving element related to the moving core, extracting the moving element, and deeply learning a plurality of selected word banks according to the moving element and a machine; and selecting a plurality of word bank fields which are most frequently used by the user according to the historical data of the user.

In some embodiments of the present disclosure, in the step S3, performing syntactic plane segmentation in the selected lexicon by an N-shortest path method in natural language processing; the shortest path is selected by adopting a greedy algorithm or a Dijkstra shortest path algorithm.

In some embodiments of the present disclosure, the step S3, the integrating the results of the plurality of thesaurus fields for evaluation includes: evaluating the association degree between words and evaluating the result of the shortest path algorithm; updating the user history data includes: and updating the use condition of the user historical word bank field, and optimizing the weight of the nouns in the word bank through historical data and machine learning.

In some embodiments of the present disclosure, the step S1 is preceded by: and step S0, the ASR recognition engine receives a voice command sent by a user, and performs voice recognition to obtain a voice recognition result.

According to another aspect of the present disclosure, a system for adaptively acquiring a speech lexicon field based on historical data and machine learning is provided, which includes: the semantic plane analysis module is used for carrying out sentence-mode classification on the voice recognition result and sending the classification result to the selected various word banks; the syntactic plane analysis module is used for carrying out syntactic word segmentation in the selected word stock, evaluating the result of integrating a plurality of word stock fields and outputting the optimal result; and the pragmatic plane analysis module is used for determining the final word stock field by combining the optimal result with the sentence analysis of the pragmatic plane.

In some embodiments of the present disclosure, the semantic plane analysis module comprises: the sentence mode classification submodule is used for carrying out sentence mode classification of a semantic plane on the recognition result and finding a moving core in the voice command and a moving element related to the moving core; the machine selection submodule is used for sending the extracted dynamic elements to a plurality of word banks selected by combining machine deep learning; and the history selection submodule is used for sending the history data of the user to a plurality of word bank fields which are used most frequently by the user.

In some embodiments of the present disclosure, the syntax plane analysis module comprises: the word segmentation submodule is used for carrying out word segmentation of a syntactic plane in the selected word bank by using an N-shortest path method in natural language processing; the evaluation and updating submodule evaluates the result of the shortest path algorithm by evaluating the association degree between words and evaluates the field with the highest evaluation score as the optimal result; and updating the use condition of the user historical word bank field, and optimizing the weight of the nouns in the word bank through historical data and machine learning.

In some embodiments of the present disclosure, the ASR recognition engine is configured to receive a voice instruction sent by a user, perform voice recognition, and obtain a voice recognition result.

(III) advantageous effects

According to the technical scheme, the method and the system for self-adaptively acquiring the speech lexicon based on the historical data and the machine learning have at least one of the following beneficial effects:

(1) the word bank with high use frequency is preferentially found out according to the use condition of the user historical word bank, meanwhile, the corresponding field is acquired from the historical data of the user in a self-adaptive manner by combining machine learning as supplement, the condition that the user is forced to use the specific field by parameters or other modes is avoided, and therefore the flexibility and the accuracy are greatly improved;

(2) the sentence analysis is divided into three different aspects, and the analysis results of the syntax, sentence mode and pragmatic aspects of the three aspects are integrated, so that the complexity of the analysis is reduced, and the identification accuracy is improved.

Drawings

Fig. 1 is a flowchart of a method for adaptively acquiring a speech lexicon field based on historical data and machine learning according to an embodiment of the present disclosure.

Fig. 2 is a schematic structural diagram of a system for adaptively acquiring a speech lexicon field based on historical data and machine learning according to an embodiment of the present disclosure.

Detailed Description

The disclosure provides a method and a system for adaptively acquiring a speech word bank field based on historical data and machine learning. The present disclosure employs an analysis method that divides the type of sentence into three planes: syntactic, semantic, and pragmatic. The sentence type separated according to the syntax plane of the sentence may be referred to as a sentence pattern, for example, the sentence is divided into a predicate sentence and a non-predicate sentence. The sentence type separated according to the sentence semantic plane may be called a sentence model, for example, the sentence is divided into "action core + main event", "action core + main event + guest event". The sentence type separated from the sentence language plane may be called sentence, for example, the sentence may be divided into a statement sentence, a question sentence, a quiz sentence, and the like.

Because the sentence types analyzed from the three planes are different, and the combination of different levels can make the sentence analysis result and the field selection more reasonable. The self-adaptive selection of the user voice instruction word stock field enables a user or a developer not to specify the corresponding field, and can quickly select the corresponding field according to the historical data of the user instruction and the supplement of the field obtained by combining machine learning. The sentence analysis is carried out by utilizing three aspects of the sentence analysis, and the complexity of the analysis is reduced.

Before describing a solution to the problem, it is helpful to define some definitions of certain words.

ASR Automatic Speech Recognition technology;

the event core generally consists of a verb of a sentence or a verb and an adjective component of a predicate core;

and the dynamic element dynamically checks the associated mandatory semantic components.

For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.

Certain embodiments of the present disclosure will now be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the disclosure are shown. Indeed, various embodiments of the disclosure may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements.

In a first exemplary embodiment of the present disclosure, a method for adaptively acquiring a speech lexicon field based on historical data and machine learning is provided. Fig. 1 is a flowchart of a flow chart of adaptively selecting a thesaurus field according to a first embodiment of the disclosure. As shown in fig. 1, the method for adaptively acquiring a speech lexicon based on historical data and machine learning of the present disclosure includes:

step S0, the ASR recognition engine receives a voice command sent by a user, and performs voice recognition to obtain a recognition result;

step S1, carrying out sentence mode classification of semantic plane on the recognition result, and finding out a dynamic core in the voice command and a dynamic element related to the dynamic core, wherein the dynamic element is a host affair, a guest affair and the like, and most of the dynamic elements are expressed by noun components;

step S2, extracting the dynamic elements in the voice command, selecting a plurality of word banks by combining machine deep learning, and selecting a plurality of word bank fields which are most frequently used by the user according to the historical data of the user;

step S3, performing syntactic plane word segmentation in the selected word stock by using an N-shortest path method in natural language processing, integrating results of a plurality of word stock fields for evaluation, obtaining a field with the highest evaluation score as an optimal result, outputting the optimal result, and updating user historical data;

and step S4, performing sentence analysis of the pragmatic plane to determine the final word stock field.

Each step of the method for adaptively acquiring a speech lexicon according to the embodiment is described in detail below.

In step S1, sentence pattern classification of the semantic plane is performed, and a pattern matching algorithm is adopted, so as to obtain a moving core in the voice command and a moving element related to the moving core.

In step S2, the kernel and the related dynamic element in the voice command are separated, the dynamic element is extracted, and a plurality of word banks selected according to the dynamic element and machine deep learning are selected, for example: music, navigation, etc.; according to the historical data, selecting a plurality of word bank fields which are used most frequently by users, such as: chat, star, etc.

In step S3, performing syntactic plane segmentation in the selected lexicon by using the N-shortest path method in natural language processing; the principle of the N-shortest path method is as follows: each sentence will generate a directed acyclic graph, each word as a vertex of the graph, and the edges represent possible participles. Each edge has a weight (initial value is 1) which represents the probability of the word; preferably, the weight value adopts the value of TF-IDF obtained from a dictionary; and in the directed acyclic graph, searching N paths with the maximum weight values. Usually, there is more than one shortest path, and a greedy algorithm is adopted to solve a suboptimal solution or Dijkstra shortest path algorithm when selecting the shortest path. Because the optimal solution and the suboptimal solution of the shortest path have little difference in word segmentation effect, a greedy algorithm is preferably adopted to solve the optimal path;

the method for evaluating by integrating the results of the plurality of thesaurus fields comprises the following steps: evaluating the association degree between words (for example, the voice instruction is that "i want to hear the re-heading of rhizoma zingiberis mionectsis", the association degree of "rhizoma zingiberis mionectsis" and "re-heading" is higher than that of "chiori" and "re-heading"), and evaluating the result of the shortest-path algorithm (for example, the word of "rhizoma zingiberis mionectsis" is higher than that of "chiori" in the shortest-path algorithm);

updating the user history data includes: and updating the use condition of the user historical word bank field, and optimizing the weight of the nouns in the word bank through historical data and machine learning.

In step S4, the performing sentence analysis on the pragmatic plane includes: and analyzing whether the sentence sends a question or a command, and the like, and finally determining the word bank field according to the analysis result and the optimal result.

The method and the device can self-adaptively find out the matched user word bank field according to the historical data of the user by integrating machine learning, and dynamically update the use condition of the user word bank field. In the aspect of sentence analysis, the dynamic core and the dynamic element of the sentence are found in the aspect of sentence model, and the dynamic element is analyzed in the aspect of syntax in a word stock. And finally, analyzing sentence mode aspect and syntax aspect to analyze pragmatic aspect, and by the method, the accuracy of word bank field selection can be improved, and the accuracy of voice instruction recognition can be improved.

So far, the first embodiment of the present disclosure finishes introduction of a method for adaptively acquiring a speech word library field based on historical data and machine learning.

In a second exemplary embodiment of the present disclosure, a system for adaptively acquiring a speech lexicon domain based on historical data and machine learning is provided. Fig. 2 is a schematic structural diagram of a system for adaptively acquiring a speech lexicon field based on historical data and machine learning according to an embodiment of the present disclosure. As shown in fig. 2, the system includes: the system comprises an ASR recognition engine, a semantic plane analysis module, a syntax plane analysis module and a pragmatic plane analysis module.

The following describes each part of the system for adaptively acquiring the speech lexicon field based on the history data and the machine learning in detail.

The ASR recognition engine is used for receiving a voice instruction sent by a user, and performing voice recognition to obtain a recognition result;

a semantic plane analysis module, the semantic plane analysis module comprising:

a sentence mode classification submodule for performing sentence mode classification of semantic plane on the recognition result, and finding out a dynamic core in the voice command and a dynamic element related to the dynamic core (most of which are expressed by noun components, namely, the main affair and the guest affair and the like);

the machine selection submodule is used for sending the extracted dynamic elements to a plurality of word banks selected by combining machine deep learning;

and the history selection submodule is used for simultaneously sending the extracted action elements to a plurality of word bank fields which are most frequently used by the user according to the historical data of the user.

A syntactic plane analysis module, the syntactic plane analysis module comprising:

the word segmentation submodule is used for carrying out syntactic plane word segmentation in a selected word stock by using an N-shortest path method in natural language processing, and the principle of the N-shortest path method is as follows: each sentence will generate a directed acyclic graph, each word as a vertex of the graph, and the edges represent possible participles. Each edge has a weight (initial value is 1) which represents the probability of the word; preferably, the weight value adopts the value of TF-IDF obtained from a dictionary; and in the directed acyclic graph, searching N paths with the maximum weight values. Usually, there is more than one shortest path, and a greedy algorithm is adopted to solve a suboptimal solution or Dijkstra shortest path algorithm when selecting the shortest path. Because the optimal solution and the suboptimal solution of the shortest path have little difference in word segmentation effect, a greedy algorithm is preferably adopted to solve the optimal path;

the evaluation and updating submodule is used for evaluating by integrating results of a plurality of word bank fields, solving the field with the highest evaluation score as an optimal result, outputting the optimal result and updating the historical data of the user;

the areas with the highest evaluation scores comprise: evaluating the association degree between words (for example, the voice instruction is that "i want to hear the re-heading of rhizoma zingiberis mionectsis", the association degree of "rhizoma zingiberis mionectsis" and "re-heading" is higher than that of "chiori" and "re-heading"), evaluating the result of the shortest-path algorithm (for example, the word of "rhizoma zingiberis mionectsis" is higher than that of "chiori" in the shortest-path algorithm),

And the pragmatic plane analysis module is used for carrying out sentence class analysis of the pragmatic plane and determining the final word stock field.

For the purpose of brief description, any technical features that can be applied to the same in the above embodiment 1 are described herein, and the same description need not be repeated.

So far, the second embodiment of the present disclosure completes the introduction of each part of the system in the field of adaptively acquiring a speech lexicon based on historical data and machine learning.

So far, the embodiments of the present disclosure have been described in detail with reference to the accompanying drawings. It is to be noted that, in the attached drawings or in the description, the implementation modes not shown or described are all the modes known by the ordinary skilled person in the field of technology, and are not described in detail. Further, the above definitions of the various elements and methods are not limited to the various specific structures, shapes or arrangements of parts mentioned in the examples, which may be easily modified or substituted by those of ordinary skill in the art.

Furthermore, the word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements.

In addition, unless steps are specifically described or must occur in sequence, the order of the steps is not limited to that listed above and may be changed or rearranged as desired by the desired design. The embodiments described above may be mixed and matched with each other or with other embodiments based on design and reliability considerations, i.e., technical features in different embodiments may be freely combined to form further embodiments.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, this disclosure is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the present disclosure as described herein, and any descriptions above of specific languages are provided for disclosure of enablement and best mode of the present disclosure.

The disclosure may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. Various component embodiments of the disclosure may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components in the relevant apparatus according to embodiments of the present disclosure. The present disclosure may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present disclosure may be stored on a computer-readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Also in the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the disclosure, various features of the disclosure are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various disclosed aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that is, the claimed disclosure requires more features than are expressly recited in each claim. Rather, as the following claims reflect, disclosed aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this disclosure.

The above-mentioned embodiments are intended to illustrate the objects, aspects and advantages of the present disclosure in further detail, and it should be understood that the above-mentioned embodiments are only illustrative of the present disclosure and are not intended to limit the present disclosure, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims

1. A method for adaptively acquiring a speech lexicon field based on historical data and machine learning comprises the following steps:

step S0, the ASR recognition engine receives a voice command sent by a user, and performs voice recognition to obtain a voice recognition result;

step S1, sentence mode classification of semantic plane is carried out on the voice recognition result, and a moving core in the voice command and a moving element related to the moving core are found;

step S2, picking out the dynamic elements in the voice command, and combining the dynamic elements with a machine to learn a plurality of selected word banks according to the dynamic elements; selecting a plurality of word bank fields which are most frequently used by the user according to the historical data of the user;

step S3, performing syntactic plane word segmentation in the selected word bank by using a natural language processing method, integrating results of a plurality of word bank fields for evaluation, solving the field with the highest evaluation score as an optimal result, outputting the optimal result, updating the use condition of the user historical word bank field, and optimizing the weight of nouns in the word bank by historical data and machine learning, wherein the weight is the probability of the nouns, and the evaluation of the results of the plurality of word bank fields comprises the evaluation of the association degree between words in the speech recognition result;

and step S4, combining the optimal result with the sentence analysis of the pragmatic plane to determine the final word stock field.

2. The method according to claim 1, wherein the sentence-mode classification of the semantic plane in step S1 adopts a pattern matching algorithm to obtain the action kernel and the action element related thereto in the voice command.

3. The method according to claim 1, wherein the machine learning in the step S2 is deep learning.

4. The method of claim 1, wherein,

in step S3, performing syntactic plane segmentation in the selected lexicon by using the N-shortest path method in natural language processing; the shortest path is selected by adopting a greedy algorithm or a Dijkstra shortest path algorithm.

5. The method according to claim 1, wherein, in the step S3,

the method for evaluating by integrating the results of the plurality of thesaurus fields comprises the following steps: and evaluating the result of the shortest algorithm.

6. The method of claim 1, wherein in step S0, the ASR recognition engine employs automatic speech recognition techniques.

7. A system for adaptively acquiring a speech lexicon field based on historical data and machine learning comprises:

the ASR recognition engine is used for receiving a voice instruction sent by a user, performing voice recognition and obtaining a voice recognition result;

the semantic plane analysis module is used for carrying out sentence-mode classification on the voice recognition result and finding a moving core in the voice command and a moving element related to the moving core; extracting moving elements in the voice command, and combining a machine to learn a plurality of selected word banks according to the moving elements; selecting a plurality of word bank fields which are most frequently used by the user according to the historical data of the user;

the syntactic plane analysis module is used for carrying out syntactic word segmentation in the selected word bank, evaluating results of a plurality of word bank fields by synthesis, solving the word bank field with the highest evaluation score as an optimal result, outputting the optimal result, updating the use condition of the user historical word bank field, and optimizing the weight of nouns in the word bank by historical data and machine learning, wherein the weight is the probability of the nouns, and the evaluation of the results of the plurality of word bank fields comprises the evaluation of the association degree between words in the speech recognition result;

and the pragmatic plane analysis module is used for determining the final word stock field by combining the optimal result with the sentence analysis of the pragmatic plane.

8. The system of claim 7, wherein the semantic plane analysis module comprises:

the sentence mode classification submodule is used for carrying out sentence mode classification of a semantic plane on the recognition result and finding a moving core in the voice command and a moving element related to the moving core by adopting a mode matching algorithm;

and the history selection submodule is used for sending the history data of the user to a plurality of word bank fields which are used most frequently by the user.

9. The system of claim 7, wherein the syntactic plane analysis module comprises:

the word segmentation submodule is used for carrying out word segmentation of a syntactic plane in the selected word bank by using an N-shortest path method in natural language processing;

the evaluation and updating submodule evaluates the result of the shortest path algorithm by evaluating the association degree between words and evaluates the field with the highest evaluation score as the optimal result; and updating the use condition of the user historical word bank field, and optimizing the weight of the nouns in the word bank through historical data and machine learning.

10. The system of claim 7, the ASR recognition engine employing automatic speech recognition techniques.