CN111104803B

CN111104803B - Semantic understanding processing method, device, equipment and readable storage medium

Info

Publication number: CN111104803B
Application number: CN201911415186.6A
Authority: CN
Inventors: 艾坤; 梅林海; 刘权; 陈志刚; 王智国; 胡国平
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2024-02-13
Anticipated expiration: 2039-12-31
Also published as: CN111104803A

Abstract

The embodiment of the invention provides a semantic understanding processing method, a semantic understanding processing device, semantic understanding processing equipment and a readable storage medium, wherein word segmentation processing is carried out on sentences to be analyzed, and corresponding labels are set for word segmentation results; the labels comprise a part-of-speech label used for representing the general part-of-speech and a dictionary label used for representing the special part-of-speech; substituting the label into the statement to be analyzed to obtain an updated statement to be analyzed, and matching the updated statement to be analyzed with a preset matching rule to obtain a matching result; wherein the matching rule comprises an intention and a rule, and the rule at least comprises an N-tuple formed by connecting segmentation through a separator mark; keywords in the rule are represented by the dictionary labels corresponding to the keywords, and non-keywords in the rule are represented by the non-keywords. The rule in the embodiment of the invention supports various matching modes, and can meet the generalization requirement of semantic understanding rules.

Description

Semantic understanding processing method, device, equipment and readable storage medium

Technical Field

The present invention relates to the field of semantic understanding, and in particular, to a semantic understanding processing method, apparatus, device, and readable storage medium.

Background

Communication and understanding with others using natural language is one of the hallmarks of human intelligence and one of the most challenging capabilities for artificial intelligence. How to let a machine understand the semantics contained in natural language is an important task in the development of artificial intelligence technology.

The existing method for realizing semantic understanding is mainly a method based on a neural model, a scheme based on the neural network often needs a large amount of training data to improve the effect, meanwhile, when the model effect is problematic, the model needs to be retrained, the updating iteration is slow, the semantic understanding effect on a certain sentence pattern is possibly poor, the demands of users are different in different scenes, the sequence of words in the sentence does not influence the semantic in many cases, the model cannot recognize the correct semantic, more samples need to be used for training, and the training period is long.

Disclosure of Invention

The embodiment of the invention provides a semantic understanding processing method, a semantic understanding processing device, semantic understanding equipment and a readable storage medium, which are used for solving the problems of weak generalization capability of the semantic understanding method and high labor cost for constructing rules in the prior art.

In a first aspect, an embodiment of the present invention provides a semantic understanding processing method, including:

Performing word segmentation processing on the sentence to be analyzed, and setting corresponding labels for word segmentation results; the labels comprise a part-of-speech label used for representing the general part-of-speech and a dictionary label used for representing the special part-of-speech;

substituting the label into the statement to be analyzed to obtain an updated statement to be analyzed, and matching the updated statement to be analyzed with a preset matching rule to obtain a matching result;

wherein the matching rule comprises an intention and a rule, and the rule at least comprises an N-tuple formed by connecting segmentation through a separator mark; keywords in the rule are represented by the dictionary labels corresponding to the keywords, and non-keywords in the rule are represented by the non-keywords.

Preferably, the matching of the updated statement to be analyzed with a preset matching rule to obtain a matching result includes:

matching the keywords in the updated sentences to be analyzed with a preset matching rule to obtain an intermediate result;

and matching the non-keywords in the updated statement to be analyzed with the matching rules in the intermediate result to obtain the matching result.

Preferably, the matching rules comprise a plurality of matching rules, each matching rule corresponds to a different matching level, and the matching precision corresponding to each matching level from high to low is sequentially reduced;

Correspondingly, matching the updated statement to be analyzed with a preset matching rule to obtain a matching result, wherein the matching result comprises the following steps:

and according to the matching levels, matching the updated sentences to be analyzed with the matching rules corresponding to the matching levels in sequence from high to low until the matching is successful, and obtaining a matching result.

Preferably, the matching the sentence to be analyzed and the tag with the matching rule corresponding to each matching level in sequence from high to low until the matching is successful, and obtaining the matching result includes:

if the statement to be analyzed only belongs to one field, determining the matching result according to the matching score corresponding to the matching rule of successful matching in the field;

if the statement to be analyzed belongs to a plurality of fields, determining the matching result according to the matching level and the matching score corresponding to the matching rule of successful matching in each field.

Preferably, determining the matching result according to the matching score corresponding to the matching rule of successful matching in the field includes:

determining matching scores corresponding to matching rules which are successful in matching according to the number of word label labels in the sentence to be analyzed, the number of dictionary label labels and the matching length of the sentence to be analyzed and the matching rules which are successful in matching, and determining the matching result according to the matching scores;

The step of determining the matching result according to the matching level corresponding to the matching rule of successful matching in each field comprises the following steps:

comparing the matching levels corresponding to the matching rules successfully matched in the fields;

if the matching levels are different, taking the matching rule corresponding to the highest matching level as the matching result;

if the matching levels are the same, determining matching scores corresponding to matching rules which are successful in matching according to the number of word label labels in the sentences to be analyzed, the number of dictionary label labels and the matching length of the sentences to be analyzed and the matching rules which are successful in matching, and determining the matching result according to the matching scores.

Preferably, the method further comprises:

performing field classification on all the collected linguistic data, and performing word segmentation on the linguistic data by applying a word segmentation system corresponding to each field;

setting dictionary labels for word segmentation results by using a domain dictionary corresponding to the domain;

determining intent of the corpus by using verbs in the word segmentation result;

and generating a matching rule according to the intention, the word segmentation result and the dictionary label.

Preferably, generating a matching rule according to the intention, the word segmentation result and the dictionary label includes:

Generating a first-level matching rule according to the intention, the word segmentation result and the dictionary label;

generating a multi-level matching rule by using the generalization field based on the first-level matching rule;

wherein the generalization field includes:

a first field for concatenating the tokens to form an N-tuple with the "+" symbol that does not allow insertion of data as a separator tag, and for concatenating the tokens to form an N-tuple with the "-" symbol that does allow insertion of data as a separator tag;

a second field for concatenating the tokens with "-" symbols allowing the respective dictionary labels to change order as separator tokens to form an N-tuple;

the word length constraint field is used for representing the maximum difference value between the total length of the sentence to be analyzed and the number of the regular words and representing the maximum number of the inserted words between two dictionary labels;

a dictionary linked list field, configured to characterize a relationship between dictionary labels in a domain dictionary corresponding to each domain, where the relationship includes: equal to, contain, and combine.

In a second aspect, an embodiment of the present invention provides a semantic understanding processing apparatus, including:

the label setting unit is used for performing word segmentation processing on the sentences to be analyzed and setting corresponding labels for word segmentation results; the labels comprise a part-of-speech label used for representing the general part-of-speech and a dictionary label used for representing the special part-of-speech;

The rule matching unit substitutes the label into the statement to be analyzed to obtain an updated statement to be analyzed, and matches the updated statement to be analyzed with a preset matching rule to obtain a matching result;

In a third aspect, embodiments of the present invention provide a semantic understanding processing device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method as provided in the first aspect when the program is executed.

In a fourth aspect, embodiments of the present invention provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method as provided by the first aspect.

The semantic understanding processing method, the semantic understanding processing device, the semantic understanding processing equipment and the readable storage medium can meet the generalization requirement of semantic consistent sentences with different word sequences by supporting the rule matching process of various matching modes.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of an embodiment of a semantic understanding processing method of the present invention;

FIG. 2 is a flow chart of generating a preset matching rule in the semantic understanding processing method of the present invention;

FIG. 3 is a schematic diagram of a semantic understanding processing device according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an embodiment of the semantic understanding processing device according to the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In one embodiment of the present invention, a semantic understanding processing method is provided, which is described in detail with reference to fig. 1, and the semantic understanding processing method includes:

s100, word segmentation processing is carried out on the sentences to be analyzed, and corresponding labels are set for word segmentation results.

After receiving a to-be-analyzed sentence of which the user request is to be subjected to semantic understanding, firstly distributing the input user request, classifying the fields through a classification model, and judging the field to which the input sentence belongs. The classification model can be an existing classification model or a classification model trained in the process of constructing rules.

If the input user request does not belong to any domain, the input user request can be classified into a chat class, wherein the chat class is a non-task domain, such as greeting chat content.

After the input sentence is determined to belong to a specific field, a word segmentation system corresponding to the field is applied to perform word segmentation processing on the sentence to be analyzed. The segmented sentence can be regarded as a segmented result composed of a series of ordered words. The word segmentation system can use the existing word segmentation system to perform word segmentation processing, such as character string matching, full segmentation, word formation and other modes.

After the word segmentation process is completed, a corresponding tag needs to be set for the words in the sentence. Each word has corresponding general parts of speech, such as nouns, verbs, adverbs, prepositions, auxiliary words and the like; each word may also have a specific part of speech, e.g. the vocabulary of the music domain may contain singers, songs, bands, styles etc. Thus, the labels mentioned in this embodiment include a part-of-speech label for characterizing a generic part-of-speech and a dictionary label for characterizing a specific part-of-speech.

The step of setting the part-of-speech tags for word segmentation results by using a universal dictionary comprises the following steps: the sentence after word segmentation is formed from a series of ordered words, every word has correspondent attribute, such as noun, verb, adverb, preposition and auxiliary word, etc., the noun can be subdivided into general results of name, company, etc., then the general dictionary can be used for setting part-of-speech label for said word. The label is characterized by being applicable to all fields and not easy to generate ambiguity. The method specifically comprises the following steps: for the sentence to be analyzed after the word segmentation processing is completed, firstly, searching a general dictionary by using each word segment in the sentence to be analyzed to obtain a matching part of speech corresponding to the word segment, and setting the obtained matching part of speech as a part of speech tag of the corresponding word segment.

The step of setting the dictionary label for the word segmentation result by applying a domain dictionary corresponding to the domain may specifically be: the dictionary labels with fine granularity are marked on nouns and verbs of sentences to be analyzed in the request by using dictionaries in various fields, such as singers, songs, bands, styles and the like in the music field, and labels are marked on sentences to be analyzed, such as places, non-famous people and the like, by using NER model schemes for non-enumeration dictionaries. Dictionary labels give words more specific label meaning than part-of-speech labels. After each word segment in the sentence to be analyzed is marked with a part-of-speech tag, searching a domain dictionary corresponding to the sentence to be analyzed by using each word segment in the sentence to be analyzed, and if the corresponding matching word is searched in the domain dictionary by the word segment, taking the matching word in the domain dictionary as the dictionary tag of the word segment.

In addition, after setting corresponding labels for word segmentation results, the models can be used for carrying out duplication elimination on conflicting part-of-speech labels or dictionary labels, if Liu Dehua is not only a singer but also a song, the possibility of being the singer or the song is determined by using the language model, the language model is a model for measuring the smoothness of a sentence, the input is a string of sentences with labels, the output is a score between 0 and 1, the score is high and low to indicate the smoothness of the sentences, sentences with scores lower than a preset value are deleted, and duplication elimination is realized; if i want to listen to the song of Liu Dehua, liu Dehua is marked as either artist or song, the primitive sentence rewrites the song i want to listen to song and the song i want to listen to artist, obviously the former is not smooth and can be deleted.

Further, after the step is executed, part of the words in the sentence to be analyzed are marked with part of speech tags, and part of the words are marked with part of speech tags and dictionary tags at the same time.

S200, substituting the label into the statement to be analyzed to obtain an updated statement to be analyzed, and matching the updated statement to be analyzed with a preset matching rule to obtain a matching result.

Specifically, the label set for the statement to be analyzed output in the step S100 is brought into the statement to be analyzed to form an updated statement to be analyzed, and the updated statement to be analyzed is matched with different rules in the rule base to obtain a matching result. The intention of the sentence to be analyzed is the intention of the rule successfully matched with the sentence.

Specifically, if the dictionary label exists in the label reserved in the step S100, the dictionary label is preferentially used for constructing a fixed sentence pattern; if there is no dictionary label, a fixed sentence pattern is constructed using the lexical label.

Furthermore, each matching rule mentioned in this step includes an intention and a rule. Wherein the intended function is to semantically classify the sentence; the role of the rules is to match the words required for semantics and extract the keywords. The definition of the keywords in this embodiment is as follows: after a certain word in the sentence is replaced, if the semantic change of the sentence is performed under a specific scene, the word is a keyword.

Rules at least comprise N-tuples formed by connecting tokens through separator tags, keywords in the rules are represented by the dictionary labels corresponding to the keywords, and non-keywords in the rules are represented by the non-keywords themselves.

Illustrating: intent = pause: would you please stop it, this statement indicates that the statement is intended to stop.

Another example illustrates the intent of asking for the weather of [ weekend ] (date) [ san francisco ] (location), this statement indicates that it is intended to be an ask_weather, in which there are several keywords, the keywords in the address and date rule respectively, may be replaced by dictionary labels, indicating that as shown in this statement, the address and date may be replaced by dictionary labels, such as the intent = ask_weather: ask for the weather of [ date0: date ] [ location0:% location% ], in which date0 and location0 are dictionary sets. For a keyword in a rule replaced by a dictionary label, any vocabulary in the dictionary label can be used to match the keyword.

For non-keywords, the matching can be performed by using the non-keywords, and the rule of the sentence is changed to be the condition of the content=ask_weather: [ please ] + [ date0: date ] + [ location0:% location% ] + [ weather ].

The semantic understanding processing method in the embodiment supports various rule matching modes, and can meet the generalization requirement of semantic understanding rules.

In one embodiment of the present invention, substituting the tag into the sentence to be analyzed to obtain an updated sentence to be analyzed, and matching the updated sentence to be analyzed with a preset matching rule to obtain a matching result, further including:

In the semantic understanding system, a large number of rules may be contained in a constructed rule base, and a sentence to be analyzed input into the semantic understanding system also contains a plurality of segmentation words, if a single sentence to be analyzed is matched with each rule in the rule base one by one, a great challenge is presented to the operation efficiency of the semantic understanding system.

In this embodiment, the process of matching the updated statement to be analyzed with a preset matching rule is specifically divided into two steps: keyword matching and non-keyword matching.

Specifically, the updated sentence to be analyzed is obtained by substituting the label into the original obtained sentence to be analyzed, and therefore, the label data indicates the keyword information in the sentence to be analyzed. Firstly, matching keywords in sentences to be analyzed with matching rules in a rule base, judging which rules in the rule base can be matched with keyword information in sentences to be analyzed, and taking the matching rules successfully matched as intermediate results. The set of matching rules contained in the intermediate result is a subset of the set of matching rules in the overall semantic understanding system rules base.

Further, matching the non-keywords in the sentence to be analyzed with the matching rules contained in the intermediate result, judging which rules in the intermediate result can be matched with the non-keyword information in the sentence to be analyzed, and taking the matching rules successfully matched as the final matching result of the embodiment.

In this embodiment, the process of matching the updated sentence to be analyzed with the preset matching rule is performed in two steps. In the keyword matching step, a large number of matching rules which cannot be matched are screened by utilizing the matching of the matching rules in the keyword and rule base, so that the number of matching rules which need to be tried to be matched in the non-keyword matching step is greatly reduced, the calculated amount of the rule matching process is reduced, and the semantic understanding efficiency is improved.

In one embodiment of the present invention, the matching of the updated sentence to be analyzed with a preset matching rule to obtain a matching result further includes:

In this embodiment, the matching rules include a plurality of matching rules, each matching rule corresponds to a different matching level, and the matching precision corresponding to each matching level from high to low is sequentially reduced.

Specifically, in this embodiment, a multi-level rule system is proposed, and the rules are classified according to the accuracy and the aspiration rate of the rules, for example, the rules may be classified into four levels, i.e., one, two, three, and four levels from high accuracy to low accuracy, where the accuracy of the first level rule is the highest, the aspiration rate is the lowest, and so on. The rules of each level do not limit specific precision, and only the precision of the rule of the upper level is higher than that of the rule of the lower level, and each rule can be subjected to level adjustment according to actual use conditions.

The semantic understanding processing method in the embodiment uses a multi-level file mode, so that the requirements of users on different precision of semantic understanding under different scenes are met.

In one embodiment of the present invention, matching the sentence to be analyzed and the tag with the matching rule corresponding to each matching level in order from high to low until the matching is successful, further includes:

if the statement to be analyzed only belongs to one field, determining the matching result according to the matching level and the matching score corresponding to the matching rule of successful matching in the field;

The system first classifies rule files by domain, such as music, weather, navigation, etc. Each domain has its own rule system file. The rule system file contains the following parts, (1) a dictionary file, (2) a rule dictionary file, (3) a hierarchical rule file, and (4) a dictionary feature relation file.

The dictionary file is mainly used for collecting the enumerated word sets of the same category in the corresponding field, such as singers, songs and the like. The rule dictionary file is mainly used for acquiring the non-enumerated word sets of the same category in the corresponding field, such as time, place and other information. The two together form a domain dictionary for setting dictionary labels of the word segmentation in the sentences to be analyzed. For word segmentation in the word to be analyzed after word segmentation processing, setting dictionary labels corresponding to the word segmentation by searching a domain dictionary in the corresponding domain, namely a dictionary file and a rule dictionary file.

The classification rules in the classification rule file are described above, and refer to a rule base formed by all rules generated in the present invention. When the rules are matched with the sentences to be analyzed, the corresponding rules are required to be searched in a rule base to be matched with the sentences to be analyzed, and a matching result is obtained.

Dictionary profiles refer to the relationships between various dictionary labels, including equality, inclusion, combination, etc., that the system defines based on all dictionaries. When the dictionary label corresponding to the keyword X in a certain rule is the dictionary a, if the relation that the dictionary a contains the dictionary B exists in the dictionary feature file, words in the dictionary a and the dictionary B can be matched with the keyword X. For example, a person dictionary may contain artist, singer, student, etc., for rule intent = eat + [ please ] + [ person ] + [ like eat ] + [ freit ]. For artist, words within the student dictionary may also match.

Specifically, for a sentence to be analyzed, two situations may occur in which the sentence to be analyzed belongs to one domain or multiple domains according to the semantic richness of the sentence to be analyzed. If the sentence to be analyzed belongs to one field, the situation that the sentence to be analyzed is matched with a plurality of matching rules in the field may occur in the process of rule matching; the sentences to be analyzed belong to a plurality of fields, and in the process of rule matching, the situation that the sentences to be analyzed are matched with a plurality of matching rules in the plurality of fields can occur. And for the two cases that the sentence to be analyzed belongs to one field or a plurality of fields, the ways of determining the matching result from the matched plurality of matching rules are different.

The following describes how the matching result is determined from the matching rules in the two cases.

If the statement to be analyzed only belongs to one field, determining the matching result according to the matching level and the matching score corresponding to the matching rule of successful matching in the field. It can be understood that a sentence to be analyzed belonging to a field may be matched with a plurality of matching rules of different levels in the field, and at this time, a matching result needs to be determined according to the level of the matching rules; the sentence to be analyzed may also be matched with a plurality of rules in the same level in the field, and at this time, a matching result needs to be judged according to the matching score of the matching rule.

Specifically, determining the matching result according to the matching level and the matching score corresponding to the matching rule of successful matching in the field includes:

comparing the matching levels corresponding to the matching rules successfully matched in the field;

If the statement to be analyzed belongs to a plurality of fields, determining the matching result according to the matching level and the matching score corresponding to the matching rule of successful matching in each field. It can be understood that, for a sentence to be analyzed belonging to multiple fields, multiple matching rules of different levels in multiple fields may be matched, where a matching result needs to be determined according to the level of the matching rule; the sentence to be analyzed may also match a plurality of rules of the same level in a plurality of fields, and at this time, a matching result needs to be determined according to a matching score of the matching rule.

Specifically, the determining the matching result according to the matching level corresponding to the matching rule of successful matching in each field includes:

In the above description, if the matching levels are different, the matching rule corresponding to the highest matching level is used as the matching result, specifically, the higher the matching rule, the more accurate the matching rule result is, and the better semantic understanding effect can be achieved.

And if the matching levels are the same, determining matching scores corresponding to the matching rules which are successfully matched respectively, and determining the matching result according to the matching scores. Specifically, if the same-level rule has a plurality of results, the optimal result is given according to the comprehensive results of parts of speech, labels, matched lengths and the like. The part of speech tagged words are generally just nouns, verbs, adverbs and the like, the range is relatively large, the dictionary tagged words are a little smaller, and are relatively clear, such as singers and song names, and in an exemplary assumed matching rule, the number of dictionary tagged words is n, the number of part of speech tagged words is m, the number of non-keywords is o, the number of words which are not matched with the head is p, the number of words which are not matched with the middle is q, the number of words which are not matched with the tail is r, and k1 to k6 are coefficients, so that the matching score is:

k1*n ² /(m+n+o)+k2*m ² /(m+n+o)+k3*o ² /(m+n+o)-k4*p/(m+n+o)-k5*q/(m+n+o)-k6*r/(m+n+o)。

where the square in the formula is to increase the weight of the change, the effect is worse if all linear.

Under the condition of matching to a plurality of rules, the semantic understanding processing method in the embodiment realizes the optimal selection of the matching rules by comparing the matching level between the plurality of matching rules successfully matched and the matching score between the same-level rules.

In still another implementation manner, in order to reduce unnecessary low-precision matching processes, after the sentence to be analyzed is successfully matched with the rule with the high matching level, matching of the rule with the low matching level may not be performed any more, in this case, if the sentence to be analyzed only belongs to one field, the matching result may be determined directly according to the matching score corresponding to the matching rule with the successful matching in the field, which may specifically be: determining matching scores corresponding to matching rules which are successful in matching according to the number of word label labels in the sentence to be analyzed, the number of dictionary label labels and the matching length of the sentence to be analyzed and the matching rules which are successful in matching, and determining the matching result according to the matching scores; for the sentences to be analyzed belonging to a plurality of fields, the situation that the matching levels of the matching rules successfully matched in different fields are different still possibly occurs, so that the matching result can still be determined according to the matching level and the matching score corresponding to the matching rule successfully matched in each field.

In an embodiment of the present invention, in the matching of the updated sentence to be analyzed with a preset matching rule, the preset matching rule may be pre-generated, and a specific generation process is described in detail with reference to fig. 2.

S210, performing field classification on all the collected corpuses, and performing word segmentation on the corpuses by applying word segmentation systems corresponding to all the fields.

The method comprises the steps of collecting relevant linguistic data in a certain field, wherein the first stage can manually classify all the collected linguistic data, training a field classification model according to the existing linguistic data after enough field data exist, and classifying by using the trained field classification model without manual classification; then, the corpus is segmented by using the existing segmentation software.

S220, a domain dictionary corresponding to the domain is applied to set dictionary labels for word segmentation results.

After word segmentation, the word segmentation results are calibrated and combined by using a dictionary, if the word segmentation results divide Liu Dehe into two words, liu Dehua words exist in the dictionary, liu Dehua is combined, the word segmentation carries out certain word punctuation verbs, and actually, the word is a noun in the dictionary, marking correction is carried out, so that each word is ensured to have a correct dictionary label.

S230, determining the intention of the corpus by using verbs in the word segmentation result.

The verb in the word segmentation result is utilized to judge the corpus intention, such as playing/checking, or the trained intention classification model is utilized to judge the intention, the intention type of the corpus is marked manually according to the verb in the initial stage, after a certain training set is obtained, the labeled corpus is utilized to train the intention classification model, and after the model classification precision is higher than that of the initial method, the intention classification model can be utilized to classify the intention.

S240, generating a matching rule according to the intention, the word segmentation result and the dictionary label.

Further, the method specifically comprises the following steps:

and generating a first-level matching rule according to the intention, the word segmentation result and the dictionary label.

According to the constitution of the matching rule described in the foregoing embodiment, the matching rule includes an intention and a rule. And rules are generated from the word segmentation results of the sentence and the dictionary labels.

For example, for a sentence to be analyzed with intention of "ask weather", word segmentation result of "please ask/weekend/san francisco/weather", and relevant dictionary labels of date and location, the generated first-level rule is: content = ask_weather + [ please ] + [ date0: date ] + [ location0:% location% ] + [ weather ].

And generating a multi-level matching rule by using the generalization field based on the one-level matching rule.

The generalization field mentioned in this step is exemplified as follows.

Assume that the original sentence is:

content = ask_weather + [ please ] + [ date0: date ] + [ location0:% location% ] + [ weather ].

In order to further improve the generalization capability of the rule, the present embodiment provides four generalization fields, which are respectively:

a first field for concatenating the tokens to form an N-tuple with the "+" symbol that does not allow insertion of data as a separator tag, and for concatenating the tokens to form an N-tuple with the "-" symbol that does allow insertion of data as a separator tag.

The sentence-loading matching rule is converted into:

content = ask_weather: [ please ] - [ date0: date ] - [ location0:% location% ] - [ weather ]. When the user asks for the query "please ask for the weather of the Mingtian san francisco", the user can also be matched by rules, and data can be inserted between the word segments connected with the query, so long as the sequence of the word segments is consistent with the rules.

And a second field for concatenating the tokens to form an N-tuple with the "-" symbol allowing the respective dictionary labels to change order as a separator tag, the matching may be in the following form.

The time and place sequence changes when the user requests 'please ask weather in the open sky of san francisco', and can be identified by rules.

A word length constraint field for representing the maximum difference between the total length of the sentence to be analyzed and the number of the rule words, and representing the maximum number of the inserted words between two dictionary labels, such as rule intent=ask_weather: len+=5 win++2 [ please ] to [ date0:date ] to [ location0:% location% ] + [ weather ].

The rule indicates that the total length of the matched sentence cannot be 5 times more than the number of words of the rule (current 5), and two words can be inserted before date and location at most, for example, a semantic request of 'please ask about how weather is like the open sky of san francisco', can be matched with the rule.

A dictionary linked list field for characterizing a relationship between dictionary labels in a domain dictionary corresponding to each domain, the relationship including, but not limited to: equal to, contain, and combine, = eat + [ please ] + [ person ] + [ favorite ] + [ frey ] for rule. For artist, words within the student dictionary may also match.

From the above generalization field, it can be illustrated how a bi-tri-level rule with better generalization capability is generated from one level rule.

The primary rule of the first-level rule is mainly composed of "+", meanwhile, the difference between the request statement and the number of the words of the rule is not too large, and the primary rule is as follows:

intent＝ask_weather:len＝+3[‘weather’]+[‘like’]+[date0:％date％]+[‘in’]+[location0:％location％]。

the "+" in the first-level rule is replaced by "-" and the word length constraint is modified at the same time, and the second-level rule is used,

intent＝ask_weather:len＝+5win＝+2[‘weather’]-[‘like’]-[date0:％date％]-[‘in’]-[location0:％location％]。

furthermore, the connective in the sentence is changed into "" and the generalization capability is further improved, and the third-level rule is as follows:

intent＝ask_weather:len＝+10win＝+4[‘weather’]～[‘like’]～[date0:％date％]～[‘in’]～[location0:％location％]。

the fourth level is further generalized at the previous level, and by increasing the matching length, the dictionary range is extended, such as location0 is extended to NER, indicating that the noun is valid, and date is specific to time.

intent＝ask_weather:len＝+15win＝+6[‘weather’]～[‘like’]～[time:％date％]～[‘in’]～[NER:％location％]。

The rule system of the embodiment can automatically generate a first-level rule on the basis of the existing dictionary, and a next-level rule can be automatically generated according to a previous-level rule and then can be automatically adjusted according to the actual needs.

In one embodiment of the present invention, a semantic understanding processing apparatus is provided, which is described in detail with reference to fig. 3, and the semantic understanding processing apparatus includes:

a tag setting unit 31, configured to perform word segmentation processing on the sentence to be analyzed, and set a corresponding tag for the word segmentation result;

The semantic understanding processing device in the present embodiment has a tag setting unit 31 specifically including two functions of word segmentation and tag setting. After acquiring the sentence to be analyzed in the user request, the tag setting unit 31 first performs word segmentation processing on the sentence to be analyzed. For the field of semantic understanding, word segmentation processing is generally used as a text processing mode after a piece of text is acquired, and is the basis of semantic understanding. The word segmentation processing in this embodiment may use common character string matching, full segmentation, word formation, and other modes.

After the word segmentation process is completed, the tag setting unit 31 sets a corresponding tag for the words in the sentence. Each word has corresponding parts of speech, such as nouns, verbs, adverbs, prepositions, auxiliary words and the like; each word may also have specific semantics, such as vocabularies in the music domain may include singers, songs, bands, styles, etc. Thus, the labels mentioned in this embodiment include a part-of-speech label for characterizing a generic part-of-speech and a dictionary label for characterizing a specific part-of-speech.

And the rule matching unit 32 substitutes the label into the statement to be analyzed to obtain an updated statement to be analyzed, and matches the updated statement to be analyzed with a preset matching rule to obtain a matching result.

After the tag setting unit 31 sets the tag, the rule matching unit 32 brings the tag into the original sentence to form a fixed sentence pattern, and the matching result is obtained by matching different rules of the fixed sentence pattern in the rule base according to the rules of each field.

Specifically, if dictionary tags exist in the tags set by the tag setting unit 31, the dictionary tags are preferentially used to construct a fixed sentence pattern; if there is no dictionary label, a fixed sentence pattern is constructed using the lexical label.

Further, each matching rule mentioned in the present embodiment includes an intention and a rule. Wherein the intended function is to classify sentences; the role of the rules is to match the words required for semantics and extract the keywords. The definition of the keywords in this embodiment is as follows: after a certain word in the sentence is replaced, if the semantic change of the sentence is performed under a specific scene, the word is a keyword.

The matching rule comprises an intention and a rule, wherein the rule at least comprises an N-tuple formed by connecting word segmentation through a separator mark; keywords in the rule are represented by the dictionary labels corresponding to the keywords, and non-keywords in the rule are represented by the non-keywords.

The semantic understanding processing device in the embodiment supports various rule matching modes, and can meet the generalization requirements of semantic consistent sentences with different word sequences.

The following describes a semantic understanding processing device according to an embodiment of the present invention, and details are described with reference to fig. 4, where the semantic understanding processing device includes:

processor 410, communication interface (Communications Interface) 420, memory 430 and communication bus 440, wherein processor 410, communication interface 420 and memory 430 communicate with each other via communication bus 440. The processor 410 may call logic instructions in the memory 430 to perform, for example, the following methods: performing word segmentation processing on the sentence to be analyzed, and setting corresponding labels for word segmentation results; the labels comprise a part-of-speech label used for representing the general part-of-speech and a dictionary label used for representing the special part-of-speech; substituting the label into the statement to be analyzed to obtain an updated statement to be analyzed, and matching the updated statement to be analyzed with a preset matching rule to obtain a matching result; wherein the matching rule comprises an intention and a rule, and the rule at least comprises an N-tuple formed by connecting segmentation through a separator mark; keywords in the rule are represented by the dictionary labels corresponding to the keywords, and non-keywords in the rule are represented by the non-keywords.

Further, the logic instructions in the memory 430 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, embodiments of the present invention further provide a non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor is implemented to perform the transmission method provided in the above embodiments, for example, including: performing word segmentation processing on the sentence to be analyzed, and setting corresponding labels for word segmentation results; the labels comprise a part-of-speech label used for representing the general part-of-speech and a dictionary label used for representing the special part-of-speech; substituting the label into the statement to be analyzed to obtain an updated statement to be analyzed, and matching the updated statement to be analyzed with a preset matching rule to obtain a matching result; wherein the matching rule comprises an intention and a rule, and the rule at least comprises an N-tuple formed by connecting segmentation through a separator mark; in the N-tuple, the keywords in the rule are represented by the dictionary labels corresponding to the keywords, and the non-keywords in the rule are represented by the keywords.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A semantic understanding processing method, comprising:

substituting the label into the statement to be analyzed to obtain an updated statement to be analyzed, and matching keywords in the updated statement to be analyzed with a preset matching rule to obtain an intermediate result; matching the non-keywords in the updated statement to be analyzed with the matching rules in the intermediate result to obtain a matching result;

The matching rule comprises an intention and a rule, the rule at least comprises an N-tuple formed by connecting word segmentation through a separator mark, the keywords in the rule are represented by the corresponding dictionary labels, and the non-keywords in the rule are represented by the non-keywords in the rule;

the step of generating the preset matching rule comprises the following steps:

generating a matching rule according to the intention, the word segmentation result and the dictionary label;

the generating a matching rule according to the intention, the word segmentation result and the dictionary label comprises the following steps:

2. The semantic understanding processing method according to claim 1, wherein the matching rules include a plurality of matching rules, each matching rule corresponds to a different matching level, and matching accuracy corresponding to each matching level from high to low is sequentially lowered;

3. The semantic understanding processing method according to claim 2, wherein the sequentially matching the sentence to be analyzed and the tag with the matching rules corresponding to the matching levels in the order from high to low until the matching is successful, and the obtaining the matching result comprises:

4. A semantic understanding processing method according to claim 3, wherein determining the matching result according to a matching score corresponding to a matching rule of successful matching in the field comprises:

5. The semantic understanding processing method according to any one of claims 1 to 4, wherein the generalization field includes:

The second field is used for marking the connective word with the sign of the "" sign which allows the dictionary labels to change the sequence as a separator to form an N-tuple;

6. A semantic understanding processing apparatus, comprising:

the rule matching unit substitutes the label into the statement to be analyzed to obtain an updated statement to be analyzed, and matches keywords in the updated statement to be analyzed with a preset matching rule to obtain an intermediate result; matching the non-keywords in the updated statement to be analyzed with the matching rules in the intermediate result to obtain a matching result;

Wherein the matching rule comprises an intention and a rule, and the rule at least comprises an N-tuple formed by connecting segmentation through a separator mark; the keywords in the rule are represented by the dictionary labels corresponding to the keywords, and the non-keywords in the rule are represented by the non-keywords;

the step of generating the preset matching rule comprises the following steps:

7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the semantic understanding process method according to any of claims 1 to 5 when the program is executed.

8. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the semantic understanding processing method according to any one of claims 1 to 5.