CN111104803A

CN111104803A - Semantic understanding processing method, device and equipment and readable storage medium

Info

Publication number: CN111104803A
Application number: CN201911415186.6A
Authority: CN
Inventors: 艾坤; 梅林海; 刘权; 陈志刚; 王智国; 胡国平
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2020-05-05
Anticipated expiration: 2039-12-31
Also published as: CN111104803B

Abstract

The embodiment of the invention provides a semantic understanding processing method, a semantic understanding processing device, semantic understanding processing equipment and a readable storage medium, wherein a sentence to be analyzed is subjected to word segmentation processing, and a corresponding label is set for a word segmentation result; the tags comprise part-of-speech tags for characterizing general parts-of-speech and dictionary tags for characterizing special parts-of-speech; substituting the label into the statement to be analyzed to obtain an updated statement to be analyzed, and matching the updated statement to be analyzed with a preset matching rule to obtain a matching result; wherein the matching rules include intents and rules, the rules including at least N-tuples formed by concatenating participles via delimiter markers; keywords in the rules are represented by their corresponding dictionary tags, and non-keywords in the rules are represented by themselves. The rule in the embodiment of the invention supports various matching modes, and can meet the generalization requirement of the semantic understanding rule.

Description

Semantic understanding processing method, device and equipment and readable storage medium

Technical Field

The present invention relates to the field of semantic understanding, and in particular, to a semantic understanding processing method, apparatus, device, and readable storage medium.

Background

The use of natural language to communicate and understand with others is one of the signs of human intelligence and one of the most challenging capabilities for human intelligence. How to let a machine understand the semantics contained in a natural language is an important work in the development process of an artificial intelligence technology.

The existing semantic understanding realization method is mainly based on a neural model, a scheme based on a neural network usually needs a large amount of training data to improve the effect, meanwhile, when the model effect has problems, the model needs to be retrained, the updating iteration is slow, the semantic understanding effect on a certain sentence pattern is possibly poor, the requirements of users are different in different scenes, the sequence of words in the sentence does not influence the semantics in many cases, the model cannot identify the correct semantics, more samples are needed for training, and the training period is long.

Disclosure of Invention

The embodiment of the invention provides a semantic understanding processing method, a semantic understanding processing device, semantic understanding processing equipment and a readable storage medium, which are used for solving the problems that in the prior art, a semantic understanding method is not strong in generalization capability and high in labor cost for rule construction.

In a first aspect, an embodiment of the present invention provides a semantic understanding processing method, including:

performing word segmentation processing on the sentence to be analyzed, and setting a corresponding label for a word segmentation result; the tags comprise part-of-speech tags for characterizing general parts-of-speech and dictionary tags for characterizing special parts-of-speech;

substituting the label into the statement to be analyzed to obtain an updated statement to be analyzed, and matching the updated statement to be analyzed with a preset matching rule to obtain a matching result;

wherein the matching rules include intents and rules, the rules including at least N-tuples formed by concatenating participles via delimiter markers; keywords in the rules are represented by their corresponding dictionary tags, and non-keywords in the rules are represented by themselves.

Preferably, matching the updated sentence to be analyzed with a preset matching rule to obtain a matching result, including:

matching the keywords in the updated sentence to be analyzed with a preset matching rule to obtain an intermediate result;

and matching the non-keyword in the updated statement to be analyzed with the matching rule in the intermediate result to obtain the matching result.

Preferably, the matching rules include a plurality of matching rules, each matching rule corresponds to a different matching level, and matching accuracies corresponding to the matching levels from high to low are sequentially reduced;

correspondingly, matching the updated statement to be analyzed with a preset matching rule to obtain a matching result, including:

and according to the matching levels, sequentially matching the updated statement to be analyzed with the matching rules corresponding to the matching levels from high to low until the matching is successful, and obtaining a matching result.

Preferably, the matching the statement to be analyzed and the tag with the matching rules corresponding to the matching levels in sequence from high to low until the matching is successful, and obtaining the matching result includes:

if the sentence to be analyzed only belongs to one field, determining the matching result according to the matching score corresponding to the matching rule which is successfully matched in the field;

and if the sentence to be analyzed belongs to a plurality of fields, determining the matching result according to the matching level and the matching score corresponding to the matching rule which is successfully matched in each field.

Preferably, determining the matching result according to the matching score corresponding to the matching rule successfully matched in the field includes:

determining matching scores corresponding to matching rules which are successfully matched according to the number of part-of-speech tag labels in the sentence to be analyzed, the number of dictionary tag labels and the matching length of the sentence to be analyzed and each matching rule which is successfully matched, and determining the matching result according to the matching scores;

the determining the matching result according to the matching level corresponding to the matching rule successfully matched in each field comprises the following steps:

comparing the matching levels corresponding to the matching rules successfully matched in each field;

if the matching levels are different, the matching rule corresponding to the highest matching level is used as the matching result;

if the matching levels are the same, determining matching scores corresponding to matching rules which are successfully matched according to the number of part-of-speech tag labels in the sentence to be analyzed, the number of dictionary tag labels and the matching length of the sentence to be analyzed and the matching rules which are successfully matched, and determining the matching result according to the matching scores.

Preferably, the method further comprises:

performing field classification on all collected corpora, and performing word segmentation processing on the corpora by using word segmentation systems corresponding to the fields;

setting dictionary labels for word segmentation results by applying a domain dictionary corresponding to the domain;

determining the intention of the corpus by using verbs in the word segmentation result;

and generating a matching rule according to the intention, the word segmentation result and the dictionary label.

Preferably, generating a matching rule according to the intention, the word segmentation result and the dictionary label comprises:

generating a primary matching rule according to the intention, the word segmentation result and the dictionary label;

generating a multi-stage matching rule by utilizing a generalization field based on the one-stage matching rule;

wherein the generalization field comprises:

a first field for forming an N-tuple by concatenating the participles with a "+" sign as a delimiter mark that does not allow insertion of data, and for forming an N-tuple by concatenating the participles with a "-" sign as a delimiter mark that allows insertion of data;

a second field for connecting the participles to form an N-tuple with a "" symbol as a delimiter mark allowing the dictionary labels to change order;

the word length constraint field is used for representing the maximum difference between the total length of the sentence to be analyzed and the number of regular words and representing the maximum number of words inserted between two dictionary labels;

a dictionary linked list field for representing the relationship between dictionary labels in the domain dictionary corresponding to each domain, the relationship comprising: equal to, inclusive, and combinable.

In a second aspect, an embodiment of the present invention provides a semantic understanding processing apparatus, including:

the label setting unit is used for performing word segmentation processing on the sentence to be analyzed and setting a corresponding label for a word segmentation result; the tags comprise part-of-speech tags for characterizing general parts-of-speech and dictionary tags for characterizing special parts-of-speech;

the rule matching unit is used for substituting the label into the statement to be analyzed to obtain an updated statement to be analyzed, and matching the updated statement to be analyzed with a preset matching rule to obtain a matching result;

In a third aspect, an embodiment of the present invention provides a semantic understanding processing apparatus, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the method provided in the first aspect when executing the program.

In a fourth aspect, an embodiment of the present invention provides a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the method as provided in the first aspect.

The semantic understanding processing method, the semantic understanding processing device, the semantic understanding processing equipment and the readable storage medium provided by the embodiment of the invention can meet the generalization requirement of sentences with different word sequences and consistent semantics by supporting the rule matching process of multiple matching modes.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a flow chart of an embodiment of a semantic understanding processing method of the present invention;

FIG. 2 is a flow chart of the generation of the preset matching rule in the semantic understanding processing method according to the present invention;

FIG. 3 is a schematic structural diagram of a semantic understanding processing apparatus according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an embodiment of the semantic understanding processing device of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In an embodiment of the present invention, a semantic understanding processing method is provided, which is described in detail with reference to fig. 1, and the semantic understanding processing method includes:

s100, performing word segmentation processing on the sentence to be analyzed, and setting a corresponding label for a word segmentation result.

After receiving a statement to be analyzed, which is requested by a user to be subjected to semantic understanding, the input user request is distributed, the domain classification is carried out through a classification model, and the domain to which the input statement belongs is judged. The classification model can be an existing classification model or a classification model obtained by training when rules are built.

If the input user request does not belong to any field, the input can be classified into a chat class, and the chat class is a non-task field, such as greeting chat content.

And after determining that the input sentence belongs to a certain specific field, applying a word segmentation system corresponding to the field to perform word segmentation processing on the sentence to be analyzed. The sentence after word segmentation can be regarded as a word segmentation result formed by a series of ordered words. The word segmentation system can use the existing word segmentation system to perform word segmentation processing, such as character string matching, full segmentation, word formation and the like.

After the word segmentation process is completed, corresponding tags need to be set for the words in the sentence. Each word has corresponding universal part-of-speech, such as noun, verb, adverb, preposition, auxiliary word, etc.; each term may also have a specific part-of-speech, such as the vocabulary in the music field may include singers, songs, bands, styles, etc. Thus, the labels mentioned in this embodiment include part-of-speech labels for characterizing general parts-of-speech and dictionary labels for characterizing special parts-of-speech.

The setting of the part-of-speech tag for the word segmentation result by applying the general dictionary specifically comprises the following steps: the sentence after word segmentation is composed of a series of ordered words, each word has corresponding attributes, such as nouns, verbs, adverbs, prepositions, auxiliary words and the like, the nouns can subdivide common results such as names of people, companies and the like, then a general dictionary is applied to obtain the general part of speech of the words in the sentence after word segmentation, and the obtained general part of speech is used for setting part of speech labels for the words. The label is characterized by being applicable to all fields and not easy to generate ambiguity. The method specifically comprises the following steps: for the sentence to be analyzed after the word segmentation processing is finished, firstly, each word in the sentence to be analyzed is used for searching the general dictionary to obtain the matching part of speech corresponding to the word segmentation, and the obtained matching part of speech is set as the part of speech label of the corresponding word.

The setting of the dictionary label for the word segmentation result by applying the domain dictionary corresponding to the domain may specifically be: and labeling fine-grained dictionary labels such as singers, songs, bands, styles and the like in the music field possibly by using various field dictionaries to the nouns and verbs of the sentences to be analyzed in the request, and labeling the sentences to be analyzed such as places, non-famous names and the like by using an NER model scheme for a non-enumerable dictionary. Dictionary labels give words more specific label meaning than part-of-speech labels. And after each participle in the sentence to be analyzed is labeled with a part-of-speech label, searching a domain dictionary corresponding to the sentence to be analyzed by using each participle in the sentence to be analyzed, and if the corresponding matched word is searched in the domain dictionary by the participle, taking the matched word in the domain dictionary as the dictionary label of the participle.

In addition, after setting corresponding tags for word segmentation results, a model can be used for de-weighting conflicting part-of-speech tags or dictionary tags, for example, when Liudebua is both a singer and a song, the possibility that Liudeb is the singer or the song is determined by using a language model, the language model is a model for measuring the smoothness degree of a sentence, a string of tagged sentences is input, scores between 0 and 1 are output, the level of the score indicates the smoothness degree of the sentence, and sentences with scores lower than a preset value are deleted to realize de-weighting; like the songs i want to listen to Liu De Hua, Liu De Hua is marked as artist and song as song, the original sentence rewrites the songs i want to listen to song and the songs i want to listen to artist, the former is obviously not smooth and can be deleted.

Furthermore, after the step is executed, part of the participles in the sentence to be analyzed are only marked with part-of-speech tags, and part of the participles are marked with part-of-speech tags and dictionary tags at the same time.

S200, substituting the label into the statement to be analyzed to obtain an updated statement to be analyzed, and matching the updated statement to be analyzed with a preset matching rule to obtain a matching result.

Specifically, in this step, the tag set for the statement to be analyzed, which is output in step S100, is brought into the statement to be analyzed to form an updated statement to be analyzed, and the updated statement to be analyzed is matched with different rules in the rule base to obtain a matching result. The intention of the statement to be analyzed is the intention of the rule matched with the statement to be analyzed successfully.

Specifically, if there is a dictionary label in the labels retained in step S100, the dictionary label is preferentially used to construct a fixed sentence pattern; if there are no dictionary tags, then the lexical tags are used to construct a fixed sentence pattern.

Further, each matching rule mentioned in this step includes an intention and a rule. Wherein the intent functions to semantically classify the sentence; the role of the rules is to match the words required by the semantics and to extract the keywords. The definition of the keywords in this embodiment is as follows: after a word in a sentence is replaced, if the semantic of the sentence changes in a specific scene, the word is a keyword.

The rules comprise at least N-tuples formed by concatenating participles by delimiter markers, keywords in the rules being represented by their (keyword) corresponding dictionary tags, non-keywords in the rules being represented by their (non-keyword) themselves.

For example, the following steps are carried out: while the statement indicates that the statement is intended to stop.

Another example is the weather of the intent ask for [ end of week ] (date) [ san Francisco ] (location), where there are several keywords, the keywords in the address and date rules respectively, can be replaced by dictionary tags, indicating a certain set, as shown in the sentence, where the address and date can be replaced by dictionary tags, as shown in the sentence, the weather of ask for [ date0: date ] [ location 0:% location ], where date0 and location0 are dictionary sets. For a keyword in a rule that a dictionary tag replaces, any vocabulary in the dictionary tag can be used to match the keyword.

For non-keywords, the matching can be performed by itself, and the rule of the above sentence is changed to intent ═ ask _ weather: [ ask ] + [ date0: date ] + [ location 0:% location ] + [ weather ].

The semantic understanding processing method in the embodiment supports multiple rule matching modes, and can meet the generalization requirement of the semantic understanding rule.

In an embodiment of the present invention, substituting the label into the statement to be analyzed to obtain an updated statement to be analyzed, and matching the updated statement to be analyzed with a preset matching rule to obtain a matching result, further including:

In the semantic understanding system, a large number of rules may be contained in the constructed rule base, and the sentence to be analyzed input into the semantic understanding system also contains a plurality of participles, and if the whole sentence to be analyzed is matched with each rule in the rule base one by one, great challenge is brought to the operation efficiency of the semantic understanding system.

In this embodiment, the process of matching the updated sentence to be analyzed with the preset matching rule is specifically divided into two steps: keyword matching and non-keyword matching.

Specifically, the updated sentence to be analyzed is obtained by substituting the tag into the originally obtained sentence to be analyzed, and therefore, the tag data indicates the keyword information in the sentence to be analyzed. Firstly, matching keywords in the sentence to be analyzed with matching rules in a rule base, judging which rules in the rule base can be matched with keyword information in the sentence to be analyzed, and taking the matching rules which are successfully matched as intermediate results. The set of matching rules contained in the intermediate result is a subset of the set of matching rules in the entire semantic understanding system rule base.

Further, matching the non-keyword in the sentence to be analyzed with the matching rule contained in the intermediate result, determining which rules in the intermediate result can be matched with the non-keyword information in the sentence to be analyzed, and taking the matching rule successfully matched as the final matching result of the embodiment.

In this embodiment, the process of matching the updated statement to be analyzed with the preset matching rule is performed in two steps. In the keyword matching step, a large number of matching rules which cannot be matched are screened by matching the keywords with the matching rules in the rule base, so that the number of the matching rules which need to try to be matched in the non-keyword matching step is greatly reduced, the calculation amount in the rule matching process is reduced, and the semantic understanding efficiency is improved.

In an embodiment of the present invention, matching the updated statement to be analyzed with a preset matching rule to obtain a matching result, further comprising:

In this embodiment, the matching rules include a plurality of matching rules, each matching rule corresponds to a different matching level, and the matching precision corresponding to each matching level from high to low decreases in sequence.

Specifically, the present embodiment proposes a multi-level rule system, which performs classification according to the precision and the aspiration rate of the rule, for example, the rule may be divided into four levels of rules, which are divided into one, two, three, and four levels from high precision to low precision, wherein the one level of rule has the highest precision and the lowest aspiration rate, and so on. The specific precision of each level of rule is not limited, only the precision of the rule of the previous level is higher than that of the rule of the next level, and the level of each rule can be adjusted according to the actual use condition.

The semantic understanding processing method in the embodiment adopts a multi-level file mode, and meets the requirements of users on different semantic understanding accuracies in different scenes.

In an embodiment of the present invention, matching the statement to be analyzed and the tag with the matching rules corresponding to the matching levels in sequence from high to low until matching is successful, further includes:

if the sentence to be analyzed only belongs to one field, determining the matching result according to the matching level and the matching score corresponding to the matching rule which is successfully matched in the field;

The system first classifies regular files, such as music, weather, navigation, etc., by domain. Each domain has its own rule system file. The rule system file comprises the following parts of (1) a dictionary file, (2) a rule dictionary file, (3) a grading rule file and (4) a dictionary characteristic relation file.

The dictionary file is mainly used for collecting the same category of enumeratable word sets in the corresponding field, such as singers and songs. The rule dictionary file is mainly used for acquiring the same category non-enumerable word set in the corresponding field, such as time, place and other information. The two jointly form a domain dictionary for setting dictionary labels of the participles in the sentence to be analyzed. For the participles in the sentence to be analyzed after the participle processing, the dictionary labels corresponding to the participles are set by searching the domain dictionary of the corresponding domain, namely the dictionary file and the rule dictionary file.

The classification rules in the classification rule file have been described above, and refer to a rule base formed by all the rules generated in the present invention. When the statement to be analyzed is subjected to rule matching, the corresponding rule needs to be searched in the rule base to be matched with the statement to be analyzed, so that a matching result is obtained.

Dictionary profiles refer to the fact that the system defines the relationships between various dictionary labels, including equals, contains, combines, etc., based on all dictionaries. When the dictionary label corresponding to the keyword X in a rule is dictionary a, if there is a relationship that dictionary a contains dictionary B in the dictionary profile, then words in both dictionary a and dictionary B can be matched with the keyword X. For example, the person dictionary may contain artist, singer, student, etc., for the rule intent ═ eat: [ ask ] + [ person ] + [ like eat ] + [ front ]. For artist, words in the student dictionary can be matched as well.

Specifically, for a sentence to be analyzed, two situations may occur, that is, the sentence to be analyzed belongs to one field or multiple fields, according to the semantic richness of the sentence to be analyzed. If the statement to be analyzed belongs to one field, the condition that the statement to be analyzed is matched with a plurality of matching rules in the field may occur in the process of rule matching; the statement to be analyzed belongs to multiple fields, and the condition that the statement to be analyzed is matched with multiple matching rules in multiple fields may occur in the process of rule matching. And for two conditions that the statement to be analyzed belongs to one field or a plurality of fields, the mode of determining the matching result from the matched matching rules is different.

How to determine the matching result from the matched multiple matching rules in the two cases is described below.

And if the sentence to be analyzed only belongs to one field, determining the matching result according to the matching level and the matching score corresponding to the matching rule which is successfully matched in the field. It can be understood that a statement to be analyzed belonging to a field may match a plurality of matching rules of different levels in the field, and at this time, a matching result needs to be judged according to the level of the matching rule; the sentence to be analyzed may also match a plurality of rules of the same level in the field, and at this time, the matching result needs to be judged according to the matching score of the matching rule.

Specifically, determining the matching result according to the matching level and the matching score corresponding to the matching rule successfully matched in the field includes:

comparing the matching levels corresponding to the matching rules successfully matched in the field;

And if the sentence to be analyzed belongs to a plurality of fields, determining the matching result according to the matching level and the matching score corresponding to the matching rule which is successfully matched in each field. It can be understood that, for a sentence to be analyzed belonging to multiple fields, it may match multiple matching rules of different levels in the multiple fields, and at this time, it is necessary to determine a matching result according to the level of the matching rule; the sentence to be analyzed may also match a plurality of rules of the same level in a plurality of fields, and at this time, the matching result needs to be judged according to the matching score of the matching rule.

Specifically, the determining the matching result according to the matching level corresponding to the matching rule successfully matched in each field includes:

In the above description, if the matching levels are different, the matching rule corresponding to the highest matching level is used as the matching result, and specifically, the higher the level of the matching rule, the more accurate the result of the matching rule is, the better the semantic understanding effect can be achieved.

And if the matching grades are the same, determining matching scores corresponding to the matching rules which are successfully matched respectively, and determining the matching result according to the matching scores. Specifically, if the peer rule has a plurality of results, the optimal result is given according to the comprehensive results of the part of speech, the label, the matched length and the like. The part-of-speech tagging words are generally only nouns, verbs, adverbs and the like, the range is relatively large, the range of the dictionary tagging words is small, and the dictionary tagging words are relatively clear, such as singers and song titles, exemplarily, in the matched rule, the number of the dictionary tagging words is n, the number of the part-of-speech tagging words is m, the number of non-keywords is o, the number of words with unmatched heads is p, words with unmatched middle parts is q, the number of words with unmatched tails is r, and the coefficients from k1 to k6 are used, so that the matching is divided into:

k1*n²/(m+n+o)+k2*m²/(m+n+o)+k3*o²/(m+n+o)-k4*p/(m+n+o)-k5*q/(m+n+o)-k6*r/(m+n+o)。

where the square in the formula is to increase the weight of the change, the effect would be worse if all were linear.

In the semantic understanding processing method in this embodiment, when a plurality of rules are matched, the optimal selection of the matching rules is realized by comparing the matching levels between the plurality of matching rules that are successfully matched with each other and the matching scores between the rules of the same level.

In another implementation manner, in order to reduce unnecessary low-precision matching processes, after a sentence to be analyzed is successfully matched with a rule at a high matching level, matching of the rule at a low matching level may not be performed any more, in this case, if the sentence to be analyzed only belongs to one field, the matching result may be determined directly according to a matching score corresponding to a matching rule that is successfully matched in the field, and specifically, the matching process may be: determining matching scores corresponding to matching rules which are successfully matched according to the number of part-of-speech tag labels in the sentence to be analyzed, the number of dictionary tag labels and the matching length of the sentence to be analyzed and each matching rule which is successfully matched, and determining the matching result according to the matching scores; for the sentence to be analyzed belonging to a plurality of fields, the matching level of each matching rule which is successfully matched in different fields may be different, so that the matching result can be determined according to the matching level and the matching score corresponding to the matching rule which is successfully matched in each field.

In an embodiment of the present invention, in matching the updated statement to be analyzed with a preset matching rule, the preset matching rule may be generated in advance, and a specific generation process is described in detail with reference to fig. 2.

S210, performing field classification on all the collected linguistic data, and performing word segmentation processing on the linguistic data by applying a word segmentation system corresponding to each field.

Collecting related linguistic data of a certain field, manually classifying all the collected linguistic data in the first stage, training a field classification model according to the existing linguistic data after enough field data exists, and classifying by using the trained field classification model without manual classification; and then segmenting the corpus by using the existing segmentation software.

And S220, setting dictionary labels for the word segmentation results by applying a domain dictionary corresponding to the domain.

And after word segmentation, the dictionary is used for calibrating and merging word segmentation results, if the word segmentation results divide Liu De and Hua into two words, if Liu De Hua exists in the dictionary, Liu De Hua is merged, the word segmentation marks a word in a verb, and if the word is actually a noun in the dictionary, the word is labeled and corrected, so that each word is ensured to have a correct dictionary label.

And S230, determining the intention of the corpus by using verbs in the word segmentation result.

The method comprises the steps of judging corpus intentions such as playing/viewing and the like by using verbs in word segmentation results or judging intentions by using a trained intention classification model, manually marking the intention types of the corpora according to the verbs in an initial stage, training the intention classification model by using the labeled corpora after obtaining a certain training set, and performing intention classification by using the intention classification model after the model classification precision is higher than that of an initial method.

S240, generating a matching rule according to the intention, the word segmentation result and the dictionary label.

Further, the method specifically comprises the following steps:

and generating a primary matching rule according to the intention, the word segmentation result and the dictionary label.

According to the constitution of the matching rule described in the foregoing embodiment, the matching rule includes the intention and the rule. And the rules are generated from the word segmentation results and dictionary labels of the sentence.

For example, for the sentence to be analyzed with the intent of "asking weather", the word segmentation result of "asking/this weekend/san francisco/weather", and the relevant dictionary label of date and location, the generated primary rule is: intent ═ ask _ weather: [ ask ] + [ date0: date ] + [ location 0:% location% ] + [ of ] + [ weather ].

And generating a multi-stage matching rule by utilizing the generalization field based on the one-stage matching rule.

The generalization fields mentioned in this step are exemplified as follows.

Assume the original sentence pattern as:

intent ═ ask _ weather: [ ask ] + [ date0: date ] + [ location 0:% location% ] + [ of ] + [ weather ].

In order to further improve the generalization capability of the rule, the present embodiment provides four generalization fields, respectively:

a first field for marking concatenated participles with a "+" sign as a delimiter that does not allow insertion of data to form an N-tuple, and for marking concatenated participles with a "-" sign as a delimiter that allows insertion of data to form an N-tuple.

The matching rule of the upper sentence is converted into:

intent ═ ask _ weather: [ ask ] - [ date0: date ] - [ location 0:% location% ] - [ for ] - [ weather ]. When the user asks for the question that the weather of the tomorrow san francisco can be checked, the question can also be matched by the rule, data can be inserted in the connected participles by the 'minus', and the order of the participles is consistent with the rule.

A second field for connecting the participles to form N-tuples with a "to" symbol as a delimiter mark that allows the respective dictionary labels to change order, the matching may be in the form as follows.

The location can be identified by rules when the user requests "ask for weather in tomorrow in san Francisco", and the time and place order changes.

A word length constraint field for characterizing the maximum difference between the total length of the sentence to be analyzed and the number of regular words and for characterizing the maximum number of words to be inserted between two dictionary labels, such as the + weather of the rule intent ask +2[ ask ] - [ date0: date ] - [ location 0:% ] + [ for example.

The rule indicates that the total length of the matched sentence can not be more than 5 words of the rule (5 at present), and at most two words can be inserted before the date and the location, for example, a semantic request ' asking what weather is in the open day of san francisco's mountain ', can be matched with the rule.

A dictionary linked list field for representing a relationship between dictionary labels in a domain dictionary corresponding to each domain, where the relationship includes but is not limited to: equals, contains, and combines, for the rule intent ═ eat: [ ask ] + [ person ] + [ like eat ] + [ fruit ]. For artist, words in the student dictionary can be matched as well.

According to the generalization field, how to generate a two-three-four-level rule with better generalization capability from a one-level rule can be exemplified.

The primary rule of the primary rule is mainly the combination of "+", and the difference between the request statement and the rule word number is not too large, and the following rules are adopted:

intent＝ask_weather:len＝+3[‘weather’]+[‘like’]+[date0:％date％]+[‘in’]+[location0:％location％]。

replacing the "+" in the first-level rule with the "-" and modifying the word length constraint to be used as a second-level rule,

intent＝ask_weather:len＝+5win＝+2[‘weather’]-[‘like’]-[date0:％date％]-[‘in’]-[location0:％location％]。

furthermore, the connective words in the above sentence are changed into "" and the generalization ability is further improved, and the third-level rule is as follows:

intent＝ask_weather:len＝+10win＝+4[‘weather’]～[‘like’]～[date0:％date％]～[‘in’]～[location0:％location％]。

the fourth level is further generalized on the upper level, and by increasing the matching length, the dictionary range is expanded, for example, location0 is expanded to NER, which indicates that the noun is sufficient, and date is specific to time.

intent＝ask_weather:len＝+15win＝+6[‘weather’]～[‘like’]～[time:％date％]～[‘in’]～[NER:％location％]。

In a general rule system, all rules need to be added manually, the rule system of the embodiment can automatically generate a first-level rule on the basis of an existing dictionary, a next-level rule can be automatically generated according to the previous-level rule, and then the rule system can be adjusted automatically according to the user requirements according to actual needs.

In an embodiment of the present invention, a semantic understanding processing apparatus is provided, which is described in detail with reference to fig. 3, and includes:

the label setting unit 31 is configured to perform word segmentation processing on the sentence to be analyzed, and set a corresponding label for the word segmentation result;

in the semantic understanding processing apparatus in this embodiment, the tag setting unit 31 specifically includes two functions of word segmentation and tag setting. After obtaining the to-be-analyzed sentence in the user request, the tag setting unit 31 first performs word segmentation on the to-be-analyzed sentence. For the semantic understanding field, after a section of text is acquired, word segmentation processing is usually used as a text processing mode and is the basis of semantic understanding. The word segmentation processing in the embodiment may use common ways of string matching, full segmentation, word formation, and the like.

After the word segmentation process is completed, the tag setting unit 31 sets a corresponding tag for a word in a sentence. Each word has corresponding part of speech, such as noun, verb, adverb, preposition, assistant word, etc.; each term may also have specific semantics, such as the vocabulary of the music domain may include singer, song, band, genre, and so forth. Thus, the labels mentioned in this embodiment include part-of-speech labels for characterizing general parts-of-speech and dictionary labels for characterizing special parts-of-speech.

And the rule matching unit 32 is used for substituting the label into the statement to be analyzed to obtain an updated statement to be analyzed, and matching the updated statement to be analyzed with a preset matching rule to obtain a matching result.

After the label setting unit 31 sets the label, the rule matching unit 32 brings the label into the original sentence to form a fixed sentence pattern, and matches the fixed sentence pattern with different rules in the rule base according to the rules in each field to obtain a matching result.

Specifically, if there is a dictionary tag in the tags set by the tag setting unit 31, the dictionary tag is preferentially used to construct a fixed sentence pattern; if there are no dictionary tags, then the lexical tags are used to construct a fixed sentence pattern.

Further, each of the matching rules mentioned in the present embodiment includes an intention and a rule. Wherein the intent is to classify the sentence; the role of the rules is to match the words required by the semantics and to extract the keywords. The definition of the keywords in this embodiment is as follows: after a word in a sentence is replaced, if the semantic of the sentence changes in a specific scene, the word is a keyword.

The matching rules comprise intents and rules, the rules at least comprise N-tuples formed by connecting participles through delimiter marks; keywords in the rules are represented by their corresponding dictionary tags, and non-keywords in the rules are represented by themselves.

The semantic understanding processing device in the embodiment supports multiple rule matching modes, and can meet the generalization requirements of sentences with different word sequences and consistent semantics.

The following describes a semantic understanding processing device provided by an embodiment of the present invention, and is described in detail with reference to fig. 4, where the semantic understanding processing device includes:

a processor (processor)410, a communication Interface 420, a memory (memory)430 and a communication bus 440, wherein the processor 410, the communication Interface 420 and the memory 430 are communicated with each other via the communication bus 440. The processor 410 may call logic instructions in the memory 430 to perform, for example, the following methods: performing word segmentation processing on the sentence to be analyzed, and setting a corresponding label for a word segmentation result; the tags comprise part-of-speech tags for characterizing general parts-of-speech and dictionary tags for characterizing special parts-of-speech; substituting the label into the statement to be analyzed to obtain an updated statement to be analyzed, and matching the updated statement to be analyzed with a preset matching rule to obtain a matching result; wherein the matching rules include intents and rules, the rules including at least N-tuples formed by concatenating participles via delimiter markers; keywords in the rules are represented by their corresponding dictionary tags, and non-keywords in the rules are represented by themselves.

In addition, the logic instructions in the memory 430 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the transmission method provided in the foregoing embodiments when executed by a processor, and for example, the method includes: performing word segmentation processing on the sentence to be analyzed, and setting a corresponding label for a word segmentation result; the tags comprise part-of-speech tags for characterizing general parts-of-speech and dictionary tags for characterizing special parts-of-speech; substituting the label into the statement to be analyzed to obtain an updated statement to be analyzed, and matching the updated statement to be analyzed with a preset matching rule to obtain a matching result; wherein the matching rules include intents and rules, the rules including at least N-tuples formed by concatenating participles via delimiter markers; in the N-tuple, the keywords in the rule are represented by their corresponding dictionary tags, and the non-keywords in the rule are represented by themselves.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A semantic understanding processing method, comprising:

wherein the matching rules include intents and rules, the rules include at least N-tuples formed by connecting participles by delimiter markers, keywords in the rules are represented by their corresponding dictionary labels, and non-keywords in the rules are represented by themselves.

2. The semantic understanding processing method according to claim 1, wherein the step of matching the updated sentence to be analyzed with a preset matching rule to obtain a matching result comprises:

3. The semantic understanding processing method according to claim 2, wherein the matching rules include a plurality of matching rules, each matching rule corresponds to a different matching level, and matching accuracies corresponding to the matching levels from high to low are sequentially reduced;

4. The semantic understanding processing method according to claim 3, wherein the matching the sentence to be analyzed and the tag with the matching rules corresponding to the matching levels in sequence from high to low until the matching is successful, and obtaining the matching result comprises:

5. The semantic understanding processing method according to claim 4, wherein determining the matching result according to the matching score corresponding to the matching rule matching successfully in the field comprises:

6. The semantic understanding processing method according to any one of claims 1 to 5, further comprising:

7. The semantic understanding processing method according to claim 6, wherein generating a matching rule according to the intention, the word segmentation result, and the dictionary label includes:

wherein the generalization field comprises:

a second field for marking the conjunctive word segmentation to form an N-tuple by using the 'to' symbol which allows the dictionary labels to change the sequence as a separator;

8. A semantic understanding processing apparatus, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the semantic understanding processing method according to any one of claims 1 to 7 are implemented when the processor executes the program.

10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the semantic understanding processing method according to any one of claims 1 to 7.