CN112347235A

CN112347235A - Rule base generation method and device

Info

Publication number: CN112347235A
Application number: CN202011222725.7A
Authority: CN
Inventors: 孟振南; 雷欣; 李志飞
Original assignee: Beijing Yufanzhi Information Technology Co ltd
Current assignee: Beijing Yufanzhi Information Technology Co ltd
Priority date: 2020-11-05
Filing date: 2020-11-05
Publication date: 2021-02-09

Abstract

A rule base generation method and device are disclosed. The method comprises the following steps: receiving a request text; searching a plurality of historical request texts similar to the request texts from a request text library according to a preset similarity algorithm; deleting stop words in the request text and the plurality of historical request texts respectively to generate a plurality of reference request texts; training each reference request text in the multiple reference request texts according to a preset training model to generate an alternative rule; determining whether the looseness of the alternative rule meets a preset standard; and if the looseness of the alternative rule meets a preset standard, adding the alternative rule into a rule base.

Description

Rule base generation method and device

Technical Field

The present application relates to the field of communications technologies, and in particular, to a method and an apparatus for generating a rule base.

Background

There are a number of rule systems in current human-machine dialog applications (e.g., intelligent robots). Because the rule system has the characteristics of high accuracy and low recall rate, the existing man-machine conversation system needs to depend on the rule system in a large quantity, and the rule system needs to be realized based on the establishment of a rule base. At present, rules in a rule base are all manually written, so that the rules are not automatic, mistakes are easy to make in manual writing, conflicts among the rules are easy to cause, the maintenance is very complicated, and a large amount of manpower and material resources are consumed.

Disclosure of Invention

In order to solve the above problems, the present invention provides a method and an apparatus for generating a rule base, which can automatically generate a rule base containing a large number of high-precision rules, thereby saving manpower and material resources.

In order to achieve the above object, in a first aspect, an embodiment of the present invention provides a rule generation method, where the method includes:

receiving a request text;

searching a plurality of historical request texts similar to the request texts from a request text library according to a preset similarity algorithm;

deleting stop words in the request text and the plurality of historical request texts respectively to generate a plurality of reference request texts;

training each reference request text in the multiple reference request texts according to a preset training model to generate an alternative rule;

determining whether the looseness of the alternative rule meets a preset standard;

and if the looseness of the alternative rule meets a preset standard, adding the alternative rule into a rule base.

Preferably, the determining whether the looseness of the alternative rule meets a preset standard includes: matching the alternative rule with the junk request text in the junk request text set; if the number of the matched spam request texts is larger than a first preset number, determining that the looseness of the alternative rule does not accord with a preset standard; and if the number of the matched spam request texts is not more than a first preset number, determining that the looseness of the alternative rules meets a preset standard.

Preferably, the determining whether the looseness of the alternative rule meets a preset standard includes: matching the alternative rule with normal request texts in normal request text sets in a plurality of different fields; if the number of the matched fields is larger than a second preset number, determining that the looseness of the alternative rule does not accord with a preset standard; and if the number of the matched fields is not more than a second preset number, determining that the looseness of the alternative rule meets a preset standard.

Preferably, the method further comprises: and if the looseness of the alternative rule does not meet the preset standard, discarding the alternative rule.

Preferably, the preset training model comprises: a CRF model, a BERT model, or a SVM model.

In a second aspect, an embodiment of the present invention provides a rule base generation apparatus, including:

a receiving unit configured to receive a request text;

the searching unit is used for searching a plurality of historical request texts similar to the request texts from a request text library according to a preset similarity algorithm;

a deleting unit, configured to delete stop words in the request text and the plurality of history request texts, respectively, and generate a plurality of reference request texts;

the training unit is used for training each reference request text in the multiple reference request texts according to a preset training model to generate an alternative rule;

the determining unit is used for determining whether the looseness of the alternative rule meets a preset standard or not;

and the processing unit is used for adding the alternative rule into a rule base if the looseness of the alternative rule meets a preset standard.

Preferably, the determining unit is specifically configured to: matching the alternative rule with the junk request text in the junk request text set; if the number of the matched spam request texts is larger than a first preset number, determining that the looseness of the alternative rule does not accord with a preset standard; and if the number of the matched spam request texts is not more than a first preset number, determining that the looseness of the alternative rules meets a preset standard.

Preferably, the determining unit is specifically configured to: matching the alternative rule with normal request texts in normal request text sets in a plurality of different fields; if the number of the matched fields is larger than a second preset number, determining that the looseness of the alternative rule does not accord with a preset standard; and if the number of the matched fields is not more than a second preset number, determining that the looseness of the alternative rule meets a preset standard.

Preferably, the processing unit is further configured to discard the alternative rule if the slack of the alternative rule does not meet a preset criterion.

In a third aspect, an embodiment of the present invention provides a computer-readable storage medium, where the storage medium stores a computer program for executing the rule base generation method according to the first aspect.

In a fourth aspect, an embodiment of the present invention provides an electronic device, including:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to read the executable instruction from the memory, and execute the instruction to implement the rule base generation method according to the first aspect.

By using the rule base generation method and device provided by the invention, after the request text is received, a plurality of historical request texts similar to the request text are searched from the request text base according to a preset similarity algorithm, then the request text and stop words in the plurality of historical request texts are deleted, a plurality of reference request texts are generated, each reference request text in the plurality of reference request texts is trained according to a preset training model, an alternative rule is generated, and the alternative rule with the looseness meeting a preset standard in the generated alternative rule is added into the rule base, so that the rule base containing a large number of high-precision rules is automatically generated, and therefore, manpower and material resources are saved.

Drawings

The above and other objects, features and advantages of the present application will become more apparent by describing in more detail embodiments of the present application with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the drawings, like reference numbers generally represent like parts or steps.

Fig. 1 is a schematic flowchart of a rule base generation method provided in an exemplary embodiment of the present application;

fig. 2 is a block diagram of a rule base generation apparatus according to an exemplary embodiment of the present application;

fig. 3 is a block diagram of an electronic device provided for an exemplary embodiment of the present application.

Detailed Description

Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be understood that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and that the present application is not limited by the example embodiments described herein.

Fig. 1 is a schematic flowchart of a method for generating a rule base according to an embodiment of the present application. The method can be applied to electronic equipment. The method for generating the rule base provided by the embodiment of the application can comprise the following steps:

step 101, receiving a request text.

Step 102, searching a plurality of historical request texts similar to the request text from a request text library according to a preset similarity algorithm.

In one example, an offline request text base may be created in advance, and the request text base includes historical request texts of different users. Based on the method, after the request text is received, the historical request text similar to the current request text can be found out in the request text library through a preset similarity algorithm to form a large-scale request text set, so that the method is beneficial to training a rule with high precision.

Step 103, deleting stop words in the request text and the plurality of historical request texts included in the request text set corresponding to the request text, respectively, and generating a plurality of reference request texts.

In one example, stop words include, but are not limited to: meaningless symbols, words of moods, etc. It should be noted that, the method for determining stop words may adopt the prior art, and the present invention is not limited to this.

And 104, training each reference request text in the multiple reference request texts according to a preset training model to generate an alternative rule.

Wherein, the preset training model may include: a Conditional Random Field (CRF) model, a transform model-based Bidirectional encoding Representation from transforms (BERT) model, or a Support Vector Machine (SVM) model.

In one example, proper nouns (e.g., names of people, time, location, organization, etc.) in the reference request text may be labeled using a CRF or BERT model. Thus, the reference request text can become a rule replaced with proper nouns. Therefore, the rules generated by generalizing the proper nouns can be matched with more request texts, and the recall of the rule models is increased. For example, the reference request text is "abc of playful chapter three", and after training the reference request text using a CRF or BERT model, a rule "$ (Song) of playful $ (Singer)" may be generated.

Furthermore, the BERT or SVM model can be used for dividing the reference request into various fields defined in advance by semantic protocols and matching dictionaries or knowledge maps in the fields, so that rules in the fields can be abstracted more, the generalization capability of similar rules is enhanced, and the recall of the rule model is increased. For example, "Play $ of $ (Singer)" trained using the BERT or SVM model can generate $ (Song) of the rule "$ (Play) $ (Feature)".

And 105, determining whether the looseness of the alternative rule meets a preset standard.

If the looseness of the alternative rule meets the preset standard, executing step 106; if the slack of the alternative rule does not meet the preset criteria, step 107 is performed.

In one example, determining whether the slack of the alternative rule meets a preset criterion includes: matching the alternative rules with the junk request texts in the junk request text set; if the number of the matched spam request texts is larger than a first preset number, determining that the looseness of the alternative rules does not accord with a preset standard; and if the number of the matched spam request texts is not greater than the first preset number, determining that the looseness of the alternative rules meets the preset standard. For convenience of description, the method of determining whether the looseness of the alternative rule meets the preset criterion in this example is referred to as a first determination method.

Specifically, a garbage request text set is created in advance. The junk request text set comprises junk request texts, namely request texts without actual meanings, and follow-up operations cannot be performed according to the request texts. The spam request text can be obtained by labeling the history request text, or automatically generating, or crawling a webpage. After the alternative rule is generated, matching the alternative rule with the spam request texts in the spam request text set, if the number of the spam request texts matched with the alternative rule is greater than a first preset number, indicating that the alternative rule is too loose, and in order to ensure the accuracy of the rule in the rule base, determining that the loose degree of the alternative rule does not meet a preset standard, so as to execute step 107, and discarding the alternative rule. If the number of spam request texts matched with the alternative rule is not greater than the first preset number, the alternative rule is loose and proper, the precision is high, and the loose degree of the alternative rule can be determined to meet the preset standard, so that step 106 is executed to add the alternative rule to the rule base.

In another example, determining whether the slack of the alternate rule meets a predetermined criterion includes: matching the alternative rules with normal request texts in normal request text sets in a plurality of different fields; if the number of the matched fields is larger than a second preset number, determining that the looseness of the alternative rule does not accord with a preset standard; and if the number of the matched fields is not more than the second preset number, determining that the looseness of the alternative rule meets the preset standard. For convenience of description, the method of determining whether the looseness of the alternative rule meets the preset criterion in this example is referred to as a second determination method.

Specifically, normal request text sets of a plurality of domains are created in advance. The normal request text set of each domain includes normal request text in the domain, i.e., subsequent operations can be performed according to such request text. Wherein, the normal request text can be obtained by marking the history request text. After the alternative rule is generated, matching the alternative rule with the normal request texts in the normal request text sets of the multiple different domains, if the number of the domains matched by the alternative rule is greater than a second preset number, the alternative rule is too loose, and in order to reduce the conflict of the rules in the rule base, determining that the loose degree of the alternative rule does not meet the preset standard, so as to execute step 107, and discard the alternative rule. If the number of the matched fields of the alternative rule is not greater than the second preset number, the alternative rule is loose and suitable and is not easy to generate conflict, and the loose degree of the alternative rule can be determined to meet the preset standard, so that step 106 is executed to add the alternative rule into the rule base.

It should be noted that the first preset number and the second preset number may be manually and empirically configured in advance.

The first determination method and the second determination method may be applied separately or simultaneously. When the first determination method and the second determination method are applied simultaneously, the looseness of the alternative rule is determined to meet the preset standard only when the number of the matched junk request texts is not more than a first preset number and if the number of the matched fields is not more than a second preset number; otherwise, the looseness of the alternative rule is considered to be not in accordance with the preset standard. It is to be understood that the application order of the first determination method and the second determination method is not limited by the embodiments of the present invention.

Step 106, adding the alternative rule to the rule base.

It can be understood that, in the embodiment of the present invention, a plurality of candidate rules can be obtained by using the received request text, and the plurality of candidate rules are subjected to loose degree screening, and the candidate rules whose loose degrees meet the preset standard are added to the rule base, so that a large-scale high-precision rule base is generated in batch.

Step 107, the alternative rule is discarded.

By using the rule base generation method provided by the embodiment of the invention, after the request text is received, a plurality of historical request texts similar to the request text are searched from the request text base according to a preset similarity algorithm, then the request text and stop words in the plurality of historical request texts are deleted, a plurality of reference request texts are generated, each reference request text in the plurality of reference request texts is trained according to a preset training model, an alternative rule is generated, and the alternative rule with the looseness meeting a preset standard in the generated alternative rule is added into the rule base, so that the rule base containing a large number of high-precision rules is automatically generated, and therefore, manpower and material resources are saved.

An embodiment of the present invention provides a rule base generating device, and fig. 2 is a structural diagram of the rule base generating device. The apparatus is applied to an electronic device, and as shown in fig. 2, the rule base generating apparatus includes:

a receiving unit 201, configured to receive a request text;

a searching unit 202, configured to search, according to a preset similarity algorithm, a plurality of historical request texts similar to the request text from a request text library;

a deleting unit 203, configured to delete stop words in the request text and the plurality of history request texts, respectively, and generate a plurality of reference request texts;

the training unit 204 is configured to train each of the multiple reference request texts according to a preset training model, and generate an alternative rule;

a determining unit 205, configured to determine whether the looseness of the candidate rule meets a preset standard;

and the processing unit 206 is configured to add the alternative rule to a rule base if the looseness of the alternative rule meets a preset standard.

Preferably, the determining unit 205 is specifically configured to: matching the alternative rule with the junk request text in the junk request text set; if the number of the matched spam request texts is larger than a first preset number, determining that the looseness of the alternative rule does not accord with a preset standard; and if the number of the matched spam request texts is not more than a first preset number, determining that the looseness of the alternative rules meets a preset standard.

Preferably, the determining unit 205 is specifically configured to: matching the alternative rule with normal request texts in normal request text sets in a plurality of different fields; if the number of the matched fields is larger than a second preset number, determining that the looseness of the alternative rule does not accord with a preset standard; and if the number of the matched fields is not more than a second preset number, determining that the looseness of the alternative rule meets a preset standard.

Preferably, the processing unit 205 is further configured to discard the alternative rule if the slack of the alternative rule does not meet a preset criterion.

By utilizing the rule base generation device provided by the invention, after the request text is received, a plurality of historical request texts similar to the request text are searched from the request text base according to a preset similarity algorithm, then the request text and stop words in the plurality of historical request texts are deleted, a plurality of reference request texts are generated, each reference request text in the plurality of reference request texts is trained according to a preset training model, an alternative rule is generated, and the alternative rule with the looseness meeting a preset standard in the generated alternative rule is added into the rule base, so that the rule base containing a large number of high-precision rules is automatically generated, and therefore, labor and material resources are saved.

Next, an electronic apparatus 11 according to an embodiment of the present application is described with reference to fig. 3.

As shown in fig. 3, the electronic device 11 includes one or more processors 111 and memory 112.

The processor 111 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 11 to perform desired functions.

Memory 112 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by processor 111 to implement the rule base generation methods of the various embodiments of the application described above and/or other desired functions. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.

In one example, the electronic device 11 may further include: an input device 113 and an output device 114, which are interconnected by a bus system and/or other form of connection mechanism (not shown).

The input device 113 may include, for example, a keyboard, a mouse, and the like.

The output device 114 may output various information including the determined distance information, direction information, and the like to the outside. The output devices 114 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.

Of course, for the sake of simplicity, only some of the components of the electronic device 11 relevant to the present application are shown in fig. 3, and components such as buses, input/output interfaces, and the like are omitted. In addition, the electronic device 11 may include any other suitable components, depending on the particular application.

Exemplary computer program product and computer-readable storage Medium

In addition to the above-described methods and apparatus, embodiments of the present application may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the rule base generation method according to various embodiments of the present application described in the "exemplary methods" section of this specification, supra.

The computer program product may be written with program code for performing the operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present application may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the rule base generation method according to various embodiments of the present application described in the "exemplary methods" section above in this specification.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing describes the general principles of the present application in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present application are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present application. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the foregoing disclosure is not intended to be exhaustive or to limit the disclosure to the precise details disclosed.

The block diagrams of devices, apparatuses, systems referred to in this application are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

It should also be noted that in the devices, apparatuses, and methods of the present application, the components or steps may be decomposed and/or recombined. These decompositions and/or recombinations are to be considered as equivalents of the present application.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the application to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. A method of rule base generation, the method comprising:

receiving a request text;

2. The method of claim 1, wherein the determining whether the slack of the alternative rule meets a preset criterion comprises:

matching the alternative rule with the junk request text in the junk request text set;

if the number of the matched spam request texts is larger than a first preset number, determining that the looseness of the alternative rule does not accord with a preset standard;

and if the number of the matched spam request texts is not more than a first preset number, determining that the looseness of the alternative rules meets a preset standard.

3. The method of claim 1, wherein the determining whether the slack of the alternative rule meets a preset criterion comprises:

matching the alternative rule with normal request texts in normal request text sets in a plurality of different fields;

if the number of the matched fields is larger than a second preset number, determining that the looseness of the alternative rule does not accord with a preset standard;

and if the number of the matched fields is not more than a second preset number, determining that the looseness of the alternative rule meets a preset standard.

4. The method according to any one of claims 1-3, further comprising:

and if the looseness of the alternative rule does not meet the preset standard, discarding the alternative rule.

5. The method according to any one of claims 1-3, wherein the predetermined training model comprises: a CRF model, a BERT model, or a SVM model.

6. An apparatus for generating a rule base, the apparatus comprising:

a receiving unit configured to receive a request text;

7. The apparatus according to claim 6, wherein the determining unit is specifically configured to:

8. The apparatus according to claim 6, wherein the determining unit is specifically configured to:

9. The apparatus according to any of claims 6-8, wherein the processing unit is further configured to discard the alternative rule if the slack of the alternative rule does not meet a preset criterion.

10. The apparatus according to any one of claims 6-8, wherein the predetermined training model comprises: a CRF model, a BERT model, or a SVM model.

11. A computer-readable storage medium storing a computer program for executing the rule base generation method according to any one of claims 1 to 5.

12. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to read the executable instructions from the memory and execute the instructions to implement the rule base generation method of any one of claims 1 to 5.