CN114611707A

CN114611707A - Method and system for machine learning by combining rules

Info

Publication number: CN114611707A
Application number: CN202210203843.6A
Authority: CN
Inventors: 罗远飞; 陈雨强
Original assignee: 4Paradigm Beijing Technology Co Ltd
Current assignee: 4Paradigm Beijing Technology Co Ltd
Priority date: 2016-08-25
Filing date: 2016-08-25
Publication date: 2022-06-10
Also published as: CN106407999A

Abstract

A method and system for machine learning in conjunction with rules is provided, the method comprising: (A) obtaining a data record, wherein the data record comprises a plurality of attribute information; (B) applying at least one rule relating to a predicted target to the plurality of attribute information to generate a rule-related characteristic of the data record; (C) forming prediction samples based at least on the rule-related features; and (D) generating a machine-learned prediction result for the prediction target based on the prediction samples using a machine-learned prediction model, wherein the machine-learned prediction model is trained to provide respective machine-learned prediction results for the prediction samples. By the method and the system, the rule can be formed into the rule-related characteristic participating in machine learning, so that the rule is effectively integrated into the machine learning process, and the prediction effect is improved.

Description

Method and system for machine learning by combining rules

This application is a divisional application of patent applications entitled "method and system for machine learning by association with rules" filed on 25/8/2016.

Technical Field

Exemplary embodiments of the present invention relate generally to the field of artificial intelligence and, more particularly, relate to a method and system for machine learning (e.g., training and forecasting) in conjunction with rules.

Background

Recently, machine learning techniques have been more commonly applied in the field of artificial intelligence than conventional Rule (Rule) systems. This is because, as the application scenario becomes complicated, the number of rules that need to be summarized and developed becomes larger, and sometimes, the growth rate of data causes people to have no way to grasp the rule change caused by the data change.

Accordingly, people prefer to use machine learning techniques to solve the problem. However, many machine learning algorithms are black boxes, and the reason why a specific strategy is often difficult to clearly express is generated by the generated machine learning model; on the other hand, even if in practice it has been found that a certain factor has a critical role in a particular scenario, it is difficult to effectively apply such a finding directly into a machine learning system.

In particular, in U.S. patent application publication No. US20160171386, a system and method for opinion mining is presented in which a rule-based system may act as an emotion detection module with which a machine learning based system may communicate and process data provided by the emotion detection module.

Further, chinese patent application publication No. CN105721194A discloses a solution for realizing a fully automatic operator network fault location function by using technologies such as real-time big data processing and machine learning. The experience of operation and maintenance personnel is solidified into the system, so that the system has basic judgment intelligence.

In addition, in chinese patent application publication No. CN105320960A, a voting-based cross-language subjective and objective sentiment classification method is disclosed, which comprises the following steps: s1, constructing an emotion dictionary of a target language according to the emotion dictionary of the source language; s2, extracting words from sentences in the text to be labeled respectively by adopting three algorithms of a rule algorithm, an algorithm combining machine translation and statistical machine learning and a polarity characteristic value calculation algorithm, judging the emotion polarity of the words according to the constructed emotion dictionary of the target language, and further judging the subjective and objective properties of the sentences; and S3, obtaining the judgment results of the subjective and objective properties of the sentence obtained according to the three algorithms, and judging the subjective and objective properties of the sentence through voting.

It can be seen that in the existing solution, the rule system and the machine learning system can be connected in sequence, that is, the rule system is used to preprocess the data sample of the machine learning system, or to correct the prediction result of the machine learning system; alternatively, the rule system and the machine learning system may predict the target object separately and use one of the results of the prediction and the results of the machine learning system. In the above manner, the rule system and the machine learning system both work independently, so that the rules are difficult to be effectively integrated into machine learning.

Disclosure of Invention

Exemplary embodiments of the present invention aim to overcome the drawback of rules that are difficult to efficiently incorporate into machine learning.

In accordance with an exemplary embodiment of the present invention, there is provided a method of machine learning in conjunction with rules, comprising: (A) obtaining a data record, wherein the data record comprises a plurality of attribute information; (B) applying at least one rule relating to a predicted target to the plurality of attribute information to generate a rule-related characteristic of the data record; (C) forming prediction samples based at least on the rule-related features; and (D) generating a machine-learned prediction result for the prediction target based on the prediction samples using a machine-learned prediction model, wherein the machine-learned prediction model is trained to provide respective machine-learned prediction results for the prediction samples.

Optionally, in the method, the rule-related features comprise rule prediction features and/or rule description features, wherein in step (B) the rule prediction features are generated based on rule prediction results obtained by the data record in accordance with the at least one rule, and/or the rule description features are generated based on whether conditions of the data record for each of the at least one rule are established.

Optionally, in the method, in step (B), rule description features are generated by multiplying logical values indicating whether or not conditions of the data record for each rule among the at least one rule are satisfied by weights of the rules corresponding to the rule description features, respectively; or, in the step (B), a logical value indicating whether or not a condition of the data record for each of the at least one rule is satisfied is taken as a rule description feature, and weights of each of the at least one rule, which represent determinism of the respective rule, are taken as initial values of rule description feature coefficients of the machine learning prediction model, respectively.

Optionally, in the method, the weight is set by human specification and/or by a rule learning engine, wherein the rule learning engine is configured to learn the weight of each rule among the at least one rule based on a rule training sample.

Optionally, in the method, in step (B), rule description features are generated by multiplying logical values indicating whether or not conditions of the data record for each rule among the at least one rule are satisfied by conclusion values of the rule corresponding to the rule description features, respectively; or, in the step (B), a logical value indicating whether or not the condition of the data record for each of the at least one rule is satisfied is taken as a rule description feature, and conclusion values of each of the at least one rule, which represent results caused when the condition of the corresponding rule is satisfied, are taken as initial values of rule description feature coefficients of the machine learning prediction model, respectively.

Optionally, in the method, the conclusion value is a positive value when the result is positively correlated with the prediction target, and the conclusion value is a negative value when the result is negatively correlated with the prediction target.

Optionally, in the method, in step (B), the rule description feature is generated by multiplying a logical value indicating whether or not a condition of the data record for each rule among the at least one rule is satisfied by a product of a weight of the rule corresponding to the rule description feature and a conclusion value, respectively; or, in the step (B), a logical value indicating whether or not the condition of the data record for each of the at least one rule is satisfied is taken as a rule description feature, and products of weights of each of the at least one rule, which represent the certainty of the respective rules, and conclusion values, which represent results caused when the condition of the respective rules is satisfied, and the weight of each of the at least one rule is multiplied by the conclusion values, respectively, are taken as initial values of rule description feature coefficients of the machine learning prediction model.

Alternatively, in the method, an initial value of a rule description feature coefficient of the machine learning prediction model is multiplied by a coefficient correction value set according to an algorithm of the machine learning prediction model.

Optionally, the method further comprises: (E) and fusing the machine learning prediction result with a rule prediction result obtained by the data record according to the at least one rule to obtain a fused prediction result corresponding to the prediction sample.

Optionally, the method further comprises: (F) and combining the data records and the machine learning prediction result or the fusion prediction result into a rule training sample.

Optionally, in the method, the weight of each rule among the at least one rule is set based on an updated value of a rule description feature coefficient of the machine learning prediction model.

Optionally, in the method, the rule learning engine is based on a markov logic network.

Optionally, before step (B), the method further comprises: (G) obtaining the at least one rule regarding the predicted objective.

According to another exemplary embodiment of the invention, there is provided a method of machine learning in conjunction with rules, comprising: (A) acquiring a historical data record, wherein the historical data record comprises a plurality of attribute information and a mark serving as a prediction target actual value; (B) applying at least one rule relating to a predicted target to the plurality of attribute information to generate a rule-related feature of a historical data record; (C) forming a training sample based on at least the rule-related features and labels; and (D) training a machine learning prediction model based on the training samples, wherein the machine learning model is to provide machine learning prediction results for the new data records with respect to the prediction objective.

Optionally, in the method, the rule-related feature comprises a rule prediction feature and/or a rule description feature, wherein in step (B), the rule prediction feature is generated based on a rule prediction result obtained by the historical data record in accordance with the at least one rule, and/or the rule description feature is generated based on whether a condition of the historical data record for each rule among the at least one rule is established.

Optionally, in the method, in step (B), rule description features are generated by multiplying logical values indicating whether or not conditions of the history data record for each rule among the at least one rule are satisfied by weights of the rules corresponding to the rule description features, respectively; alternatively, in step (B), a logical value indicating whether or not the condition of the history data record for each rule among the at least one rule is satisfied is taken as a rule description feature, and in step (D), weights of each rule among the at least one rule are taken as initial values of rule description feature coefficients of the machine learning prediction model, respectively, wherein the weights represent the certainty of the respective rule.

Optionally, in the method, in step (B), rule description features are generated by multiplying logical values indicating whether or not conditions of the history data record for each rule among the at least one rule are satisfied by conclusion values of the rule corresponding to the rule description features, respectively; or, in step (B), a logical value indicating whether or not the condition of the history data record for each of the at least one rule is satisfied is taken as a rule description feature, and, in step (D), conclusion values of each of the at least one rule, which represent results caused when the condition of the corresponding rule is satisfied, are taken as initial values of rule description feature coefficients of the machine learning prediction model, respectively.

Optionally, in the method, in step (B), a rule description feature is generated by multiplying a logical value indicating whether or not a condition of the history data record for each rule among the at least one rule is satisfied by a product of a weight of the rule corresponding to the rule description feature and a conclusion value, respectively; or, in step (B), a logical value indicating whether or not the condition of the history data record for each of the at least one rule is satisfied is taken as a rule description feature, and, in step (D), products of weights of each of the at least one rule, which represent the certainty of the respective rules, and conclusion values, which represent the results caused when the condition of the respective rules is satisfied, and the conclusion values are taken as initial values of rule description feature coefficients of the machine learning prediction model, respectively.

Alternatively, in the method, in step (D), an initial value of the rule description feature coefficient of the machine learning prediction model is multiplied by a coefficient correction value set according to an algorithm of the machine learning prediction model.

According to another exemplary embodiment of the present invention, there is provided a system for machine learning in conjunction with rules, including: data record obtaining means for obtaining a data record, wherein the data record includes a plurality of attribute information; rule-related feature generation means for applying at least one rule regarding the predicted target to the plurality of attribute information to generate a rule-related feature of the data record; prediction sample generation means for forming prediction samples based at least on the rule-dependent features; and machine learning prediction means for generating a machine learning prediction result regarding the prediction target based on the prediction samples using a machine learning prediction model trained to provide respective machine learning prediction results for the prediction samples.

Optionally, in the system, the rule-related features include rule prediction features and/or rule description features, wherein the rule-related feature generation means generates the rule prediction features based on rule prediction results obtained by the data record in accordance with the at least one rule, and/or generates the rule description features based on whether conditions of the data record for each rule among the at least one rule are established.

Optionally, in the system, the rule-related feature generation means generates the rule description feature by multiplying a logical value indicating whether or not a condition of the data record is established for each rule among the at least one rule by a weight of the rule corresponding to the rule description feature, respectively; alternatively, the rule-related feature generation means may take, as the rule description feature, a logical value indicating whether or not a condition of the data record for each of the at least one rule holds, and take, as initial values of rule description feature coefficients of the machine learning prediction model, weights of each of the at least one rule, the weights representing the certainty of the respective rule, respectively.

Optionally, in the system, the weight is set by a human specification and/or by a rule learning engine, wherein the rule learning engine is configured to learn the weight of each rule among the at least one rule based on the rule training samples.

Optionally, in the system, the rule-related feature generation means generates the rule-describing feature by multiplying a logical value indicating whether or not a condition of the data record for each rule among the at least one rule is satisfied by a conclusion value of the rule corresponding to the rule-describing feature, respectively; alternatively, the rule-related feature generation means may take, as the rule description feature, a logical value indicating whether or not the condition of the data record for each of the at least one rule is satisfied, and take, as initial values of rule description feature coefficients of the machine learning prediction model, conclusion values of each of the at least one rule, the conclusion values representing results caused when the condition of the corresponding rule is satisfied, respectively.

Optionally, in the system, the conclusion value is a positive value when the result is positively correlated with the prediction target, and the conclusion value is a negative value when the result is negatively correlated with the prediction target.

Optionally, in the system, the rule-related feature generation means generates the rule-describing feature by multiplying a logical value indicating whether or not a condition of the data record is satisfied for each rule among the at least one rule by a product of a weight of the rule corresponding to the rule-describing feature and a conclusion value, respectively; alternatively, the rule-related feature generation means may take, as the rule description feature, a logical value indicating whether or not the condition of the data record for each of the at least one rule is satisfied, and take, as an initial value of a rule description feature coefficient of the machine learning prediction model, products of weights of each of the at least one rule, the weights representing the certainty of the respective rules, and conclusion values representing results caused when the condition of the respective rules is satisfied, respectively.

Alternatively, in the system, an initial value of a rule description feature coefficient of the machine learning prediction model is multiplied by a coefficient correction value set according to an algorithm of the machine learning prediction model.

Optionally, the system further comprises: and the fusion device is used for fusing the machine learning prediction result with the rule prediction result obtained by the data record according to the at least one rule so as to obtain a fusion prediction result corresponding to the prediction sample.

Optionally, in the system, the machine learning prediction device further combines the data records and the machine learning prediction results into rule training samples; or the fusion device also combines the data records and the fusion prediction result into a rule training sample.

Optionally, in the system, the weight of each rule among the at least one rule is set based on an updated value of a rule description feature coefficient of the machine learning prediction model.

Optionally, in the system, the rule learning engine is based on a markov logic network.

Optionally, the system further comprises: rule obtaining means for obtaining the at least one rule regarding the predicted target.

According to another exemplary embodiment of the present invention, there is provided a system for machine learning in conjunction with rules, including: history data record obtaining means for obtaining a history data record including a plurality of attribute information and a flag as a prediction target actual value; rule-related feature generation means for applying at least one rule concerning the prediction target to the plurality of attribute information to generate a rule-related feature of the history data record; training sample generating means for forming a training sample based on at least the rule-related features and the labels; and a machine learning model training means for training a machine learning prediction model based on the training samples, wherein the machine learning model is used for providing a machine learning prediction result about the prediction target for the new data record.

Optionally, in the system, the rule-related feature includes a rule prediction feature and/or a rule description feature, wherein the rule-related feature generation means generates the rule prediction feature based on a rule prediction result obtained by the historical data record in accordance with the at least one rule, and/or generates the rule description feature based on whether or not a condition of the historical data record for each rule among the at least one rule is established.

Optionally, in the system, the rule-related feature generation means generates the rule-describing features by multiplying logical values indicating whether or not the condition of the history data record for each rule among the at least one rule is satisfied by weights of the rules corresponding to the rule-describing features, respectively; alternatively, the rule-related feature generation means may take, as the rule description feature, a logical value indicating whether or not the condition of the history data record for each of the at least one rule is satisfied, and the machine learning model training means may take, as initial values of rule description feature coefficients of the machine learning prediction model, weights of each of the at least one rule, the weights representing the certainty of the corresponding rule, respectively.

Optionally, in the system, the rule-related feature generation means generates the rule-describing feature by multiplying a logical value indicating whether or not a condition of the history data record for each rule among the at least one rule is satisfied by a conclusion value of the rule corresponding to the rule-describing feature, respectively; alternatively, the rule-related feature generation means may take, as the rule description feature, a logical value indicating whether or not the condition of the history data record for each of the at least one rule is satisfied, and the machine learning model training means may take, as initial values of rule description feature coefficients of the machine learning prediction model, conclusion values of each of the at least one rule, the conclusion values indicating results caused when the condition of the corresponding rule is satisfied, respectively.

Optionally, in the system, the rule-related feature generation means generates the rule-describing feature by multiplying a logical value indicating whether or not a condition of the history data record for each rule among the at least one rule is satisfied by a product of a weight of the rule corresponding to the rule-describing feature and a conclusion value, respectively; alternatively, the rule-related feature generation means may take, as the rule description feature, a logical value indicating whether or not the condition of the history data record for each of the at least one rule is satisfied, and the machine learning model training means may take, as initial values of rule description feature coefficients of the machine learning prediction model, products of weights of each of the at least one rule, the weights representing the certainty of the respective rules, and conclusion values representing results caused when the condition of the respective rules is satisfied, respectively.

Alternatively, in the system, the machine learning model training means multiplies the initial value of the rule description feature coefficient of the machine learning prediction model by a coefficient correction value set according to an algorithm of the machine learning prediction model.

According to another exemplary embodiment of the invention, a computing device for machine learning in conjunction with rules is provided, comprising a storage component having stored therein a set of computer-executable instructions which, when executed by the processor, perform the steps of: (A) obtaining a data record, wherein the data record comprises a plurality of attribute information; (B) applying at least one rule relating to a predicted target to the plurality of attribute information to generate a rule-related characteristic of the data record; (C) forming prediction samples based at least on the rule-related features; and (D) generating a machine-learned prediction result for the prediction target based on the prediction samples using a machine-learned prediction model, wherein the machine-learned prediction model is trained to provide respective machine-learned prediction results for the prediction samples.

Optionally, in the computing device, the rule-related features comprise rule prediction features and/or rule description features, wherein in step (B) the rule prediction features are generated based on rule prediction results obtained by the data record in accordance with the at least one rule, and/or the rule description features are generated based on whether the data record stands for a condition of each rule of the at least one rule.

Optionally, in the computing apparatus, in the step (B), rule description features are generated by multiplying logical values indicating whether or not conditions of the data record for each rule among the at least one rule are satisfied by weights of the rules corresponding to the rule description features, respectively; or, in the step (B), a logical value indicating whether or not a condition of the data record for each of the at least one rule is satisfied is taken as a rule description feature, and weights of each of the at least one rule, which represent determinism of the respective rule, are taken as initial values of rule description feature coefficients of the machine learning prediction model, respectively.

Optionally, in the computing device, the weight is set by human specification and/or by a rule learning engine, wherein the rule learning engine is configured to learn the weight of each rule among the at least one rule based on a rule training sample.

Optionally, in the computing device, in step (B), generating a rule description feature by multiplying a logical value indicating whether or not a condition of the data record is satisfied for each rule among the at least one rule by a conclusion value of the rule corresponding to the rule description feature, respectively; or, in the step (B), a logical value indicating whether or not the condition of the data record for each of the at least one rule is satisfied is taken as a rule description feature, and conclusion values of each of the at least one rule, which represent results caused when the condition of the corresponding rule is satisfied, are taken as initial values of rule description feature coefficients of the machine learning prediction model, respectively.

Optionally, in the computing device, the conclusion value is a positive value when the result is positively correlated with the prediction target, and the conclusion value is a negative value when the result is negatively correlated with the prediction target.

Optionally, in the computing device, in step (B), generating a rule description feature by multiplying a logical value indicating whether or not a condition of the data record is satisfied for each rule among the at least one rule by a product of a weight of the rule corresponding to the rule description feature and a conclusion value, respectively; or, in the step (B), a logical value indicating whether or not the condition of the data record for each of the at least one rule is satisfied is taken as a rule description feature, and products of weights of each of the at least one rule, which represent the certainty of the respective rules, and conclusion values, which represent results caused when the condition of the respective rules is satisfied, and the weight of each of the at least one rule is multiplied by the conclusion values, respectively, are taken as initial values of rule description feature coefficients of the machine learning prediction model.

Alternatively, in the calculation means, an initial value of the rule description feature coefficient of the machine learning prediction model is multiplied by a coefficient correction value set in accordance with an algorithm of the machine learning prediction model.

Optionally, in the computing device, when the set of computer-executable instructions is executed by the processor, the following steps are further performed: (E) and fusing the machine learning prediction result with a rule prediction result obtained by the data record according to the at least one rule to obtain a fused prediction result corresponding to the prediction sample.

Optionally, in the computing device, when the set of computer-executable instructions is executed by the processor, the following steps are further performed: (F) and combining the data records and the machine learning prediction result or the fusion prediction result into a rule training sample.

Optionally, in the computing device, a weight of each rule among the at least one rule is set based on an updated value of a rule description feature coefficient of the machine learning prediction model.

Optionally, in the computing device, the rule learning engine is based on a markov logic network.

Optionally, in the computing device, when the set of computer-executable instructions is executed by the processor, before step (B), further performing the steps of: (G) obtaining the at least one rule regarding the predicted objective.

According to another exemplary embodiment of the invention, a computing device for machine learning in conjunction with rules is provided, comprising a storage component having stored therein a set of computer-executable instructions which, when executed by the processor, perform the steps of: (A) acquiring a historical data record, wherein the historical data record comprises a plurality of attribute information and a mark serving as a prediction target actual value; (B) applying at least one rule relating to a predicted target to the plurality of attribute information to generate a rule-related feature of a historical data record; (C) forming a training sample based on at least the rule-related features and labels; and (D) training a machine learning prediction model based on the training samples, wherein the machine learning model is to provide machine learning prediction results for the new data records with respect to the prediction objective.

Optionally, in the computing device, the rule-related features include rule prediction features and/or rule description features, wherein in step (B), the rule prediction features are generated based on rule prediction results obtained by the historical data record in accordance with the at least one rule, and/or the rule description features are generated based on whether conditions of the historical data record for each rule of the at least one rule are established.

Optionally, in the computing apparatus, in the step (B), rule description features are generated by multiplying logical values indicating whether or not conditions of the history data record for each rule among the at least one rule are satisfied by weights of the rules corresponding to the rule description features, respectively; or, in step (B), a logical value indicating whether or not a condition of the history data record for each of the at least one rule is satisfied is taken as a rule description feature, and in step (D), weights of each of the at least one rule are taken as initial values of rule description feature coefficients of the machine learning prediction model, respectively, wherein the weights represent certainty of the respective rule.

Optionally, in the computing apparatus, in step (B), generating a rule description feature by multiplying a logical value indicating whether or not a condition of the history data record for each rule among the at least one rule is satisfied by a conclusion value of the rule corresponding to the rule description feature, respectively; or, in step (B), a logical value indicating whether or not the condition of the history data record for each of the at least one rule is satisfied is taken as a rule description feature, and, in step (D), conclusion values of each of the at least one rule, which represent results caused when the condition of the corresponding rule is satisfied, are taken as initial values of rule description feature coefficients of the machine learning prediction model, respectively.

Optionally, in the computing apparatus, in step (B), a rule description feature is generated by multiplying a logical value indicating whether or not a condition of the history data record for each rule among the at least one rule is satisfied by a product of a weight of the rule corresponding to the rule description feature and a conclusion value, respectively; or, in step (B), a logical value indicating whether or not the condition of the history data record for each of the at least one rule is satisfied is taken as a rule description feature, and, in step (D), products of weights of each of the at least one rule, which represent the certainty of the respective rules, and conclusion values, which represent the results caused when the condition of the respective rules is satisfied, and the conclusion values are taken as initial values of rule description feature coefficients of the machine learning prediction model, respectively.

Alternatively, in the calculation means, in step (D), the initial value of the rule description feature coefficient of the machine learning prediction model is multiplied by a coefficient correction value set according to an algorithm of the machine learning prediction model.

Optionally, in the computing device, when the set of computer-executable instructions is executed by the processor, the following steps are further performed: (E) obtaining the at least one rule regarding the predicted objective.

In the method and system for machine learning in conjunction with rules according to exemplary embodiments of the present invention, the rules can be formed as rule-related features participating in machine learning, so that the rules are effectively incorporated into the process of machine learning, thereby improving the effect of machine learning.

Drawings

These and/or other aspects and advantages of the present invention will become more apparent and more readily appreciated from the following detailed description of the embodiments of the invention, taken in conjunction with the accompanying drawings of which:

FIG. 1 illustrates a block diagram of a system for machine learning in conjunction with rules, according to an exemplary embodiment of the present invention;

FIG. 2 illustrates a block diagram of a system for machine learning in conjunction with rules, according to another exemplary embodiment of the present invention;

FIG. 3 illustrates a flow diagram of a method of machine learning in conjunction with rules, according to an exemplary embodiment of the invention;

FIG. 4 illustrates an example of a Markov logic network in accordance with an illustrative embodiment of the present invention;

FIG. 5 shows a flow diagram of a method of machine learning in conjunction with rules, according to another example embodiment of the present invention;

FIG. 6 illustrates a block diagram of a system for machine learning in conjunction with rules, according to another exemplary embodiment of the present invention; and

FIG. 7 illustrates a flow diagram of a method of machine learning in conjunction with rules, according to another exemplary embodiment of the invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, exemplary embodiments thereof will be described in further detail below with reference to the accompanying drawings and detailed description.

In an exemplary embodiment of the present invention, machine learning is performed by: generating rule-related features to be involved in machine learning by applying rules regarding the prediction target to respective attribute information of data records for prediction or training; prediction or training of the machine learning model is performed using machine learning samples (e.g., prediction samples or training samples) that encompass at least rule-related features.

Here, machine learning is a necessary product of the development of artificial intelligence research to a certain stage, which is directed to improving the performance of the system itself by means of calculation, using experience. In a computer system, "experience" is usually in the form of "data" from which a "model" can be generated by a machine learning algorithm, i.e. by providing empirical data to a machine learning algorithm, a model can be generated based on these empirical data, which provides a corresponding judgment, i.e. a prediction, in the face of a new situation. It should be noted that the present invention is not particularly limited to a particular machine learning algorithm.

Here, a rule generally refers to a logic rule that is semantically explicit, describes an objective rule or a domain concept implied by data distribution, and can be written in the form of "… … if … …". Formally, a rule is like:

the logical expression "←" in the right part, which is called "body", represents the premise of the rule (i.e., the condition of the rule), and the left part, which is called "rule head", represents the result of the rule when the condition is satisfied (i.e., the conclusion value of the rule). The rule body is composed of logic characters (letters) f_kComposition of conjunctions (conjunctions) in which the conjunctions symbol "Λ" is used to represent "and" each character f_kAre boolean expressions examining example attributes such as "(black) or"% hardness. L is the number of logical characters in the rule body, the length of the rule, and the head of the rule

It may be a logical word generally used to indicate the target category or concept determined by the rule, such as "good melon", and the rule header may also indicate a quantitative determination result, such as "specific sweetness of melon", which is also referred to as "if-then rule".

Take the rules about the quality of watermelon as an example:

rule 1: good melon ← (root tip is curling) ^ (umbilicus portion is concave);

rule 2: long Zhi jiao (blurred vision) in Long Zhi jiao (Lu Zhi jiao).

Rule 1 has a length of 2, and discriminates examples by judging the assignment (evaluation) of two logical words, and a sample conforming to the rule is called "cover" by the rule. Note that samples covered by rule 1 are good melons, but samples not covered by rule 1 are not necessarily good melons; only those covered by the rule 2 that have long strokes are not good melons.

According to an exemplary embodiment of the present invention, the rule header of a rule may be directly or indirectly related (e.g., positively or negatively related) to a prediction target, and the rule body is then a specific check for the respective attribute information of the data record, accordingly. For example, where the prediction objective involves fraud auditing of credit card transactions, the rule header may indicate the result of a determination as to whether a credit card transaction is suspected of being fraudulent, and the rule body may include a specific verification of attribute information or other relevant information regarding the condition of the credit card transaction. In the case where there are a plurality of rules regarding the predicted target, it may not be necessary to require that the rule header of each rule be directly related to the predicted target, but the plurality of rules may be associated with each other so as to be related to the predicted target as a whole.

Accordingly, in the exemplary embodiment of the present invention, the above-mentioned rules about the prediction target can be effectively integrated into the prediction or training samples of the machine learning model, so as to achieve a better machine learning effect.

A specific scheme of machine learning in conjunction with rules according to an exemplary embodiment of the present invention will be described in detail below with reference to the accompanying drawings.

FIG. 1 illustrates a block diagram of a system for machine learning in conjunction with rules, according to an exemplary embodiment of the invention. Specifically, the system proposes a processing architecture for prediction using a machine learning model based on a prediction sample into which a rule is fused, where the rule and the machine learning model relate to the same or similar prediction target, and an application result of the rule is converted into a prediction sample characteristic. The system shown in fig. 1 may be implemented entirely by a computer program, as a software program, as a dedicated hardware device, or as a combination of software and hardware. Accordingly, each device constituting the system shown in fig. 1 may be a virtual module that realizes the corresponding function only by means of a computer program, may be a general-purpose or dedicated device that realizes the function by means of a hardware structure, or may be a processor or the like on which the corresponding computer program runs. By utilizing the system, the rule about the prediction target can be effectively merged into the prediction sample of the machine learning model, so that a better prediction result is obtained.

As shown in fig. 1, the data record obtaining apparatus 100 is configured to obtain a data record, wherein the data record includes a plurality of attribute information.

In an exemplary embodiment of the present invention, rules regarding the prediction objective are also incorporated in using machine learning techniques to derive some prediction result for the attribute information of the data records.

By way of example, the prediction objective herein may relate to business judgment, e.g., predicting whether fraud is suspected, credit scoring, differentiated pricing, probability of enterprise closure, etc. For example, the data record may include personal information that is filled in when a person applies for a credit card, and accordingly, the predictive target may indicate whether the application relates to fraud or indicates credit card availability that should be approved, etc. Alternatively, the data record may comprise information about the commodity and/or the potential purchaser to be priced differentially, and accordingly the predicted objective may be a personalized estimated price of the commodity for the potential purchaser. Alternatively, the data record may include information relating to the registration and operation of the enterprise, and accordingly, the prediction objective may be the probability of the enterprise closing within a predetermined period of time in the future.

Furthermore, the predicted objective may also relate to behavioral determinations, such as predicted click probability, marketing response rate, recommendation response rate, and the like. It should be noted that the prediction target here may be any content or matter that can be predicted, that is, the exemplary embodiment of the present invention does not impose any particular limitation on the data record (and its attribute information) and the prediction target, and any data record and prediction target that can be predicted by a machine learning technique may be applied to the exemplary embodiment of the present invention.

The data record may be data generated on-line, data generated and stored in advance, or data received from the outside through an input device or a transmission medium. Such data may relate to information about an individual, business, or organization, such as identity, academic calendar, occupation, assets, contact details, liabilities, income, earnings, taxes, and the like. Alternatively, the data may relate to information about business related items, such as transaction amount, transaction partner, subject matter, transaction location, etc. about the sales contract. It should be noted that the attribute information content mentioned in the exemplary embodiments of the present invention may relate to the performance or nature of any object or matter in some respect, and is not limited to defining or describing individuals, objects, organizations, units, organizations, items, events, and so forth.

The data record acquisition device 100 may acquire structured or unstructured data from different sources, such as text data or numerical data. The acquired data records can be used to form machine learning examples and participate in the training/prediction process of machine learning. Such data may originate from within the entity desiring the prediction, e.g., from a bank, business, school, etc. desiring the prediction; such data may also originate from other than the aforementioned entities, such as from data providers, the internet (e.g., social networking sites), mobile operators, APP operators, courier companies, credit agencies, and so forth. Optionally, the internal data and external data described above may be used in combination to form a machine learning sample with more information.

The data may be input to the data record acquisition device 100 through an input device, or automatically generated by the data record acquisition device 100 according to existing data, or may be obtained by the data record acquisition device 100 from a network (e.g., a storage medium (e.g., a data warehouse) on the network), and furthermore, an intermediate data exchange device such as a server may facilitate the data record acquisition device 100 to acquire corresponding data from an external data source. Here, the acquired data may be converted into a format that is easy to handle by a data conversion module such as a text analysis module in the data record acquisition apparatus 100. It should be noted that the data record acquisition apparatus 100 may be configured as various modules composed of software, hardware, and/or firmware, and some or all of these modules may be integrated or cooperate together to accomplish a specific function.

The rule-related feature generating means 200 is configured to apply at least one rule regarding the prediction target to the plurality of attribute information of the data record to generate a rule-related feature of the data record.

As described above, the rules according to the exemplary embodiments of the present invention refer to rules related to a prediction target, that is, the rule headers of the rules are the same as or related to (directly related or indirectly related) the prediction target; accordingly, the rule-related feature generation apparatus 200 is intended to perform rule verification with respect to the attribute information of the data record, for example, to determine whether the attribute information of the data record conforms to the content of the rule body defined by each rule, and the rule-related feature generation apparatus 200 generates at least a part of the sample features of the subsequently input machine learning model based on the result of the rule verification. In this way, rules (e.g., business rules reflecting expert experience) can be effectively fused with the subsequent machine learning process, so as to obtain a better prediction result.

Here, as an example, the rule-related feature generating device 200 may convert the rule verification result of the data record into the feature of the machine learning sample corresponding to the data record in a suitable manner, where the rule verification result may indicate an individual judgment result of the data record for each rule, or may indicate an integrated judgment result of the data record under multiple rules, for example, the rule verification result may be a rule estimation result of the data record under all rules. Accordingly, the rule-related feature generation apparatus 200 may generate at least a part of features (i.e., rule-related features) of the machine learning sample based on the rule check result, and these rule-related features may serve as all features of the machine learning sample, or these rule-related features may also constitute features of the machine learning sample together with other features (e.g., attribute features generated based on attribute information of the data records).

Specifically, the rule correlation generation device 200 may obtain a verification result of the data record under the rule. Here, in the case where the rule relates to the attribute information itself of the data record, the rule-correlation generating apparatus 200 may directly apply the rule to the attribute information of the data record; further, if the rule relates to a transformation result (for example, a transformation result such as discretization, field combination, extraction of a partial field value, rounding, or the like) of the attribute information (i.e., attribute field), the rule correlation generation apparatus 200 needs to first perform transformation corresponding to each rule on the attribute information of the data record and apply the rule to the transformed attribute information.

As described above, the rule verification result may be the verification result of each rule, or may be the combined verification result of a plurality of rules or even all rules, wherein the combined verification result based on all rules may be regarded as the rule prediction result of the data record. As an example, the rule correlation generation apparatus 200 may obtain a prediction result of the data record based on the rule as a whole based on the markov logic network.

The rule checking result may be used as a rule-related feature of the data record, and the rule-related feature may include, for example, a rule prediction feature and/or a rule description feature, that is, the rule-related feature may be only the rule prediction feature or the rule description feature, or may be a combination of the two features. Wherein the rule-related generating device 200 may generate a rule prediction feature based on a rule prediction result obtained by the data record according to the at least one rule; further, the rule correlation generating device 200 may generate the rule description feature based on whether the data record stands for a condition of each rule among the at least one rule.

Alternatively, the rule-related feature generation device 200 may further consider the respective confidence differences of the rules when applying the rules to generate the rule-related features. In practice, rules may be rules on the predicted targets summarized based on previous experience, and the applicability of these rules is limited, for example, in the case of telemarketing to bank customers, the number of past marketing rules is often limited, and cannot cover all the customers, which is also an important reason for applying machine learning to improve marketing accuracy.

As an example, according to an exemplary embodiment of the present invention, the rule-related feature generation apparatus 200 may apply each rule having a corresponding weight set thereto, wherein the weight represents a certainty (e.g., confidence) of the corresponding rule. That is, the rule-related feature generation apparatus 200 may apply the at least one rule to which the weight is set to the plurality of attribute information so that the rule verification result can reflect the confidence of the corresponding rule. For example, the rule-related feature generation apparatus 200 may generate the rule description feature by multiplying a logical value indicating whether or not a condition of the data record for each rule among the at least one rule is satisfied by a weight of the rule corresponding to the rule description feature, respectively. By the method, the rule limitation can be relaxed to a certain degree, the reliability that a machine learning model depends on the rule extremely is avoided, the defect of hard judgment errors is overcome, and the rule application result which is more accurate as a whole is obtained.

Here, the weight may be set by a human designation and/or by a rule learning engine for learning the weight of the at least one rule based on the rule training samples. Here, a rule training sample refers to a sample of historical data that already has the actual value of the predicted target, which may be used by the rule learning engine to continually learn the weights of the various rules. In addition, the rule training sample is used as a historical true sample and can also be used as a basis for artificially specifying the weight of each rule.

By way of example, the rule learning engine herein may be based on a Markov logic network, where the Markov logic network is not limited to the initial version of the Markov logic network, but also includes variants or equivalents such as probabilistic Soft logic (Prohibitory Soft logic). However, it should be noted that: the rule learning engine according to an exemplary embodiment of the present invention is not limited to the form of a markov logic network, but may take any form capable of learning rule weights.

As an example, the setting of the weight may be done by the rule-related feature generation apparatus 200 before the rule is applied, and in particular, the rule-related feature generation apparatus 200 may set the corresponding weight for each of the at least one rule and apply the at least one rule with the weight set to the plurality of attribute information. Here, the rule-related feature generation apparatus 200 may set the weight of each rule in various appropriate manners, for example, the weight of each rule may be learned or updated by a rule learning manner, or may be specified according to manual input by a service person.

For example, the rule-related feature generation apparatus 200 may set a corresponding weight for the at least one rule respectively by human designation and/or by a rule learning engine.

As an example, the rule-related feature generation apparatus 200 may set, in conjunction with the rule training sample set, a corresponding weight for each of the at least one rule through human specification and/or through a rule learning engine. Here, the rule training sample set refers to a set of data samples (i.e., rule training samples) that already have an actual value of a prediction target, wherein the rule training samples are not limited to weights used for obtaining respective rules through machine learning, but may be used in any manner. For example, these rule training samples can be used to help business personnel understand the decision-making effect of each rule, so as to set the corresponding weight; in addition, these rule training samples may also be used to relax rules, for example, using a Markov logic network based rule learning engine to learn the weights of the rules in conjunction with a set of rule training samples. Such a set of rule training samples may be obtained in advance by the rule-related feature generation apparatus 200, and these rule training samples may be derived from the same data records as those of the machine learning prediction model, as an example. Furthermore, new rule training samples may also be supplemented via the machine learning results of the prediction samples, i.e. the data records together with the prediction results of the respective prediction samples are composed into new rule training samples.

Here, as an example, the system shown in fig. 1 may further include a rule obtaining device (not shown) for obtaining the at least one rule regarding the prediction target. Here, the rule obtaining means may output a graphical user interface for inputting the rule, which may include a rule editing interface for manually inputting the rule and/or a selection input type interface for displaying constituent items of the rule header and/or the rule body for manual configuration, as an example. Further, optionally, the selection input-type interface can further include a component for manually setting the weights of the respective rules, such that a business person can manually specify the weights of the respective rules.

It should be noted that the above-described manner of setting the weights is merely an example and is not intended to limit the scope of the exemplary embodiments of the present invention, and the above-described manners may be used alone or in combination.

The above shows an example of applying the weight of the rule to the value of the rule-related feature, but according to the exemplary embodiment of the present invention, the application manner of the rule weight is not limited thereto.

For example, the weights of the rules may be applied to the training process of the machine learning prediction model, in such a way that the training phase of the machine learning prediction model may effectively use the experience of the rules, thereby better learning the machine learning prediction model.

Specifically, the rule-related feature generation apparatus 200 may use, as the rule description feature, a logical value indicating whether or not a condition of the data record is satisfied for each of the at least one rule, and accordingly, weights of each of the at least one rule are used as initial values of rule description feature coefficients of the machine learning prediction model, respectively, where the weights represent the certainty of the respective rule.

Further, in the above example, the weight of each rule among the at least one rule may be set based on an updated value of the rule description feature coefficient of the machine learning prediction model. Here, as an example, in the case where the initial value of the rule description feature coefficient is set as the weight of the corresponding rule, during the training of the machine learning prediction model, the value of the coefficient is continuously updated, and is converted into the confidence that the rule is newly determined under machine learning, and the confidence may be inversely used as the weight of the rule.

Specifically, the rule-related feature generation apparatus 200 may set the weight of each rule among the at least one rule based on an updated value of a rule description feature coefficient of the machine-learned prediction model, wherein the rule description feature coefficient is used to set a corresponding weight for the at least one rule by human designation and/or by a rule learning engine, respectively. According to an exemplary embodiment of the present invention, the rule application result may be converted into at least a part of sample features of the machine learning prediction model, i.e., rule description features, and accordingly, coefficients related to the rule description features in the machine learning prediction model may be used to adversely affect the setting of rule weights, for example, the coefficients may be used as rule weights input correspondingly in a markov logic network, and then the rule weights to be applied to the attribute information or the model may be iterated based on the continuously input coefficients. In this way, the machine learning and rule systems can interact in terms of weights, thus iterating to a more optimal model.

Further, it should be noted that according to exemplary embodiments of the present invention, the result of the rule may relate not only to the result value of a classification (e.g., binary) decision, but also to a quantized conclusion value. That is, as an example, the result of the rule may be a classification result taking a value of "0" or "1", or may be a classification result taking a value of any real number (either a positive value or a negative value). In particular, the conclusion value is positive when the result is positively correlated with the prediction target and negative when the result is negatively correlated with the prediction target. Similarly, the application mode of the rule result is not limited to the mode of taking the value as the relevant characteristic of the rule, and the rule result can be applied to the training process of the machine learning prediction model, so that the experience of the rule can be effectively used for reference in the training stage of the machine learning prediction model, and the machine learning prediction model can be better learned.

Specifically, the rule-related feature generation apparatus 200 may use, as the rule description feature, a logical value indicating whether or not a condition of the data record for each of the at least one rule is satisfied, and accordingly, a conclusion value of each of the at least one rule, which indicates a result caused when the condition of the corresponding rule is satisfied, as an initial value of a rule description feature coefficient of the machine learning prediction model, respectively.

It should be noted that the above-described ways of applying the rule weights and the rule results may also be combined such that the common influence of the rule weights and the rule results is reflected in the respective coefficients of the rule-related features or the machine learning model. As an example, the rule-related feature generation apparatus 200 may generate the rule-describing feature by multiplying a logical value indicating whether or not the condition of the data record for each rule among the at least one rule is satisfied by a product of the weight of the rule corresponding to the rule-describing feature and the conclusion value, respectively.

Alternatively, as another example, the rule-related feature generation apparatus 200 may use, as the rule description feature, a logical value indicating whether or not a condition of the data record is satisfied for each of the at least one rule, and accordingly, products of the weights and conclusion values of each of the at least one rule are used as initial values of rule description feature coefficients of the machine learning prediction model, respectively.

It should be noted that when the rule weights and/or rule conclusion values are applied to the rule description features or the corresponding model feature coefficient initial values, their numerical ranges may be appropriately adjusted to better suit the machine learning model.

In addition, as an alternative, the initial values may be further adjusted to better embody the characteristics of the algorithm or the expert experience, on the basis of setting the rule weights and/or the rule conclusion values as the initial values of the corresponding model feature coefficients. Specifically, the initial value of the rule description feature coefficient of the machine learning prediction model may be further multiplied by a coefficient correction value set according to the algorithm of the machine learning prediction model.

The prediction sample generation means 300 is arranged to form prediction samples based at least on the rule-related features. Here, as an example, the prediction sample generation apparatus 300 may generate the prediction sample so as to cover only the rule-related feature generated by the rule-related generation apparatus 200. Alternatively, the prediction sample generation apparatus 300 may generate the prediction sample by combining the rule-related feature with other features, where the other features may be attribute features generated based on attribute information of the data record. Here, the attribute feature may be the attribute information itself, or may be a result obtained by processing the attribute information (i.e., attribute field) (i.e., feature processing, for example, various feature engineering processes such as discretization, field combination, extraction of partial field values, rounding, and the like).

The machine learning prediction apparatus 400 is configured to generate a machine learning prediction result regarding a prediction target based on prediction samples using a machine learning prediction model trained to provide respective machine learning prediction results for the prediction samples.

Specifically, after obtaining a prediction sample of the rule check result into which the data record is merged, the machine learning prediction apparatus 400 may provide a machine learning prediction result regarding the prediction target using a machine learning model (i.e., a machine learning prediction model) previously trained based on a machine learning technique. Here, the machine learning prediction model is trained based on a machine learning algorithm, and specifically, a machine learning prediction model may be trained based on a specific machine learning algorithm using a large amount of historical data as training samples, where the features of the training samples are the same as those of the prediction samples, and the corresponding prediction target actual values are used as the labels (labels) of the training samples.

Accordingly, when a new prediction sample arrives, the machine learning prediction apparatus 400 may input the features of the prediction sample to the machine learning prediction model, i.e., may obtain the prediction result of the new prediction sample with respect to the prediction target.

It can be seen that, according to an exemplary embodiment of the present invention, the machine learning prediction apparatus 400 may use a prediction model trained based on an arbitrary machine learning algorithm, because the rule application result is converted into a feature in a prediction sample, ensuring the independence of the original machine learning algorithm, and accordingly, the system shown in fig. 1 may be understood as a general machine learning system, which not only effectively introduces rule judgment, but also does not need to change the original machine learning algorithm.

Further, as an example, the machine learning prediction apparatus 400 may also combine the data records and the machine learning prediction results into rule training samples. As described above, in the exemplary embodiment of the present invention, the rule training sample set may be combined to set the corresponding weight for each rule, and the above operations may be performed by the rule-related feature generation apparatus 200 or other apparatuses. Accordingly, by composing the data records together with the prediction results of the respective prediction samples into new rule training samples, the rule weights can be adjusted based on the prediction results of machine learning to better overcome the limitations of the rules themselves. For this purpose, the machine learning prediction apparatus 400 may use the machine learning prediction result as a label of the new rule training sample, and the label and the data record may be combined into a complete rule training sample, and accordingly, the machine learning prediction apparatus 400 may provide the combined rule training sample to the rule-related feature generating apparatus 200.

Furthermore, the machine learning prediction apparatus 400 may further extract rule description feature coefficients of the machine learning prediction model, wherein the rule description feature coefficients are used for setting respective weights for the at least one rule through human specification and/or through a rule learning engine. In this way, the machine learning prediction apparatus 400 can extract the weight of the relevant rule obtained by the machine learning manner, that is, the rule description feature coefficient, and supply the extracted coefficient to the apparatus for setting the rule weight (for example, the rule-related feature generation apparatus 200 or other apparatus) to update the previously set rule weight. It can be seen that the machine learning system and the rule system influence each other in the aspect of rule coefficients, which is helpful for obtaining a better estimation result.

It should be understood that the above-described devices shown in fig. 1 may each be configured as software, hardware, firmware, or any combination thereof that performs a particular function. These means may correspond, for example, to an application-specific integrated circuit, to pure software code, or to a combination of software and hardware elements or modules. Further, one or more functions implemented by these apparatuses may also be collectively performed by components in a physical entity device (e.g., a processor, a client, a server, or the like).

In addition, in order to further integrate the rule and the prediction result of the machine learning, as an optional mode, on the basis of obtaining the prediction result of the machine learning based on the prediction sample into which the relevant features of the rule are fused, the prediction result of the rule given based on the whole rule can be further fused to obtain the final prediction result.

FIG. 2 illustrates a block diagram of a system for machine learning in conjunction with rules, according to another exemplary embodiment of the present invention. Here, the data record obtaining device 100, the rule-related feature generating device 200, the prediction sample generating device 300, and the machine learning predicting device 400 in the system shown in fig. 2 may perform similar operations to those described above in fig. 1, and will not be described again here. In addition, the system shown in fig. 2 further includes a rule prediction device 500 and a fusion device 600.

Here, the rule prediction device 500 is configured to obtain a rule prediction result of the data record according to the at least one rule, and accordingly, the fusion device 600 is configured to fuse the machine learning prediction result and the rule prediction result to obtain a fusion prediction result corresponding to the prediction sample.

Specifically, the rule prediction apparatus 500 may obtain the rule prediction result of the data recorded under the overall rule. As an example, the rule prediction apparatus 500 may include a rule learning engine that may learn weights to respective rules based on a markov logic network, and accordingly, the rule prediction apparatus 500 applies the rules with the weights set to the data records, thereby obtaining rule prediction results of the data records.

It can be seen that, as an example, the rule-related feature may also include the rule prediction result, in which case the rule prediction apparatus 500 may be provided within the rule-related feature generation apparatus 200, or the rule-related generation apparatus 200 may implement the function of the rule prediction apparatus 500, so that the rule prediction apparatus 500 does not have to be separately provided in the system. Accordingly, the obtained rule prediction result is not only incorporated as a rule-related feature (specifically, a rule prediction feature) into the prediction sample, but also is to be incorporated with a machine learning result of the prediction sample.

Further, as another example, the rule-related feature may not include the rule prediction result, in which case a separate rule prediction apparatus 500 is provided as shown in fig. 2.

The fusion device 600 may receive the machine learning prediction result from the machine learning prediction device 400, receive the rule prediction result from the rule prediction device 500 or the rule-related feature generation device 200, and fuse the two results to obtain a fusion prediction result corresponding to the prediction sample.

Furthermore, as an alternative, the above-mentioned fusion prediction result may also be used to form a new rule training sample, and specifically, the fusion apparatus 600 may combine the data record and the fusion prediction result into the rule training sample.

A flow chart of a method of machine learning in conjunction with rules according to an exemplary embodiment of the present invention is described below with reference to fig. 3. Here, the method shown in fig. 3 may be performed by the system shown in fig. 1, may be implemented entirely in software by a computer program, and may be performed by a specifically configured computing device, as an example. For convenience of description, it is assumed that the method shown in fig. 3 is performed by the system shown in fig. 1.

As shown, in step S100, a data record is acquired by the data record acquisition apparatus 100, wherein the data record includes a plurality of attribute information.

Here, as an example, each acquired data record may correspond to an item to be predicted (e.g., an event or an object) for which prediction regarding a prediction target is to be performed, and accordingly, the data record may include various attribute information fields reflecting the performance or nature (i.e., attributes) of the event or object in some respect. These attribute information fields may be filtered or otherwise processed accordingly. Here, the data record acquiring apparatus 100 may collect data manually, semi-automatically or automatically, or process the collected raw data, so that the processed various attribute information may be used as a sample feature for machine learning. As an example, the data record acquisition device 100 may collect data in batches.

Here, the data record obtaining apparatus 100 may receive the data record to be predicted, which is manually input by the user, through an input device (e.g., a workstation). Furthermore, the data record acquisition device 100 may systematically retrieve the data record to be predicted from the data source in a fully automated manner, for example, by systematically requesting the data source and obtaining the requested data from the response via a timer mechanism implemented in software, firmware, hardware, or a combination thereof. The data sources may include one or more databases or other servers. The manner in which the data is obtained in a fully automated manner may be implemented via an internal network and/or an external network, which may include transmitting encrypted data over the internet. Where servers, databases, networks, etc. are configured to communicate with one another, data collection may be automated without human intervention, but it should be noted that certain user input operations may still exist in this manner. The semi-automatic mode is between the manual mode and the full-automatic mode. The semi-automatic mode differs from the fully automatic mode in that a trigger mechanism activated by the user replaces the timer mechanism. In this case, the request for extracting data is generated only in the case where a specific user input is received. Each time data is acquired, the captured data may preferably be stored in non-volatile memory. As an example, a data warehouse may be utilized to store raw data collected during acquisition as well as processed data.

The data records obtained above may originate from the same or different data sources, that is, each data record may also be the result of a concatenation of different data records. For example, in addition to obtaining information data records (which include attribute information fields of income, academic history, post, property condition, and the like) filled by a customer when applying for opening a credit card to a bank, the data record obtaining apparatus 100 may obtain other data records of the customer at the bank, such as loan records, daily transaction data, and the like, and these obtained data records may be spliced into a complete data record. Furthermore, the data record acquisition device 100 may also acquire data originating from other private or public sources, such as data originating from a data provider, data originating from the internet (e.g., social networking sites), data originating from a mobile operator, data originating from an APP operator, data originating from an express company, data originating from a credit agency, and so forth.

Optionally, the data record acquiring apparatus 100 may store and/or process the acquired data by means of a hardware cluster (such as a Hadoop cluster, a Spark cluster, etc.), for example, store, sort, and perform other offline operations. In addition, the data record acquisition device 100 may perform online streaming processing on the acquired data.

As an example, a data conversion module such as a text analysis module may be included in the data record obtaining device 100, and accordingly, in step S100, the data record obtaining device 100 may convert unstructured data such as text into more usable structured data for further processing or reference. Text-based data may include emails, documents, web pages, graphics, spreadsheets, call center logs, transaction reports, and the like.

According to an exemplary embodiment of the present invention, the data record obtaining apparatus 100 may optionally perform feature engineering processing on the obtained data records, that is, processing the attribute information field values of the data records to obtain attribute information of sample features that can be used for rule learning and/or machine learning. For example, the data record acquisition device 100 may perform various feature engineering processes such as discretization, field combination, extracting partial field values, rounding, etc. on the raw attribute information fields of the received data records, thereby converting the raw attribute information field values into attribute information that can be used as rule learning features and/or machine learning features.

In the prior art, the participation of rules in machine learning is mainly embodied in filtering out data which do not accord with the rules, transforming original data by using the rules and the like, so that the rules cannot participate in the training process of a machine learning model, and the final prediction result is influenced only by preprocessing data in the early stage or correcting the prediction result in the later stage.

On the other hand, according to the exemplary embodiments of the present invention, the rules can be effectively integrated into the machine learning process while preserving the independence of the machine learning model, which enables the versatility of the machine learning system to be realized using any machine learning model suitable for the prediction target without being limited to a certain specific machine learning model.

Specifically, in step S200, at least one rule regarding the prediction target is applied to the plurality of attribute information of the data record by the rule-related feature generation means 200 to generate a rule-related feature of the data record.

As an example, before step S200, a step of acquiring the at least one rule regarding the prediction target may be further included. As an example, the rules may be received externally using a visual interface, and further, components for a user to configure the rules or rule weights may be exposed in the interface for the user to conveniently set or adjust the rules.

Here, the at least one rule is related to the prediction objective as a whole, wherein each rule may directly or indirectly relate to the prediction objective.

For example, assuming that prediction of spam is targeted, there may be at least two rules:

rule 3: spam ← title contains "promotion";

rule 4: spam ← title contains "contribution".

For another example, assuming that a prediction of cancer is targeted, there may be at least two rules:

rule 5:

rule 6:

according to an exemplary embodiment of the present invention, this may be doneThe rule of (2) is applied to the acquired data record to obtain the corresponding rule-related characteristic. Here, assuming that the set of acquired data records is X, each data record may have d pieces of attribute information, i.e., the ith data record X_iEpsilon X, which can be expressed as X_i＝(x_i,1；x_i,2；…；x_i,d) Wherein i and d are positive integers. Accordingly, the rule-related feature may be a verification result obtained by applying the rule to at least a part of the attribute information, where the verification result may correspond to a single rule or a plurality of rules, and may be a rule prediction result corresponding to the entire rules, as an example.

As an example, the rule-related feature may be obtained by determining whether or not the condition of each rule is established, and for example, a logical value indicating whether or not the condition of the data record for each rule among the at least one rule is established may be used as the rule-related feature. In particular, in the above example of predicting spam, x is directed to the ith data record_iCorresponding rule-related features (R) may be generated_i,3；R_i,4) Wherein R is_i,3May indicate whether the condition of rule 3 holds, R_i,4May indicate whether the condition of rule 4 holds. For example, when the ith data record x_iContains a "promotion", that is, when the rule body of rule 3 is established, R_i,3Can take the value of 1; when the header information does not contain a "promotion," R_i,3And may take the value 0. In addition, when the ith data record x_iWhen the header information of (2) includes "contribution", that is, the rule body of rule 4 is established, R_i,4Can take the value of 1; when the title information does not contain "contribution", R_i,4And may take the value 0.

In practice, rules tend to have uncertainty, e.g., mail whose header contains a "promotion" or "contribution" is not necessarily spam. Uncertainty in the rules themselves can easily lead to large discrepancy between the final prediction and the objective situation, and the continued use of such rules can lead to a continuous deterioration of the prediction.

For this reason, according to an exemplary embodiment of the present invention, in applying the rule to the data record in step S200, the confidence of the rule may be considered, and in particular, the at least one rule with the weight set may be applied to the plurality of attribute information, so that the confidences corresponding to different rules can be distinguished in a subsequent machine learning model. The weights herein may be preset by human designation and/or by a rule learning engine.

Specifically, in step S200, the rule description feature may be generated by multiplying a logical value indicating whether or not the condition of the data record for each rule among the at least one rule is satisfied by a weight of the rule corresponding to the rule description feature, respectively. For example, in the example of predicting spam above, rule 3 may be set with a weight of 0.8, while rule 4 may be set with a weight of 0.3, and accordingly, for a data record, R is true when the rule body of rule 3 holds_i,3The value can be 0.8; when the rule body of rule 3 does not hold, R_i,3And may take the value 0. Further, when the rule body of rule 4 holds, R_i,4The value can be 0.3; when the rule body of rule 4 does not hold, R_i,4And may take the value 0.

In addition to using the logic value itself indicating whether the condition of the rule is satisfied as the rule-related feature, the result caused when the condition is satisfied can be introduced into the rule-related feature, and in particular, such rule-related feature can effectively embody the quantitative conclusion value caused when the condition of the rule is satisfied. Specifically, in step S200, the rule description feature may be generated by multiplying logical values indicating whether or not the condition of the data record is satisfied for each rule among the at least one rule by conclusion values of the rule corresponding to the rule description feature, respectively.

For example, in another example of predicting spam, there may be rules that can judge spam probability, such as:

rule 33: 70% junk mail ← title contains "promotion";

rule 44: 40% is spam ← title contains "contribution".

Accordingly, x is recorded for the ith data record_iCorresponding rule-related features (R) may be generated_i,33；R_i,44) Wherein R is_i,33May correspond to the result of the establishment of the condition of rule 33, R_i,44May correspond to the result of the condition of the rule 44 being satisfied. For example, when the ith data record x_iContains a "promotion", i.e., R is the time when the rule body of rule 33 is established_i,33Can take the value of 0.7; when the header information does not contain a "promotion," R_i,33And may take the value 0. In addition, when the ith data record x_iWhen the header information of (2) includes "contribution", that is, when the rule body of the rule 44 is established, R_i,44The value can be 0.4; when the title information does not contain "contribution", R_i,44And may take the value 0.

In the case where the result values of the rules are applied as above, the weights of the respective rules may be further combined, and in particular, in step S200, the rule description feature may be generated by multiplying the logical value indicating whether the condition of the data record for each rule among the at least one rule is satisfied by the product of the weight of the rule corresponding to the rule description feature and the conclusion value, respectively.

For example, assume the above rule R_i,33Is set with a weight of 0.6 and rule R_i,44Is set with a weight of 0.5. Accordingly, when the ith data record x_iContains a "promotion", i.e., R is the time when the rule body of rule 33 is established_i,33Can take the value of 0.7 x 0.6 ═ 0.42; when the header information does not contain a "promotion," R_i,33And may take the value 0. In addition, when the ith data record x_iWhen the header information of (2) includes "contribution", that is, when the rule body of the rule 44 is established, R_i,44Can take the value of 0.4 x 0.5 ═ 0.2; when the title information does not contain "contribution", R_i,44And may take the value 0.

It should be noted that, when determining a specific feature value of a rule-related feature, the value range of the weight or conclusion value may be appropriately adjusted, so that the feature value can be effectively applied to machine learning operation.

In essence, the rule-related feature may be characterized as a rule-describing feature that may be generated based on whether the data record satisfies a condition for each rule of the at least one rule, may indicate a result of whether the condition of the rule is satisfied, and may further incorporate a weight and/or conclusion value for the rule.

It can be seen that in the above example, for a data record, each rule has a corresponding rule description feature whose value indicates the result of the verification when the rule is applied to the attribute information of the data record. However, exemplary embodiments of the present invention are not limited thereto, and the rule description feature may correspond to a comprehensive result after a plurality of rules are applied. It should be noted that the manner of generating the rule-related feature is not limited to the above-described example, and any manner of generating the related feature by applying the rule to the data record may be applied to the exemplary embodiment of the present invention.

For example, the rule-related feature may also be characterized as a rule prediction feature that is generated based on a rule prediction result obtained by the data record in accordance with the at least one rule. For example, a rule learning engine may be implemented in the system (for example, in the rule-related feature generation apparatus 200), and a prediction result obtained by the data record based on the entire rule may be obtained in step S200, and the prediction result may be used as the rule-related feature.

For example, in the above example of predicting spam, in step S200, a rule learning engine can be used to predict that a certain email is spam based on an overall rule (optionally together with corresponding weights) including rule 3, rule 4 and other relevant rules, and the probability of 0.4 is 0.4, and the prediction result 0.4 is taken as the rule-related characteristic P of the email_RUL. It should be noted that the rule prediction features may be used as rule-related features of the data record along with the rule description features.

According to an exemplary embodiment of the present invention, the rule learning engine may be configured to learn a weight of each rule based on the input rule and a corresponding instance (i.e., a real historical data record as a rule training sample), and to be able to give a prediction result (e.g., a pre-estimated probability) for a new data record based on the weighted rule. Here, the input rule may or may not include a weight. The rule weights herein may be initially specified by a human, and the human specified rules are then continuously updated using, for example, a Markov logic network. In addition, the rule weight here can also be derived from a subsequent machine learning model, that is, the coefficient of the rule-related feature (e.g., rule description feature) in the machine learning model obtained during the training process or after the training is completed is fed back to the rule learning engine, so that the rule learning part and the machine learning part interact with each other through the weight of the rule feature, and a model with better performance is iterated.

The above-described rule learning engine may be obtained based on a Markov logic network (or a variant thereof, e.g., probabilistic soft logic), for example, and may be constructed in other forms as well. Specifically, a markov logic network is a binary set consisting of rules of a first order logic formula and their corresponding weight values. The basic idea of markov logic networks is to relax the constraints of first order logic formulas, i.e., the more formulas an event violates, the less likely it will occur, but not necessarily at all (i.e., the probability of occurrence is not necessarily 0). The Markov logic network can be obtained by instantiating the rules based on the rule training sample set, and further learning and reasoning can be carried out on the Markov logic network.

Figure 4 illustrates an example of a markov logic network in accordance with an exemplary embodiment of the present invention. In the example shown in fig. 4, targeting the prediction of cancer, accordingly, there are two rules:

rule 5:

rule 6:

the instantiation of A, B two persons under the above rules is shown in fig. 4, and those skilled in the art will understand that in a set of worlds composed of examples related to the above rules, the weights corresponding to the rules and the predicted prediction results based on the rules as a whole can be learned.

It can be seen that, according to the exemplary embodiment of the present invention, the rule weights learned by the markov logic network and the prediction results thereof can be applied to general machine learning problems (e.g., classification problems, etc.), so that in combination with general machine learning methods, the markov random field is no longer relied on as a whole, and the solution of statistical relationship learning and reasoning problems is not limited.

Referring again to fig. 3, in step S300, a prediction sample is formed by the prediction sample generation apparatus 300 based on at least the rule-related features. In this way, the raw data records may be expanded to accommodate sample features associated with the rules, thereby enabling the rules to effectively participate directly in the prediction process of the model.

Here, the prediction sample covers at least the rule-related feature, and the prediction sample may further include other features, for example, an attribute feature obtained based on the attribute information. As an example, the prediction sample generation apparatus 300 may generate the prediction sample by concatenating the rule-related feature with the other attribute feature.

For example, suppose data record x_i＝(x_i,1；x_i,2；…；x_i,d) Which can respectively obtain m rule-related characteristics (r) through the examination of m (wherein m is a positive integer) rules_i,1；r_i,2；…；r_i,m) The value of each rule-related feature depends on the test result of the corresponding rule when the rule is applied to the data record, for example, when the rule body of the corresponding rule is established, the value of the rule-related feature may be 1, the weight value of the rule, the conclusion value of the rule, the product of the weight value of the rule and the conclusion value, and the like; otherwise, the rule-related feature may take the value 0. Accordingly, the prediction sample generation device 300 may generate the prediction sample y based on the data record_i＝(x_i,1；x_i,2；…；x_i,d；r_i,1；r_i,2；…；r_i,m)。

Also for example, suppose data record x_i＝(x_i,1；x_i,2；…；x_i,d) Through comprehensive examination of m rules, a prediction result about a prediction target can be obtained, and the prediction result can be used as a rule-related feature P_RULi. Accordingly, the prediction sample generation device 300 may generate the prediction sample y based on the data record_i＝(x_i,1；x_i,2；…；x_i,d；P_RULi)。

Alternatively, both the rule describing feature and the rule predicting feature may be simultaneously used as the rule related feature, and accordingly, the prediction sample generation apparatus 300 may generate the data record x based on the data record_iOf the prediction sample y_i＝(x_i,1；x_i,2；…；x_i,d；r_i,1；r_i,2；…；r_i,m；P_RULi)。

It should be noted that the prediction sample generation apparatus 300 is not limited to concatenating the rule-related feature with other features when generating the prediction sample, and may employ various appropriate feature processing methods (e.g., feature combinations, etc.).

Next, in step S400, a machine learning prediction result regarding the prediction target is generated by the machine learning prediction apparatus 400 based on the prediction samples using a machine learning prediction model trained to provide respective machine learning prediction results for the prediction samples.

Specifically, the machine learning prediction apparatus 400 may input the prediction samples into the machine learning prediction model to obtain a machine learning prediction result about the prediction target. The Machine learning prediction model described herein may be any Machine learning model suitable for the original data records, for example, if the original data records are suitable for Support Vector Machine (Support Vector Machine), Logistic Regression (Logistic Regression), etc., the Machine learning prediction model may also adopt the same algorithm without limitation due to the introduction of rule-related features.

As an example, the machine learning predictive model may be a pre-trained predictive model. Specifically, for data records (e.g., spam mail which is finally determined) which have historically obtained real results about a prediction target, the real results are used as labels (labels) under supervised learning, corresponding prediction sample characteristics are used as corresponding training sample characteristics, and then a machine learning prediction model for prediction based on prediction samples combined with rule-related characteristics can be trained.

Here, the machine learning prediction model may be trained in advance by the machine learning prediction apparatus 400. Further, the machine learning prediction model may be trained in advance by a device (not shown) related to model training provided in the system shown in fig. 1, or may be trained in advance by an external device other than the system shown in fig. 1, in which case the machine learning prediction device 400 may receive the machine learning prediction model trained by the device from the device related to model training or the external device.

After obtaining the prediction results regarding the prediction targets in step S400, the machine learning prediction apparatus 400 may store the machine learning prediction results in a corresponding memory for further processing at a later time, or these machine learning prediction results may be transmitted to an external processing apparatus. In addition, the machine learning prediction result can also be displayed to the user through an output device.

According to an exemplary embodiment of the present invention, intermediate or final outcomes of machine learning may be fed back to the rules section. As an example, the method may further comprise the steps of: and combining the data records and the machine learning prediction results into rule training samples. As another example, the method may further comprise the steps of: an updated value of the rule description feature coefficients of the machine learning prediction model is extracted for setting a weight of each rule among the at least one rule.

It can be seen that in an exemplary embodiment of the present invention, the machine learning prediction result can be regarded as a label of the rule training sample (i.e., instance) capable of updating the rule weight, and/or the rule-related feature coefficients of the machine learning model itself can also be fed back to the rule system to directly act on the update of the weight. In this way, the machine learning system and the rule system can interact with each other and iterate over and over to produce a more effective model.

According to an exemplary embodiment of the present invention, in addition to fusing rules at the feature level of the predicted samples, machine learning and rules may be further fused at the level of the prediction result. That is, the machine learning prediction result and the rule prediction result may be fused as the final prediction result instead of the machine learning prediction result.

FIG. 5 illustrates a flow diagram of a method of machine learning in conjunction with rules, according to another exemplary embodiment of the invention. Referring to fig. 5, steps S100 to S400 are substantially similar to steps S100 to S400 shown in fig. 3, and will not be described again here.

The method shown in fig. 5 further includes step S600, in which the fusion device 600 fuses the machine learning prediction result and the rule prediction result to obtain a fusion prediction result corresponding to the prediction sample. Here, the machine learning prediction result may be derived from step S400, and the rule prediction result may be derived from step S200 (in the case where the rule-related feature includes the rule prediction result) or other steps. That is, in the case where the rule-related feature does not include a rule prediction result, the method shown in fig. 5 further includes the steps of: obtaining a rule prediction result of the data record according to the at least one rule. Accordingly, the machine learning prediction result and the rule prediction result may be merged into a final prediction result in step S600. As an example, assume for data record x_iThe rule predicts result as P_RULiAnd the machine learning prediction result is P_MLiThen, in step S600, the fusion device 600 may fuse the above results into P ═ w × P by means of, for example, weighted averaging or the like_RULi+(1-w)×P_MLiWherein w is more than or equal to 0 and less than or equal to 1. Here, the specific value of w may be set as necessary. For example, if the rule prediction result is 0.4 and the machine learning prediction result is 0.7 for a certain data record, the fusion result is 0.55 if the fusion method is averaging.

It should be noted that the above-described machine learning prediction result may indicate not only a prediction result obtained in a case where the feature level fuses the rule, but also a simple machine learning prediction result that does not involve the rule.

Further, according to an exemplary embodiment of the present invention, the fusion result may be fed back to the rule processing section for updating the weight of the rule. As an example, the method shown in fig. 5 may further comprise the steps of: and combining the data records and the fusion result into a rule training sample.

It should be noted that the system for machine learning with rules as described above according to an exemplary embodiment of the present invention may completely rely on the execution of a computer program to realize the corresponding functions, that is, each device corresponds to each step in the functional architecture of the computer program, so that the whole system is called by a special software package (e.g., lib library) to realize the corresponding prediction function.

FIG. 6 illustrates a block diagram of a system for machine learning in conjunction with rules, according to another exemplary embodiment of the invention. Here, those skilled in the art will appreciate that the system shown in FIG. 6 is intended to train out a machine learning predictive model according to an exemplary embodiment of the invention, and that the processing performed may correspond to the processing performed by the system shown in FIG. 1. For example, the devices constituting the above system and the operations performed by the devices may have correspondence, and accordingly, with respect to the specific operations of each device in fig. 6, the following description will be made with reference to the system shown in fig. 1, without repeating the relevant details.

Specifically, the system shown in fig. 6 includes: a history data record acquisition device 1000, a rule-related feature generation device 2000, a training sample generation device 3000, and a machine learning model training device 4000.

The history data record acquisition means 1000 is for acquiring a history data record including a plurality of attribute information and a flag as a prediction target actual value. Here, it should be understood that the history data acquisition apparatus 1000 may operate in a similar manner to the data record acquisition apparatus 100 except that it acquires history data that already has a predicted target actual value.

The rule-related feature generating means 2000 is configured to apply at least one rule regarding the predicted target to the plurality of attribute information to generate a rule-related feature of the history data record. Here, it should be understood that the rule-related feature generation apparatus 2000 is intended to generate rule-related features in model training samples, and the specific operation may correspond to the manner in which the rule-related feature generation apparatus 200 of fig. 1 generates rule-related features in model prediction samples.

The training sample generating means 3000 is adapted to form a training sample based on at least the rule-related features and the labels. Here, in form, the training samples further include the labels in the history data records as compared with the prediction samples, and accordingly, it is to be understood that the training sample generating apparatus 3000 may operate in a similar manner to the prediction sample generating apparatus 300 except that label information is further included in the training samples.

The machine learning model training apparatus 4000 is configured to train a machine learning prediction model based on the training samples, wherein the machine learning prediction model is configured to provide a machine learning prediction result regarding the prediction target for the new data record.

As described above, the history data record acquisition means 1000, the rule-related feature generation means 2000, the training sample generation means 3000, and the machine learning model training means 4000 may be similar in specific operation to the data record acquisition means 100, the rule-related feature generation means 200, the prediction sample generation means 300, and the machine learning prediction means 400 shown in fig. 1, so that those skilled in the art can know the corresponding processing details. In addition, the system shown in fig. 6 may also include additional means, such as rule obtaining means, like the system shown in fig. 1, wherein the rule obtaining means is used for obtaining the at least one rule about the prediction target.

As described above, the history data record acquisition means 1000, the rule-related feature generation means 2000, the training sample generation means 3000, and the machine learning model training means 4000 may be integrated with the data record acquisition means 100, the rule-related feature generation means 200, the prediction sample generation means 300, and the machine learning prediction means 400 shown in fig. 1, respectively, so that the integrated means performs corresponding operations in the model training and model prediction stages, respectively. In addition, at least one or all of the devices in the system shown in fig. 6 may be separate from the system shown in fig. 1 and exist as separate parts.

According to an exemplary embodiment of the present invention, in the training process of the machine learning model, a training sample containing rule-related features is used, i.e., the rules are converted into features, thereby participating in the machine learning process more effectively.

A flowchart of a method of machine learning in conjunction with rules according to another exemplary embodiment of the present invention will be described below with reference to fig. 7. Here, the method shown in fig. 7 may be performed by the system shown in fig. 6, may be implemented entirely in software by a computer program, and may be performed by a specifically configured computing device. For convenience of description, it is assumed that the method shown in fig. 7 is performed by the system shown in fig. 1.

Here, those skilled in the art will appreciate that the method illustrated in FIG. 7 is intended to train a machine learning predictive model according to an exemplary embodiment of the present invention, and that the processing performed may correspond to the processing performed in the method illustrated in FIG. 3. Accordingly, with respect to the specific operation of each step in fig. 7, the following description will be made with reference to the method shown in fig. 3, and relevant details are not repeated, and only the technical features not embodied in fig. 3 are described with emphasis.

As shown in the drawing, in step S1000, a history data record including a plurality of attribute information and a flag as a prediction target actual value is acquired by the history data record acquisition means 1000. Here, it should be understood that, in step S1000, the history data record acquisition means 1000 may acquire the history data record in accordance with an operation similar to that performed in step S100 by the data record acquisition means 100, except that it acquires history data that already has a predicted target actual value.

Next, in step S2000, at least one rule regarding the prediction target is applied to the plurality of attribute information by the rule-related feature generation means 2000 to generate a rule-related feature of the history data record. Here, in step S2000, the rule-related feature generation means 2000 may generate the same rule-related feature in accordance with an operation similar to that performed in step S200 by the rule-related feature generation means 200.

In step S3000, a training sample is formed by the training sample generating device 3000 based on at least the rule-related features and the labels. Here, in step S3000, the training sample generating device 3000 may generate training samples according to operations similar to those performed in step S300 by the prediction sample generating device 300, except that the training samples also need to include corresponding labels.

In step S4000, a machine learning prediction model for providing a machine learning prediction result about a prediction target for a new data record is trained based on training samples by the machine learning model training device 4000.

Here, in training the machine learning prediction model, as an alternative, the rule may be further applied to the coefficient of the model, for example, the weight of the rule and/or the conclusion value of the rule are applied to the coefficient of the model. By the method, the priori knowledge can be effectively used for reference, and the machine learning model can be rapidly and accurately learned.

In particular, for a rule-describing feature in a training sample, the initial value of its corresponding coefficient in the model may be set to correlate with the rule.

As an example, in step S2000, the rule-related feature generation apparatus 2000 may take, as the rule description feature, a logical value indicating whether or not the condition of the history data record for each of the at least one rule is satisfied, and accordingly, in step S4000, the machine learning model training apparatus 4000 may take, as the initial values of the rule description feature coefficients of the machine learning prediction model, the weights of each of the at least one rule, which represent the certainty of the corresponding rule, respectively.

In the above example, each rule of the at least one rule may also be set based on an updated value of the rule description feature coefficient of the machine learning prediction model, thereby enabling both machine learning and the rule to interact, thereby iterating through a better model.

As another example, in step S2000, the rule-related feature generation apparatus 2000 may use, as the rule description feature, a logical value indicating whether or not the condition of the history data record for each of the at least one rule is satisfied, and accordingly, in step S4000, the machine learning model training apparatus 4000 may use, as the initial value of the rule description feature coefficient of the machine learning prediction model, a conclusion value of each of the at least one rule, respectively, where the conclusion value represents a result caused when the condition of the corresponding rule is satisfied.

As still another example, in step S2000, the rule-related feature generation apparatus 2000 may use, as the rule description feature, a logical value indicating whether or not the condition of the history data record for each rule among the at least one rule is satisfied, and accordingly, in step S4000, the machine learning model training apparatus 4000 may use, as the initial values of the rule description feature coefficients of the machine learning prediction model, the products of the weights and conclusion values of each rule among the at least one rule, respectively.

In addition to this, the initial value of the rule description characteristic coefficient may be adjusted by the coefficient correction value. Specifically, in training the machine learning prediction model, the method may further include the steps of: the initial value of the rule description characteristic coefficient of the machine learning prediction model is multiplied by the coefficient correction value, so that the training process of the model is more effective. Here, the coefficient correction value may be manually adjusted by a programmer or may be automatically set according to an algorithm of a machine learning prediction model.

Alternatively, the various means shown in fig. 1, 2 or 6 may be implemented by hardware, software, firmware, middleware, microcode or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the corresponding operations may be stored in a computer-readable medium such as a storage medium, so that a processor may perform the corresponding operations by reading and executing the corresponding program code or code segments.

Here, the exemplary embodiments of the present invention may also be implemented as a computing device comprising a storage component and a processor, the storage component having stored therein a set of computer-executable instructions that, when executed by the processor, perform the above-described method of machine learning in conjunction with rules.

In particular, the computing devices may be deployed in servers or clients as well as on node devices in a distributed network environment. Further, the computing device may be a PC computer, tablet device, personal digital assistant, smart phone, or other device capable of executing the set of instructions.

The computing device need not be a single computing device, but can be any collection of devices or circuitry capable of executing the instructions (or sets of instructions), individually or in combination. The computing device may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).

In the computing device, the processor may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.

Some of the operations described in the above method for machine learning in conjunction with rules may be implemented in software, some of the operations may be implemented in hardware, or a combination of both.

The processor may execute instructions or code stored in one of the memory components, which may also store data. Instructions and data may also be transmitted and received over a network via a network interface device, which may employ any known transmission protocol.

The memory component may be integral to the processor, e.g., having RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, the storage component may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The storage component and the processor may be operatively coupled or may communicate with each other, such as through an I/O port, a network connection, etc., so that the processor can read files stored in the storage component.

Further, the computing device may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the computing device may be connected to each other via a bus and/or a network.

The operations described above with respect to the method of machine learning in conjunction with rules may be described as various interconnected or coupled functional blocks or functional diagrams. However, these functional blocks or functional diagrams may be equally integrated into a single logic device or operated on by non-exact boundaries.

In particular, as described above, a computing device for machine learning in conjunction with rules according to an exemplary embodiment of the present invention may include a storage component having stored therein a set of computer-executable instructions that, when executed by the processor, perform the steps of: (A) obtaining a data record, wherein the data record comprises a plurality of attribute information; (B) applying at least one rule relating to a predicted target to the plurality of attribute information to generate a rule-related characteristic of the data record; (C) forming prediction samples based at least on the rule-related features; and (D) generating a machine-learned prediction result for the prediction target based on the prediction samples using a machine-learned prediction model, wherein the machine-learned prediction model is trained to provide respective machine-learned prediction results for the prediction samples. It should be noted that the details of the processing of the method for machine learning according to the exemplary embodiment of the present invention in conjunction with the rules have been described above in conjunction with fig. 3 to 5, and the details of the processing when the computing device performs the steps will not be described herein again.

Furthermore, a computing device for machine learning in conjunction with rules according to another exemplary embodiment of the present invention may include a storage component having stored therein a set of computer-executable instructions that, when executed by the processor, perform the steps of: (A) acquiring a historical data record, wherein the historical data record comprises a plurality of attribute information and a mark serving as a prediction target actual value; (B) applying at least one rule relating to a predicted target to the plurality of attribute information to generate a rule-related feature of a historical data record; (C) forming a training sample based on at least the rule-related features and labels; and (D) training a machine learning prediction model based on the training samples, wherein the machine learning model is to provide machine learning prediction results for the prediction objective for the new data records.

While exemplary embodiments of the invention have been described above, it should be understood that the above description is illustrative only and not exhaustive, and that the invention is not limited to the exemplary embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. Therefore, the protection scope of the present invention should be subject to the scope of the claims.

Claims

1. A method of machine learning in conjunction with rules, comprising:

(A) obtaining a data record, wherein the data record comprises a plurality of attribute information;

(B) applying at least one rule relating to a predicted target to the plurality of attribute information to generate a rule-related characteristic of the data record;

(C) forming prediction samples based at least on the rule-related features; and

(D) machine learning prediction results are generated for the prediction targets based on the prediction samples using a machine learning prediction model trained to provide respective machine learning prediction results for the prediction samples.

2. The method of claim 1, wherein the rule-related features comprise rule prediction features and/or rule description features, wherein in step (B) rule prediction features are generated based on rule prediction results obtained by the data record in accordance with the at least one rule and/or rule description features are generated based on whether conditions of the data record for each rule of the at least one rule are established.

3. The method of claim 2, wherein, in step (B), the rule description feature is generated by multiplying a logical value indicating whether or not a condition of the data record for each rule among the at least one rule is satisfied by a weight of the rule corresponding to the rule description feature, respectively;

or, in the step (B), a logical value indicating whether or not a condition of the data record for each rule among the at least one rule holds is taken as a rule description feature, and weights of each rule among the at least one rule are taken as initial values of rule description feature coefficients of the machine learning prediction model, respectively,

wherein the weights represent the certainty of the respective rule.

4. The method of claim 2, wherein, in step (B), the rule description feature is generated by multiplying a logical value indicating whether or not a condition of the data record for each rule among the at least one rule is satisfied by a conclusion value of the rule corresponding to the rule description feature, respectively;

or, in the step (B), a logical value indicating whether or not a condition of the data record for each of the at least one rule is satisfied is taken as a rule description feature, and conclusion values of each of the at least one rule are taken as initial values of rule description feature coefficients of the machine learning prediction model, respectively,

wherein the conclusion value represents a result caused when a condition of the corresponding rule is established.

5. The method of claim 2, wherein in step (B), the rule description feature is generated by multiplying a logical value indicating whether or not the condition of the data record for each rule among the at least one rule is satisfied by a product of a weight of the rule corresponding to the rule description feature and a conclusion value, respectively;

or, in the step (B), a logical value indicating whether or not the condition of the data record for each of the at least one rule is satisfied is taken as a rule description feature, and products of weights and conclusion values of each of the at least one rule are taken as initial values of rule description feature coefficients of the machine learning prediction model, respectively,

wherein the weight represents a certainty of the corresponding rule, and the conclusion value represents a result caused when a condition of the corresponding rule is satisfied.

6. A method of machine learning in conjunction with rules, comprising:

(A) acquiring a historical data record, wherein the historical data record comprises a plurality of attribute information and a mark serving as a prediction target actual value;

(B) applying at least one rule relating to a predicted target to the plurality of attribute information to generate a rule-related feature of a historical data record;

(C) forming a training sample based on at least the rule-related features and labels; and

(D) training a machine learning prediction model based on the training samples, wherein the machine learning model is used for providing a machine learning prediction result about a prediction target for the new data record.

7. A system for machine learning in conjunction with rules, comprising:

data record obtaining means for obtaining a data record, wherein the data record includes a plurality of attribute information;

rule-related feature generation means for applying at least one rule regarding the predicted target to the plurality of attribute information to generate a rule-related feature of the data record;

prediction sample generation means for forming prediction samples based at least on said rule-dependent features; and

machine learning prediction means for generating a machine learning prediction result on a prediction target based on prediction samples using a machine learning prediction model trained to provide respective machine learning prediction results for the prediction samples.

8. A system for machine learning in conjunction with rules, comprising:

history data record obtaining means for obtaining a history data record including a plurality of attribute information and a flag as a prediction target actual value;

rule-related feature generation means for applying at least one rule concerning the prediction target to the plurality of attribute information to generate a rule-related feature of the history data record;

training sample generating means for forming a training sample based on at least the rule-dependent features and the labels; and

and training a machine learning prediction model based on the training samples, wherein the machine learning prediction model is used for providing a machine learning prediction result about the prediction target aiming at the new data record.

9. A computing device for machine learning in conjunction with rules, comprising a storage component having stored therein a set of computer-executable instructions that, when executed by a processor, perform the steps of:

(C) forming prediction samples based at least on the rule-related features; and

10. A computing device for machine learning in conjunction with rules, comprising a storage component having stored therein a set of computer-executable instructions that, when executed by a processor, perform the steps of:

(D) training a machine learning prediction model based on the training samples, wherein the machine learning model is used for providing machine learning prediction results about the prediction targets for the new data records.