CN113435212A - Text inference method and device based on rule embedding - Google Patents
Text inference method and device based on rule embedding Download PDFInfo
- Publication number
- CN113435212A CN113435212A CN202110984877.9A CN202110984877A CN113435212A CN 113435212 A CN113435212 A CN 113435212A CN 202110984877 A CN202110984877 A CN 202110984877A CN 113435212 A CN113435212 A CN 113435212A
- Authority
- CN
- China
- Prior art keywords
- network
- text
- input text
- rule
- semantic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
- G06N5/041—Abduction
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Machine Translation (AREA)
Abstract
A text inference method based on rule embedding comprises the steps of carrying out neural retrieval and inference on different components of a logic rule based on a pre-trained semantic logic network, and supporting user requirement change or task migration; and combining a parallel structure of a semantic logic network and a neural classification network, adopting a probability distribution distance function Jensen-Shannon divergence, and constraining the consistency of an inference result through network fine tuning training. The semantic logic network provided by the invention encodes the user rules into semantic vectors, can better retain the semantic information of the text while detecting the logic rules, and supports language flexibility and text diversity. The invention also provides a method for integrating the user rules into the neural classification network to improve the text inference performance, namely, the parallel prediction structure inferred by the neural classification network and the semantic logic network is combined, consistency joint loss is adopted, the semantic logic network and the neural classification network can benefit each other, and the detection result of the rules is used as the evidence of text inference.
Description
Technical Field
The invention discloses a text inference method and a text inference device based on rule embedding, and belongs to the technical field of natural language processing.
Background
Public opinion subscription is an important application scene in the new media era, and means that a media mechanism regularly pushes texts such as internet public opinions or news concerned by users according to the requirements of subscribing users, wherein the user requirements are usually expressed in the form of keyword logic rules and describe the text contents preferred by the users. The text inference task based on the user requirement refers to judging whether one text meets the user requirement, and the task has important application value in the scene.
The existing technologies for processing the inference task are mainly divided into two categories, one is to infer based on a keyword boolean search result and find out a text matching the logic expression by comparing the text with a keyword logic expression defined by a user, but the keyword boolean search mode has limitations, and the flexibility of natural language enables the text expression form with the same semantic meaning to have great freedom degree and influence the matching result. The other method is a classification method based on deep learning, which is characterized in that text type inference is carried out based on pre-training word vectors and a neural network, and supervised learning is carried out on a large-scale labeled data set, so that the neural network can understand and infer whether texts meet user requirements from a semantic level, for example, text expression vectors obtained based on a convolutional neural network are recorded in Chinese patent document CN 113076488A: a method and a system for recommending information based on user data are provided, which perform feature modeling on a specific sentence in a text carrying user information through a preset keyword, but have the defects that the problem of diversity of topics related to user requirements is difficult to process, and the change of the user requirements is difficult to adapt.
Disclosure of Invention
Aiming at the problems in the prior art, the invention discloses a text inference method based on rule embedding.
The invention also discloses a device for realizing the text inference method, so as to realize the inference processing of the text.
Summary of the invention:
a text inference method based on rule embedding comprises two parts: firstly, based on a pre-trained semantic logic network, different components of a logic rule are subjected to neural retrieval and inference, and user requirement change or task migration is supported; and secondly, combining a parallel structure of a semantic logic network and a neural classification network, adopting a probability distribution distance function Jensen-Shannon divergence, and constraining the consistency of the inferred result through network fine tuning training. And finally, fusion inference is carried out based on the prediction results of the semantic logic network and the neural classification network, and meanwhile, the activation result of the semantic logic network is used as the evidence of the text inference result.
The invention provides a semantic logic network, which approximates a logic inference process in a neural manner, wherein the process comprises the detection of different granularity components in a logic rule by a text and the combination of detection results, and the components comprise items, conjunctions and disjunctions. And respectively verifying the inclusion relationship of the text to the components by introducing three independent loss functions. Aiming at the challenges brought by the dynamically changing user requirements, the semantic logic network is trained by using a pre-training-fine tuning mechanism. The semantic logic network is composed of three modules and is respectively used for semantic detection of items in user rules, conjunction rules and disjunction rules, and text inference is carried out by combining detection results. A text is obtained from a Chinese general corpus such as Chinese Wikipedia, and a general keyword set corpus is obtained from a Chinese synonym forest such as Chinese WordNet. And pre-training each module by using the universal corpus to enhance the robustness of the network to keyword detection and finely adjust the previous user data, thereby improving the adaptability to user requirement change.
In addition, the invention provides a parallel structure combining the optional neural classification network and the semantic logic network, and the network is finely adjusted by a joint training mode to improve the inference performance. In order to combine the neural classification network and the semantic logic network, a Jensen-Shannon loss function is used as a regularization term, and the consistency of prediction results on two sides of a parallel structure is constrained through network fine tuning stage training.
Technical term interpretation:
1. user requirements: also called user rules, in the present invention, the subscribing user describes his preferences for text content, given in the form of logical rules in the form of a set of keywords, i.e. words or phrases in the academic field. Dynamically changing user requirements refer to: when a user puts forward a new focus, the logic expression is changed to express by adding or deleting key words.
2. Text inference: for a given user requirement, it is inferred whether the input text meets the requirement.
3. Semantic logic network: refers to a neural network used for semantic detection and inference of input text.
4. Parallel network: the parallel network in the invention comprises a semantic logic network and a neural classification network which are arranged in parallel.
5. And (3) consistency constraint: in a loss function, a probability distribution distance function Jensen-Shannon divergence, namely JS distance for short is introduced to serve as a regularization term, and the inference results on two sides of a parallel network are constrained to be optimized towards the consistency direction of probability distribution; the JS distance is a variant of Kullback-Leibler (KL) divergence, and the asymmetric problem of the KL divergence is solved.
The detailed technical scheme of the invention is as follows:
a method for rule-based embedding based text inference, the method comprising:
1) converting a keyword logic expression for describing user requirements into an equivalent disjunctive normal form, wherein the user requirements are a propositional formula P, and the disjunctive normal form of P is as follows:
in the formula (1), the first and second groups,indicating the number of conjunction rules that are to be applied,r iis the ith user rule; in the propositional formula P, conjunctions are taken from the setAn item is a set of keywordsKeywords and synonyms thereof related to description subject or semantics are included; formulation of propositions according to the theorem of existence of normal formPMust be capable ofIs converted into a disjunctive normal form equivalent to the above,is a conjunction rule composed of a set of keywords, i.e.WhereinRepresentation conjunctive rulesThe number of the middle items and all the conjunction rule sets forming the user requirements are expressed asI.e. a user rule set, whereinRepresenting the number of conjunction rules; in this step, english of the Disjunctive Normal Form is abbreviated as DNF, the Disjunctive Normal Form has flexibility of processing user requirement change, and the change of the user requirement can be efficiently adapted through an addition and deletion conjunction rule; the specific process of the conversion in the step is the same as the conversion mode of the traditional logic expression, and does not belong to the content to be protected by the invention;
2) determining whether an input text satisfies a user rule:
given text collectionsAnd user rule set(ii) a Inputting textIs expressed as(ii) a Inferring an output text-level probability ofRepresenting input textA probability of satisfying a user rule; rule level probabilities ofIs a length ofOf which the ith dimension value represents the predicted input textxSatisfying user rulesProbability of, according toValue of (2) judging textxWhich user rules are satisfied.
The invention utilizes semantic logic network to input textxAnd (3) understanding, and deducing whether the user rule corresponding to the user requirement is met:
using semantic logic network to input text in turnxItem detection, conjunction rule detection and disjunction normal form detection are carried out, whether an input text meets the user rule or not is finally judged, and whether the input text x meets the items in the user rule or not is judged in the stepRule of conjunctionAnd the semantics of the extraction rule are shown in the right side of the attached drawing 1, and three modules from bottom to top respectively perform item detection, conjunction rule detection and extraction rule detection.
Preferably, according to the present invention, the method for text inference based on rule embedding further includes a neural classification network disposed in parallel with the semantic logic network, the neural classification network is configured to: performing category prediction on an input text to obtain the probability that the input text meets the requirements of a user, namely a prediction result;
respectively deducing the input text through a neural classification network and a semantic logic network to respectively obtain the prediction results of the input text and the semantic logic network; and finally, constraining the consistency of the prediction results of the Jensen-Shannon divergence, namely JS distance for short.
According to the invention, the input texts are sequentially pairedxThe specific method for item detection, conjunction rule detection and disjunction normal form detection comprises the following steps:
2-1) item detection
Item detection for determining input textWhether or not to include disjunctive normal form items(ii) a related semantic;
The output is recorded as the detection resultRepresenting input textContaining itemThe probability of (d);
will input textConverting into corresponding pre-training word vector structureForming a matrix: the pre-training word vector refers to a Chinese word vector obtained by training a Chinese Wikipedia corpus and a word2vec algorithm, and is obtained by inputting a textThe matrix composed of pre-training word vectors corresponding to all the words in the Chinese language is recorded asWhereinWhich represents the real number field, is,uis to input textThe length of the truncation of (a) is,dis the length of the pre-training word vector,is a wordCorresponding length isdThe vector of (a);
will itemConversion to vector form: item(s)Vector ofThe average of the pre-training word vectors corresponding to all keywords in the corresponding keyword set, i.e.WhereinIs a key word in the set that is,is thatCorresponding to the pre-training word vector.
Computing itemAnd inputting textThe mutual information between every vocabulary in the Chinese character, vectorAnd inputting textPre-training word embedding matrixThe interaction vector is obtained through matrix multiplication and is recorded as:
For input textObtaining text semantic vector after semantic coding through coding network ENCIn the present invention, different convolutional neural networks can be adopted, and a TEXTCNN structure is preferably adopted as the coding network ENC, and the sizes of the three convolutional kernels used are 2 in each cased,3×d,4×dWhereindIs the pre-training word directionA dimension of magnitude, the number of each convolution kernel being 64;
semantic vector of textAnd interaction vectorSplicing, and reducing dimension through a multilayer perceptron network MLP to obtain vectorsI.e. as input textTo itemContains the relationship:
throughValue of function activation as detection of input textContaining itemProbability, i.e. inference result, representing the input textTo itemCorresponding key wordSatisfaction degree of set semantics:
predicting input text with semantic logic networkxContaining itemProbability of said vectorThe rule is also used as the input of the next-stage conjunction rule module;to representThe function is activated in such a way that,is a network parameter;
evaluating inference results using cross-entropy loss functionWith true result, i.e. true tag of itemDifference between distributions to find loss:
Wherein the content of the first and second substances,the true label of the item is obtained by the character string matching detection and synonym expansion of the text and the keyword;representing training set sample expectations;Mis the number of keyword sets; training process by minimizing lossesDetecting all parameters in the network by using the updated items;indicating use ofThe norm regularizes the parameters of the term detection network to avoid overfitting; the cross entropy loss function is a cross entropy cross-entropy loss function;
2-2) conjunctive rule detection
Conjunctive rule detection for validating input textxWhether to satisfy conjunctive rulesThe semantics of (2);
conjunctive rule embedding networkThe invention verifies that the method adoptsOrThe different structures have the capability of approximate logic conjunction operation and conjunction ruleComprising a sequence of itemsThe expression vectors corresponding to the items detected form a sequenceSplicing all vectors in the sequence as input, passing throughObtaining a representation vector of a conjunctive ruleThe output vector contains the rules for extracting the input text pairContains the relationship:
throughThe function activation yields the detection probability of the conjunctive rule, shown in equation (7), whereTo representThe function is activated in such a way that,is a parameter of the network that is,is that the input text contains conjunction rulesProbability of (2), i.e. inference result:
measuring prediction results by using cross entropy loss functionWith true result, i.e. true tag of ruleDifference of (2) and lossWhereinThe label is a regular real label and is obtained by the conjunction operation of Boolean values of related item labels;representing training set sample expectations; training process by minimizing lossesTo update all parameters in the UNet and conjunction rule detection modules,indicating use ofThe norm regularizes all parameters in UNet and conjunctive rule detection modules to avoid overfitting:
2-3) disjunctive normal form detection
Disjunctive normal form detection for validating input textxWhether the text meets the complete user rule set is equivalent to whether the text meets any conjunction rule in the user rule set;
the input is as follows: the conjunctive rule in step 2-2) represents a vectorAnd other associated conjunction rules represent vectors;
the output is: predicting a probability that the input text satisfies the user rule set;
by usingmaxFunction to implement disjunctive networks: using the maximum probability in the inference results in the step 2-2) as a text inference result to represent the inference of the input textxProbability of satisfying user's demand, whereinIs the probability that the predicted input text satisfies the user rule set,the expression is taken as a function of the maximum probability,and the inference result output by the conjunction rule detection module is represented as follows:
calculating the loss by using cross entropy loss functionAs shown in formula (10), whereinyIs a real label of the input text, whether the text meets the requirements of the user is marked by an expert,representing training set sample expectations, the training process by minimizing lossesTo update all the parameters of the semantic logical network,indicating use ofThe norm regularizes the parameters of the semantic logic network to avoid overfitting:
according to a preferred embodiment of the present invention, the processing method of the neural classification network includes:
constructing semantic vectors of input text by a text encoding module, the text encoding network used therein being
ENC2Preferably CNN, RNN or BERT based coding modules; by passingAfter the text coding module obtains the semantic expression vector of the input text, the category prediction is carried out based on the semantic expression vector, as shown in formula (11),representing the probability that the neural classification network predicts that the input text meets the user's requirements, hereIs the output text level tag,To representThe function is activated in such a way that,is the network parameter:
measuring prediction result of neural classification network by using cross entropy loss functionWith true result, i.e. true tag of the input textThe difference between them, shown in formula (12), is lostBy minimizing lossesTo update the neural classification netAll parameters of the network, whereinyIs a real label of the input text, whether the text meets the requirements of the user is marked by an expert,represents the expectation of the sample of the training set,indicating use ofThe norm regularizes all parameters of the neural classification network to avoid overfitting:
3) and deducing the input text through a neural classification network and a semantic logic network respectively to obtain the prediction results of the input text and the semantic logic network, and finally constraining the consistency of the prediction results of the input text and the semantic logic network by using Jensen-Shannon divergence, namely JS distance for short.
Measuring the similarity between the prediction result distribution of the neural classification network and the semantic logic network by adopting JS distance, wherein the greater the similarity between the two is, the smaller the JS distance value is, and the probability distribution output by the neural classification network is recorded asThe probability distribution of the semantic logic network output isThen JS distance between themThe calculation formula of (2) is as follows:
the above-mentionedExpressing the Kullback-leibler (KL) divergence, the calculation is shown in equation (14), the JS distance is a variant of the KL divergence, solving the asymmetric problem of the KL divergence:
taking the JS distance as a regular term in the joint loss, and realizing the joint lossIs calculated as in equation (15), wherein hyper-parameterFor the purpose of weighing the different loss terms,the value range is (0, 1), and the constraint condition is satisfied
in parallel structure training process, by minimizing joint lossTo update all parameters of the neural classification network and the semantic logic network.
An apparatus for implementing the text inference method is characterized by comprising: a semantic logic network module;
the semantic logic network module is used for: determining whether an input text satisfies a user rule; the semantic logic network module comprises: the device comprises an item detection module, a conjunction rule detection module and a disjunction normal form detection module which are sequentially arranged along the direction of data flow.
According to the preferable embodiment of the present invention, the apparatus for implementing the text inference method further includes a neural classification network module disposed in parallel with the semantic logic network module;
the neural classification network is to: performing category prediction on an input text to obtain the probability that the input text meets the requirements of a user, namely a prediction result;
and respectively deducing the input text through a neural classification network and a semantic logic network to respectively obtain the prediction results of the input text and the semantic logic network, and finally, constraining the consistency of the prediction results of the input text and the semantic logic network by using Jensen-Shannon divergence.
The technical advantages of the invention are as follows:
(1) the semantic logic network provided by the invention encodes the user rules into semantic vectors, can better retain the semantic information of the text while detecting the logic rules, and supports language flexibility and text diversity.
(2) The invention also provides a method for integrating the user rules into the neural classification network to improve the text inference performance, namely, the parallel prediction structure inferred by the neural classification network and the semantic logic network is combined, consistency joint loss is adopted, the semantic logic network and the neural classification network can benefit each other, and the detection result of the rules is used as the evidence of text inference.
The semantic logic inference based on pre-training provided by the invention can better meet the user requirements of dynamic change. Aiming at the challenge brought to the supervised learning method, the invention utilizes massive universal linguistic data such as Chinese Wikipedia and the like and linguistic knowledge in open fields such as synonyms, near-synonym sets and the like extracted based on Chinese WordNet to pre-train a semantic logic network and finely tune on specific user data, thereby enhancing the robustness of keyword detection and being beneficial to efficiently processing the user requirement of dynamic change.
Drawings
FIG. 1 is a schematic diagram of an apparatus for implementing a rule embedding based text inference method of the present invention;
FIG. 2 is an example of a user requirement decision tree in the embodiment of the present invention.
Detailed Description
The invention is described in detail below with reference to the following examples and the accompanying drawings of the specification, but is not limited thereto.
Examples 1,
A method of rule-embedding based text inference, the method comprising:
1) converting a keyword logic expression for describing user requirements into an equivalent disjunctive normal form, wherein the user requirements are a propositional formula P, and the disjunctive normal form of P is as follows:
in the formula (1), the first and second groups,indicating the number of conjunction rules that are to be applied,r iis the ith user rule; in the propositional formula P, conjunctions are taken from the setAn item is a set of keywordsKeywords and synonyms thereof related to description subject or semantics are included; formulation of propositions according to the theorem of existence of normal formPMust be converted into an equivalent disjunctive normal form,is a gate ofConjunctive rules formed by sets of key words, i.e.WhereinRepresentation conjunctive rulesThe number of the middle items and all the conjunction rule sets forming the user requirements are expressed asI.e. a user rule set, whereinRepresenting the number of conjunction rules; in this step, english of the Disjunctive Normal Form is abbreviated as DNF, the Disjunctive Normal Form has flexibility of processing user requirement change, and the change of the user requirement can be efficiently adapted through an addition and deletion conjunction rule;
2) determining whether an input text satisfies a user rule:
given text collectionsAnd user rule set(ii) a Inputting textIs expressed as(ii) a Inferring an output text-level probability ofRepresenting input textA probability of satisfying a user rule; rule level probabilities ofIs a length ofOf which the ith dimension value represents the predicted input textxSatisfying user rulesProbability of, according toValue of (2) judging textxWhich user rules are satisfied.
The invention utilizes semantic logic network to input textxAnd (3) understanding, and deducing whether the user rule corresponding to the user requirement is met:
using semantic logic network to input text in turnxItem detection, conjunction rule detection and disjunction normal form detection are carried out, and whether an input text meets the user rule or not is finally judged.
The sequential pair of input textsxThe specific method for item detection, conjunction rule detection and disjunction normal form detection comprises the following steps:
2-1) item detection
Item detection for determining input textWhether or not to include disjunctive normal form items(ii) a related semantic;
The output is recorded as the detection resultRepresenting input textContaining itemThe probability of (d);
will input textConverting into a matrix formed by corresponding pre-training word vectors: the pre-training word vector refers to a Chinese word vector obtained by training a Chinese Wikipedia corpus and a word2vec algorithm, and is obtained by inputting a textThe matrix composed of pre-training word vectors corresponding to all the words in the Chinese language is recorded asWhereinWhich represents the real number field, is,uis to input textThe length of the truncation of (a) is,dis the length of the pre-training word vector,is a wordCorresponding length isdThe vector of (a);
will itemConversion to vector form: item(s)Vector ofThe average of the pre-training word vectors corresponding to all keywords in the corresponding keyword set, i.e.WhereinIs a key word in the set that is,is thatCorresponding to the pre-training word vector.
Computing itemAnd inputting textThe mutual information between every vocabulary in the Chinese character, vectorAnd inputting textPre-training word embedding matrixThe interaction vector is obtained through matrix multiplication and is recorded as:
For input textObtaining text semantic vector after semantic coding through coding network ENCIn the present invention, different convolutional neural networks can be adopted, and a TEXTCNN structure is preferably adopted as the coding network ENC, and the sizes of the three convolutional kernels used are 2 in each cased,3×d,4×dWhereindIs the dimension of the pre-training word vector, and the number of each convolution kernel is 64;
semantic vector of textAnd interaction vectorSplicing, and reducing dimension through a multilayer perceptron network MLP to obtain vectorsI.e. as input textTo itemContains the relationship:
throughValue of function activationAs detected input textContaining itemProbability, i.e. inference result, representing the input textTo itemThe satisfaction degree of the corresponding keyword set semantics:
predicting input text with semantic logic networkContaining itemProbability of said vectorThe rule is also used as the input of the next-stage conjunction rule module;to representThe function is activated in such a way that,is a network parameter;
estimating a push using a cross entropy loss functionBroken resultWith true result, i.e. true tag of itemDifference between distributions to find loss:
Wherein the content of the first and second substances,the true label of the item is obtained by the character string matching detection and synonym expansion of the text and the keyword;representing training set sample expectations;Mis the number of keyword sets; training process by minimizing lossesDetecting all parameters in the network by using the updated items;indicating use ofThe norm regularizes the parameters of the term detection network to avoid overfitting; the cross entropy loss function is a cross entropy cross-entropy loss function;
2-2) conjunctive rule detection
Conjunctive rule detection for validating input textxWhether to satisfy conjunctive rulesThe semantics of (2);
conjunctive rule embedding networkThe invention verifies that the method adoptsOrThe different structures have the capability of approximate logic conjunction operation and conjunction ruleComprising a sequence of itemsThe expression vectors corresponding to the items detected form a sequenceSplicing all vectors in the sequence as input, passing throughObtaining a representation vector of a conjunctive ruleThe output vector contains the rules for extracting the input text pairContains the relationship:
throughThe function activation yields the detection probability of the conjunctive rule, shown in equation (7), whereTo representThe function is activated in such a way that,is a parameter of the network that is,is that the input text contains conjunction rulesProbability of (2), i.e. inference result:
measuring prediction results by using cross entropy loss functionAnd true resultsDifference of (2) and lossWhereinThe label is a regular real label and is obtained by the conjunction operation of Boolean values of related item labels;representing training set sample expectations; training process by minimizing lossesTo update all parameters in the UNet and conjunction rule detection modules,indicating use ofThe norm regularizes all parameters in UNet and conjunctive rule detection modules to avoid overfitting:
2-3) disjunctive normal form detection
Disjunctive normal form detection for validating input textWhether the text meets the complete user rule set is equivalent to whether the text meets any conjunction rule in the user rule set;
the input is as follows: the conjunctive rule in step 2-2) represents a vectorAnd other associated conjunction rules represent vectors;
the output is: predicting a probability that the input text satisfies the user rule set;
by usingmaxFunction to implement disjunctive networks: using the maximum probability in the inference results in the step 2-2) as a text inference result to represent the inference of the input textProbability of satisfying user's demand, whereinIs the probability that the predicted input text satisfies the user rule set,the expression is taken as a function of the maximum probability,and the inference result output by the conjunction rule detection module is represented as follows:
calculating the loss by using cross entropy loss functionAs shown in formula (10), whereinIs a real label of the input text, whether the text meets the requirements of the user is marked by an expert,representing a training setSample expectation, training process by minimizing lossesTo update all the parameters of the semantic logical network,indicating use ofThe norm regularizes the parameters of the semantic logic network to avoid overfitting:
examples 2,
The method of embodiment 1 for rule-based embedded text inference further comprising a neural classification network disposed in parallel with the semantic logic network, the neural classification network configured to: performing category prediction on an input text to obtain the probability that the input text meets the requirements of a user, namely a prediction result;
respectively deducing the input text through a neural classification network and a semantic logic network to respectively obtain the prediction results of the input text and the semantic logic network; and finally, constraining the consistency of the prediction results of the Jensen-Shannon divergence, namely JS distance for short.
The processing method of the neural classification network comprises the following steps:
constructing semantic vectors of input text by a text encoding module, the text encoding network used therein being
ENC2Preferably CNN, RNN or BERT based coding modules; after the semantic expression vector of the input text is obtained through a text coding module, category prediction is carried out based on the semantic expression vector, as shown in formula (11),representing the probability that the neural classification network predicts that the input text meets the user's requirements, hereIs the output text level tag,To representThe function is activated in such a way that,is the network parameter:
measuring prediction result of neural classification network by using cross entropy loss functionAnd true resultsThe difference between them, shown in formula (12), is lostBy minimizing lossesTo update all parameters of the neural classification network, whereinIs a real label of the input text, whether the text meets the requirements of the user is marked by an expert,representing training set sample periodsThe physician can watch the disease,indicating use ofThe norm regularizes all parameters of the neural classification network to avoid overfitting:
3) and deducing the input text through a neural classification network and a semantic logic network respectively to obtain the prediction results of the input text and the semantic logic network, and finally constraining the consistency of the prediction results of the input text and the semantic logic network by using Jensen-Shannon divergence, namely JS distance for short.
Measuring the similarity between the prediction result distribution of the neural classification network and the semantic logic network by adopting JS distance, wherein the greater the similarity between the two is, the smaller the JS distance value is, and the probability distribution output by the neural classification network is recorded asThe probability distribution of the semantic logic network output isThen JS distance between themThe calculation formula of (2) is as follows:
the above-mentionedExpressing the Kullback-leibler (KL) divergence, the calculation is shown in equation (14), the JS distance is a variant of the KL divergence, solving the asymmetric problem of the KL divergence:
taking the JS distance as a regular term in the joint loss, and realizing the joint lossIs calculated as in equation (15), wherein hyper-parameterFor the purpose of weighing the different loss terms,the value range is (0, 1), and the constraint condition is satisfied
in parallel structure training process, by minimizing joint lossTo update all parameters of the neural classification network and the semantic logic network.
Examples 3,
An apparatus for implementing the text inference method according to embodiment 1, comprising: a semantic logic network module;
the semantic logic network module is used for: determining whether an input text satisfies a user rule; the semantic logic network module comprises: the device comprises an item detection module, a conjunction rule detection module and a disjunction normal form detection module which are sequentially arranged along the direction of data flow.
Examples 4,
On the basis of embodiment 3, the apparatus for implementing the text inference method further includes a neural classification network module disposed in parallel with the semantic logic network module;
the neural classification network is to: performing category prediction on an input text to obtain the probability that the input text meets the requirements of a user, namely a prediction result;
and respectively deducing the input text through a neural classification network and a semantic logic network to respectively obtain the prediction results of the input text and the semantic logic network, and finally, constraining the consistency of the prediction results of the input text and the semantic logic network by using Jensen-Shannon divergence.
Application examples,
Practical application methods of the method and apparatus as described in examples 1-4 are as follows.
Pre-training a semantic logic network based on the general corpus:
obtaining a universal corpus, comprising: training texts are obtained from a Chinese general corpus such as Chinese Wikipedia, and keyword sets are obtained from a Chinese synonym forest such as Chinese version WordNet.
The automatic labeling of item level and conjunctive rule level is carried out on the general corpus, which comprises the following steps: for item annotation, at least inclusion for satisfactionText of one of the keywordsx,Item labelIs 1:
if not, thenIs 0; for the conjunction rule marking, randomly combining the keyword set to generate a conjunction rule;
if the text isxSimultaneously satisfy the conjunction ruleAll the items in (1), the conjunction rule tag of the textIs 1;
According to the step 2) described in embodiment 1, with reference to fig. 1, the general corpus pre-training item detection module and the conjunction rule detection module are used, which specifically include:
Inputting universal corpus textxInputting a set of generic keywords ,A training item detection module;
and converting the input token into a corresponding pre-training word vector at an embedding layer of the model. For the keyword set to be detected, the vector of the keyword set is the average value of the pre-training word vectors corresponding to all words in the set, because the synonym is embedded in the semantic space and has adjacent position relation, the average vector can present common semantic features; on the other hand, for the place name set, the top prefecture words on the geographic region are used as proxy words, because the prefecture place names all contain the fact that the event occurs in the region;
the item detection module outputting probability corresponding to formula (4)The real label corresponds to the above labelFinding the loss of formula (5)And reversely propagating to update the parameters of the item detection network, and iteratively training until the accuracy of the verification set is improved to be less than a threshold value.
According to step 2-2) of embodiment 1), a conjunction rule detection module is added to the untrained UNet, and two modules are trained by using a universal corpus, which specifically includes:
input devicexAndobtaining an output vector by UNettVectors corresponding to all terms contained in the conjunction ruletSplicing input CNet to obtain detection probability of conjunction ruleCorresponding to the formula (7), all the conjunction rules are detected in sequence;
by usingAnd a labelLoss calculationDiscarding the predicted part of the term detection module, corresponding to equation (8), based on the lossAnd the propagation is carried out reversely to update the parameters of the UNet and the CNet.
The network is finely adjusted based on user data, which specifically comprises the following steps:
acquiring user requirements in the form of logic rules:
a subscribing user pays attention to emergencies in a specific area, including social security events, natural disasters and the like, the requirement description of the user is shown in fig. 2, white nodes represent logic or operations, and black nodes represent logic and operations:
for target text, starting with the leaf node and passing the boolean predicate value to the root node, an example of the keyword set in fig. 2 is shown in table 1:
TABLE 1 keyword set example of subscribed users
The logic rules corresponding to the requirement decision tree of the subscribing user are written according to step 1) of embodiment 1, and propositional formulas and disjunctive normal forms equivalent to the decision tree are shown in table 2.
TABLE 2 logic rules for subscribing users
Fine tuning the semantic logic network using user samples and rules:
the sample set comprises historical interest texts of the user, namely texts judged and pushed by experts, the texts form a positive sample set and correspond to tagsConstructing a negative sample set of texts which are not interested in the user history, namely the texts judged by experts not to be pushed, and corresponding labels。
And preprocessing the text of the sample set, including Chinese word segmentation, text truncation or filling, and converting the word after word segmentation into a token input form. And converts all keywords contained in the logic rules into an input form for token.
The method for finely adjusting the semantic logic network by using the sample set specifically comprises the following steps:
according to step 2-1) of embodiment 1), the network is tested using the user sample set and the item fine-tuning items of the logic rulesThe analog pre-training process trains UNet using user data, iteratively until the validation set accuracy rises below a threshold.
According to step 2-2), the network UNet and CNet are fine-tuned using the user sample set and the collection rules, analogizing to the pre-training process, and iteratively training until the validation set accuracy is raised by less than the threshold.
According to step 2-3), adding an extraction rule detection module, training the DNet by using a user sample set, and specifically comprising:
input deviceAndobtaining all conjunction rules through UNet and CNetIs predicted with probability ofThe maximum probability in MAX network is used as the probability that the inferred text meets the user's requirement, as shown in formula (9). For example, if there are three predicted probabilities output by CNet, which are 0.98, 0.73, and 0.43, respectively, the probability of MAX network output is 0.98, which indicates that the text satisfies the user requirement if any one of the rules is satisfied.
Alternatively, if the DNet is implemented using MLP, then all will beConcatenating and inputting DNet to obtain a representation vectorRAnd based on vectorsRObtaining prediction probabilities。
By usingAnd a labelLoss calculationL R Discarding terms, predicting parts in conjunction rule detection module, and using losses, as in equation (10)L R And the parameters of the whole semantic logic network are updated by back propagation.
Training a parallel network based on user data, which is specifically as follows:
according to embodiments 2 and 4, training a parallel network structure using a user sample set specifically includes:
independently training the neural classification network: the neural classification network is fully trained using the user sample set, and the loss function is as in equation (12).
Jointly training a semantic logic network and a neural classification network: and combining the trained semantic logic network and the trained neural classification network for fine tuning, and introducing a JS item in the combined loss to constrain the consistency of the prediction results at two sides of the parallel structure. The joint loss is as in equation (15). At this time, the two ends of the parallel network simultaneously predict the categories of the texts, and the output of one side of the neural classification network isThe calculation is shown in formula (11), and the output of the semantic logic network isThe calculation is as shown in formula (9), and the invention preferably adoptsAs a final output result. For example, in this application, the input text "Bin is affected by 'Liqima' strong typhoon, and the prediction result corresponding to …" isJudging to meet the user requirements; the prediction result of the input text "in the latest updated scenario, wing views with feather back to Qingzhou …" isAnd judging that the user requirements are not met.
Claims (7)
1. A method for rule-based embedding based text inference, the method comprising:
1) converting a keyword logic expression for describing user requirements into an equivalent disjunctive normal form, wherein the user requirements are a propositional formula P, and the disjunctive normal form of P is as follows:
in the formula (1), the first and second groups,indicating the number of conjunction rules that are to be applied,r iis the ith user rule; in the propositional formula P, conjunctions are taken from the setAn item is a set of keywordsKeywords and synonyms thereof related to description subject or semantics are included; formulation of propositions according to the theorem of existence of normal formPMust be converted into an equivalent disjunctive normal form,is a conjunction rule composed of a set of keywords, i.e.WhereinRepresentation conjunctive rulesThe number of the middle items and all the conjunction rule sets forming the user requirements are expressed asI.e. a user rule set, whereinRepresenting the number of conjunction rules; in this step, english of the Disjunctive Normal Form is abbreviated as DNF, the Disjunctive Normal Form has flexibility of processing user requirement change, and the change of the user requirement can be efficiently adapted through an addition and deletion conjunction rule;
2) determining whether an input text satisfies a user rule:
using semantic logic network to input text in turnxItem detection, conjunction rule detection and disjunction normal form detection are carried out, and whether an input text meets the user rule or not is finally judged.
2. The method of claim 1, further comprising a neural classification network disposed in parallel with the semantic logic network, the neural classification network configured to: performing category prediction on an input text to obtain the probability that the input text meets the requirements of a user, namely a prediction result;
respectively deducing the input text through a neural classification network and a semantic logic network to respectively obtain the prediction results of the input text and the semantic logic network; and finally, constraining the consistency of the prediction results of the Jensen-Shannon divergence, namely JS distance for short.
3. The method of claim 1, wherein the sequential pairing of input text is performed in sequencexThe specific method for item detection, conjunction rule detection and disjunction normal form detection comprises the following steps:
2-1) item detection
Item detection for determining input textWhether or not to include disjunctive normal form items(ii) a related semantic;
The output is recorded as the detection resultRepresenting input textContaining itemThe probability of (d);
will input textConverting into a matrix formed by corresponding pre-training word vectors: is marked as
WhereinWhich represents the real number field, is,uis to input textThe length of the truncation of (a) is,dis the length of the pre-training word vector,is a wordCorresponding length isdThe vector of (a);
will itemConversion to vector form: item(s)Vector ofThe average of the pre-training word vectors corresponding to all keywords in the corresponding keyword set, i.e.WhereinIs a key word in the set that is,is thatCorresponding to the pre-training word vector;
will vectorAnd inputting the textBook (I)Pre-training word embedding matrixThe interaction vector is obtained through matrix multiplication and is recorded as:
Semantic vector of textAnd interaction vectorSplicing, and reducing dimension through a multilayer perceptron network MLP to obtain vectorsI.e. as input textTo itemContains the relationship:
throughValue of function activation as detection of input textContaining itemProbability, i.e. inference result, representing the input textTo itemThe satisfaction degree of the corresponding keyword set semantics:
predicting input text with semantic logic networkxContaining itemProbability of said vectorAnd also as the nextInputting a stage conjunction rule module;to representThe function is activated in such a way that,is a network parameter;
evaluating inference results using cross-entropy loss functionWith true result, i.e. true tag of itemDifference between distributions to find loss:
Wherein the content of the first and second substances,the true label of the item is obtained by the character string matching detection and synonym expansion of the text and the keyword;representing training set sample expectations;Mis the number of keyword sets; training process by minimizing lossesDetecting all parameters in the network by using the updated items;indicating use ofNormalizing the parameters of the item detection network by the norm;
2-2) conjunctive rule detection
Conjunctive rule detection for validating input textWhether to satisfy conjunctive rulesThe semantics of (2);
conjunctive rule embedding networkRule of conjunctionComprising a sequence of itemsThe expression vectors corresponding to the items detected form a sequenceConcatenating all vectors in the sequenceAs an input, pass throughObtaining a representation vector of a conjunctive rule:
is shown in formula (7), whereinTo representThe function is activated in such a way that,is a parameter of the network that is,is that the input text contains conjunction rulesProbability of (2), i.e. inference result:
measuring prediction results by using cross entropy loss functionWith true result, i.e. true tag of ruleDifference of (2) and lossWhereinThe label is a regular real label and is obtained by the conjunction operation of Boolean values of related item labels;representing training set sample expectations; training process by minimizing lossesTo update all parameters in the UNet and conjunction rule detection modules,indicating use ofRegularizing all parameters in the UNet and conjunction rule detection modules by norm:
2-3) disjunctive normal form detection
Disjunctive normal form detection for validating input textWhether a complete set of user rules is satisfied;
the input is as follows: the conjunctive rule in step 2-2) represents a vectorAnd other associated conjunction rules represent vectors;
the output is: predicting a probability that the input text satisfies the user rule set;
by usingmaxFunction to implement disjunctive networks: using the maximum probability in the inference results in the step 2-2) as a text inference result, whereinIs the probability that the predicted input text satisfies the user rule set,the expression is taken as a function of the maximum probability,and the inference result output by the conjunction rule detection module is represented as follows:
calculating the loss by using cross entropy loss functionAs shown in formula (10), whereinIs a real label of the input text, whether the text meets the requirements of the user is marked by an expert,representing training set sample expectations, the training process by minimizing lossesTo update all the parameters of the semantic logical network,indicating use ofNorm to regularize parameters of the semantic logic network:
4. the method of claim 2, wherein the processing method of the neural classification network comprises:
constructing semantic vectors of input text by a text encoding module, the text encoding network used therein being
ENC2(ii) a After the semantic expression vector of the input text is obtained through a text coding module, category prediction is carried out based on the semantic expression vector, as shown in formula (11),representing neural classification network predicted input text charactersProbability of meeting user demand, hereIs the output text level tag,To representThe function is activated in such a way that,is the network parameter:
measuring prediction result of neural classification network by using cross entropy loss functionWith true result, i.e. true tag of the input textThe difference between them, shown in formula (12), is lostBy minimizing lossesTo update all parameters of the neural classification network, whereinIs a real label of the input text, whether the text meets the requirements of the user is marked by an expert,represents the expectation of the sample of the training set,indicating use ofNorm to regularize all parameters of the neural classification network:
and deducing the input text through a neural classification network and a semantic logic network respectively to obtain the prediction results of the input text and the semantic logic network, and finally constraining the consistency of the prediction results of the input text and the semantic logic network by using Jensen-Shannon divergence, namely JS distance for short.
5. The method of claim 4, wherein the text inference based on rule embedding is performed,
measuring the similarity between the prediction result distribution of the neural classification network and the semantic logic network by adopting JS distance, and recording the probability distribution output by the neural classification network asThe probability distribution of the semantic logic network output isThen JS distance between themThe calculation formula of (2) is as follows:
taking the JS distance as a regular term in the joint loss, and realizing the joint lossIs calculated as in equation (15), wherein hyper-parameterFor the purpose of weighing the different loss terms,the value range is (0, 1), and the constraint condition is satisfied
6. An apparatus for implementing the text inference method of any one of claims 1-5, comprising: a semantic logic network module;
the semantic logic network module is used for: determining whether an input text satisfies a user rule; the semantic logic network module comprises: the device comprises an item detection module, a conjunction rule detection module and a disjunction normal form detection module which are sequentially arranged along the direction of data flow.
7. The apparatus of claim 6, further comprising a neural classification network module disposed in parallel with the semantic logic network module;
the neural classification network is to: performing category prediction on an input text to obtain the probability that the input text meets the requirements of a user, namely a prediction result;
and respectively deducing the input text through a neural classification network and a semantic logic network to respectively obtain the prediction results of the input text and the semantic logic network, and finally, constraining the consistency of the prediction results of the input text and the semantic logic network by using Jensen-Shannon divergence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110984877.9A CN113435212B (en) | 2021-08-26 | 2021-08-26 | Text inference method and device based on rule embedding |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110984877.9A CN113435212B (en) | 2021-08-26 | 2021-08-26 | Text inference method and device based on rule embedding |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113435212A true CN113435212A (en) | 2021-09-24 |
CN113435212B CN113435212B (en) | 2021-11-16 |
Family
ID=77797888
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110984877.9A Active CN113435212B (en) | 2021-08-26 | 2021-08-26 | Text inference method and device based on rule embedding |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113435212B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114003726A (en) * | 2021-12-31 | 2022-02-01 | 山东大学 | Subspace embedding-based academic thesis difference analysis method |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102708096A (en) * | 2012-05-29 | 2012-10-03 | 代松 | Network intelligence public sentiment monitoring system based on semantics and work method thereof |
CN103605729A (en) * | 2013-11-19 | 2014-02-26 | 段炼 | POI (point of interest) Chinese text categorizing method based on local random word density model |
CN103699663A (en) * | 2013-12-27 | 2014-04-02 | 中国科学院自动化研究所 | Hot event mining method based on large-scale knowledge base |
CN109840322A (en) * | 2018-11-08 | 2019-06-04 | 中山大学 | It is a kind of based on intensified learning cloze test type reading understand analysis model and method |
CN110069623A (en) * | 2017-12-06 | 2019-07-30 | 腾讯科技(深圳)有限公司 | Summary texts generation method, device, storage medium and computer equipment |
CN110321432A (en) * | 2019-06-24 | 2019-10-11 | 拓尔思信息技术股份有限公司 | Textual event information extracting method, electronic device and non-volatile memory medium |
US10621499B1 (en) * | 2015-08-03 | 2020-04-14 | Marca Research & Development International, Llc | Systems and methods for semantic understanding of digital information |
CN113268565A (en) * | 2021-04-27 | 2021-08-17 | 山东大学 | Method and device for quickly generating word vector based on concept text |
-
2021
- 2021-08-26 CN CN202110984877.9A patent/CN113435212B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102708096A (en) * | 2012-05-29 | 2012-10-03 | 代松 | Network intelligence public sentiment monitoring system based on semantics and work method thereof |
CN103605729A (en) * | 2013-11-19 | 2014-02-26 | 段炼 | POI (point of interest) Chinese text categorizing method based on local random word density model |
CN103699663A (en) * | 2013-12-27 | 2014-04-02 | 中国科学院自动化研究所 | Hot event mining method based on large-scale knowledge base |
US10621499B1 (en) * | 2015-08-03 | 2020-04-14 | Marca Research & Development International, Llc | Systems and methods for semantic understanding of digital information |
CN110069623A (en) * | 2017-12-06 | 2019-07-30 | 腾讯科技(深圳)有限公司 | Summary texts generation method, device, storage medium and computer equipment |
CN109840322A (en) * | 2018-11-08 | 2019-06-04 | 中山大学 | It is a kind of based on intensified learning cloze test type reading understand analysis model and method |
CN110321432A (en) * | 2019-06-24 | 2019-10-11 | 拓尔思信息技术股份有限公司 | Textual event information extracting method, electronic device and non-volatile memory medium |
CN113268565A (en) * | 2021-04-27 | 2021-08-17 | 山东大学 | Method and device for quickly generating word vector based on concept text |
Non-Patent Citations (3)
Title |
---|
LI ZHAOHUI;BAI XIAOCHEN;HU RUI;LI XIAOLI: ""Measuring Phase-Amplitude Coupling Based on the Jensen-Shannon Divergence and Correlation Matrix"", 《IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING : A PUBLICATION OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY》 * |
刘云: ""面向社会化媒体用户评论行为的属性推断"", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
陈良军; 洪彧; SUJITH MANGALATHU; 勾红叶; 蒲黔辉: ""基于Jensen-Shannon散度的自适应采样方法的高效可靠性分析"", 《JOURNAL OF CENTRAL SOUTH UNIVERSITY》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114003726A (en) * | 2021-12-31 | 2022-02-01 | 山东大学 | Subspace embedding-based academic thesis difference analysis method |
Also Published As
Publication number | Publication date |
---|---|
CN113435212B (en) | 2021-11-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11182562B2 (en) | Deep embedding for natural language content based on semantic dependencies | |
Marivate et al. | Improving short text classification through global augmentation methods | |
US11281976B2 (en) | Generative adversarial network based modeling of text for natural language processing | |
US11481416B2 (en) | Question Answering using trained generative adversarial network based modeling of text | |
US10657259B2 (en) | Protecting cognitive systems from gradient based attacks through the use of deceiving gradients | |
Mahmood et al. | Deep sentiments in roman urdu text using recurrent convolutional neural network model | |
Ezaldeen et al. | A hybrid E-learning recommendation integrating adaptive profiling and sentiment analysis | |
Suissa et al. | Text analysis using deep neural networks in digital humanities and information science | |
US11663518B2 (en) | Cognitive system virtual corpus training and utilization | |
Rauf et al. | Using bert for checking the polarity of movie reviews | |
CN110781666B (en) | Natural language processing text modeling based on generative antagonism network | |
Essa et al. | Fake news detection based on a hybrid BERT and LightGBM models | |
Jiang et al. | A hierarchical model with recurrent convolutional neural networks for sequential sentence classification | |
Suresh Kumar et al. | Local search five‐element cycle optimized reLU‐BiLSTM for multilingual aspect‐based text classification | |
Patil et al. | Hate speech detection using deep learning and text analysis | |
CN113435212B (en) | Text inference method and device based on rule embedding | |
CN116956228A (en) | Text mining method for technical transaction platform | |
Neill et al. | Meta-embedding as auxiliary task regularization | |
Nazarizadeh et al. | Using Group Deep Learning and Data Augmentation in Persian Sentiment Analysis | |
Kandi | Language Modelling for Handling Out-of-Vocabulary Words in Natural Language Processing | |
Lou | Deep learning-based sentiment analysis of movie reviews | |
Jawale et al. | Sentiment analysis and vector embedding: A comparative study | |
Ait Benali et al. | Arabic named entity recognition in social media based on BiLSTM-CRF using an attention mechanism | |
Baruah et al. | Detection of Hate Speech in Assamese Text | |
Han | Emotion Analysis of Literary Works Based on Attentional Mechanisms and the Fusion of Two-Channel Features |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |