CN113064967A - Complaint reporting credibility analysis method based on deep migration network - Google Patents
Complaint reporting credibility analysis method based on deep migration network Download PDFInfo
- Publication number
- CN113064967A CN113064967A CN202110310932.6A CN202110310932A CN113064967A CN 113064967 A CN113064967 A CN 113064967A CN 202110310932 A CN202110310932 A CN 202110310932A CN 113064967 A CN113064967 A CN 113064967A
- Authority
- CN
- China
- Prior art keywords
- domain
- text
- feature
- source
- target domain
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013508 migration Methods 0.000 title claims abstract description 31
- 230000005012 migration Effects 0.000 title claims abstract description 31
- 238000004458 analytical method Methods 0.000 title abstract description 29
- 238000000034 method Methods 0.000 claims abstract description 39
- 239000013598 vector Substances 0.000 claims abstract description 28
- 230000007613 environmental effect Effects 0.000 claims abstract description 17
- 230000004927 fusion Effects 0.000 claims abstract description 13
- 230000002457 bidirectional effect Effects 0.000 claims abstract description 4
- 230000007246 mechanism Effects 0.000 claims abstract description 4
- 230000006978 adaptation Effects 0.000 claims description 16
- 230000006870 function Effects 0.000 claims description 16
- 239000011159 matrix material Substances 0.000 claims description 9
- 239000000284 extract Substances 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 7
- 238000012549 training Methods 0.000 claims description 7
- 239000004576 sand Substances 0.000 claims description 5
- 230000011218 segmentation Effects 0.000 claims description 5
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 5
- 238000007781 pre-processing Methods 0.000 claims description 4
- 238000013523 data management Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 2
- 238000001914 filtration Methods 0.000 claims description 2
- 238000010606 normalization Methods 0.000 claims description 2
- 238000012360 testing method Methods 0.000 claims description 2
- 238000007500 overflow downdraw method Methods 0.000 claims 1
- 230000001172 regenerating effect Effects 0.000 claims 1
- 230000006403 short-term memory Effects 0.000 claims 1
- 238000000605 extraction Methods 0.000 abstract description 3
- 238000013473 artificial intelligence Methods 0.000 abstract 1
- 238000013135 deep learning Methods 0.000 description 3
- 238000009826 distribution Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- 238000002679 ablation Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000003912 environmental pollution Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000011426 transformation method Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
- 238000003911 water pollution Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Databases & Information Systems (AREA)
- Molecular Biology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a complaint reporting credibility analysis method based on a deep migration network, and belongs to the technical field of artificial intelligence. The method specifically comprises the following steps: firstly, expressing a microblog text, a complaint report text and a microblog text mixed complaint report text as matrixes through a Word2vec text vectorization model respectively; secondly, inputting the vectorized text into three groups of bidirectional LSTM networks for feature extraction, and respectively extracting a source domain private feature vector, a source domain target domain shared feature vector and a target domain private feature vector; then, respectively carrying out feature fusion on the shared features and the private features of the source domain and the target domain through a self-attention mechanism to obtain final source domain features and target domain features; and finally, inputting the source domain characteristics and the target domain characteristics into a multilayer perceptron and outputting the final classification result. The method solves the problems of difficult manual analysis and lack of effective data marking in the analysis of the reliability of the complaint reporting, and provides a thought for the analysis of the reliability of the complaint reporting in the environmental category.
Description
Technical Field
The invention relates to an environment type complaint reporting credibility analysis method, in particular to an environment type complaint reporting credibility analysis method based on a deep migration network.
Background
The environmental complaint report means that citizens complain about environmental pollution phenomena or events which affect the production and life of citizens or violate national relevant regulations. Complainters often describe complaints in the form of text. Among the many complaint reporting events are non-authentic complaint reporting events that tamper with, exaggerate, or engrave the fact. These complaints will directly increase the difficulty of the authorities in handling water pollution events, reducing administrative efficiency. In order to improve the administrative management efficiency and avoid the waste of management resources, the administrative management department urgently needs to analyze the credibility of the event of reporting the netizen complaints.
At present, in the field of water environment complaint reporting, the related work of reliability analysis of complaint reporting events is rare, and the related work of the complaint reporting reliability analysis based on complaint reporting texts is relatively less. But similar efforts exist in other areas for performing credibility analysis based on textual content. After the occurrence of deep learning, various methods based on deep learning techniques have been proposed, which have very good effects in the reliability analysis work based on text contents, such as false news detection, rumor detection, and the like. However, most of the machine learning and deep learning methods need to be based on a large amount of data containing credibility labels. The text data of the complaint reports in the environmental complaint report credibility analysis often lack credibility labels, and the credibility analysis of the complaint reports by manpower is very difficult.
In order to solve the problems, microblog texts are used for assisting in analysis of the reliability of complaint reports. The microblog text and the complaint report text are both expressed on the emotion and attitude of an author, and the microblog rumor and the false complaint report are usually falsified and distorted on the fact, so that the microblog text and the complaint report text have certain semantic similarity; by combining a semi-supervised migration learning method, the knowledge in the microblog text is migrated to the complaint reporting text credibility analysis process by using the technologies of feature migration, field adaptation and the like by using a migration learning theory, and the performance index of the complaint reporting credibility analysis is improved.
In conclusion, the environmental complaint reporting reliability analysis based on the deep migration network is an innovative research problem and has important research significance and application value.
Disclosure of Invention
The invention aims to solve the problems that manual analysis is difficult, an effective credibility label is lacked, and an effective credibility analysis model cannot be trained in the credibility analysis of the environmental complaint reports. A deep migration network is proposed to solve the above problems. The method takes the microblog text as a source field and the complaint report text as a target field, designs effective feature extraction, feature migration and field adaptation methods, and utilizes the microblog text to assist in analysis of the complaint report credibility.
The method for analyzing the credibility of the environmental complaint report based on the deep migration network comprises the following steps:
s1, collecting data;
s2, preprocessing microblog text data (source field) and complaint report text data (target field);
s3, inputting the preprocessed text into a Word2vec model for Word vector training to generate Word vectors;
s4, encoding the microblog text word vectors and the complaint report text word vectors, and respectively designing a source domain feature encoder, a field sharing feature encoder and a target domain feature encoder to extract source domain private features, field sharing features and target domain private features;
and S5 domain feature fusion: performing feature fusion on the source domain private feature and the domain sharing feature by using a self-attention method to obtain a source domain feature; and performing feature fusion on the target domain private features and the domain sharing features by using a self-attention method to obtain target domain features.
S6, calculating MK-MMD distance of the source domain feature and the target domain feature, and performing feature transformation on the source domain feature and the target domain feature to complete the domain adaptation.
And S7, obtaining a classification result by the source domain characteristics and the target domain characteristics through a multi-layer perceptron network.
Drawings
Fig. 1 is a schematic diagram illustrating details of a complaint reporting credibility analysis method based on a deep migration network.
Fig. 2 is a schematic diagram of a bi-directional LSTM encoding process.
FIG. 3 is a flow chart of a method for analyzing the credibility of complaint reporting in a deep migration-based network.
Detailed Description
The invention provides an environmental complaint reporting credibility analysis method based on a deep migration network, which mainly comprises the following steps of:
the detailed description of the present invention is provided with reference to the accompanying figure 1:
step S1, acquiring a microblog source text extracted from social media; extracting complaint report text data from a large water environment big data management platform, and constructing a data set:representing a source field (microblog text), where NSThe number of samples is represented by the number of samples,a sample of the microblog's text is represented,a microblog text credibility label; complaint report text dataset:represents a target field (complaint report text) in whichRepresents the number of training samples and the number of training samples,in order to test the number of samples,in order to report a sample of the text for a complaint,text confidence tags for complaints.
Step S2, microblog text data (source)Fields) and complaint report text data (target fields): the preprocessing comprises data cleaning and word segmentation, does not comprise word-stop operation, and word segmentation of the textExpressed as a set of word sequences:
wherein o belongs to { s, t }, s represents a source domain, and t represents a target domain;as sentencesThe words contained; t isiIs the sentence length.
Step S3, text vectorization:
inputting the preprocessed text after Word segmentation into a Word2vec model for Word training, and then vectorizing the textText sequence ofInputting the data into a Word2vec model once to obtainRepresents:wherein n is the number of texts, d is the dimension of the word vector, and the dimension of the generated word vector is 300 dimensions.
Step S4, the text after the vector quantization is encoded. The encoding refers to a process of sending the vectorized text into a neural network for feature extraction, and three encoders are designed: source domain private feature encoderEncoder for extracting private characteristics of source domain (microblog text) and target domainEncoder for extracting target field (complaint report text) and field sharing characteristic (E)c) And extracting the sharing characteristics of the complaint report text and the microblog text, wherein the three encoders have the same network structure and are all based on a bidirectional LSTM network. As shown in fig. 2, the specific encoding process:
step S401, for the text after vectorizationOutput of LSTM model connecting forward and backward directionsAndoutput as Bi-LSTM at time t:
wherein,is TiInputting at the t-th time step in the time steps; c. CtCell states at time t of LSTM, htIs the output of the t time step, calculated by equation (2):
wherein, Wf,Wi,Wo,WcAs a weight matrix, bf,bi,bo,bcIs a bias vector. σ is sigmoid function,. alpha.is element-wise multiplication。ftTo forget the door, itTo the input gate otIs an output gate. In the whole process, firstly, the door f is forgottentSome information of the previous state is selectively filtered out. Then input into the gate itDeciding which data is updated; LSTM cell state ctBy forgetting historical information and adding new informationAnd covering the old state with the new state value to complete the state updating. Finally, an output gate otDetermining the output information, output h at the current time steptThrough otAnd filtering the information to obtain the information.
Step S402, taking the output of the last time stepAndoutputting, as an encoded output of the ith sentence:
wherein,is composed ofThe forward hidden layer output of the text sequence,as textOutputting the sequence to a hidden layer;text output for LSTM network codingI.e. the output of the encoder.
Step S403, three groups of encoders respectively extract the domain sharing characteristics ec∈Rn1×m=[e1,e2,...,en1](ii) a The source domain private characteristic and the target domain private characteristic encoder output are respectively Where m is the dimension of the Bi-LSTM output vector, n2=Ns,are the number of texts.
Step S5, domain feature fusion: and the domain sharing feature encoder extracts the sharing features of the source domain and the target domain. The domain private feature encoder can extract domain private features, and the defect that the shared feature extractor cannot extract specific domain information is overcome. In order to obtain the shared information of the source domain and the target domain and simultaneously retain more complete information of the specific domain, the information of the specific domain needs to be combinedAnd sharing the domain information ecFusion is performed.
Step S501, matrix WVKey matrix WKQuery matrix WQDot-multiply with the input vector and score the result:
wherein b belongs to { c, p }, c represents the domain sharing, and p is the domain private; < is the zoom dot product; d is a constant (usually, a word vector dimension) set to prevent an excessively large value after dot product, and usually, a dimension of an input word vector is taken;
Step S503, multiplying the attention weight point by the value vector to obtain the final source domain feature eo(target domain characteristics):
where o e { s, t } s represents the source domain, t represents the target domain, eoIs a feature after fusion.
Step S6, domain adaptation: source domain feature e after domain feature fusionsAnd target domain characteristics etAre different, so as to esAnd etAnd performing domain adaptation. The purpose of the domain adaptation is to realize domain adaptation, so that the data distribution of the two domains converges. And performing domain self-adaptation in a feature alignment mode, namely enabling the data distribution of the source domain and the data distribution of the target domain to converge by a feature transformation method. And calculating the distance between the source domain data and the target domain data by an MK-MMD method, adding the distance into a loss function, and updating the network weight together with the tag loss to realize the domain adaptation. The MK-MMD distance formula of the source domain and the target domain is as follows:
wherein a mapping exists in a regenerated Hilbert space HPhi (-) mapping original variables into RKHS, MMD2(es,et) Is the distance between the source domain feature and the target domain feature.
And step S7, reliability classification, namely, sending the source domain characteristics and the target domain characteristics into an MLP network to output a classification result, and updating network parameters according to classification loss and field adaptation loss.
Step S701, source domain feature e after fusion of domain featuressAnd target domain characteristics etFeeding into an MLP:
is a prediction vector, i.e., a prediction result; MLP represents a multi-layer perceptron;andrepresenting the probability of prediction; sigmoid is an activation function.
And step S702, calculating a loss function according to the classification result to update network parameters, wherein the deep migration network learns the data difference between the source field and the target field to realize field adaptation on one hand and learns the label loss on the other hand. The final objective function (the overall network loss function) is lost by the MK-MMD statistics source domain tags representing the domain differences, so the overall migration network loss function is (9):
L=Lcls+λLda (9)
wherein, λ is an adjusting parameter; l isdaFor field adaptation losses, i.e. MMD2(es,et);LclsFor label loss, including source domain label lossAnd loss of target domain labelCross-entry criterion was used in this classification task to reduce the loss function:
wherein y ∈ {0,1} is a confidence label; theta is a parameter to be optimized.
The reliability analysis accuracy index of the model is a standardized AUC: in a water environment complaint reporting credibility classification task, attention should be paid to avoid the situation that pollution time is not processed timely due to the occurrence of credible complaint reporting misjudgment, namely, the True Positive Rate (TPR) is improved on the basis of low False Positive Rate (FPR) (low credibility texts are positive samples, and high credibility texts are negative samples). The task should focus on considering the area of the sub-region on the ROC curve (AUC) when FPR is less than or equal to maxfprFPR≤maxfpr). When maxfpr is particularly small, the AUC variation range is very small, and model performance cannot be well compared, so standardized AUC (SPAUC) is usedFPR≤maxfpr):
Wherein s ismaxIn the fpr experiment, 0.05 is taken as fpr,so SPAUCFPR≤fprVarying between 0.5 and 1. The experimental result shows that the reliability analysis of the environmental complaint reporting can be well carried out based on the LSTM coding network.
The method adopts social media to extract microblog source texts (comprising 133346 texts, wherein the total number of high-reliability texts is 66131, and the total number of low-reliability texts is 67215) and extract complaint report text data (the total number of 200K complaint report text data is 1482 including 889 high-reliability complaints and 593 low-reliability complaints with reliability labels) from a large water environment big data management platform.
As shown in Table 1, the experiments were characterized by CNN, Transfomer, GRU-2, RNN, LSTM _ Attention and LSTM, respectively, extractors. "Attention" means that the private features of the source domain and the target domain are both fused with the shared feature; "Source _ Attention" only fuses the Source domain private feature and the domain sharing feature; "Target _ Attention" means that only Target domain private features and domain sharing features are fused; "No _ Attention" means that No feature fusion is performed, only domain sharing features are used. The deep migration network based on the bidirectional LSTM has the best performance in the task, and the superiority of the deep migration network architecture and the feasibility of using microblog texts to assist in complaint reporting credibility analysis are also proved. Ablation experiments were performed depending on whether feature fusion was performed using the attention mechanism. As shown in the ablation results in table 1, in the case of using the deep migration network, the performance of each feature extractor after feature fusion using the attention mechanism is better than the method using only the domain sharing feature, and the fusion effect of the source domain private feature and the sharing feature is better than that of the target domain private feature and the sharing feature.
TABLE 1 complaint reporting credibility classification experimental results
In conclusion, the method can well utilize knowledge in the microblog text field to assist the analysis of the complaint reporting credibility, and can well complete the complaint reporting credibility analysis task.
Claims (9)
1. A method for analyzing the credibility of environmental complaint reporting based on a deep migration network comprises the following specific steps:
s1, collecting data;
s2 preprocessing the source domain and the target domain;
s3, inputting the preprocessed text into a Word2vec model for Word vector training to generate Word vectors;
s4, encoding the microblog text and the complaint report text after text vectorization, and extracting high-level features;
s5 fusing the domain private feature and the domain sharing feature by using a self-attention method;
s6, calculating MK-MMD distance of the source domain feature and the target domain feature, performing feature transformation on the source domain feature and the target domain feature, and performing domain adaptation;
s7, obtaining a classification result by the source domain characteristic and the target domain characteristic through a multi-layer perceptron network;
the domain source is microblog text data, and the target domain is complaint report text data.
2. The method for analyzing the credibility of the environmental complaint report based on the deep migration network as claimed in claim 1, wherein:
in step S1, extracting microblog source text from social media; extracting complaint report text data from a large water environment big data management platform, and constructing a data set:represents a source domain, where NSThe number of samples is represented by the number of samples,a sample of the microblog's text is represented,a microblog text credibility label; complaint report text dataset:representing a target domain, whereinRepresents the number of training samples and the number of training samples,in order to test the number of samples,in order to report a sample of the text for a complaint,text confidence tags for complaints.
3. The method for analyzing the credibility of the environmental complaint report based on the deep migration network as claimed in claim 1, wherein the method comprises:
in step S2, preprocessing includes data cleansing and word segmentation, does not include a stop word operation, and the segmented text is represented as a set of words:wherein o e { s, t } s represents a source domain and t represents a target domain;as sentencesThe words contained; t isiIs the sentence length.
4. The method for analyzing the credibility of the environmental complaint report based on the deep migration network as claimed in claim 1, wherein the method comprises:
5. The method for analyzing the credibility of the environmental complaint report based on the deep migration network as claimed in claim 3, wherein: text vectorization is realized by using a Word2vec model, and the dimension of the generated Word vector d is 300 dimensions.
6. The method for analyzing the credibility of the environmental complaint report based on the deep migration network as claimed in claim 1, wherein the method comprises:
the encoder in step S4 is a Bi-directional long short term memory network (Bi-LSTM), and extracts the private features and the shared features of the source domain and the target domain using three sets of encoders with identical network structures;
the specific coding mode is as follows:
step S401 adopts bidirectional LSTM as core module of encoder, for text sequenceOutput of LSTM model connecting forward and backward directionsAndoutput as Bi-LSTM at time t:
where o e s, t, s denotes the source domain, t denotes the target domain,is TiInputting at the t-th time step in the time steps; c. CtCell states at time t of LSTM, htIs the output of the t time step, calculated by equation (2):
wherein, Wf,Wi,Wo,WcAs a weight matrix, bf,bi,bo,bcIs a bias vector. σ is a sigmoid function, which is an element-wise multiplication. f. oftTo forget the door, itTo the input gate otIs an output gate. In the whole process, firstly, the door f is forgottentSome information of the previous state is selectively filtered out. Then input into the gate itDeciding which data is updated; LSTM cell state ctBy forgetting historical information and adding new informationAnd covering the old state with the new state value to complete the state updating. Finally, an output gate otDetermining the output information, output h at the current time steptThrough otFiltering the information to obtain;
wherein,is composed ofThe forward hidden layer output of the text sequence,as textOutputting the sequence to a hidden layer;text output for LSTM network codingI.e. the output of the encoder;
step S403, three groups of encoders respectively extract the domain sharing characteristics ec∈Rn1×m=[e1,e2,...,en1](ii) a The source domain private characteristic and the target domain private characteristic encoder output are respectively Where m is the dimension of the Bi-LSTM output vectorn2=Ns,Are the number of texts.
7. The method for analyzing the credibility of the environmental complaint report based on the deep migration network as claimed in claim 1, wherein the method comprises:
the domain feature fusion described in step S5 is to fuse the source domain private feature, the target domain private feature, and the domain sharing feature respectively through an attention-driven mechanism, and the specific feature fusion method is as follows:
step S501, matrix WVKey matrix WKQuery matrix WQDot-multiply with the input vector and score the result:
wherein b belongs to { c, p }, c represents the domain sharing, and p is the domain private; < is the zoom dot product; d is a constant (usually, a word vector dimension) set to prevent an excessively large value after dot product, and usually, a dimension of an input word vector is taken;
step S502 performs Softmax normalization on the scores to obtain attention weights:
step S503, multiplying the vector of the value by the weight point to obtain the final source domain feature (target domain feature):
wherein, o is belonged to { s, t }, s represents a source domain, t represents a target domain, eoIs a post-fusion feature.
8. The method for analyzing the credibility of the environmental complaint report based on the deep migration network as claimed in claim 1, wherein the method comprises:
the domain adaptation described in step S6 refers to using maximum mean difference (MK-MMD (Ma)ximum mean disparity)) calculates the source domain feature esAnd target domain characteristics etAnd adding the distance into a loss function, and performing feature transformation along with an iteration process to finish field adaptation:
wherein there is a mapping phi (-) in a Regenerative Kernel Hilbert Space (RKHS) that maps the original variables to the RKHS.
9. The method for analyzing the credibility of the environmental complaint report based on the deep migration network as claimed in claim 1, wherein the method comprises:
in step S7, the source domain feature e is obtained by fusing the domain featuressAnd target domain characteristics etAnd respectively sending the classification result to an MLP network to output a classification result:
is a prediction vector, i.e., a prediction result; MLP represents a multi-layer perceptron;andrepresenting the probability of prediction; sigmoid is an activation function.
Meanwhile, a loss function is calculated according to the classification result to update network parameters, on one hand, the deep migration network learns the data difference between the source field and the target field to realize field adaptation, and on the other hand, the deep migration network learns the label loss. The final objective function (the overall network loss function) is lost by the MK-MMD statistics source domain tags representing the domain differences, so the overall migration network loss function is:
L=Lcls+λLda (9)
wherein, λ is an adjusting parameter; l isdaFor field adaptation losses, i.e. MMD2(es,et);LclsFor label loss, including source domain label lossAnd loss of target domain labelCross-entry criterion was used in this classification task to reduce the loss function:
wherein y ∈ {0,1} is a confidence label; theta is a parameter to be optimized.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110310932.6A CN113064967B (en) | 2021-03-23 | 2021-03-23 | Complaint reporting credibility analysis method based on deep migration network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110310932.6A CN113064967B (en) | 2021-03-23 | 2021-03-23 | Complaint reporting credibility analysis method based on deep migration network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113064967A true CN113064967A (en) | 2021-07-02 |
CN113064967B CN113064967B (en) | 2024-03-22 |
Family
ID=76563241
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110310932.6A Active CN113064967B (en) | 2021-03-23 | 2021-03-23 | Complaint reporting credibility analysis method based on deep migration network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113064967B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113901810A (en) * | 2021-09-24 | 2022-01-07 | 杭州中科睿鉴科技有限公司 | Cross-domain false news detection method based on multi-representation learning |
CN114969321A (en) * | 2022-03-14 | 2022-08-30 | 北京工业大学 | Environment complaint report text classification method based on multi-weight self-training |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111522965A (en) * | 2020-04-22 | 2020-08-11 | 重庆邮电大学 | Question-answering method and system for entity relationship extraction based on transfer learning |
US20200387570A1 (en) * | 2019-06-05 | 2020-12-10 | Fmr Llc | Automated identification and classification of complaint-specific user interactions using a multilayer neural network |
-
2021
- 2021-03-23 CN CN202110310932.6A patent/CN113064967B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200387570A1 (en) * | 2019-06-05 | 2020-12-10 | Fmr Llc | Automated identification and classification of complaint-specific user interactions using a multilayer neural network |
CN111522965A (en) * | 2020-04-22 | 2020-08-11 | 重庆邮电大学 | Question-answering method and system for entity relationship extraction based on transfer learning |
Non-Patent Citations (1)
Title |
---|
刘勘 等: "基于深度迁移网络的 Twitter 谣言检测研究", DATA ANALYSIS AND KNOWLEDGE DISCOVERY, no. 10, pages 47 - 55 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113901810A (en) * | 2021-09-24 | 2022-01-07 | 杭州中科睿鉴科技有限公司 | Cross-domain false news detection method based on multi-representation learning |
CN114969321A (en) * | 2022-03-14 | 2022-08-30 | 北京工业大学 | Environment complaint report text classification method based on multi-weight self-training |
CN114969321B (en) * | 2022-03-14 | 2024-03-22 | 北京工业大学 | Environmental complaint reporting text classification method based on multi-weight self-training |
Also Published As
Publication number | Publication date |
---|---|
CN113064967B (en) | 2024-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chen et al. | Research on text sentiment analysis based on CNNs and SVM | |
CN111079985B (en) | Criminal case criminal period prediction method based on BERT and fused with distinguishable attribute features | |
Wei et al. | A target-guided neural memory model for stance detection in twitter | |
CN110110318B (en) | Text steganography detection method and system based on cyclic neural network | |
CN114021584B (en) | Knowledge representation learning method based on graph convolution network and translation model | |
CN110888980A (en) | Implicit discourse relation identification method based on knowledge-enhanced attention neural network | |
CN111026880B (en) | Joint learning-based judicial knowledge graph construction method | |
CN116245107B (en) | Electric power audit text entity identification method, device, equipment and storage medium | |
CN115631365A (en) | Cross-modal contrast zero sample learning method fusing knowledge graph | |
CN113064967A (en) | Complaint reporting credibility analysis method based on deep migration network | |
CN115510245B (en) | Unstructured data-oriented domain knowledge extraction method | |
CN111859979A (en) | Ironic text collaborative recognition method, ironic text collaborative recognition device, ironic text collaborative recognition equipment and computer readable medium | |
CN116383399A (en) | Event public opinion risk prediction method and system | |
CN116932661A (en) | Event knowledge graph construction method oriented to network security | |
CN112925907A (en) | Microblog comment viewpoint object classification method based on event graph convolutional neural network | |
Yu et al. | Policy text classification algorithm based on BERT | |
CN113255360A (en) | Document rating method and device based on hierarchical self-attention network | |
CN115906816A (en) | Text emotion analysis method of two-channel Attention model based on Bert | |
CN117951092A (en) | Multi-mode information fusion-based electronic archive image multi-stage classification method and device | |
CN116452241B (en) | User loss probability calculation method based on multi-mode fusion neural network | |
CN113157913A (en) | Ethical behavior discrimination method based on social news data set | |
CN115797795B (en) | Remote sensing image question-answer type retrieval system and method based on reinforcement learning | |
CN111563374A (en) | Personnel social relationship extraction method based on judicial official documents | |
CN116795980A (en) | Short text classification method integrating fine-grained element knowledge | |
CN117216617A (en) | Text classification model training method, device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |