CN114722189B - Multi-label unbalanced text classification method in budget execution audit - Google Patents

Multi-label unbalanced text classification method in budget execution audit Download PDF

Info

Publication number
CN114722189B
CN114722189B CN202111534284.9A CN202111534284A CN114722189B CN 114722189 B CN114722189 B CN 114722189B CN 202111534284 A CN202111534284 A CN 202111534284A CN 114722189 B CN114722189 B CN 114722189B
Authority
CN
China
Prior art keywords
sentence
matrix
training
word
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111534284.9A
Other languages
Chinese (zh)
Other versions
CN114722189A (en
Inventor
伍之昂
张璐
方昌健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Weishen Information Technology Co ltd
NANJING AUDIT UNIVERSITY
Original Assignee
Guangdong Weishen Information Technology Co ltd
NANJING AUDIT UNIVERSITY
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Weishen Information Technology Co ltd, NANJING AUDIT UNIVERSITY filed Critical Guangdong Weishen Information Technology Co ltd
Priority to CN202111534284.9A priority Critical patent/CN114722189B/en
Publication of CN114722189A publication Critical patent/CN114722189A/en
Application granted granted Critical
Publication of CN114722189B publication Critical patent/CN114722189B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/381Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using identifiers, e.g. barcodes, RFIDs
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a multi-mark unbalanced text classification method in budget execution audit, which comprises the following steps: constructing a keyword library in the budget execution and audit field, selecting seed words from the keyword library as tag description, then performing word segmentation based on a word segmentation tool and the keyword library, and calculating an embedded matrix corresponding to the tag and the word segmentation; constructing a similarity matrix of the calculated words, phrases and labels (namely label description) of the neural network, solving the weight of the words based on the constructed pooling layer, solving the sentence embedding matrix by combining the word embedding matrix, and outputting the sentence embedding matrix to a classifier to obtain a prediction result; and introducing unbalanced data weight into the loss function, adding tag description into the loss function to strengthen learning of the small categories and the tags, training to obtain a model by taking the minimum loss function as a target, and effectively classifying the payment abstract text data of the unknown tags. The invention effectively solves the problem of multi-label unbalanced classification of the payment voucher abstract text in budget execution audit.

Description

Multi-label unbalanced text classification method in budget execution audit
Technical Field
The invention relates to the field of text classification, in particular to a multi-mark unbalanced text classification method in budget execution audit.
Background
In the execution of audits on financial budgets, it is necessary to categorize the payment summary of the money to identify whether its use is consistent with the budget item, to thereby review whether the expenditure is compliant or not, and even to identify high risk transactions. At present, a large amount of text classification work still depends on manual labeling of auditors, and the explosive growth of audit data in a big data environment is more and more difficult to deal with. Although the research on text classification problems is long, so far, it is still fresh to perform audit completely towards budget, and research and application of payment abstract text classification are carried out pertinently, and general text classification algorithms and tools are obviously difficult to be completely suitable for the field with extremely strong specialty. The text analysis scene in the budget execution audit has the problems of multiple professional words of the text in the audit field, multiple categories of the budget subjects, unbalanced sample size and the like, and meanwhile, the traditional text classification method is difficult to capture the importance degree of different words affecting the classification model by using an unsupervised sentence characterization mechanism based on average word vectors. Aiming at the problems, the invention provides a multi-label unbalanced text classification method in budget execution audit, which integrates sentence characterization learning and training of a multi-label unbalanced classification model in a supervised learning mode, is hopeful to quickly and accurately solve the classification problem of the abstract for payment and improves the efficiency of audit work.
Disclosure of Invention
The invention aims to: the invention provides a multi-label unbalanced text classification method in budget execution audit, which can solve the problem of multi-label unbalanced text classification in budget execution audit.
The technical scheme is as follows:
a multi-label unbalanced text classification method in budget execution audit comprises the following steps:
step one: data preprocessing and word embedding training to obtain input data of a model: giving text data of the payment certificate abstract with a label, wherein the number of samples among different categories is different, and the number of the categories in the data is K; constructing a keyword library for budget execution and audit from a given text, namely proper nouns in the field, and selecting representative seed words from the keyword library as descriptions of labels; word segmentation is carried out on the text by using a word stock and a word segmentation tool, and pre-training of word embedding vectors is completed on the full audit text data to obtain a word matrix E i =[e i1 ,…,e iL ] T Wherein i is the sequence number of the sentence, L is the sequence number of the word in the sentence, L is the length of the sentence, the seed words are mapped to word embedding matrixes, and then the word embedding matrixes of the seed words of each category are averaged to obtain an embedding matrix L= [ L ] of all the tags 1 ,…,l K ] T
Step two: constructing a model, and constructing a classification frame of the multi-label unbalanced text: firstly, constructing a model by using words and labels in sentencesSolving a similarity matrix, and then calculating the similarity of the context information, namely the phrase and the label by using a neural network, wherein 2 groups of parameters W are provided 1 And b 1 Training is required; and then using a newly constructed base pool layer to calculate weight vectors between the phrases and all the category labels, finally using the weight vectors to weight the original words, and obtaining a proper sentence embedding matrix after the training process is completed, namely, the sentence embedding matrix fused with domain knowledge, wherein the formula is as follows:
Figure BDA0003412580650000021
wherein Z is i An embedding matrix for the ith sentence, f 1 To E as i L input, Z i Mapping functions for the output;
then, sentence is classified by using a classifier by taking the sentence embedding matrix as input, wherein 2 groups of parameters need to be trained, namely W 2 And b 2 The method comprises the steps of carrying out a first treatment on the surface of the The formula is as follows:
Figure BDA0003412580650000022
wherein the method comprises the steps of
Figure BDA0003412580650000023
For sentence Z i Predictive corresponding class probability evidence, f 2 To Z i Input, & gt>
Figure BDA0003412580650000024
Mapping functions for the output;
step three: constructing a sentence embedding and unbalanced multi-classification unified objective function, and guiding the neural network to train; using a cross entropy loss function as a basic objective function, introducing weight data to bias the loss function towards a small class, strengthening training of a classifier on the small class, and finally embedding a tag word into the loss function to strengthen learning of a tag, and realizing training of a model with the aim of minimizing a currently constructed unbalanced objective function; after training, effectively classifying the payment abstract text data of the unknown label;
further, in the second step, a model is built, and a classification frame of the multi-label unbalanced text is built: firstly, constructing a model, solving a similarity matrix by using words and tags in sentences, and then calculating the similarity of context information, namely phrases and tags by using a neural network, wherein 2 groups of parameters W are arranged 1 And b 1 Training is required; then using a newly constructed base pool layer to calculate weight vectors between phrases and all category labels, and finally using the weight vectors to weight the original words, and obtaining a proper sentence embedding matrix after the training process is completed, namely, the sentence embedding matrix fused with domain knowledge;
the method specifically comprises the following steps: in the first stage, the similarity matrix is first solved, and the formula is as follows:
Figure BDA0003412580650000025
similarity matrix G i Is L x K, wherein L is represented by 2 Norms.
And calculating the similarity between the phrase containing the context semantics and the label in the sentence, wherein the formula is as follows:
Figure BDA0003412580650000026
wherein j represents the sequence number of the phrase center position word, j-p, j+p is the sequence numbers of the leftmost and rightmost words of the phrase, W 1 And b 1 Performing iterative training in training for two groups of parameters in the neural network;
then calculating a related weight value matrix of the word:
Figure BDA0003412580650000031
wherein c jk For the j-th word and the correspondingSimilarity of kth class labels;
re-alignment of beta j Normalized calculations were performed as follows:
Figure BDA0003412580650000032
wherein exp represents an exponential function with e as the base, beta j′ A similarity value for the j' th word in the sentence;
finally obtaining an embedding matrix of the sentence, wherein the formula is as follows:
Figure BDA0003412580650000033
the above process is expressed as a whole as formula (1);
the second stage builds three-layer full-connection layer neural network classifier to embed the sentence into matrix Z i The input classifier is trained to obtain effective prediction output
Figure BDA0003412580650000034
The overall process is expressed as formula (2);
further, in the third step, an objective function unified by sentence embedding and unbalanced multi-classification is constructed, and the neural network training is guided. The cross entropy loss function is used as a basic objective function, weight data are introduced to bias the loss function towards a small category, training of the classifier on the small category is enhanced, and finally, the tag word is embedded into the matrix and introduced into the loss function to enhance learning of the tag, and training of the model is achieved by minimizing the currently constructed unbalanced objective function as a target 99as target. After training, the payment abstract text data of the unknown label can be effectively classified;
the method specifically comprises the following steps: first, the inverse weights of the classes are calculated as follows:
Figure BDA0003412580650000035
where c (-) is the number of samples in the class, medium (-) represents the median, y k Representing the label vector of the kth class, the number of samples of the kth' class is the median of the number of all classes, y k′ A tag vector representing a kth' class;
and then smoothing the reverse weight to obtain a final weight vector, wherein the formula is as follows:
Figure BDA0003412580650000036
wherein S (·) represents a sigmoid function, r k Reverse weight of kth class, r k′ Reverse weight for the kth' category;
then introducing weight vectors to construct a loss function, wherein the formula is as follows:
Figure BDA0003412580650000037
where N is the total number of sentences in the dataset and CE (·) is the cross entropy loss function;
Figure BDA0003412580650000043
Figure BDA0003412580650000044
meaning that the function f can be broken down into two parts: f (f) 1 And f 2 As a function f 1 As a function f 2 Is input to the computer; y is i For the actual tag matrix of the ith sentence, Σ is the weight vector, Σ T Representing the transpose of the weight vector, y ik The value of the kth tag representing the ith sentence corresponds to an actual tag position of 1, the remaining positions of 0, < ->
Figure BDA0003412580650000041
A predictive probability of a kth tag representing an ith sentence;
to improve the importance of the label in training, a special label loss function is added, and the formula is as follows:
Figure BDA0003412580650000042
where k is the serial number of the corresponding class, α is the penalty coefficient, y k Is a category label matrix;
finally, training the model based on Adam algorithm and with the aim of minimizing equation (11).
The beneficial effects are that: the invention effectively solves the problem of multi-label unbalanced classification of the summary text of the payment certificate in budget execution audit, and remarkably improves recall rate and overall performance on subclasses by introducing label similarity calculation, thereby greatly improving the efficiency of auditing personnel in checking budget execution compliance and identifying high-risk transactions.
Drawings
Fig. 1 is a flowchart of an unbalanced text classification method for an audit field according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a neural network framework in accordance with an embodiment of the present invention.
Fig. 3 is a schematic diagram of a model training process according to an embodiment of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings. Fig. 1 is a diagram of an unbalanced text classification method facing to an audit field according to an embodiment of the present invention. As shown in fig. 1, the present embodiment includes the steps of:
step one: data preprocessing and word embedding training to obtain input data of a model; giving text data of the payment certificate abstract with a label, wherein the number of samples among different categories is different, and the number of the categories in the data is K; constructing a keyword library for budget execution and audit from a given text, namely proper nouns in the field, and selecting representative seed words from the keyword library as descriptions of labels; word segmentation is carried out on the text by using a word stock and a word segmentation tool, and pre-training of word embedding vectors is completed on the full audit text data to obtain a word matrixE i =[e i1 ,…,e iL ] T Wherein i is the sentence sequence number, L is the sequence number of the word in the sentence, the seed words are mapped to word embedding matrixes, and then the word embedding matrixes of the seed words of each category are averaged to obtain an embedding matrix L= [ L ] of all tags 1 ,…,l K ] T
Step two: constructing a model and constructing a classification frame of the multi-mark unbalanced text; firstly, performing model construction, as shown in fig. 2, and solving a similarity matrix by using words and tags in sentences; then using neural network to calculate the similarity of context information, i.e. phrase and label, where there are 2 sets of parameters W 1 And b 1 Training is required; and then using a newly constructed base pool layer to calculate weight vectors between the phrases and all the category labels, finally using the weight vectors to weight the original words, and obtaining a proper sentence embedding matrix after the training process is completed, namely, the sentence embedding matrix fused with domain knowledge, wherein the formula is as follows:
Figure BDA0003412580650000055
wherein Z is i An embedding matrix for the ith sentence, f 1 To E as i L input, Z i Mapping functions for the output;
finally, sentence is classified by using a classifier by taking the sentence embedding matrix as input, wherein 2 groups of parameters need to be trained, namely W 2 And b 2 The formula is as follows:
Figure BDA0003412580650000051
wherein the method comprises the steps of
Figure BDA0003412580650000052
For sentence Z i Predicted corresponding class probability matrix, f 2 To Z i Input, & gt>
Figure BDA0003412580650000053
Mapping functions for the output;
step three: and constructing an objective function unified by sentence embedding and unbalanced multi-classification, and guiding the neural network to train. The cross entropy loss function is used as a basic objective function, weight data are introduced to bias the loss function towards a small class, training of the classifier on the small class is enhanced, and finally, the tag word is embedded into the matrix and introduced into the loss function to enhance learning of the tag, and training of the model is achieved with the aim of minimizing the currently constructed unbalanced objective function. After training, the payment abstract text data of the unknown label can be effectively classified;
in a specific embodiment, a method for classifying multi-label unbalanced text in budget execution audit is described in detail:
firstly, executing audit text data according to the existing budget, utilizing a word segmentation tool LAC (Lexical Analysis of Chinese) to segment sentences, counting corresponding word frequencies in each category, and constructing a keyword library and a seed word in the budget execution and audit field according to word segmentation results and the collected professional field word library:
the keyword library and seed words in the budget execution and audit field are shown in the following table:
Figure BDA0003412580650000054
executing word library in the auditing field and word segmentation results obtained by using LAC by using conventional stop words based on budget, as shown in the following table;
sequence number Sentence Word segmentation result
1 Shenzhen specialist attends to the ball sea following project accommodation fee Shenzhen specialist attends to the ball sea following project accommodation fee
The seed words are characterized by CBOW (Continues Bag of Words) to obtain an embedding matrix corresponding to the tag. Taking travel class as an example, the embedding matrix of the seed words and the embedding matrix of the tags are shown in the following table:
Figure BDA0003412580650000061
and obtaining an average value of the seed word embedding matrix in the travel class to obtain an embedding matrix of the tag, wherein the average value is shown in the following table:
Figure BDA0003412580650000062
and then, representing the word segmentation result by using CBOW to obtain an embedded matrix corresponding to the word, wherein the embedded matrix is shown in the following table:
Figure BDA0003412580650000063
the data is divided into a training set and a testing set according to the score, the training set is input into the model for training, and the training process is shown in figure 3.
After training, inputting the test set into the trained model, and calculating the obtained beta j After being introduced as weight, the weight is calculated to obtain a sentence embedding matrix, as shown in the following table:
Figure BDA0003412580650000064
Figure BDA0003412580650000071
the final prediction result obtained after the sentence is embedded into the matrix input classifier is shown in the following table:
Figure BDA0003412580650000072
the overall prediction results are shown in the following table:
Precision Recall F1-score support
five-risk one-gold 0.965 0.971 0.968 17573
Personnel wages and assistance 0.905 0.907 0.906 11075
Office expenses 0.931 0.905 0.918 3955
Property management fee 0.874 0.873 0.874 1983
Foundation fee 0.896 0.791 0.840 826
Travel fee 0.780 0.751 0.765 719
Special purchasing 0.697 0.685 0.677 691
Official business expense 0.645 0.690 0.667 519
Others 0.500 0.757 0.602 189
Macro Avg 0.799 0.811 0.805 37530
Weigthed Avg 0.922 0.921 0.921 37530
Big Avg 0.911 0.867 0.888 15856
Small Avg 0.743 0.783 0.759 21674
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims (3)

1. A method for classifying multi-label unbalanced text in budget execution audit, which is characterized by comprising the following steps:
step one: data preprocessing and word embedding training to obtain input data of a model: giving text data of the payment certificate abstract with a label, wherein the number of samples among different categories is different, and the number of the categories in the data is K; constructing a keyword library for budget execution and audit from a given text, namely proper nouns in the field, and selecting representative seed words from the keyword library as descriptions of labels; word segmentation is carried out on the text by using a word stock and a word segmentation tool, and pre-training of word embedding vectors is completed on the full audit text data to obtain a word matrix E i =[e i1 ,…,e iL ] T Wherein i is the sequence number of the sentence, L is the sequence number of the word in the sentence, L is the length of the sentence, the seed words are mapped to word embedding matrixes, and then the word embedding matrixes of the seed words of each category are averaged to obtain an embedding matrix L= [ L ] of all the tags 1 ,…,l K ] T
Step two: constructing a model and constructing a classification frame of the multi-mark unbalanced text; firstly, constructing a model, solving a similarity matrix by using words and tags in sentences, and then calculating the similarity of context information, namely phrases and tags by using a neural network, wherein 2 groups of parameters W are arranged 1 And b 1 Training is required; and then using a newly constructed base pool layer to calculate weight vectors between the phrases and all the category labels, finally using the weight vectors to weight the original words, and obtaining a proper sentence embedding matrix after the training process is completed, namely, the sentence embedding matrix fused with domain knowledge, wherein the formula is as follows:
Figure FDA0004199035720000011
wherein Z is i An embedding matrix for the ith sentence, f 1 To E as i L input, Z i Mapping functions for the output;
then take sentence embedded matrix asInput classifies sentences using a classifier where there are 2 sets of parameters to train, i.e. W 2 And b 2 The formula is as follows:
Figure FDA0004199035720000012
wherein the method comprises the steps of
Figure FDA0004199035720000013
For sentence Z i Predicted corresponding class probability matrix, f 2 To Z i Input, & gt>
Figure FDA0004199035720000014
Mapping functions for the output;
step three: constructing a sentence embedding and unbalanced multi-classification unified objective function, and guiding the neural network to train; using a cross entropy loss function as a basic objective function, introducing weight data to bias the loss function towards a small class, strengthening training of a classifier on the small class, and finally embedding a tag word into the loss function to strengthen learning of a tag, and realizing training of a model with the aim of minimizing a currently constructed unbalanced objective function; after training, the payment abstract text data of the unknown label is effectively classified.
2. The method for classifying multi-label unbalanced texts in budget execution audit according to claim 1, wherein in the second step, a model is built, and a classifying framework of the multi-label unbalanced texts is built: firstly, constructing a model, solving a similarity matrix by using words and tags in sentences, and then calculating the similarity of context information, namely phrases and tags by using a neural network, wherein 2 groups of parameters W are arranged 1 And b 1 Training is required; then using the newly constructed base pool layer to calculate the weight vector between the phrase and all the class labels, finally using the weight vector to weight the original word, obtaining the proper sentence embedding matrix after finishing the training process,namely, sentence embedding matrix fused with domain knowledge;
the method specifically comprises the following steps: in the first stage, the similarity matrix is first solved, and the formula is as follows:
Figure FDA0004199035720000021
similarity matrix G i Is L x K, wherein L is represented by 2 A norm;
and calculating the similarity between the phrase containing the context semantics and the label in the sentence, wherein the formula is as follows:
c i =ReLU(G i,j-p:j+p W 1 T +b 1 ),1≤j≤L (4)
wherein j represents the sequence number of the phrase center position word, j-p, j+p is the sequence numbers of the leftmost and rightmost words of the phrase, W 1 And b 1 Performing iterative training in training for two groups of parameters in the neural network;
then calculating a related weight value matrix of the word:
Figure FDA0004199035720000022
wherein c jk Similarity of the phrase corresponding to the jth word and the corresponding kth category label;
re-alignment of beta j Normalized calculations were performed as follows:
Figure FDA0004199035720000023
wherein exp represents an exponential function with e as the base, beta j′ A similarity value for the j' th word in the sentence;
finally obtaining an embedding matrix of the sentence, wherein the formula is as follows:
Figure FDA0004199035720000024
the above process is expressed as a whole as formula (1);
the second stage builds three-layer full-connection layer neural network classifier to embed the sentence into matrix Z i The input classifier is trained to obtain effective prediction output
Figure FDA0004199035720000025
The overall process is expressed as formula (2).
3. The method for classifying multi-label unbalanced texts in budget execution audit according to claim 1, wherein in the third step, an objective function unified with unbalanced multi-classification is constructed for sentence embedding, and neural network training is guided; using a cross entropy loss function as a basic objective function, introducing weight data to bias the loss function towards a small class, strengthening training of a classifier on the small class, and finally embedding a tag word into the loss function to strengthen learning of a tag, and realizing training of a model with the aim of minimizing a currently constructed unbalanced objective function; after training, effectively classifying the payment abstract text data of the unknown label;
the method specifically comprises the following steps: first, the inverse weights of the classes are calculated as follows:
Figure FDA0004199035720000031
where c (-) is the number of samples in the class, medium (-) represents the median, y k Representing the label vector of the kth class, the number of samples of the kth' class is the median of the number of all classes, y k′ A tag vector representing a kth' class;
and then smoothing the reverse weight to obtain a final weight vector, wherein the formula is as follows:
Figure FDA0004199035720000032
wherein S (·) represents a sigmoid function, r k Reverse weight of kth class, r k′ Reverse weight for the kth' category;
then introducing weight vectors to construct a loss function, wherein the formula is as follows:
Figure FDA0004199035720000033
where N is the total number of sentences in the dataset and CE (·) is the cross entropy loss function;
Figure FDA0004199035720000034
Figure FDA0004199035720000035
meaning that the function f can be broken down into two parts: f (f) 1 And f 2 As a function f 1 As a function f 2 Is input to the computer; y is i For the actual tag matrix of the ith sentence, Σ is the weight vector, Σ T Representing the transpose of the weight vector, y ik The value of the kth tag representing the ith sentence corresponds to an actual tag position of 1, the remaining positions of 0, < >>
Figure FDA0004199035720000036
Representing a predictive probability of a kth tag of an ith sentence;
to improve the importance of the label in training, a label loss function is added, and the formula is as follows:
Figure FDA0004199035720000037
where k is the serial number of the corresponding class, α is the penalty coefficient, y k Is a category label matrix;
finally, training the model based on Adam algorithm and with the aim of minimizing equation (11).
CN202111534284.9A 2021-12-15 2021-12-15 Multi-label unbalanced text classification method in budget execution audit Active CN114722189B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111534284.9A CN114722189B (en) 2021-12-15 2021-12-15 Multi-label unbalanced text classification method in budget execution audit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111534284.9A CN114722189B (en) 2021-12-15 2021-12-15 Multi-label unbalanced text classification method in budget execution audit

Publications (2)

Publication Number Publication Date
CN114722189A CN114722189A (en) 2022-07-08
CN114722189B true CN114722189B (en) 2023-06-23

Family

ID=82236185

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111534284.9A Active CN114722189B (en) 2021-12-15 2021-12-15 Multi-label unbalanced text classification method in budget execution audit

Country Status (1)

Country Link
CN (1) CN114722189B (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4807880B2 (en) * 2006-10-19 2011-11-02 日本電信電話株式会社 Accumulated document classification device, accumulated document classification method, program, and recording medium
CN110163234B (en) * 2018-10-10 2023-04-18 腾讯科技(深圳)有限公司 Model training method and device and storage medium
US11068656B2 (en) * 2019-04-10 2021-07-20 International Business Machines Corporation Displaying text classification anomalies predicted by a text classification model
US11537821B2 (en) * 2019-04-10 2022-12-27 International Business Machines Corporation Evaluating text classification anomalies predicted by a text classification model
CN110609898B (en) * 2019-08-19 2023-05-05 中国科学院重庆绿色智能技术研究院 Self-classifying method for unbalanced text data
CN111737476B (en) * 2020-08-05 2020-11-20 腾讯科技(深圳)有限公司 Text processing method and device, computer readable storage medium and electronic equipment

Also Published As

Publication number Publication date
CN114722189A (en) 2022-07-08

Similar Documents

Publication Publication Date Title
Day et al. Deep learning for financial sentiment analysis on finance news providers
CN107590177B (en) Chinese text classification method combined with supervised learning
Yang et al. Automatic academic paper rating based on modularized hierarchical convolutional neural network
CN109977199B (en) Reading understanding method based on attention pooling mechanism
CN110807084A (en) Attention mechanism-based patent term relationship extraction method for Bi-LSTM and keyword strategy
CN108875809A (en) The biomedical entity relationship classification method of joint attention mechanism and neural network
CN109492105B (en) Text emotion classification method based on multi-feature ensemble learning
CN110750645A (en) Cross-domain false comment identification method based on countermeasure training
CN110188047A (en) A kind of repeated defects report detection method based on binary channels convolutional neural networks
CN113420145B (en) Semi-supervised learning-based bid-bidding text classification method and system
Fujikawa et al. Recognition of oracle bone inscriptions by using two deep learning models
Yang et al. Meta captioning: A meta learning based remote sensing image captioning framework
CN115952292B (en) Multi-label classification method, apparatus and computer readable medium
CN111581943A (en) Chinese-over-bilingual multi-document news viewpoint sentence identification method based on sentence association graph
CN115906842A (en) Policy information identification method
WO2023071120A1 (en) Method for recognizing proportion of green assets in digital assets and related product
CN113536780A (en) Intelligent auxiliary case judging method for enterprise bankruptcy cases based on natural language processing
CN115455189A (en) Policy text classification method based on prompt learning
Yu et al. Using BiLSTM with attention mechanism to automatically detect self-admitted technical debt
CN110287495A (en) A kind of power marketing profession word recognition method and system
CN110245234A (en) A kind of multi-source data sample correlating method based on ontology and semantic similarity
CN114722189B (en) Multi-label unbalanced text classification method in budget execution audit
Melba Rosalind et al. Predicting students’ satisfaction towards online courses using aspect-based sentiment analysis
CN113312907B (en) Remote supervision relation extraction method and device based on hybrid neural network
Flicoteaux ECSTRA-APHP@ CLEF eHealth2018-task 1: ICD10 Code Extraction from Death Certificates.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant