CN109446332A - A kind of people's mediation case classification system and method based on feature migration and adaptive learning - Google Patents
A kind of people's mediation case classification system and method based on feature migration and adaptive learning Download PDFInfo
- Publication number
- CN109446332A CN109446332A CN201811590326.9A CN201811590326A CN109446332A CN 109446332 A CN109446332 A CN 109446332A CN 201811590326 A CN201811590326 A CN 201811590326A CN 109446332 A CN109446332 A CN 109446332A
- Authority
- CN
- China
- Prior art keywords
- data
- people
- mediation
- auxiliary
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 230000003044 adaptive effect Effects 0.000 title claims abstract description 24
- 238000013508 migration Methods 0.000 title claims abstract description 21
- 230000005012 migration Effects 0.000 title claims abstract description 21
- 238000012549 training Methods 0.000 claims abstract description 58
- 238000013528 artificial neural network Methods 0.000 claims abstract description 35
- 239000000284 extract Substances 0.000 claims abstract description 15
- 238000012545 processing Methods 0.000 claims abstract description 13
- 238000000605 extraction Methods 0.000 claims abstract description 8
- 238000010276 construction Methods 0.000 claims abstract description 5
- 238000013527 convolutional neural network Methods 0.000 claims description 25
- 238000013480 data collection Methods 0.000 claims description 20
- 239000011159 matrix material Substances 0.000 claims description 19
- 230000006870 function Effects 0.000 claims description 18
- 238000004364 calculation method Methods 0.000 claims description 13
- 238000013526 transfer learning Methods 0.000 claims description 12
- 238000004140 cleaning Methods 0.000 claims description 11
- 238000009826 distribution Methods 0.000 claims description 10
- 238000004321 preservation Methods 0.000 claims description 9
- 238000005516 engineering process Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 7
- 238000011176 pooling Methods 0.000 claims description 6
- 230000006978 adaptation Effects 0.000 claims description 5
- 238000005070 sampling Methods 0.000 claims description 4
- 238000004458 analytical method Methods 0.000 claims description 3
- 238000002790 cross-validation Methods 0.000 claims description 3
- 238000013459 approach Methods 0.000 claims description 2
- 230000000977 initiatory effect Effects 0.000 claims description 2
- 238000003780 insertion Methods 0.000 claims description 2
- 230000037431 insertion Effects 0.000 claims description 2
- 230000007704 transition Effects 0.000 claims description 2
- 210000005036 nerve Anatomy 0.000 claims 1
- 238000013145 classification model Methods 0.000 abstract description 3
- 230000007423 decrease Effects 0.000 abstract description 3
- 238000005457 optimization Methods 0.000 abstract description 3
- 230000008859 change Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000009415 formwork Methods 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 238000003475 lamination Methods 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/18—Legal services
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Tourism & Hospitality (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Technology Law (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of people's mediation case classification system and methods based on feature migration and adaptive learning, and present system includes data acquisition module, characteristic extracting module, feature transferring module, network training module, and system structure is simple, has a wide range of application;The method of the present invention includes construction character vector table, helper data vectorsization processing, the processing of people's mediation data vectorization, auxiliary data after vectorization is input in neural network, extract auxiliary data features, the auxiliary data generic features of extraction are moved in new neural network, the people's mediation data after vectorization are input in this neural network, train classification models.The method of the present invention can effectively convert all texts, will not ignore low-frequency word, and dimension decline is obvious, and training speed is fast, is convenient for subsequent online iteration optimization;It solves the otherness between people's mediation field and field of auxiliary simultaneously, meets the individual demand of specific area.
Description
Technical field
The present invention relates to data processing sorting technique fields, more particularly to one kind is based on feature migration and adaptive learning
People's mediation case classification system and method.
Background technique
Currently, China mediates an issue more than 900 ten thousand every year, existing dispute type but only has 20 multiclass, with economic society
The quantity of the development of meeting, case increases and the type of case shows diversification, how quickly accurately to be divided case
Class simultaneously increases new case type in time, improves the efficiency for reconciling work, is the serious problem that people's mediation work faces.When
The case type number that preceding people's mediation case type has the disadvantage that 1, has deposited is few, can not cover all disputes;2, no
Newly-increased dispute type can be separated with the dispute class area deposited in time;3, specific item does not refine under existing dispute type, no
Dispute main points can accurately be embodied.
The subdivision of people's mediation case type is many kinds of, and Text Classification can help people accurately from magnanimity number
Automating sorting function is realized according to middle extraction type feature.Existing people's mediation data mainly based on short text, deposit by short text
Sparsity, real-time, magnanimity and the lack of standard the features such as.These features of short text make text classification face following difficulty
Point: 1, short text Feature Words are few, are indicated with traditional vector space model based on entry, will cause the dilute of vector space
It dredges, in addition the information such as word frequency, Term co-occurrence frequency cannot be fully utilized, and potential semantic association closes between losing word
System;2, the lack of standard of short text makes occur atypical characteristic word and the unrecognized unregistered word of dictionary for word segmentation in text,
Cause traditional text pretreatment and document representation method not accurate enough;3, short text data is huge, in sorting algorithm
Selection on be often more likely to the learning method of non-inert, inert learning method will cause excessively high time complexity.
With a large amount of generations of short text data, people have done a large amount of exploration and practices for the sorting technique of short text.
But the application of the technology still belongs to blank in people's mediation field (strongly professional short text).Number of patent application CN
201710686945.7 proposing a kind of short text classification that composite class dimension-reduction algorithm is combined with weighting lack sampling SVM algorithm
Method solves the problems, such as high latitude sparsity and class imbalance in text classification, but the effect in mostly classification accuracy
It is bad.Number of patent application CN201510271672.0 discloses a kind of short text classification method based on convolutional neural networks, leads to
The word for crossing pre-training indicates that vector carries out semantic extension to short text, and the semantic feature of fixed length is extracted using convolutional neural networks
Vector makes its semantic feature vectorization expression be further enhanced, and the performance of its classification task is finally made to be improved.
But this method is difficult to expand corpus according to external auxiliary data in vertical field.
Summary of the invention
The present invention is to overcome above-mentioned shortcoming, and it is an object of the present invention to provide a kind of based on feature migration and adaptive learning
People's mediation case classification system and method, present system includes that data acquisition module, characteristic extracting module, feature are moved
Shifting formwork block, network training module, system structure is simple, has a wide range of application;The method of the present invention includes construction character vector table, auxiliary
Data vectorization is helped to handle, the auxiliary data after vectorization is input to neural network by the processing of people's mediation data vectorization
In, auxiliary data features are extracted, the auxiliary data generic features of extraction are moved in new neural network, after vectorization
People's mediation data are input in this neural network, train classification models.The method of the present invention can effectively to all texts into
Row conversion will not ignore low-frequency word, and dimension decline is obvious, and training speed is fast, is convenient for subsequent online iteration optimization;It solves simultaneously
The otherness determined between people's mediation field and field of auxiliary meets the individual demand of specific area.
The present invention is to reach above-mentioned purpose by the following technical programs: a kind of based on feature migration and adaptive learning
People's mediation case classification system, comprising: data acquisition module, characteristic extracting module, feature transferring module, network training mould
Block;The data acquisition module is used to acquire people's mediation data and auxiliary data, and to the people's mediation number collected
Data cleansing, duplicate removal pretreatment operation are carried out according to auxiliary data, forms auxiliary data collection and people's mediation data set;Feature
Extraction module extracts auxiliary data features and people's mediation data characteristics using convolutional neural networks, and carries out convolution to feature
Operation obtains the specific feature of people's mediation data;Feature transferring module is new for moving to auxiliary data generic features
In neural network, applied in people's mediation case classification;Network training module is used for the instruction to convolutional neural networks
Practice, obtains final training pattern.
A kind of people's mediation case classification method based on feature migration and adaptive learning, includes the following steps:
(1) collector people's condition data and auxiliary data, and people's mediation data and auxiliary data pre-process
To auxiliary data collection A, people's mediation data set B;
(2) character vector table is constructed, vectorization processing is carried out to auxiliary data, the auxiliary data after vectorization is inputted
Into convolutional neural networks, auxiliary data features are extracted;Field of auxiliary mould is obtained to convolutional neural networks re -training simultaneously
Type, and the network structure of field of auxiliary model is saved as into .meta file, network parameter saves as .checkpoint text
Part;
(3) auxiliary data features of extraction are moved in new neural network using transfer learning technology;Wherein, should
New neural network is the neural network that the network based on field of auxiliary model is rebuild, and is determined certainly in the new neural network
Adaptation layer;
(4) vectorization processing is carried out to people's mediation data, the people's mediation data after vectorization is input to step
(3) in the convolutional neural networks obtained, the specific feature of people's mediation data, and training sorter model is extracted, obtains and protects
Deposit final people's mediation disaggregated model;Classified using the people's mediation disaggregated model to people's mediation case.
Preferably, the step (1) is specific as follows:
(1.1) it collects auxiliary data: collecting long article notebook data relevant to field as field of auxiliary data;
(1.2) collector people's condition data: collecting people's mediation data in recent years, according to expertise by people's mediation
Data stamp group label;
(1.3) data cleansing: the auxiliary data of collection is cleaned, and deletes the interference character in text, is deleted too short
Data;The people's mediation data of collection are cleaned, data of poor quality and too short are deleted, delete the interference in text
Character;
(1.4) data deduplication: similar using cosine angle algorithm, Euclidean distance, Jaccard based on the data after cleaning
Degree, Longest Common Substring, any one or more method deletion repetition and set of metadata of similar data in edit distance approach;
(1.5) data after cleaning and duplicate removal are deposited into data warehouse, obtain auxiliary data collection A, people's mediation number
According to collection B.
Preferably, the step (2) is specific as follows:
(2.1) character vector table is constructed: by the text dividing of auxiliary data collection A and people's mediation data set B at single word
Symbol, each character assign an ID;Character vector table is constructed to character set;
(2.2) text is embedded in: assuming that the character string of a text is [s1,s2,s3,…,sn], snIt is n-th in text
Character is then [e according to character string and character vector table construction text vector1,e2,e3,…,en], wherein enCorresponding is sn
ID;Distributing the vector that a regular length is m to each character using WordEmbedding function indicates, auxiliary data collection
Final output text vector space I ∈ R after the insertion of A text|L|×|n*m|, m is character vector length, and L is auxiliary data set A
Sum;
(2.3) the text vector space I of output is input in convolutional calculation layer, wherein convolutional calculation layer (K layers total);
First layer convolutional layer: doing convolutional calculation to text matrix using filter, if filter size is h × m, wherein h is convolution kernel
Character quantity in window then exports feature t after convolution operationiAre as follows:
ti=f (WSi:i+h-1+b)
Wherein b ∈ R is bias term, W ∈ Rh×mFor the weight matrix of convolution kernel, f is convolution kernel function;The filter application
In a text { S1:h,S2:h+1,…,Sn-h+1Obtain feature T are as follows:
T=[t1,t2,t3,t4,…,tn-h+1]
Wherein t ∈ Rn-h+1;Similarly obtaining the feature that K layers of convolution obtain is T'=[t '1,t'2,…,t'n-Kh+K];Pass through
The pond max-pooling layer carries out down-sampling to feature, retains most important feature
The then feature vector V of full articulamentum are as follows:
Wherein k is the number of convolution kernel;It is normalized by Softmax layers;
(2.4) field of auxiliary model is obtained to convolutional neural networks re -training based on auxiliary data collection A, and will auxiliary
The network structure of domain model saves as .meta file, and network parameter saves as .checkpoint file.
Preferably, the step (2.4) is in the training process, it is based on cross entropy training objective function, that is, the instruction used
Practicing objective function is the cross entropy for minimizing destination probability distribution and actual probability distribution, wherein training objective function J (θ)
Definition are as follows:
Wherein, l is training sample number, and α is regularization factors,It is sample xiCorrect classification;Based on the instruction
Practice objective function, the error of sample is calculated by gradient descent algorithm, and updates network structure using the mode of feedback propagation
The set θ of hyper parameter, more new formula are as follows:
Wherein, λ is learning rate.
Preferably, the method that the training obtains field of auxiliary model are as follows:
(i) auxiliary data collection A is divided into P equal portions, successively extracts several equal portions data as training set, if remaining
Dry equal portions data carry out cross validation, using average value as the accuracy of auxiliary data collection A, accuracy highest as verifying collection
A training pattern preserve, as model M1;
(ii) confusion matrix, wrong sub-matrix record cast M are utilized1The data obscured of prediction auxiliary data collection A classification and every
The number of a classification mistake point, the semi-artificial cleaning data of further progress, cleaning if discovery is there are data quality problem after analysis
It is used as data set D afterwards;Wherein each column of confusion matrix represent predicted value, and what every a line represented is actual classification;
(iii) data set D is according to convolutional neural networks re -training, the preferable field of auxiliary model of output category result.
Preferably, the step (3) is using transfer learning technology migration auxiliary data features, specific step is as follows:
(3.1) tectonic network figure: according to the .meta file reconstruction neural network of preservation, network layer is respectively embeding layer
Embedding, convolutional layer (K layers total), pond layer gmp, full articulamentum fc1, fc2, sofmax layers;
(3.2) determine adaptation layer: people's mediation data field of auxiliary model net network layers successively fix in the state of into
Row transfer learning successively obtains K+3 model accuracy, and when first time accuracy declines, neural network starts people's mediation number
According to adaptive learning, so preceding q layers of neural network are the generic features extract layer for assisting domain model;
(3.3) feature migrates: according to the .checkpoint file initiation parameter of preservation, field of auxiliary model is general
Characteristic layer (q layers first) moves to step (3.1) and rebuilds in the neural network of output.
Preferably, it is described in feature transition process, judge auxiliary data data set A's and people's mediation data set B
Whether categorical measure is consistent: if the two categorical measure is consistent, according to the .checkpoint file of preservation by model parameter
Initialization;If the two categorical measure is inconsistent, according to the .checkpoint file of preservation, softmax parameter is updated, and
Model parameter is initialized.
Preferably, the step (4) is specific as follows:
(4.1) vectorization processing is carried out to people's mediation data set B according to character vector table, by the people after vectorization
Condition data is input in the neural network of step (3) output, and first q layers is extracted people's mediation data and shares with auxiliary data
FeatureTo common characteristicConvolutional calculation obtains people's mediation data characteristics T, initializes q to K+3-q layers of neural network power
Weight extracts the specific feature of people's mediation data, training sorter model;
(4.2) circuit training network is iterated until penalty values no longer reduce, and obtains and saves final people's tune
Solve disaggregated model;, can be used as the field of auxiliary model of next transfer learning;Finally, using the people's mediation disaggregated model to people
Poll solution case is classified.
Preferably, the auxiliary data, refers to judgement document's data.
The beneficial effects of the present invention are: (1) present invention use character level convolutional neural networks file classification method, energy
Effectively all texts are converted, will not ignore low-frequency word, dimension decline is obvious, and training speed is fast, convenient for it is subsequent
Line interation optimization;(2) generic features of field of auxiliary data can be moved to people's tune using transfer learning method by the present invention
It solves in data characteristics, it is difficult to solve the problems, such as short text feature extraction, while improving the generalization ability of model;(3) this hair
It is bright to use deep layer convolutional neural networks, adaptive study is carried out, solves the difference between people's mediation field and field of auxiliary
The opposite sex meets the individual demand of specific area;(4) technical solution that the present invention realizes has one for people's mediation field
Fixed flexibility, people's mediation dispute constantly develop, and for the new dispute of subsequent appearance, the present invention can be moved quickly
It moves and applies.
Detailed description of the invention
Fig. 1 is the flow diagram of the method for the present invention;
Fig. 2 is that the character vector of the embodiment of the present invention indicates to be intended to;
Fig. 3 is that the result that the vector for being m=128 to each character distribution regular length in the embodiment of the present invention indicates is shown
It is intended to;
Fig. 4 is the confusion matrix exemplary diagram that the present invention uses;
Fig. 5 is the block flow diagram of transfer learning of the present invention.
Specific embodiment
The present invention is described further combined with specific embodiments below, but protection scope of the present invention and not only limits
In this:
A kind of embodiment: people's mediation case classification system based on feature migration and adaptive learning, comprising: data
Acquisition module, characteristic extracting module, feature transferring module, network training module;The data acquisition module is for acquiring people
People's condition data and auxiliary data, and it is pre- to the people's mediation data collected and auxiliary data progress data cleansing, duplicate removal
Processing operation forms auxiliary data collection and people's mediation data set;Characteristic extracting module is extracted auxiliary using convolutional neural networks
Data characteristics and people's mediation data characteristics are helped, and convolution algorithm is carried out to feature and obtains the specific feature of people's mediation data;
Feature transferring module is applied for moving to auxiliary data generic features in new neural network in people's mediation case
In example classification;Network training module obtains final training pattern for the training to convolutional neural networks.
As shown in Figure 1, a kind of people's mediation case classification method based on feature migration and adaptive learning, including such as
Lower step:
(1) people's mediation data and auxiliary data pretreatment:
(1.1) it collects auxiliary data: collecting data (long text) relevant to field and be used as field of auxiliary data;This reality
It applies example and acquires nearly 100,000 judgement document's data as auxiliary data, wherein judgement document's type is 20 classes.
(1.2) collector people's condition data: the present embodiment acquires more than 60,000 item of nearly 3 years people's mediation cases, according to special
People's mediation case is stamped group label by family's experience, and group label amounts to 88 classes.
(1.3) data cleansing: the field of auxiliary data of collection are cleaned, and delete the interference character in text, are deleted
Too short data;The people's mediation data of collection are cleaned, data of poor quality and too short are deleted, are deleted in text
Interfere character.The present embodiment using regular expression delete judgement document's data in the time, the date, number, additional character (
N, *) etc. interference character, delete judgement document's data in content less than 30 characters data;The people are deleted using expert judgments
Reconcile the indefinite data of case type, using regular expression delete people's mediation data in the time, the date, identification card number,
Address, telephone number, bank's card number etc. interfere character, data of the content less than 15 characters in deletion people's mediation data.
(1.4) data deduplication: according to step (1.3) cleaning after data, can with cosine angle algorithm, Euclidean distance,
The methods of Jaccard similarity, Longest Common Substring, editing distance delete repetition and similar data, and the present embodiment uses
It is similar greater than in 0.8 data and people's mediation case that Jaccard similarity algorithm deletes similarity factor in judgement document
Coefficient is greater than 0.9 data.
(1.5) data after cleaning and duplicate removal are deposited into data warehouse, obtain judgement document's data set A and the people
Condition data collection B.
(2) field of auxiliary feature is extracted using convolutional neural networks:
(2.1) it constructs character vector table: the sentence of judgement document's data set A and people's mediation data set B is cut into list
A character, character deduplication, one character of a line are stored in vocab.txt file, and line number is exactly the ID of each character;At this
In embodiment, C=5000 is that character set used in data (is filled out including the unknown character<PAD>not in character vector table
Fill), it is as shown in Figure 2 to construct a character vector table:
(2.2) text is embedded in: in the present embodiment, every data regular length being set as 300, the data meeting greater than 300
It is truncated, fills unified character<PAD>less than 300 data.Assuming that the character string of a text is [s1,s2,s3,…,
sn] (0≤n≤300), snFor n-th character in text, then it is according to character string and character vector table construction text vector
[e1,e2,e3,…,en], wherein enCorresponding is snID, using WordEmbedding matrix to each character distribute one
The vector that regular length is m=128 indicates, as shown in figure 3, then text vector space is S ∈ R300×128.And so on, for
Judgement document's data set A text is embedded in final output text vector space I ∈ R|L|×|300*128|, L is judgement document's data set A
Sum.
(2.3) network structure used in the present invention is as shown in table 1 below:
Title | embedding | filter | kernel size | hidden_dim | out size |
Embedding | 128 | [300×128] | |||
Conv1 | 256 | 3×128 | 128 | [298×1×256] | |
Conv2 | 256 | 3×128 | 128 | [296×1×256] | |
Conv3 | 256 | 3×128 | 128 | [294×1×256] | |
Conv4 | 256 | 3×128 | 128 | [292×1×256] | |
Conv5 | 256 | 3×128 | 128 | [290×1×256] | |
MaxPool | [256×1] | ||||
Dropout | [256×1] | ||||
Fc | [20 × 1] or [88 × 1] | ||||
Softmax | [20 × 1] or [88 × 1] |
Table 1
Pass through (K layers total) the progress convolutional calculation of convolutional calculation layer according to the text vector space I of step (2.2) output,
First layer convolutional layer: doing convolutional calculation to text matrix using filter, if filter size is h × m, wherein h
For the character quantity in convolution kernel window, then feature t is exported after convolution operationiAre as follows:
ti=f (WSi:i+h-1+b)
Wherein b ∈ R is bias term, W ∈ Rh×mFor the weight matrix of convolution kernel, f is convolution kernel function;The filter application
In a text { S1:h,S2:h+1,…,Sn-h+1Obtain feature T are as follows:
T=[t1,t2,t3,t4,…,tn-h+1]
Wherein t ∈ Rn-h+1;Similarly obtain the feature that K layers of convolutional layer obtainsPass through max-
The pond pooling layer carries out down-sampling to feature, retains most important feature
The then feature vector V of full articulamentum are as follows:
Wherein k is the number of convolution kernel;It is normalized by Softmax layers, Softmax functional form is as follows:
Wherein, xiIt is the input short text, zjIt is j-th of classification, θ is to need to estimate in the convolutional neural networks
Hyper parameter set, Z are the predefined category set of training sample, ∮j(xi, θ) and it is the network structure to sample xiIn classification zj
On scoring, i.e., by many-sorted logic this spy return classifier by it is described scoring be mapped as about the general of all predefined classifications
Rate distribution vector, the dimension of the probability vector and the predefined category set are in the same size.
The present embodiment is through excessive wheel test, when the convolution number of plies is five layers, the character quantity h=3 timeliness in convolution kernel window
Fruit is best, generates feature T' are as follows:
T'=[t '1,t'2,…,t'290]
Wherein, t' ∈ R290;Maximum value is taken out from each vector using the pond max-pooling layer, maximum value represents
Most important signal, this Pooling mode can solve the sentence inputting problem of variable-length, most terminal cistern layer is defeated
It is out the maximum value in convolutional calculation layer.
Gradient disappears in order to prevent, and the present embodiment introduces Relu activation primitive in connection first layer entirely, by testing,
The convergence rate for the SGD that Relu is obtained can it is faster than sigmoid/tanh very much, its mathematic(al) representation is as follows:
F (x)=1 (x < 0) (ax)+1 (>=0 x) (x)
Wherein a is the constant of a very little.Data distribution is not only had modified in this way, but also remains the value of some negative axis, so that
Negative axis information will not all be lost;Model over-fitting, the present embodiment introduce Dropout technology in order to prevent simultaneously, by intersecting
Verifying, it is best to imply effect when node dropout rate is equal to 0.5, the network knot that dropout is generated at random when 0.5
Structure is most.It is normalized in the connection second layer entirely using Softmax, shows probability distribution of the judgement document in 20 classes.
(2.4) field of auxiliary model is obtained to convolutional neural networks re -training based on auxiliary data collection A, and will auxiliary
The network structure of domain model saves as .meta file, and network parameter saves as .checkpoint file.
In loop iteration training process, the training objective function that the present embodiment uses is to minimize destination probability distribution
With the cross entropy of actual probability distribution, the definition of training objective function J (θ) are as follows:
Wherein, l is training sample number, and α is regularization factors,It is sample xiCorrect classification.Based on the training
Objective function, by gradient descent algorithm calculate batch sample error, and using feedback propagation (Back Propagation,
BP mode) updates the set θ of the hyper parameter of the network structure, specifically more new formula are as follows:
Wherein, λ is learning rate, passes through test in the present embodiment and works as α=0.3, λ=1 × e-3Shi Xiaoguo is best.
(2.5) judgement document's data set A is divided into 10 equal portions, successively extracts 9 equal portions data as training set, 1 and waits numbers
Collect according to as verifying, cross validation is carried out, using average value as the accuracy of judgement document's data set A, accuracy highest one
Secondary training pattern preserves, as model M1。
(2.6) confusion matrix (each column of matrix represent predicted value, and what every a line represented is actual classification) is utilized,
Wrong sub-matrix record cast M1The number of data and each classification mistake point that prediction judgement document's data set A classification is obscured, analysis
Afterwards discovery there are data quality problem (such as: judgement document's classification marking error, judgement document's unclassified are true), further into
For the semi-artificial cleaning data of row as judgement document data set D, confusion matrix is as shown in Figure 4.
(2.7) data set D is according to convolutional neural networks re -training, and preferably (accuracy is greater than output category result
90%) judgement document's model, this model is as field of auxiliary model M2。
(2.8) by model M2Network save as my_model.meta, network parameter saves as my_
model.checkpoint。
(3) transfer learning technology is utilized, auxiliary data generic features are applied in people's mediation case classification, process
It is as shown in Figure 5:
(3.1) tectonic network figure: according to the my_model.meta file of preservation, neural network is rebuild (with judgement document
Data neural network structure is identical), network layer is respectively embeding layer embedding, convolutional layer Conv1, Conv2, Conv3,
Conv4, Conv5, pond layer gmp, full articulamentum fc1, fc2, sofmax layers.
(3.2) determine adaptation layer: people's mediation data carry out in the state that submodel network layer is successively fixed
Transfer learning successively obtains 8 model accuracies, and when first time accuracy declines, it is adaptive that network has started people's mediation data
It should learn, so preceding q layers of the network generic features extract layer for submodel.In the present embodiment, to model M2's
Conv1, Conv2, Conv3, Conv4, Conv5, gmp, fc1, fc2 are successively finely adjusted experiment, obtain three first layers model M2It learns
What is practised is general feature, and with the intensification of network layer, subsequent network is more heavily weighted toward the feature in judgement document field,
That is the present embodiment is by model M2Parameter Conv1, Conv2, Conv3 are moved in this neural network, Conv4, Conv5, gmp,
Fc1, fc2, softmax initialization are not loaded with.
(3.3) judge whether judgement document's data set A is consistent with the categorical measure of people's mediation data set B: if the two
Categorical measure is consistent, thens follow the steps (3.4);If the two categorical measure is inconsistent, (3.5) are thened follow the steps.
(3,4) initialize model parameter according to the .checkpoint file that step (2.4) saves, by model M2Parameter
Conv1, Conv2, Conv3 are moved in this neural network, Conv4, Conv5, gmp, fc1, fc2, and softmax is initialized not
Load.
(3.5) the .checkpoint file saved according to step (2.4), updates softmax parameter, and model parameter is initial
Change, by model M2Parameter Conv1, Conv2, Conv3 are moved in this neural network, Conv4, Conv5, gmp, fc1, fc2,
Softmax initialization is not loaded with.
(4) adaptive learning is carried out using the ability of convolutional neural networks feature extraction:
(4.1) according to step (2.1), step (2.2) export character vector table to people's mediation data set B carry out to
People's mediation data after vectorization are input in the neural network of step (3) output by quantification treatment, and three first layers are extracted
People's mediation data and judgement document's data common characteristicBy common characteristicBy level 2 volume lamination (Conv4, Conv5) into
Row convolutional calculation, obtains people's mediation data characteristics T, and feature T extracts notable feature by the pond max-pooling layer, passes through
Full articulamentum obtains the final specific feature of people's mediation data, training sorter model.
(4.2) circuit training network is iterated until penalty values no longer reduce, and saves people's mediation disaggregated model,
Field of auxiliary model as next transfer learning.
Since the present embodiment judgement document number of types and people's mediation number of types are inconsistent, therefore update softmax ginseng
Number (number of types class=88 is segmented in people's mediation), Restoration model M2The weight matrix of middle three first layers convolution kernel, according to step
(2.1), the character vector table of step (2.2) output carries out vectorization processing to people's mediation data set B, after vectorization
People's mediation data are input in this convolutional neural networks, extract the feature of people's mediation data, and train classification models save
People's mediation disaggregated model M3;Classified using the people's mediation disaggregated model to people's mediation case.
During people's mediation informationization promotes and applies, there can be following two situation:
1, the data of people's mediation can be more and more, while in the short time, and dispute type will not change;At this time will
Model M3Generic features extract layer move in new person people's condition data, improve the accuracy of classification.
2, people's mediation informationization application more becomes mature, and the data of people's mediation can be more and more, while it is possible that
New dispute type;At this time by model M3Generic features extract layer move in new person people's condition data, update softmax
Parameter (new people's mediation number of types), avoids training from the beginning.
It is specific embodiments of the present invention and the technical principle used described in above, if conception under this invention
Made change when the spirit that generated function is still covered without departing from specification and attached drawing, should belong to the present invention
Protection scope.
Claims (10)
1. a kind of people's mediation case classification method based on feature migration and adaptive learning, it is characterised in that including walking as follows
It is rapid:
(1) collector people's condition data and auxiliary data, and people's mediation data and auxiliary data are pre-processed to obtain auxiliary
Help data set A, people's mediation data set B;
(2) character vector table is constructed, vectorization processing is carried out to auxiliary data, the auxiliary data after vectorization is input to convolution
In neural network, auxiliary data features are extracted;Field of auxiliary model is obtained to convolutional neural networks re -training simultaneously, and will be auxiliary
The network structure of domain model is helped to save as .meta file, network parameter saves as .checkpoint file;
(3) auxiliary data features of extraction are moved in new neural network using transfer learning technology;Wherein, the new nerve
Network is the neural network that the network based on field of auxiliary model is rebuild, and determines adaptation layer in the new neural network;
(4) vectorization processing is carried out to people's mediation data, the people's mediation data after vectorization is input to step (3) and are obtained
Convolutional neural networks in, extract the specific feature of people's mediation data, and training sorter model, obtain and save final
People's mediation disaggregated model;Classified using the people's mediation disaggregated model to people's mediation case.
2. a kind of people's mediation case classification method based on feature migration and adaptive learning according to claim 1,
It is characterized by: the step (1) is specific as follows:
(1.1) it collects auxiliary data: collecting long article notebook data relevant to field as field of auxiliary data;
(1.2) collector people's condition data: collecting people's mediation data in recent years, is beaten people's mediation data according to expertise
Upper group label;
(1.3) data cleansing: the auxiliary data of collection is cleaned, and is deleted the interference character in text, is deleted too short number
According to;The people's mediation data of collection are cleaned, data of poor quality and too short are deleted, delete the interference character in text;
(1.4) data deduplication: based on the data after cleaning, using cosine angle algorithm, Euclidean distance, Jaccard similarity, most
Any one or more method in long public substring, edit distance approach deletes repetition and set of metadata of similar data;
(1.5) data after cleaning and duplicate removal are deposited into data warehouse, obtain auxiliary data collection A, people's mediation data set
B。
3. a kind of people's mediation case classification method based on feature migration and adaptive learning according to claim 1,
It is characterized by: the step (2) is specific as follows:
(2.1) character vector table is constructed: by the text dividing of auxiliary data collection A and people's mediation data set B at single character, often
A character assigns an ID;Character vector table is constructed to character set;
(2.2) text is embedded in: assuming that the character string of a text is [s1,s2,s3,…,sn], snFor n-th of character in text,
It is then [e according to character string and character vector table construction text vector1,e2,e3,…,en], wherein enCorresponding is snID;
Distributing the vector that a regular length is m to each character using WordEmbedding function indicates, auxiliary data collection A text
Final output text vector space I ∈ R after insertion|L|×|n*m|, m is character vector length, and L is the sum for assisting data set A;
(2.3) the text vector space I of output is input in convolutional calculation layer, wherein convolutional calculation layer (K layers total);
First layer convolutional layer: doing convolutional calculation to text matrix using filter, if filter size is h × m, wherein h is volume
Character quantity in product core window, then export feature t after convolution operationiAre as follows:
ti=f (WSi:i+h-1+b)
Wherein b ∈ R is bias term, W ∈ Rh×mFor the weight matrix of convolution kernel, f is convolution kernel function;The filter is applied to one
A text { S1:h,S2:h+1,…,Sn-h+1Obtain feature T are as follows:
T=[t1,t2,t3,t4,…,tn-h+1]
Wherein t ∈ Rn-h+1;Similarly obtain the feature that K layers of convolutional layer obtainsPass through max-pooling
Pond layer carries out down-sampling to feature, retains most important feature
The then feature vector V of full articulamentum are as follows:
Wherein k is the number of convolution kernel;It is normalized by Softmax layers;
(2.4) field of auxiliary model is obtained to convolutional neural networks re -training based on auxiliary data collection A, and by field of auxiliary mould
The network structure of type saves as .meta file, and network parameter saves as .checkpoint file.
4. a kind of people's mediation case classification method based on feature migration and adaptive learning according to claim 3,
It is characterized by: the step (2.4) is in the training process, it is based on cross entropy training objective function, that is, the training objective used
Function is the cross entropy for minimizing destination probability distribution and actual probability distribution, wherein the definition of training objective function J (θ)
Are as follows:
Wherein, l is training sample number, and α is regularization factors,It is sample xiCorrect classification;Based on the training objective
Function calculates the error of sample by gradient descent algorithm, and the hyper parameter of network structure is updated using the mode of feedback propagation
Set θ, more new formula are as follows:
Wherein, λ is learning rate.
5. a kind of people's mediation case classification method based on feature migration and adaptive learning according to claim 1,
It is characterized by: the method that the training obtains field of auxiliary model are as follows:
(i) auxiliary data collection A is divided into P equal portions, successively extracts several equal portions data as training set, remaining several equal portions
Data carry out cross validation as verifying collection, and using average value as the accuracy of auxiliary data collection A, accuracy is highest primary
Training pattern preserves, as model M1;
(ii) confusion matrix, wrong sub-matrix record cast M are utilized1The data and each classification that prediction auxiliary data collection A classification is obscured
The number of mistake point, the semi-artificial cleaning data of further progress, conduct after cleaning if discovery is there are data quality problem after analysis
Data set D;Wherein each column of confusion matrix represent predicted value, and what every a line represented is actual classification;
(iii) data set D is according to convolutional neural networks re -training, the preferable field of auxiliary model of output category result.
6. a kind of people's mediation case classification method based on feature migration and adaptive learning according to claim 1,
It is characterized by: the step (3) is using transfer learning technology migration auxiliary data features, specific step is as follows:
(3.1) tectonic network figure: according to the .meta file reconstruction neural network of preservation, network layer is respectively embeding layer
Embedding, convolutional layer (K layers total), pond layer gmp, full articulamentum fc1, fc2, sofmax layers;
(3.2) determine adaptation layer: people's mediation data are moved in the state that field of auxiliary model net network layers are successively fixed
Study is moved, K+3 model accuracy is successively obtained, when first time accuracy declines, neural network starts people's mediation data certainly
Adaptive learning, so preceding q layers of neural network are the generic features extract layer for assisting domain model;
(3.3) feature migrates: according to the .checkpoint file initiation parameter of preservation, by field of auxiliary model generic features
Layer (q layers first) moves to step (3.1) and rebuilds in the neural network of output.
7. a kind of people's mediation case classification method based on feature migration and adaptive learning according to claim 1,
It is characterized by: it is described in feature transition process, judge the classification number of auxiliary data data set A Yu people's mediation data set B
It whether consistent measures: if the two categorical measure is consistent, being initialized model parameter according to the .checkpoint file of preservation;
If the two categorical measure is inconsistent, according to the .checkpoint file of preservation, softmax parameter is updated, and model is joined
Number initialization.
8. a kind of people's mediation case classification method based on feature migration and adaptive learning according to claim 1,
It is characterized by: the step (4) is specific as follows:
(4.1) vectorization processing is carried out to people's mediation data set B according to character vector table, by the people's mediation number after vectorization
According to being input in the neural network of step (3) output, first q layers is extracted people's mediation data and auxiliary data common characteristic
To common characteristicConvolutional calculation obtains people's mediation data characteristics T, initializes q to K+3-q layers of neural network weight, extracts people
The specific feature of people's condition data, training sorter model;
(4.2) circuit training network is iterated until penalty values no longer reduce, and obtains and saves final people's mediation point
Class model;, can be used as the field of auxiliary model of next transfer learning;Finally, using the people's mediation disaggregated model to people's tune
Solution case is classified.
9. a kind of people's mediation case based on feature migration and adaptive learning according to any one of claims 1 to 8
Classification method, it is characterised in that: the auxiliary data refers to judgement document's data.
10. a kind of people's mediation case classification system based on feature migration and adaptive learning, characterized by comprising: data
Acquisition module, characteristic extracting module, feature transferring module, network training module;The data acquisition module is for acquiring people
People's condition data and auxiliary data, and it is pre- to the people's mediation data collected and auxiliary data progress data cleansing, duplicate removal
Processing operation forms auxiliary data collection and people's mediation data set;Characteristic extracting module is extracted using convolutional neural networks and is assisted
Data characteristics and people's mediation data characteristics, and convolution algorithm is carried out to feature and obtains the specific feature of people's mediation data;It is special
Sign transferring module is applied for moving to auxiliary data generic features in new neural network in people's mediation case point
In class;Network training module obtains final training pattern for the training to convolutional neural networks.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811590326.9A CN109446332B (en) | 2018-12-25 | 2018-12-25 | People reconciliation case classification system and method based on feature migration and self-adaptive learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811590326.9A CN109446332B (en) | 2018-12-25 | 2018-12-25 | People reconciliation case classification system and method based on feature migration and self-adaptive learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109446332A true CN109446332A (en) | 2019-03-08 |
CN109446332B CN109446332B (en) | 2023-08-25 |
Family
ID=65535335
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811590326.9A Active CN109446332B (en) | 2018-12-25 | 2018-12-25 | People reconciliation case classification system and method based on feature migration and self-adaptive learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109446332B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110110085A (en) * | 2019-04-24 | 2019-08-09 | 中电海康集团有限公司 | Traffic accident file classification method and system based on character level neural network and SVM |
CN110196911A (en) * | 2019-06-06 | 2019-09-03 | 申林森 | A kind of people's livelihood data automatic classification management system |
CN110362677A (en) * | 2019-05-31 | 2019-10-22 | 平安科技(深圳)有限公司 | The recognition methods of text data classification and device, storage medium, computer equipment |
CN110688487A (en) * | 2019-09-29 | 2020-01-14 | 中国建设银行股份有限公司 | Text classification method and device |
CN110704619A (en) * | 2019-09-24 | 2020-01-17 | 支付宝(杭州)信息技术有限公司 | Text classification method and device and electronic equipment |
CN110825872A (en) * | 2019-09-11 | 2020-02-21 | 成都数之联科技有限公司 | Method and system for extracting and classifying litigation request information |
CN111144112A (en) * | 2019-12-30 | 2020-05-12 | 广州广电运通信息科技有限公司 | Text similarity analysis method and device and storage medium |
CN111753137A (en) * | 2020-06-29 | 2020-10-09 | 四川长虹电器股份有限公司 | Video searching method based on voice characteristics |
CN112115264A (en) * | 2020-09-14 | 2020-12-22 | 中国科学院计算技术研究所苏州智能计算产业技术研究院 | Text classification model adjusting method facing data distribution change |
CN112347738A (en) * | 2020-11-04 | 2021-02-09 | 平安直通咨询有限公司上海分公司 | Judging document-based bidirectional encoder characteristic quantity model optimization method and device |
CN113052851A (en) * | 2019-12-27 | 2021-06-29 | 上海昕健医疗技术有限公司 | Medical image processing method and system based on deep learning and computer equipment |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150206309A1 (en) * | 2014-01-21 | 2015-07-23 | University Of Rochester | System and method for real-time image registration |
CN104834747A (en) * | 2015-05-25 | 2015-08-12 | 中国科学院自动化研究所 | Short text classification method based on convolution neutral network |
CN106777011A (en) * | 2016-12-07 | 2017-05-31 | 中山大学 | A kind of file classification method based on depth multi-task learning |
US20170161633A1 (en) * | 2015-12-07 | 2017-06-08 | Xerox Corporation | Transductive adaptation of classifiers without source data |
CN107967253A (en) * | 2017-10-27 | 2018-04-27 | 北京大学 | A kind of low-resource field segmenter training method and segmenting method based on transfer learning |
US20180165604A1 (en) * | 2016-12-09 | 2018-06-14 | U2 Science Labs A Montana | Systems and methods for automating data science machine learning analytical workflows |
CN108229651A (en) * | 2017-11-28 | 2018-06-29 | 北京市商汤科技开发有限公司 | Neural network model moving method and system, electronic equipment, program and medium |
CN108376267A (en) * | 2018-03-26 | 2018-08-07 | 天津大学 | A kind of zero sample classification method based on classification transfer |
CN108629772A (en) * | 2018-05-08 | 2018-10-09 | 上海商汤智能科技有限公司 | Image processing method and device, computer equipment and computer storage media |
CN108647741A (en) * | 2018-05-18 | 2018-10-12 | 湖北工业大学 | A kind of image classification method and system based on transfer learning |
CN108805137A (en) * | 2018-04-17 | 2018-11-13 | 平安科技(深圳)有限公司 | Extracting method, device, computer equipment and the storage medium of livestock feature vector |
-
2018
- 2018-12-25 CN CN201811590326.9A patent/CN109446332B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150206309A1 (en) * | 2014-01-21 | 2015-07-23 | University Of Rochester | System and method for real-time image registration |
CN104834747A (en) * | 2015-05-25 | 2015-08-12 | 中国科学院自动化研究所 | Short text classification method based on convolution neutral network |
US20170161633A1 (en) * | 2015-12-07 | 2017-06-08 | Xerox Corporation | Transductive adaptation of classifiers without source data |
CN106777011A (en) * | 2016-12-07 | 2017-05-31 | 中山大学 | A kind of file classification method based on depth multi-task learning |
US20180165604A1 (en) * | 2016-12-09 | 2018-06-14 | U2 Science Labs A Montana | Systems and methods for automating data science machine learning analytical workflows |
CN107967253A (en) * | 2017-10-27 | 2018-04-27 | 北京大学 | A kind of low-resource field segmenter training method and segmenting method based on transfer learning |
CN108229651A (en) * | 2017-11-28 | 2018-06-29 | 北京市商汤科技开发有限公司 | Neural network model moving method and system, electronic equipment, program and medium |
CN108376267A (en) * | 2018-03-26 | 2018-08-07 | 天津大学 | A kind of zero sample classification method based on classification transfer |
CN108805137A (en) * | 2018-04-17 | 2018-11-13 | 平安科技(深圳)有限公司 | Extracting method, device, computer equipment and the storage medium of livestock feature vector |
CN108629772A (en) * | 2018-05-08 | 2018-10-09 | 上海商汤智能科技有限公司 | Image processing method and device, computer equipment and computer storage media |
CN108647741A (en) * | 2018-05-18 | 2018-10-12 | 湖北工业大学 | A kind of image classification method and system based on transfer learning |
Non-Patent Citations (7)
Title |
---|
JEREMY HOWARD 等: "Universal Language Model Fine-tuning for Text Classification" * |
OKI SAPUTRA JAYA 等: "Analysis of Convolution Neural Network for Transfer Learning of Sentiment Analysis in Indonesian Tweets", 《DSIT "18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON DATA SCIENCE AND INFORMATION TECHNOLOGY》 * |
SHUN MORIYA 等: "Transfer Learning Method for Very Deep CNN for Text Classification and Methods for its Evaluation", 《2018 IEEE 42ND ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC)》 * |
TUSHAR SEMWAL 等: "A Practitioners’ Guide to Transfer Learning for Text Classification using Convolutional Neural Networks", 《PROCEEDINGS OF THE 2018 SIAM INTERNATIONAL CONFERENCE ON DATA MINING(SDM)》 * |
夏彬彬: "基于Web文本挖掘的情感分析研究" * |
金佳佳: "基于深度学习的短文本分类算法研究及应用", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
陈钊: "面向中文文本的情感分析方法研究", 《万方数据知识服务平台》 * |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110110085A (en) * | 2019-04-24 | 2019-08-09 | 中电海康集团有限公司 | Traffic accident file classification method and system based on character level neural network and SVM |
CN110362677A (en) * | 2019-05-31 | 2019-10-22 | 平安科技(深圳)有限公司 | The recognition methods of text data classification and device, storage medium, computer equipment |
CN110196911B (en) * | 2019-06-06 | 2022-04-22 | 申林森 | Automatic classification management system for civil data |
CN110196911A (en) * | 2019-06-06 | 2019-09-03 | 申林森 | A kind of people's livelihood data automatic classification management system |
CN110825872A (en) * | 2019-09-11 | 2020-02-21 | 成都数之联科技有限公司 | Method and system for extracting and classifying litigation request information |
CN110825872B (en) * | 2019-09-11 | 2023-05-23 | 成都数之联科技股份有限公司 | Method and system for extracting and classifying litigation request information |
CN110704619A (en) * | 2019-09-24 | 2020-01-17 | 支付宝(杭州)信息技术有限公司 | Text classification method and device and electronic equipment |
CN110688487A (en) * | 2019-09-29 | 2020-01-14 | 中国建设银行股份有限公司 | Text classification method and device |
CN113052851A (en) * | 2019-12-27 | 2021-06-29 | 上海昕健医疗技术有限公司 | Medical image processing method and system based on deep learning and computer equipment |
CN111144112A (en) * | 2019-12-30 | 2020-05-12 | 广州广电运通信息科技有限公司 | Text similarity analysis method and device and storage medium |
CN111144112B (en) * | 2019-12-30 | 2023-07-14 | 广州广电运通信息科技有限公司 | Text similarity analysis method, device and storage medium |
CN111753137B (en) * | 2020-06-29 | 2022-05-03 | 四川长虹电器股份有限公司 | Video searching method based on voice characteristics |
CN111753137A (en) * | 2020-06-29 | 2020-10-09 | 四川长虹电器股份有限公司 | Video searching method based on voice characteristics |
CN112115264A (en) * | 2020-09-14 | 2020-12-22 | 中国科学院计算技术研究所苏州智能计算产业技术研究院 | Text classification model adjusting method facing data distribution change |
CN112115264B (en) * | 2020-09-14 | 2024-03-22 | 中科苏州智能计算技术研究院 | Text classification model adjustment method for data distribution change |
CN112347738A (en) * | 2020-11-04 | 2021-02-09 | 平安直通咨询有限公司上海分公司 | Judging document-based bidirectional encoder characteristic quantity model optimization method and device |
CN112347738B (en) * | 2020-11-04 | 2023-09-15 | 平安直通咨询有限公司上海分公司 | Bidirectional encoder characterization quantity model optimization method and device based on referee document |
Also Published As
Publication number | Publication date |
---|---|
CN109446332B (en) | 2023-08-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109446332A (en) | A kind of people's mediation case classification system and method based on feature migration and adaptive learning | |
CN109726287A (en) | A kind of people's mediation case classification system and method based on transfer learning and deep learning | |
CN109947963A (en) | A kind of multiple dimensioned Hash search method based on deep learning | |
CN110442684A (en) | A kind of class case recommended method based on content of text | |
CN110083700A (en) | A kind of enterprise's public sentiment sensibility classification method and system based on convolutional neural networks | |
CN108984745A (en) | A kind of neural network file classification method merging more knowledge mappings | |
CN108364028A (en) | A kind of internet site automatic classification method based on deep learning | |
CN107861951A (en) | Session subject identifying method in intelligent customer service | |
CN107562812A (en) | A kind of cross-module state similarity-based learning method based on the modeling of modality-specific semantic space | |
CN106991374A (en) | Handwritten Digit Recognition method based on convolutional neural networks and random forest | |
CN108710894A (en) | A kind of Active Learning mask method and device based on cluster representative point | |
CN109582782A (en) | A kind of Text Clustering Method based on Weakly supervised deep learning | |
CN109766277A (en) | A kind of software fault diagnosis method based on transfer learning and DNN | |
CN110110335A (en) | A kind of name entity recognition method based on Overlay model | |
CN108121975A (en) | A kind of face identification method combined initial data and generate data | |
CN110413783A (en) | A kind of judicial style classification method and system based on attention mechanism | |
CN110197286A (en) | A kind of Active Learning classification method based on mixed Gauss model and sparse Bayesian | |
CN108804677A (en) | In conjunction with the deep learning question classification method and system of multi-layer attention mechanism | |
CN104657466B (en) | A kind of user interest recognition methods and device based on forum postings feature | |
CN109840322A (en) | It is a kind of based on intensified learning cloze test type reading understand analysis model and method | |
CN108846047A (en) | A kind of picture retrieval method and system based on convolution feature | |
CN109886161A (en) | A kind of road traffic index identification method based on possibility cluster and convolutional neural networks | |
CN107947921A (en) | Based on recurrent neural network and the password of probability context-free grammar generation system | |
CN109086375A (en) | A kind of short text subject extraction method based on term vector enhancing | |
CN113761218A (en) | Entity linking method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 310012 1st floor, building 1, 223 Yile Road, Hangzhou City, Zhejiang Province Applicant after: Yinjiang Technology Co.,Ltd. Address before: Floor 1, building 1, 223 Yile Road, Hangzhou, Zhejiang 310000 Applicant before: ENJOYOR Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |