CN109840279A - File classification method based on convolution loop neural network - Google Patents

File classification method based on convolution loop neural network Download PDF

Info

Publication number
CN109840279A
CN109840279A CN201910025175.0A CN201910025175A CN109840279A CN 109840279 A CN109840279 A CN 109840279A CN 201910025175 A CN201910025175 A CN 201910025175A CN 109840279 A CN109840279 A CN 109840279A
Authority
CN
China
Prior art keywords
convolution
indicate
input
text
formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910025175.0A
Other languages
Chinese (zh)
Inventor
李钊
王瑞霜
曹建
陈通
王磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Yi Yun Information Technology Co Ltd
Shandong Computer Science Center National Super Computing Center in Jinan
Shandong Computer Science Center
Original Assignee
Shandong Yi Yun Information Technology Co Ltd
Shandong Computer Science Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Yi Yun Information Technology Co Ltd, Shandong Computer Science Center filed Critical Shandong Yi Yun Information Technology Co Ltd
Priority to CN201910025175.0A priority Critical patent/CN109840279A/en
Publication of CN109840279A publication Critical patent/CN109840279A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

The present invention discloses a kind of file classification method based on convolution loop neural network, the advantage for making full use of convolutional neural networks to extract local feature carries out feature extraction to text, while the contextual feature of extraction is connected the semantic information for preferably indicating text by the advantage using LSTM with memory.This method not only obtains preferable classifying quality on English data set while also obtaining higher classification accuracy on Chinese data collection.

Description

File classification method based on convolution loop neural network
Technical field
The present invention relates to a kind of file classification methods, are a kind of texts based on convolution loop neural network specifically Classification method.
Background technique
With the fast development of depth learning technology, convolutional neural networks and Recognition with Recurrent Neural Network are in various engineerings Huge success is achieved in habit task.For example, convolutional neural networks have been widely used for computer vision field, handling Comparative maturity, such as image classification, object detection, image segmentation, speech recognition in Computer Vision Task.Circulation nerve Network is the important branch of another in deep learning, it is mainly used to processing sequence problem.Long memory network in short-term (LSTM) be Recognition with Recurrent Neural Network a kind of specific type, it can capture the contextual information of sequence, be widely used in the time Sequence problem, such as speech recognition, machine translation.
In recent years, on processing sequence data problem, more and more researchers are neural by convolutional neural networks and circulation Network integration gets up to be used together.The mixed model is referred to as convolution loop neural network (CRNN), and CRNN can be retouched simply State in convolutional neural networks followed by Recognition with Recurrent Neural Network.Convolutional neural networks are primarily used to extract feature in the model, Recognition with Recurrent Neural Network mainly gets up contextual feature informational linkage.Currently, the model is applied to music assorting, height Spectral data classification, bird audio detection etc..
Convolution loop neural network model is equally applicable to text classification.In text classification, convolutional neural networks are used Neatly the feature of text can be extracted, due to during text classification classification results by the shadow of entire content of text It rings, therefore, being connected the feature of extraction using long memory network in short-term can preferably indicate that text is preferably real in turn Existing text classification.Therefore, herein text classify using convolution loop neural network and use Chinese data collection and English Data set is compared as experimental data with other classification methods.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of file classification methods based on convolution loop neural network, first Multiple groups feature extraction first is carried out using text information of the convolutional network to input and pond is carried out to extract in text to it respectively Then the feature extracted is carried out fusion and is sent into LSTM neural network and by full articulamentum output category knot by important feature Fruit.
In order to solve the technical problem, the technical issues of present invention uses, is: the text based on convolution loop neural network This classification method, it is characterised in that: the following steps are included:
S01), term vector matrix is converted as the input of convolutional layer using the sample data of text sequence;
S02), convolution operation is carried out to input data using multiple dimensioned convolution kernel, the height of characteristic pattern uses after convolution Formula 1 calculates.During convolution operation, each local feature of input is calculated respectively using single convolution kernel first, Calculation formula such as formula 2, it is then using formula 3 that calculated feature is connected longitudinally, activation primitive is finally reused to calculating As a result it carries out NONLINEAR CALCULATION and obtains final convolution feature, calculation formula such as formula 4,
h1F(i)=f (WFX (i:i+F-1)+b) (2),
In formula, H2The height of characteristic pattern, H after expression convolution1Indicate the height inputted before convolution, F indicates the height of convolution kernel Degree, P indicate the size of Padding, and S indicates step-length,It indicates to be rounded downwards, WFIndicate that height is the convolution kernel of F, X (i:i+ F-1 the local feature vectors of the feature from ith feature to the i-th+F-1 in sample input vector) are indicated, b indicates bias;
S03), pond is carried out to extract the important of text to the result after convolution using maximum pond layer MaxPooling1D Then the result of Chi Huahou is played the input as LSTM layers, calculation formula point using Concatenate functional link by feature Not as shown in formula 5,6,
S04), LSTM will be utilized by different convolution kernels treated text feature sequence as the input of LSTM network Network can more accurately indicate the semantic information of text, and then the classification of text is better achieved, LSTM network each moment Calculation formula it is as follows:
ft=σ (Wf·[ht-1, h1t]+bf) (7),
ht=otοtanh(ct) (12),
ftIt indicates to forget door, σ indicates sigmoid function, WfIndicate the weight matrix of forgetting door,It indicates two Vector is combined into a longer vector, ht-1Represent the output at LSTM network last moment, h1tIndicate the output through convolution Chi Huahou h1In the input of t moment, bfIt is the bias for forgeing door, itIndicate input gate, WiIndicate the weight matrix of input gate, biIndicate defeated The bias of introduction,Indicate location mode currently entered, it is calculated according to last output and current input Come, WcIndicate the weight matrix of location mode currently entered, bcIndicate the bias of location mode currently entered, ctTable Show the location mode at current time, it is by forgetting door ftMultiplied by the location mode c of last momentt-1, add input gate itMultiply With location mode currently enteredAnd calculating get, the thus memory c that LSTM is long-termt-1With current memoryIn conjunction with New location mode c is formed togethert, otIndicate out gate, WoThe weight for representing out gate is placed in the middle, boRepresent the inclined of out gate Set value, htIndicate final output, it is by location mode ctWith out gate otIt is common to determine.
Further, this method further includes step S05), increase full articulamentum, full articulamentum output dimension is in training set Class number and sample calculated by Softmax function belong to the probability of each classification, calculation formula isIn formula, y (i) represents the value of i-th of neuron of output layer, y (k) generation The value of k-th of neuron in table output layer, exp represent the exponential function using e the bottom of as.
3, the file classification method according to claim 1 based on convolutional neural networks, it is characterised in that: step Further include the steps that in S01 in detail below: (1) participle operation being carried out to Chinese training dataset, (2) establish dictionary and establish word Text sequence is mapped as index sequence by the mapping of allusion quotation and index, (3), and the sequence length of all samples is processed into the same by (4) Length, can pass through mend 0 or truncation realize that (5) carry out word insertion using the good term vector of pre-training, if Length of sample series is M, the good term vector dimension of pre-training be N, then word insertion after, each sample data be converted into the term vector matrix of M*N and by its Input as convolutional layer.
Further, in step S02, convolution operation, the height difference of convolution kernel are carried out to input using one-dimensional convolutional layer 2 and 3 two scales are taken, the number of convolution kernel is 256, and activation primitive is Relu function.
Further, it joined Normalization layers of Batch between step S02 and S03 data are normalized Processing, accelerates the convergence rate of model.
Further, Dropout layers be joined between step S04 and S05, the random neuron for disconnecting designated ratio connects It connects, prevents over-fitting.
Beneficial effects of the present invention: the present invention is based on convolutional neural networks and Recognition with Recurrent Neural Network LSTM to propose that one kind is based on The file classification method of convolution loop neural network.This method makes full use of the advantage pair of convolutional neural networks extraction local feature Text carries out feature extraction, while using LSTM there is the advantage of memory the contextual feature of extraction is connected more preferable earth's surface Show the semantic information of text.This method not only obtains preferable classifying quality on English data set while on Chinese data collection Also higher classification accuracy is obtained.
Detailed description of the invention
Fig. 1 is convolution loop Artificial Neural Network Structures figure;
Fig. 2 is convolutional neural networks structure chart;
Fig. 3 is LSTM network structure.
Specific embodiment
The present invention is further illustrated in the following with reference to the drawings and specific embodiments.
Embodiment 1
The present embodiment discloses a kind of file classification method based on convolution loop neural network, and this method is based on convolution loop Neural network model, as shown in Figure 1, the model includes input layer, word embeding layer, convolutional layer, pond layer, long short-term memory LSTM Network layer, full articulamentum, the model use convolutional network to carry out multiple groups feature extraction and difference to the text information of input first Pond is carried out to it to extract feature important in text, the feature extracted is then subjected to fusion and is sent into LSTM neural network And pass through full articulamentum output category result.
The specific steps of this method are as follows:
S01), term vector matrix is converted as the input of convolutional layer using the sample data of text sequence;
In text classification, sample data is usually a text sequence, therefore before being sent to neural network, need to be by it It is expressed as term vector matrix.The length of each sample is inconsistent when due to text classification, needs before word insertion by sample Length is processed into the same length, and the size of sample length (sets sample length depending on the size of data set as M).Make herein Carry out word insertion with the good term vector of pre-training and term vector dimension indicated with N, thus each sample be represented by the word of M*N to Moment matrix and input as convolutional layer.
S02), in order to more accurately indicate the semantic feature of text, the present embodiment is using multiple dimensioned convolution kernel to input Data carry out convolution operation, are operated with maximum pondization and carry out pond to the result after convolution to extract the important feature of text, with The result of Chi Huahou is connected into the input as LSTM layers afterwards, convolutional neural networks structure is as shown in Figure 2.
In the present embodiment, convolution operation, the height difference of convolution kernel are carried out to input using one-dimensional convolutional layer (Conv1D) 2 and 3 two scales are taken, the number of convolution kernel is 256, and activation primitive is Relu function.Text size usually takes 100 in text, because The height of characteristic pattern is respectively 99 and 98 (calculation formula such as formulas 1) after this convolution, therefore characteristic pattern dimension is respectively after convolution (99,256) and (98,256).
H2 indicates the height of characteristic pattern after convolution, H in formula (1)1Indicate the height inputted before convolution, F indicates convolution kernel Height, P indicates the size (text in padding size be 0) of Padding, and S indicates step-length (step-length is 1 in text),It indicates It is rounded downwards.
In convolution characteristic extraction procedure, each local feature of input is calculated respectively using single convolution kernel first (calculation formula such as formula 2), it is then again that calculated feature is (such as formula 3) connected longitudinally, activation primitive is finally reused to calculating As a result it carries out NONLINEAR CALCULATION and obtains final convolution feature (such as formula 4).
h1F(i)=f (WF·X(i:i+F-1)+b)(2)
Wherein, WFIndicate height be F convolution kernel, X (i:i+F-1) indicate sample input vector in from ith feature to The local feature vectors of i-th+F-1 features, b indicate bias.
S03), pond is carried out to extract the important of text to the result after convolution using maximum pond layer MaxPooling1D Then the result of Chi Huahou is played the input as LSTM layers, calculation formula point using Concatenate functional link by feature Not as shown in formula 5,6,
S04), the advantages of capable of capturing contextual information using shot and long term memory network (LSTM), will pass through different convolution Core treated input of the text feature sequence as LSTM network, can more accurately be indicated the semanteme of text, into And the classification of text is better achieved.LSTM network structure is as shown in Figure 3.
The calculation formula at LSTM network each moment is as follows:
ft=σ (Wf·[ht-1, h1t]+bf) (7),
ht=otοtanh(ct) (12),
ftIt indicates to forget door, σ indicates sigmoid function, WfIndicate the weight matrix of forgetting door,It indicates two Vector is combined into a longer vector, ht-1The output at LSTM network last moment is represented,It indicates through the defeated of convolution Chi Huahou H out1In the input of t moment, bfIt is the bias for forgeing door, itIndicate input gate, WiIndicate the weight matrix of input gate, biIt indicates The bias of input gate,Indicate location mode currently entered, it is calculated according to last output and current input It gets, WcIndicate the weight matrix of location mode currently entered, bcIndicate the bias of location mode currently entered, ct Indicate the location mode at current time, it is by forgetting door ftMultiplied by the location mode c of last momentt-1, add input gate it Multiplied by location mode currently enteredAnd calculating get, the thus memory c that LSTM is long-termt-1With current memory It is combined together to form new location mode ct, otIndicate out gate, the weight that Wo represents out gate is placed in the middle, boRepresent out gate Bias, htIndicate final output, it is by location mode ctWith out gate otIt is common to determine.
S05), over-fitting in order to prevent joined Dropout layers multiple, rate 0.5 in model.Finally it is in model Full articulamentum, the last one full articulamentum output dimension calculate sample for classification number in data set and by softmax function Belong to the probability of each classification, calculation formula such as following formula (13)
In formula, y (i) represents i-th of neuron of output layer Value, y (k) represent the value of k-th of neuron in output layer, and exp represents the exponential function using e the bottom of as.
Embodiment 2
The present embodiment chooses 2 groups of Chinese data collection and 5 groups of common English Text Classification data sets follow the convolution of proposition Ring neural network model is assessed.Chinese data collection is derived from the Hownet paper data that oneself is collected, 5 groups of English data set sources In a text such as Zhang, data set covers different classification tasks such as sentiment analysis, subject classification, news category.Training sample This size is differed from 120K to 1.4M, and the quantity of classification is between 4 and 14 in classification task.Specific data set information is such as Shown in following table.
1 text classification data set information table of table
Data set Training data Test data Classification Classification task Language
Paper Data Set 1 160000 40000 5 Document classification CH
Paper Data Set2 320000 80000 10 Document classification CH
AG's news 120000 7600 4 News category EN
Sogou news 450000 60000 5 News category EN
DBPedia 560000 70000 14 Ontology EN
Yelp Review Full 650000 50000 5 Sentiment analysis EN
Yahoo!Answers 1400000 60000 10 Subject classification EN
Paper Data Set: academic paper of the academic paper data set in the Hownet that oneself is collected, data set 1 In include 5 document classifications, respectively clinical medicine, mathematics, power industry, biology, vocational education.Each classification is chosen 40000 datas are as experimental data, wherein 80% data set is as training data, 20% data set is as test number According to.It include 10 document classifications, respectively chemistry, light industry handicraft, herding and animal medicine, pharmacy, news in data set 2 With medium, railway transportation, paediatrics, sport, physics, agricultural economy, each classification equally chooses 40000 datas as real Data are tested, 80% data set is as training data, and 20% data set is as test data.
AG ' s news corpus:AG is a set more than 1,000,000 news articles, is ComeToMyHead in several years The news article from more than 2000 a sources of news collected in preceding activity.Data set is mainly used for data mining, and (classification gathers Class), in any non-commercial activities such as information retrieval (ranking, search).The theme of news categorized data set of AG be by Zhang, etc. It builds from data above concentration in character level convolutional neural networks text classification experiment.The data set is from original language 4 maximum classes of selection include World, Sports, Business, Sci/Tech in material library, and each class selects 30000 training Sample and 1900 test samples.Comprising 3 column in each sample, respectively class indexes (1 to 4), title, description information.
Sogou news corpus: search dog theme of news categorized data set is by Zhang etc. from SogouCA and SogouCS In choose for character level convolutional neural networks text classification experiment in.The data set selects 5 from original language material library Maximum classification includes Sports, finance, entertainment, automobile, technology, each class selection 90000 samples are for training, and 12000 samples are for testing.The data set is originally Chinese data collection, but Zhang etc. makes Combine stammerer Words partition system that Chinese data is converted into phonetic text with the library pypinyin.Equally divide comprising 3 column in each sample It Wei not class index (1 to 5), title and content.
DBPedia ontology dataset:DBpedia is a crowdsourcing community, it is intended to knot is extracted from wikipedia The content [24] of structure.DBpedia ontology data collection is constructed by selecting 14 non-overlap classes from DBpedia 2014 , classification include Company, EducationalInstitution, Artist, Athlete, OfficeHolder, MeanOfTransportation、Building、NaturalPlace、Village、Animal、Plant、Album、Film、 WrittenWork.From each of this 14 ontology classes class, 40000 training samples and 5000 tests are randomly choosed Sample.The field of data set includes the title and abstract of class index (1 to 14), every wikipedia article.
Yelp Review Full:Yelp comment data collection is obtained from Yelp Dataset Challenge in 2015 ?.Original comment data collection includes 5 i.e. 1-5 of star comment altogether.Yelp comment data collection is by commenting on from each star In randomly select 130000 training samples and the building of 10000 test samples.Rope is commented on comprising star in each sample Draw (1 to 5) and comment content.
Yahoo!Answers dataset:Yahoo!Answers dataset derives from Yahoo!Webscope data Collection.Yahoo!Include 4483032 problems and their answers in Webscope corpus.Yahoo!Answers subject classification number That 10 maximum classifications buildings are chosen from original language material library according to collection, subject categories include society with culture, science and Mathematics, health, education and reference book, computer and network, sport, commercially with finance, amusement with music, family and relationship and Politics and government.It include 140000 training samples and 6000 test samples in each classification.It include classification in each sample Index (1 to 10), problem title, problem content and optimum answer.
4.2 benchmark model
Choose convolution loop neural network classification of the disaggregated model more classical in recent years as benchmark model and proposition Model compares.Classical fastText and HAN classification are above chosen in homemade 2 groups of Chinese academic papers data set Model is as benchmark model.The benchmark model of text selection includes traditional disaggregated model on 5 groups of general English data sets With model neural network based.Traditional model is mainly linear method, and result provides in a text such as Zhang.It is based on The model of neural network includes char-CNN, fastText and VDCNN, their result is respectively in Zhang etc., Joulin Deng providing in the quotations such as, Conneau, the above benchmark model has used identical experimental data set, therefore for the mould to proposition Type is further to be assessed, and is equally tested using model of the above-mentioned data set to proposition in text.
The setting of 4.3 model parameters
It is to the progress word insertion of input text and fine-tuning during model training using the good term vector of pre-training; Term vector dimension size is 100;Depending on the length of the maximum sentence length Yin Wenben of each sample;The size of dictionary is according to data The difference of collection and it is different, be usually arranged as 20000;The data set that selection ratio is 0.1 is as cross-validation data set; Dropout ratio is 0.5;Convolution kernel size is 2 and 3 and convolution kernel number is 256;The neuron number of LSTM network layer It is 70;Using Adam optimization method and learning rate is set as 1e-4;Batch size is set as 256.
4.4 experimental results and analysis
Herein using above data respectively to the convolution loop neural network textual classification model of proposition carry out experiment and with Benchmark model compares.In addition, in order to enable the convolution loop neural network classification model proposed to obtain preferably text Classifying quality is tested respectively for different convolution kernel numbers in text, in experiment convolution kernel number take 64 respectively, 128, 256,512.Specific experiment result is respectively as shown in table 2 and table 3.
The different convolution kernel number experimental result tables of table 2
3 text classification experimental result table of table
It can be seen that in a certain range from the experimental result in table 2, with the increase of convolution kernel number, text classification Accuracy rate be continuously improved, when convolution kernel number be 256 when, text classification effect is best.In addition, from the experimental result in table 3 As can be seen that the model proposed in text not only achieves preferable classifying quality simultaneously in AG ' s news on Chinese data collection Classification accuracy on corpus and DBPedia ontology dataset is also above other benchmark models.In summary, it mentions Model out is applicable not only to the classification of Chinese data collection, is equally applicable to the classification of English data set.
Convolutional neural networks can be extracted the advantage of local feature and Recognition with Recurrent Neural Network LSTM by the present invention has memory Advantage combine and propose a kind of file classification method based on convolution loop neural network, while choosing 2 groups of Chinese numbers According to collection and 5 groups, commonly English data set tests the model of proposition.The experimental results showed that the model of proposition is not only in Classification accuracy with higher on literary data set also has good classifying quality on other English data sets.
Described above is only basic principle and preferred embodiment of the invention, and those skilled in the art do according to the present invention Improvement and replacement out, belong to the scope of protection of the present invention.

Claims (6)

1. the file classification method based on convolution loop neural network, it is characterised in that: the following steps are included:
S01), term vector matrix is converted as the input of convolutional layer using the sample data of text sequence;
S02), convolution operation is carried out to input data using multiple dimensioned convolution kernel, the height of characteristic pattern uses formula 1 after convolution It calculates, during convolution operation, each local feature of input is calculated respectively using single convolution kernel first, calculate public Formula such as formula 2, then using formula 3 calculated feature is connected longitudinally, finally reuse activation primitive to calculated result into Row NONLINEAR CALCULATION obtains final convolution feature, calculation formula such as formula 4,
h1F(i)=f (WFX (i:i+F-1)+b) (2),
h1F=[h1F(1);h1F(2);...;h1F(H2)] (3),
hr1F=relu (h1F) (4),
In formula, H2The height of characteristic pattern, H after expression convolution1Indicate the height inputted before convolution, F indicates the height of convolution kernel, P Indicating the size of Padding, S indicates step-length,It indicates to be rounded downwards, WFIndicate that height is the convolution kernel of F, X (i:i+F-1) Indicate the local feature vectors of the feature from ith feature to the i-th+F-1 in sample input vector, b indicates bias;
S03), pond is carried out to extract the important spy of text to the result after convolution using maximum pond layer MaxPooling1D Then the result of Chi Huahou is played the input as LSTM layers, calculation formula difference using Concatenate functional link by sign As shown in formula 5,6,
hrp1F=max (hr1F) (5),
S04), LSTM network will be utilized by different convolution kernels treated text feature sequence as the input of LSTM network It can more accurately indicate the semantic information of text, and then the classification of text is better achieved, the meter at LSTM network each moment It is as follows to calculate formula:
ft=σ (Wf·[ht-1, h1t]+bf) (7),
it=σ (Wi·[ht-1, h1t]+bi) (8),
ot=σ (Wo·[ht-1, h1t]+bo) (11),
ht=ot·tanh(ct) (12),
ftIt indicates to forget door, σ indicates sigmoid function, WfIndicate the weight matrix of forgetting door, [ht-1, h1t] indicate two to Amount is combined into a longer vector, ht-1Represent the output at LSTM network last moment, h1tIndicate the output h through convolution Chi Huahou1 In the input of t moment, bfIt is the bias for forgeing door, itIndicate input gate, WiIndicate the weight matrix of input gate, biIndicate defeated The bias of introduction,Indicate location mode currently entered, it is calculated according to last output and current input Come, WcIndicate the weight matrix of location mode currently entered, bcIndicate the bias of location mode currently entered, ctTable Show the location mode at current time, it is by forgetting door ftMultiplied by the location mode c of last momentt-1, add input gate itMultiply With location mode currently enteredAnd calculating get, the thus memory c that LSTM is long-termt-1With current memoryKnot It is combined to form new location mode ct, otIndicate out gate, WoThe weight for representing out gate is placed in the middle, boRepresent out gate Bias, htIndicate final output, it is by location mode ctWith out gate otIt is common to determine.
2. the file classification method according to claim 1 based on convolution loop neural network, it is characterised in that: further include Step S05), increase full articulamentum, the output dimension of full articulamentum is the class number in training set and passes through Softmax letter Number calculates the probability that sample belongs to each classification, and calculation formula isFormula In, y (i) represents the value of i-th of neuron of output layer, and y (k) represents the value of k-th of neuron in output layer, exp represent with e as The exponential function at bottom.
3. the file classification method according to claim 1 based on convolution loop neural network, it is characterised in that: step Further include the steps that in S01 in detail below: (1) participle operation being carried out to Chinese training dataset, (2) establish dictionary and establish word Text sequence is mapped as index sequence by the mapping of allusion quotation and index, (3), and the sequence length of all samples is processed into the same by (4) Length, (5) carry out word insertion using the good term vector of pre-training, if Length of sample series is M, the good term vector dimension of pre-training Degree is N, then after word insertion, each sample data is converted into the term vector matrix of M*N and the input as convolutional layer.
4. the file classification method according to claim 1 based on convolution loop neural network, it is characterised in that: step Convolution operation is carried out to input using one-dimensional convolutional layer in S02, the height of convolution kernel takes 2 and 3 two scales respectively, convolution kernel Number is 256, and activation primitive is Relu function.
5. the file classification method according to claim 1 based on convolution loop neural network, it is characterised in that: step It joined Normalization layers of Batch between S02 and S03 data are normalized, accelerate the convergence speed of model Degree.
6. the file classification method according to claim 1 based on convolution loop neural network, it is characterised in that: step It joined Dropout layers between S04 and S05, the random neuron connection for disconnecting designated ratio prevents over-fitting.
CN201910025175.0A 2019-01-10 2019-01-10 File classification method based on convolution loop neural network Pending CN109840279A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910025175.0A CN109840279A (en) 2019-01-10 2019-01-10 File classification method based on convolution loop neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910025175.0A CN109840279A (en) 2019-01-10 2019-01-10 File classification method based on convolution loop neural network

Publications (1)

Publication Number Publication Date
CN109840279A true CN109840279A (en) 2019-06-04

Family

ID=66883776

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910025175.0A Pending CN109840279A (en) 2019-01-10 2019-01-10 File classification method based on convolution loop neural network

Country Status (1)

Country Link
CN (1) CN109840279A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110347826A (en) * 2019-06-17 2019-10-18 昆明理工大学 A method of Laos's words and phrases feature is extracted based on character
CN110399455A (en) * 2019-06-05 2019-11-01 福建奇点时空数字科技有限公司 A kind of deep learning data digging method based on CNN and LSTM
CN110569358A (en) * 2019-08-20 2019-12-13 上海交通大学 Model, method and medium for learning long-term dependency and hierarchical structure text classification
CN110569400A (en) * 2019-07-23 2019-12-13 福建奇点时空数字科技有限公司 Information extraction method for personnel information modeling based on CNN and LSTM
CN110717330A (en) * 2019-09-23 2020-01-21 哈尔滨工程大学 Word-sentence level short text classification method based on deep learning
CN110765785A (en) * 2019-09-19 2020-02-07 平安科技(深圳)有限公司 Neural network-based Chinese-English translation method and related equipment thereof
CN111078833A (en) * 2019-12-03 2020-04-28 哈尔滨工程大学 Text classification method based on neural network
CN111310801A (en) * 2020-01-20 2020-06-19 桂林航天工业学院 Mixed dimension flow classification method and system based on convolutional neural network
CN111460100A (en) * 2020-03-30 2020-07-28 中南大学 Criminal legal document and criminal name recommendation method and system
CN111459927A (en) * 2020-03-27 2020-07-28 中南大学 CNN-L STM developer project recommendation method
CN112597311A (en) * 2020-12-28 2021-04-02 东方红卫星移动通信有限公司 Terminal information classification method and system based on low-earth-orbit satellite communication
CN112989052A (en) * 2021-04-19 2021-06-18 北京建筑大学 Chinese news text classification method based on combined-convolutional neural network
CN113297364A (en) * 2021-06-07 2021-08-24 吉林大学 Natural language understanding method and device for dialog system
CN113378556A (en) * 2020-02-25 2021-09-10 华为技术有限公司 Method and device for extracting text keywords
CN114207605A (en) * 2019-10-31 2022-03-18 深圳市欢太科技有限公司 Text classification method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169035A (en) * 2017-04-19 2017-09-15 华南理工大学 A kind of file classification method for mixing shot and long term memory network and convolutional neural networks
CN108595632A (en) * 2018-04-24 2018-09-28 福州大学 A kind of hybrid neural networks file classification method of fusion abstract and body feature
CN108763216A (en) * 2018-06-01 2018-11-06 河南理工大学 A kind of text emotion analysis method based on Chinese data collection

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169035A (en) * 2017-04-19 2017-09-15 华南理工大学 A kind of file classification method for mixing shot and long term memory network and convolutional neural networks
CN108595632A (en) * 2018-04-24 2018-09-28 福州大学 A kind of hybrid neural networks file classification method of fusion abstract and body feature
CN108763216A (en) * 2018-06-01 2018-11-06 河南理工大学 A kind of text emotion analysis method based on Chinese data collection

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CHUNTING ZHOU等: "A C-LSTM Neural Network for Text Classification", 《COMPUTER SCIENCE》 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399455A (en) * 2019-06-05 2019-11-01 福建奇点时空数字科技有限公司 A kind of deep learning data digging method based on CNN and LSTM
CN110347826A (en) * 2019-06-17 2019-10-18 昆明理工大学 A method of Laos's words and phrases feature is extracted based on character
CN110569400A (en) * 2019-07-23 2019-12-13 福建奇点时空数字科技有限公司 Information extraction method for personnel information modeling based on CNN and LSTM
CN110569358A (en) * 2019-08-20 2019-12-13 上海交通大学 Model, method and medium for learning long-term dependency and hierarchical structure text classification
CN110765785A (en) * 2019-09-19 2020-02-07 平安科技(深圳)有限公司 Neural network-based Chinese-English translation method and related equipment thereof
CN110765785B (en) * 2019-09-19 2024-03-22 平安科技(深圳)有限公司 Chinese-English translation method based on neural network and related equipment thereof
CN110717330A (en) * 2019-09-23 2020-01-21 哈尔滨工程大学 Word-sentence level short text classification method based on deep learning
CN114207605A (en) * 2019-10-31 2022-03-18 深圳市欢太科技有限公司 Text classification method and device, electronic equipment and storage medium
CN111078833A (en) * 2019-12-03 2020-04-28 哈尔滨工程大学 Text classification method based on neural network
CN111078833B (en) * 2019-12-03 2022-05-20 哈尔滨工程大学 Text classification method based on neural network
CN111310801A (en) * 2020-01-20 2020-06-19 桂林航天工业学院 Mixed dimension flow classification method and system based on convolutional neural network
CN113378556A (en) * 2020-02-25 2021-09-10 华为技术有限公司 Method and device for extracting text keywords
CN113378556B (en) * 2020-02-25 2023-07-14 华为技术有限公司 Method and device for extracting text keywords
CN111459927A (en) * 2020-03-27 2020-07-28 中南大学 CNN-L STM developer project recommendation method
CN111459927B (en) * 2020-03-27 2022-07-08 中南大学 CNN-LSTM developer project recommendation method
CN111460100A (en) * 2020-03-30 2020-07-28 中南大学 Criminal legal document and criminal name recommendation method and system
CN112597311B (en) * 2020-12-28 2023-07-11 东方红卫星移动通信有限公司 Terminal information classification method and system based on low-orbit satellite communication
CN112597311A (en) * 2020-12-28 2021-04-02 东方红卫星移动通信有限公司 Terminal information classification method and system based on low-earth-orbit satellite communication
CN112989052A (en) * 2021-04-19 2021-06-18 北京建筑大学 Chinese news text classification method based on combined-convolutional neural network
CN112989052B (en) * 2021-04-19 2022-03-08 北京建筑大学 Chinese news long text classification method based on combination-convolution neural network
CN113297364A (en) * 2021-06-07 2021-08-24 吉林大学 Natural language understanding method and device for dialog system

Similar Documents

Publication Publication Date Title
CN109840279A (en) File classification method based on convolution loop neural network
CN107967318A (en) A kind of Chinese short text subjective item automatic scoring method and system using LSTM neutral nets
CN109740154A (en) A kind of online comment fine granularity sentiment analysis method based on multi-task learning
CN106445919A (en) Sentiment classifying method and device
CN110083700A (en) A kind of enterprise's public sentiment sensibility classification method and system based on convolutional neural networks
CN106599933A (en) Text emotion classification method based on the joint deep learning model
CN113033610B (en) Multi-mode fusion sensitive information classification detection method
CN108090099B (en) Text processing method and device
CN110825850B (en) Natural language theme classification method and device
CN109977199A (en) A kind of reading understanding method based on attention pond mechanism
Pacha et al. Towards self-learning optical music recognition
CN112347766A (en) Multi-label classification method for processing microblog text cognition distortion
CN109062958B (en) Primary school composition automatic classification method based on TextRank and convolutional neural network
Fei et al. Beyond prompting: Making pre-trained language models better zero-shot learners by clustering representations
Smitha et al. Meme classification using textual and visual features
Jishan et al. Natural language description of images using hybrid recurrent neural network
CN113033180B (en) Automatic generation service system for Tibetan reading problem of primary school
Kasthuri et al. An artificial bee colony and pigeon inspired optimization hybrid feature selection algorithm for twitter sentiment analysis
CN114925198B (en) Knowledge-driven text classification method integrating character information
Li et al. Multilingual toxic text classification model based on deep learning
Mouri et al. An empirical study on bengali news headline categorization leveraging different machine learning techniques
Alvarado et al. Detecting Disaster Tweets using a Natural Language Processing technique
Rawat et al. A Systematic Review of Question Classification Techniques Based on Bloom's Taxonomy
Alsharhan Natural Language Generation and Creative Writing A Systematic Review
Alhabeeb et al. An Investigation into Indonesian Students' Opinions on Educational Reforms through the Use of Machine Learning and Sentiment Analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190604