CN106776580A - The theme line recognition methods of the deep neural network CNN and RNN of mixing - Google Patents

The theme line recognition methods of the deep neural network CNN and RNN of mixing Download PDF

Info

Publication number
CN106776580A
CN106776580A CN201710047031.6A CN201710047031A CN106776580A CN 106776580 A CN106776580 A CN 106776580A CN 201710047031 A CN201710047031 A CN 201710047031A CN 106776580 A CN106776580 A CN 106776580A
Authority
CN
China
Prior art keywords
sentence
word
rnn
cnn
theme
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710047031.6A
Other languages
Chinese (zh)
Inventor
张志勇
任江涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN201710047031.6A priority Critical patent/CN106776580A/en
Publication of CN106776580A publication Critical patent/CN106776580A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The inventive method trains term vector using the whole network news data collection in search dog laboratory so that each close word spatially closely located;And the travel notes of 600 are respectively crawled from Baidu's tour site and hornet nest tour site, and sentence is divided into travel notes, these sentences are divided into training set and test set and according to 8:2 ratio is divided, and then calculates the information entropy and association relationship of each word according to the computing formula of comentropy and mutual information for training set;Then, for each sentence in training set according to the term vector that calculates and the comentropy for calculating and mutual information come construction feature, as the input of the interacting depth neutral net CNN_RNN for building, get parameter;Meanwhile, each sentence in test set is input in CNN_RNN according to the term vector that calculates and the comentropy for calculating and mutual information come construction feature, classification is calculated using the parameter for obtaining, the error of standard results and prediction is drawn, evaluate its performance.

Description

The theme line recognition methods of the deep neural network CNN and RNN of mixing
Technical field
The present invention relates to text mining field, more particularly, to the deep neural network CNN and RNN of a kind of mixing Theme line recognition methods.
Background technology
In recent years, with expanding economy, increasing people starts to travel to enrich the cultural life of oneself.Really Real, tourism can not only be relaxed, and allow I am happier can also to expand the visual field.According to the data that National Tourism Administration announces It has been shown that, tourist industry alreadys exceed 10% to the contribution rate of GDP.At present, tourism has become critically important during schedule is lived one Point.In the Internet Age, many people begin through microblogging, and social network sites share tourism experience in a text form.
In general, most of what is seen and heard in travelling is described and oneself view to these sight spots is delivered in travel notes And to later visitor's some of the recommendations, but still can irregular some unrelated contents.These theme sentences can be identified, The knowledge in tourism of excavating for success is very important.Because these unrelated contents can cause certain to result Noise effect.
For example:In the travel notes in the description Guangzhou in hornet nest, someone writes:" thank to your concern and support, if felt Obtaining this article value must share, and please recommend your friend or wechat group, and share in the circle of friends of oneself ".This is very bright Show description is not the what is seen and heard in tourism, is not naturally also just theme line, and these sentences are undoubtedly to suitable to text analyzing In the certain noise of addition.For another example, someone writes:" the flower city square at nightfall is brilliantly illuminated, can look into the distance " small rough waist " five contemporary literature histories, It is more charming than daytime ".This clearly be exactly theme line, described in it the night scene of the Zhujiang New City in Guangzhou.Exactly these are led The content for inscribing sentence is only the emphasis of concern.
When travelling recommendation is carried out, guests can't only refer only to Guangzhou in the travel notes in Guangzhou, can also mention In the city on Guangzhou periphery, for example:The what is seen and heard in the cities such as Hong Kong, Zhuhai, Shenzhen, description and comment to these sight spots Removal has great significance to later Knowledge Discovery.Because one of shortcoming of LDA models is exactly be it to noise ratio compared with It is sensitive.That is, influence of the noise to result is very big.
Therefore, in travel notes, most sentence all be illustrate travelling in sight spot and these sight spots are commented on How sentence, correctly identify that these theme sentences are a current problems for challenge.
The content of the invention
The present invention provides a kind of theme line recognition methods of the deep neural network CNN and RNN of the mixing of more preferable effect.
In order to reach above-mentioned technique effect, technical scheme is as follows:
The theme line recognition methods of the deep neural network CNN and RNN of a kind of mixing, comprises the following steps:
S1:Term vector is trained using the whole network news data collection in search dog laboratory so that each close word is in space On it is closely located;
S2:The travel notes of 600 are respectively crawled from Baidu's tour site and hornet nest tour site, sentence are divided into travel notes, These sentences are divided into training set and test set and according to 8:2 ratio is divided, then for training set according to information The computing formula of entropy and mutual information calculates the information entropy and association relationship of each word;
S3:Comentropy and mutual information that the term vector and S2 calculated according to S1 for each sentence in training set are calculated Carry out construction feature, as the input of the interacting depth neutral net CNN_RNN for building, get parameter;
S4:Comentropy that the same term vector calculated according to S1 to each sentence in test set and S2 is calculated and mutually Information carrys out construction feature, is input in deep neural network CNN_RNN, and the parameter obtained using S3 calculates its classification, draws Standard results and the error of prediction, evaluate its performance.
Further, the detailed process of the step S1 is as follows:
S11:The whole network news data collection in search dog laboratory is downloaded first, and data set is cleaned, draw every Complete news;
S12:Participle is carried out to data set, is written in file, " t are used between word and word " separate, between news and news With " n " separate;
S13:The word2vec instruments in the gensim of python are called, unsupervised training is carried out to word, obtain its word Vector representation.
Further, the detailed process of the step S2 is as follows:
S21:Participle is carried out for each sentence in training set, stop words is removed, a collection for word is obtained to each sentence Close, count the appearance frequency of each word in appearance frequency and the not a theme sentence of each word in theme line;
S22:The information entropy IG of each word is calculated, formula is calculated as follows:
Wherein, K is coefficient, and n represents classification number, piThe probability that each word appears in classification i is represented, meanwhile, setting frequency Number threshold value, the word for frequency less than 3 does not consider its value;
S23:Calculate each word in different classes of middle association relationship, formula is calculated as follows:
For " pleasure " this word, p (pleasure, theme line) represents that pleasure appears in the number of times in theme line, similarly p (pleasure, not a theme sentence) represents the number of times during " pleasure " this word appears in not a theme sentence;
PMI value computing formula to each word is as follows:
PMI (pleasure)=PMI (pleasure, theme line)/PMI (pleasure, theme line).
Further, the detailed process of the step S3 is as follows:
S31:According to 200 dimensions of term vector for drawing each word before, comentropy IG and association relationship PMI, therefore often Individual word 202 features altogether, in training set, word number is used as standard, such as, this sentence in selecting that maximum sentence There are 200 words in son.So just there is 200*202 character representation, it is inadequate for word in sentence, that is to say, that the word in sentence Number less than 200,100 words for example, then actually have 100*202 feature, not enough use 0 is supplemented and is also accomplished by (200-100) * 202 values;
S32:For each sentence vector, 200*202 feature is obtained, first, be introduced into convolutional neural networks layer, meter Calculate formula:
WhereinJ-th characteristic pattern of l convolutional layers is represented, the right represents the result of last layerWith j-th convolution kernel Convolution is carried out, and plus bias vectorIt is eventually adding activation primitive;
S33:By above-mentioned input 200*202 vectors, it is assumed that set a convolution kernel, convolution kernel size is 3, then warp The output for crossing S32 is 198 dimensions, then, is input to pond layer in convolutional neural networks CNN, and its computing formula is:
Herein, it is 198 dimensions to each above-mentioned characteristic pattern, takes maximum, reforms into 1 dimension, but it is actual On, n characteristic pattern is set to each sentence, therefore, each sentence has n features;
S34:For the result of above-mentioned convolutional neural networks CNN, n feature is formed to each sentence, using this as following The input of ring neutral net RNN, calculates the vector of concealed nodes, and computing formula is:
ht=f (xtU+ht-1W+bt)
Wherein, xtIt is input, U is enter into the conversion of concealed nodes, ht-1The concealed nodes of last layer are represented, W represents hidden Layer to the conversion of hidden layer is hidden, b is bias vector, finally plus f activation primitives;
S35:It is time series models because RNN master is to be processed, therefore it is sorted in final step, the calculating of output is public Formula is:
ot=soft max (htV+bt)
otOutput is represented, wherein V is the conversion for representing hidden layer to output layer, finally plus softmax functions;
S36:After result is calculated, error and true error are contrasted, then counting loss function progressively adjusts Parameter so that loss function is minimum.
Further, the detailed process of the step S4 is as follows:
S41:For each sentence in test set, participle is carried out, remove stop words, obtain the term vector of each word, information Entropy, association relationship, for being supplemented with 0 less than 200 words in sentence;
S42:To each sentence expression into 200*202 form, it is input in CNN_RNN models, obtains each sentence Classification;
S43:Result and standard results to model output are contrasted, counting accuracy, recall rate, F test values and standard True rate.
Compared with prior art, the beneficial effect of technical solution of the present invention is:
The inventive method trains term vector using the whole network news data collection in search dog laboratory so that each close word Spatially closely located;And the travel notes of 600 are respectively crawled from Baidu's tour site and hornet nest tour site, to travel notes point Sentence is cut into, these sentences are divided into training set and test set and according to 8:2 ratio is divided, then for training set Computing formula according to comentropy and mutual information calculates the information entropy and association relationship of each word;Then, for training set In each sentence according to the term vector that calculates and the comentropy for calculating and mutual information come construction feature, as the mixing for building The input of deep neural network CNN_RNN, gets parameter;Meanwhile, to each sentence in test set according to the word for calculating to The comentropy and mutual information measured and calculate carry out construction feature, are input in deep neural network CNN_RNN, using the ginseng for obtaining Number, calculates its classification, draws the error of standard results and prediction, evaluates its performance, and it is good that experiment proves that the method has Recognition effect.
Brief description of the drawings
Fig. 1 is the inventive method flow chart;
Fig. 2 is the CNN_RNN model structure schematic diagrames that the present invention builds;
Fig. 3 is the impact effect histogram of the Information Entropy Features to classification results of proposition in the present invention;
Fig. 4 is the disaggregated model of proposition in the present invention and the contrast histogram of traditional SVM and xgboost classifying qualities.
Specific embodiment
Accompanying drawing being for illustration only property explanation, it is impossible to be interpreted as the limitation to this patent;
In order to more preferably illustrate the present embodiment, accompanying drawing some parts have omission, zoom in or out, and do not represent actual product Size;
To those skilled in the art, it can be to understand that some known features and its explanation may be omitted in accompanying drawing 's.
Technical scheme is described further with reference to the accompanying drawings and examples.
Embodiment 1
As shown in figure 1, the theme line recognition methods of the deep neural network CNN and RNN of a kind of mixing, including following step Suddenly:
S1:Term vector is trained using the whole network news data collection in search dog laboratory so that each close word is in space On it is closely located;
S2:The travel notes of 600 are respectively crawled from Baidu's tour site and hornet nest tour site, sentence are divided into travel notes, These sentences are divided into training set and test set and according to 8:2 ratio is divided, then for training set according to information The computing formula of entropy and mutual information calculates the information entropy and association relationship of each word;
S3:Comentropy and mutual information that the term vector and S2 calculated according to S1 for each sentence in training set are calculated Carry out construction feature, as the input (model of structure such as Fig. 2) of the interacting depth neutral net CNN_RNN for building, get ginseng Number;
S4:Comentropy that the same term vector calculated according to S1 to each sentence in test set and S2 is calculated and mutually Information carrys out construction feature, is input in deep neural network CNN_RNN, and the parameter obtained using S3 calculates its classification, draws Standard results and the error of prediction, evaluate its performance.
Further, the detailed process of the step S1 is as follows:
S11:The whole network news data collection in search dog laboratory is downloaded first, and data set is cleaned, draw every Complete news;
S12:Participle is carried out to data set, is written in file, " t are used between word and word " separate, between news and news With " n " separate;
S13:The word2vec instruments in the gensim of python are called, unsupervised training is carried out to word, obtain its word Vector representation.
Further, the detailed process of the step S2 is as follows:
S21:Participle is carried out for each sentence in training set, stop words is removed, a collection for word is obtained to each sentence Close, count the appearance frequency of each word in appearance frequency and the not a theme sentence of each word in theme line;
S22:The information entropy IG of each word is calculated, formula is calculated as follows:
Wherein, K is coefficient, and n represents classification number, piThe probability that each word appears in classification i is represented, meanwhile, setting frequency Number threshold value, the word for frequency less than 3 does not consider its value;
S23:Calculate each word in different classes of middle association relationship, formula is calculated as follows:
For " pleasure " this word, p (pleasure, theme line) represents that pleasure appears in the number of times in theme line, similarly p (pleasure, not a theme sentence) represents the number of times during " pleasure " this word appears in not a theme sentence;
PMI value computing formula to each word is as follows:
PMI (pleasure)=PMI (pleasure, theme line)/PMI (pleasure, theme line).
Further, the detailed process of the step S3 is as follows:
S31:According to 200 dimensions of term vector for drawing each word before, comentropy IG and association relationship PMI, therefore often Individual word 202 features altogether, in training set, word number is used as standard, such as, this sentence in selecting that maximum sentence There are 200 words in son.So just there is 200*202 character representation, it is inadequate for word in sentence, that is to say, that the word in sentence Number less than 200,100 words for example, then actually have 100*202 feature, not enough use 0 is supplemented and is also accomplished by (200-100) * 202 values;
S32:For each sentence vector, 200*202 feature is obtained, first, be introduced into convolutional neural networks layer, meter Calculate formula:
WhereinJ-th characteristic pattern of l convolutional layers is represented, the right represents the result of last layerWith j-th convolution kernel Convolution is carried out, and plus bias vectorIt is eventually adding activation primitive;
S33:By above-mentioned input 200*202 vectors, it is assumed that set a convolution kernel, convolution kernel size is 3, then warp The output for crossing S32 is 198 dimensions, then, is input to pond layer in convolutional neural networks CNN, and its computing formula is:
Herein, it is 198 dimensions to each above-mentioned characteristic pattern, takes maximum, reforms into 1 dimension, but it is actual On, n characteristic pattern is set to each sentence, therefore, each sentence has n features;
S34:For the result of above-mentioned convolutional neural networks CNN, n feature is formed to each sentence, using this as following The input of ring neutral net RNN, calculates the vector of concealed nodes, and computing formula is:
ht=f (xtU+ht-1W+bt)
Wherein, xtIt is input, U is enter into the conversion of concealed nodes, ht-1The concealed nodes of last layer are represented, W represents hidden Layer to the conversion of hidden layer is hidden, b is bias vector, finally plus f activation primitives;
S35:It is time series models because RNN master is to be processed, therefore it is sorted in final step, the calculating of output is public Formula is:
ot=soft max (htV+bt)
otOutput is represented, wherein V is the conversion for representing hidden layer to output layer, finally plus softmax functions;
S36:After result is calculated, error and true error are contrasted, then counting loss function progressively adjusts Parameter so that loss function is minimum.
Further, the detailed process of the step S4 is as follows:
S41:For each sentence in test set, participle is carried out, remove stop words, obtain the term vector of each word, information Entropy, association relationship, for being supplemented with 0 less than 200 words in sentence;
S42:To each sentence expression into 200*202 form, it is input in CNN_RNN models, obtains each sentence Classification;
S43:Result and standard results to model output are contrasted, counting accuracy, recall rate, F test values and standard True rate.
Tested using the method:
1st, experimental data set:Baidu travels and 1200 travel notes on hornet nest;
2nd, experimental situation:Python2.7.9 and tensorflow;
3rd, experimental tool collection:Python Open-Source Tools casees;
4th, experimental technique:The data set of crawl includes the Guangzhou travel notes of 1200 correlations, and every length of travel notes is from 2000 To 20000 numbers of words, the sentence that travel notes are divided into from 20 to 500 so far.Split by travel notes, obtain altogether The sentence of 50000.These sentences include 100000 words altogether, carry out artificial they being labeled as theme line or non-master Topic sentence, classification number is 2.
Each root first in each sentence according to the word2vec models for training obtain its corresponding 200 dimension word to Amount, for the word for not having to occur in sentence, randomly generates.For each word in training set calculate association relationship PMI and Information entropy, we produce a vector for the number of 202 sentence words to each sentence.We to each sentence using 35 as Standard, is supplemented less than 35 with 0.In a model, we have been described in for these pretreatments, and we are not repeating.
Combination and single model convolutional neural networks CNN or Recognition with Recurrent Neural Network RNN first to our model are entered Row contrast, and carry out the contrast of effect respectively to the addition of comentropy and mutual information feature extraction mode.Fig. 3 main presentations are Evaluation of the different models to effect, in order to more convenient, RNN is the LSTM units of two-layer.
5th, evaluation criterion:Accurate rate, recall rate, F values, accuracy rate
6th, experimental result:As shown in figure 4, for comparison model and the performance of traditional grader, we use SVM algorithm Contrasted with xgbosot algorithms.In traditional classifier, SVM and xgboost is to be known as being best single point respectively Class device and integrated classifier.
It can be found that our model can obtain extraordinary effect for other model.
The same or analogous part of same or analogous label correspondence;
Position relationship for the explanation of being for illustration only property described in accompanying drawing, it is impossible to be interpreted as the limitation to this patent;
Obviously, the above embodiment of the present invention is only intended to clearly illustrate example of the present invention, and is not right The restriction of embodiments of the present invention.For those of ordinary skill in the field, may be used also on the basis of the above description To make other changes in different forms.There is no need and unable to be exhaustive to all of implementation method.It is all this Any modification, equivalent and improvement made within the spirit and principle of invention etc., should be included in the claims in the present invention Protection domain within.

Claims (5)

1. the theme line recognition methods of the deep neural network CNN and RNN of a kind of mixing, it is characterised in that comprise the following steps:
S1:Using the whole network news data collection in search dog laboratory train term vector so that each close word is spatially It is closely located;
S2:The travel notes of 600 are respectively crawled from Baidu's tour site and hornet nest tour site, sentence is divided into travel notes, by this A little sentences are divided into training set and test set and according to 8:2 ratio is divided, then for training set according to comentropy and The computing formula of mutual information calculates the information entropy and association relationship of each word;
S3:Comentropy that the term vector and S2 calculated according to S1 for each sentence in training set are calculated and mutual information are come structure Feature is built, as the input of the interacting depth neutral net CNN_RNN for building, parameter is got;
S4:Comentropy and mutual information that the same term vector calculated according to S1 to each sentence in test set and S2 is calculated Carry out construction feature, be input in deep neural network CNN_RNN, the parameter obtained using S3 calculates its classification, draws standard Result and the error of prediction, evaluate its performance.
2. the theme line recognition methods of the deep neural network CNN and RNN of mixing according to claim 1, its feature exists In the detailed process of the step S1 is as follows:
S11:The whole network news data collection in search dog laboratory is downloaded first, and data set is cleaned, draw every completely News;
S12:Participle is carried out to data set, is written in file, " t are used between word and word " separate, ” is used between news and news N " is separated;
S13:The word2vec instruments in the gensim of python are called, unsupervised training is carried out to word, obtain its term vector Represent.
3. the theme line recognition methods of the deep neural network CNN and RNN of mixing according to claim 2, its feature exists In the detailed process of the step S2 is as follows:
S21:Participle is carried out for each sentence in training set, stop words is removed, a set for word is obtained to each sentence, united Count out the appearance frequency of each word in appearance frequency and the not a theme sentence of each word in theme line;
S22:The information entropy IG of each word is calculated, formula is calculated as follows:
inf o r m a t i o n G a i n = - K Σ i = 1 n p i logp i
Wherein, K is coefficient, and n represents classification number, piThe probability that each word appears in classification i is represented, meanwhile, set frequency threshold Value, the word for frequency less than 3, its value is not considered;
S23:Calculate each word in different classes of middle association relationship, formula is calculated as follows:
For " pleasure " this word, p (pleasure, theme line) represents that pleasure appears in the number of times in theme line, and similarly p is (pleased Happy, not a theme sentence) represent number of times during " pleasure " this word appears in not a theme;
PMI value computing formula to each word is as follows:
PMI (pleasure)=PMI (pleasure, theme line)/PMI (pleasure, theme line).
4. the theme line recognition methods of the deep neural network CNN and RNN of mixing according to claim 3, its feature exists In the detailed process of the step S3 is as follows:
S31:According to 200 dimensions of term vector for drawing each word before, comentropy IG and association relationship PMI, therefore each word 202 features altogether, in training set, in selecting that maximum sentence word number as standard, such as, in this sentence There are 200 words.So just there is 200*202 character representation, it is inadequate for word in sentence, that is to say, that the word number in sentence Less than 200,100 words for example, then actually have 100*202 feature, not enough use 0 is supplemented and is also accomplished by (200-100) * 202 values;
S32:For each sentence vector, 200*202 feature is obtained, first, be introduced into convolutional neural networks layer, calculate public Formula:
x j l = f ( Σ i ∈ M j x i l - 1 * k i j l + b j l )
WhereinJ-th characteristic pattern of l convolutional layers is represented, the right represents the result of last layerWith j-th convolution kernelCarry out Convolution, and plus bias vectorIt is eventually adding activation primitive;
S33:By above-mentioned input 200*202 vectors, it is assumed that set a convolution kernel, convolution kernel size is 3, then passed through The output of S32 is 198 dimensions, then, is input to pond layer in convolutional neural networks CNN, and its computing formula is:
x j l = f ( m a x ( x j l - 1 ) )
Herein, it is 198 dimensions to each above-mentioned characteristic pattern, takes maximum, reforms into 1 dimension, but it is in fact, right Each sentence sets n characteristic pattern, therefore, each sentence has n features;
S34:For the result of above-mentioned convolutional neural networks CNN, n feature is formed to each sentence, it is refreshing using this as circulation Through the input of network RNN, the vector of concealed nodes is calculated, computing formula is:
ht=f (xtU+ht-1W+bt)
Wherein, xtIt is input, U is enter into the conversion of concealed nodes, ht-1The concealed nodes of last layer are represented, W represents hidden layer To the conversion of hidden layer, b is bias vector, finally plus f activation primitives;
S35:It is time series models because RNN master is to be processed, therefore it is sorted in final step, the computing formula of output It is:
ot=soft max (htV+bt)
otOutput is represented, wherein V is the conversion for representing hidden layer to output layer, finally plus softmax functions;
S36:After result is calculated, error and true error are contrasted, counting loss function, then progressively adjustment is joined Number so that loss function is minimum.
5. the theme line recognition methods of the deep neural network CNN and RNN of mixing according to claim 4, its feature exists In the detailed process of the step S4 is as follows:
S41:For each sentence in test set, participle is carried out, remove stop words, obtain the term vector of each word, comentropy, mutually The value of information, for being supplemented with 0 less than 200 words in sentence;
S42:To each sentence expression into 200*202 form, it is input in CNN_RNN models, obtains the classification of each sentence;
S43:Result and standard results to model output are contrasted, counting accuracy, recall rate, F test values and accuracy rate.
CN201710047031.6A 2017-01-20 2017-01-20 The theme line recognition methods of the deep neural network CNN and RNN of mixing Pending CN106776580A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710047031.6A CN106776580A (en) 2017-01-20 2017-01-20 The theme line recognition methods of the deep neural network CNN and RNN of mixing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710047031.6A CN106776580A (en) 2017-01-20 2017-01-20 The theme line recognition methods of the deep neural network CNN and RNN of mixing

Publications (1)

Publication Number Publication Date
CN106776580A true CN106776580A (en) 2017-05-31

Family

ID=58943831

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710047031.6A Pending CN106776580A (en) 2017-01-20 2017-01-20 The theme line recognition methods of the deep neural network CNN and RNN of mixing

Country Status (1)

Country Link
CN (1) CN106776580A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107506350A (en) * 2017-08-16 2017-12-22 京东方科技集团股份有限公司 A kind of method and apparatus of identification information
CN108268613A (en) * 2017-12-29 2018-07-10 广州都市圈网络科技有限公司 Tour schedule generation method, electronic equipment and storage medium based on semantic analysis
CN108491515A (en) * 2018-03-26 2018-09-04 中国科学技术大学 A kind of sentence pair matching degree prediction technique for campus psychological consultation
CN108647263A (en) * 2018-04-28 2018-10-12 淮阴工学院 A kind of network address method for evaluating confidence crawled based on segmenting web page
CN109271989A (en) * 2018-09-03 2019-01-25 广东电网有限责任公司东莞供电局 Automatic handwritten test data identification method based on CNN and RNN models
CN109472020A (en) * 2018-10-11 2019-03-15 重庆邮电大学 A kind of feature alignment Chinese word cutting method
CN109472021A (en) * 2018-10-12 2019-03-15 北京诺道认知医学科技有限公司 Critical sentence screening technique and device in medical literature based on deep learning
CN109711253A (en) * 2018-11-19 2019-05-03 国家电网有限公司 Ammeter technique for partitioning based on convolutional neural networks and Recognition with Recurrent Neural Network
WO2019164078A1 (en) * 2018-02-23 2019-08-29 (주)에어사운드 Real-time multi-language interpretation wireless transmitting and receiving system capable of extracting topic sentence and transmitting and receiving method using same
CN110222328A (en) * 2019-04-08 2019-09-10 平安科技(深圳)有限公司 Participle and part-of-speech tagging method, apparatus, equipment and storage medium neural network based
CN110502898A (en) * 2019-07-31 2019-11-26 深圳前海达闼云端智能科技有限公司 Method, system, device, storage medium and the electronic equipment of the intelligent contract of audit
CN111542012A (en) * 2020-04-28 2020-08-14 南昌航空大学 Human body tumbling detection method based on SE-CNN
CN112100367A (en) * 2019-05-28 2020-12-18 贵阳海信网络科技有限公司 Public opinion early warning method and device for scenic spot

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567308A (en) * 2011-12-20 2012-07-11 上海电机学院 Information processing feature extracting method
CN104063472A (en) * 2014-06-30 2014-09-24 电子科技大学 KNN text classifying method for optimizing training sample set
CN104572892A (en) * 2014-12-24 2015-04-29 中国科学院自动化研究所 Text classification method based on cyclic convolution network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567308A (en) * 2011-12-20 2012-07-11 上海电机学院 Information processing feature extracting method
CN104063472A (en) * 2014-06-30 2014-09-24 电子科技大学 KNN text classifying method for optimizing training sample set
CN104572892A (en) * 2014-12-24 2015-04-29 中国科学院自动化研究所 Text classification method based on cyclic convolution network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DEPENG LIANG: "AC-BLSTM:Asymmetric Convolutional Bidirectional LSTM Networks for Text Classification", 《HTTPS://ARXIV.ORG/ABS/1611.01884V1》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107506350A (en) * 2017-08-16 2017-12-22 京东方科技集团股份有限公司 A kind of method and apparatus of identification information
CN108268613A (en) * 2017-12-29 2018-07-10 广州都市圈网络科技有限公司 Tour schedule generation method, electronic equipment and storage medium based on semantic analysis
WO2019164078A1 (en) * 2018-02-23 2019-08-29 (주)에어사운드 Real-time multi-language interpretation wireless transmitting and receiving system capable of extracting topic sentence and transmitting and receiving method using same
CN108491515A (en) * 2018-03-26 2018-09-04 中国科学技术大学 A kind of sentence pair matching degree prediction technique for campus psychological consultation
CN108491515B (en) * 2018-03-26 2021-10-01 中国科学技术大学 Sentence pair matching degree prediction method for campus psychological consultation
CN108647263A (en) * 2018-04-28 2018-10-12 淮阴工学院 A kind of network address method for evaluating confidence crawled based on segmenting web page
CN108647263B (en) * 2018-04-28 2022-04-12 淮阴工学院 Network address confidence evaluation method based on webpage segmentation crawling
CN109271989A (en) * 2018-09-03 2019-01-25 广东电网有限责任公司东莞供电局 Automatic handwritten test data identification method based on CNN and RNN models
CN109472020A (en) * 2018-10-11 2019-03-15 重庆邮电大学 A kind of feature alignment Chinese word cutting method
CN109472020B (en) * 2018-10-11 2022-07-01 重庆邮电大学 Feature alignment Chinese word segmentation method
WO2020074023A1 (en) * 2018-10-12 2020-04-16 北京大学第三医院 Deep learning-based method and device for screening for key sentences in medical document
CN109472021A (en) * 2018-10-12 2019-03-15 北京诺道认知医学科技有限公司 Critical sentence screening technique and device in medical literature based on deep learning
CN109711253A (en) * 2018-11-19 2019-05-03 国家电网有限公司 Ammeter technique for partitioning based on convolutional neural networks and Recognition with Recurrent Neural Network
CN110222328A (en) * 2019-04-08 2019-09-10 平安科技(深圳)有限公司 Participle and part-of-speech tagging method, apparatus, equipment and storage medium neural network based
CN110222328B (en) * 2019-04-08 2022-11-22 平安科技(深圳)有限公司 Method, device and equipment for labeling participles and parts of speech based on neural network and storage medium
CN112100367A (en) * 2019-05-28 2020-12-18 贵阳海信网络科技有限公司 Public opinion early warning method and device for scenic spot
CN110502898A (en) * 2019-07-31 2019-11-26 深圳前海达闼云端智能科技有限公司 Method, system, device, storage medium and the electronic equipment of the intelligent contract of audit
CN111542012A (en) * 2020-04-28 2020-08-14 南昌航空大学 Human body tumbling detection method based on SE-CNN
CN111542012B (en) * 2020-04-28 2022-05-03 南昌航空大学 Human body tumbling detection method based on SE-CNN

Similar Documents

Publication Publication Date Title
CN106776580A (en) The theme line recognition methods of the deep neural network CNN and RNN of mixing
Li et al. Imbalanced text sentiment classification using universal and domain-specific knowledge
CN107578775B (en) Multi-classification voice method based on deep neural network
CN112199608B (en) Social media rumor detection method based on network information propagation graph modeling
CN109447140A (en) A method of the image recognition based on neural network deep learning simultaneously recommends cognition
CN107153642A (en) A kind of analysis method based on neural network recognization text comments Sentiment orientation
CN108038205B (en) Viewpoint analysis prototype system for Chinese microblogs
CN110532379B (en) Electronic information recommendation method based on LSTM (least Square TM) user comment sentiment analysis
CN107015963A (en) Natural language semantic parsing system and method based on deep neural network
CN110674410A (en) User portrait construction and content recommendation method, device and equipment
CN103729459A (en) Method for establishing sentiment classification model
CN110569920B (en) Prediction method for multi-task machine learning
CN108304493B (en) Hypernym mining method and device based on knowledge graph
CN113392197B (en) Question-answering reasoning method and device, storage medium and electronic equipment
As et al. Artificial intelligence in urban planning and design: Technologies, implementation, and impacts
Birhane Automating ambiguity: Challenges and pitfalls of artificial intelligence
Benzi et al. Principal patterns on graphs: Discovering coherent structures in datasets
CN113934846B (en) Online forum topic modeling method combining behavior-emotion-time sequence
Nguyen et al. Emotion analysis using multilayered networks for graphical representation of tweets
CN111598252A (en) University computer basic knowledge problem solving method based on deep learning
CA2895121A1 (en) Systems and methods for analyzing and deriving meaning from large scale data sets
Kim et al. Constructing and evaluating a novel crowdsourcing-based paraphrased opinion spam dataset
Indriyanti et al. K-means method for clustering learning classes
CN104346327A (en) Method and device for determining emotion complexity of texts
Sun et al. Urban region function mining service based on social media text analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination