CN108108355A - Text emotion analysis method and system based on deep learning - Google Patents

Text emotion analysis method and system based on deep learning Download PDF

Info

Publication number
CN108108355A
CN108108355A CN201711417352.7A CN201711417352A CN108108355A CN 108108355 A CN108108355 A CN 108108355A CN 201711417352 A CN201711417352 A CN 201711417352A CN 108108355 A CN108108355 A CN 108108355A
Authority
CN
China
Prior art keywords
text
field
grader
text data
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711417352.7A
Other languages
Chinese (zh)
Inventor
王家彬
柳宜江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DIGITAL TELEVISION TECHNOLOGY CENTER BEIJING PEONY ELECTRONIC GROUP Co Ltd
Original Assignee
DIGITAL TELEVISION TECHNOLOGY CENTER BEIJING PEONY ELECTRONIC GROUP Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DIGITAL TELEVISION TECHNOLOGY CENTER BEIJING PEONY ELECTRONIC GROUP Co Ltd filed Critical DIGITAL TELEVISION TECHNOLOGY CENTER BEIJING PEONY ELECTRONIC GROUP Co Ltd
Priority to CN201711417352.7A priority Critical patent/CN108108355A/en
Publication of CN108108355A publication Critical patent/CN108108355A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The present invention is more particularly directed to a kind of text emotion analysis methods and system based on deep learning.Method comprises the following steps:Standardization processing is carried out to initial text data, generates preprocessed text data, and by preprocessed text data clusters to default field;By manually marking a part of data, sentiment analysis model of the training based on deep learning and the proprietary depth for establishing each default field respectively in different field;Emotional semantic classification is carried out to the text to be sorted of input using the grader and the proprietary depth of combination of formation.The present invention reduces human costs, avoid influence of the Feature Engineering to classification results, while reduce the special engineered workload brought.In addition, text fields is accounted for, the accuracy that sentiment analysis is carried out to text is improved.

Description

Text emotion analysis method and system based on deep learning
Technical field
The present invention relates to natural language processing field more particularly to a kind of text emotion analysis methods based on deep learning And system.
Background technology
In the Web2.0 epoch, each netizen is into the information issue source of internet.Information promulgating platform for various purposes It comes into being, FaceBook, school net, Sina weibo etc. are delivered for user, obtained, sharing various information.Due to interconnection Network users radix is big, and the information content that each information promulgating platform information promulgating platform averagely generates daily is also big, so internet The information content generated daily is also huge.And sentiment analysis be also known as emotion excavate, opinion mining, be text is handled, is analyzed, Simultaneously reasoning is concluded, draws the process of the emotional color of text, based on the huge information content that internet generates daily, to interconnecting netting index It is also very big with the difficulty of sentiment analysis according to excavate.
In terms of text emotion analysis, foreign scholar primarily directed to the short text information on Twitter as language material into Row analysis.For example foreign scholar is trained by the use of text of the website with affective tag as training corpus using text abstract characteristics Various graders carry out subjective and objective classification and feeling polarities classification.High speed development is presented in the analysis of Chinese text feeling polarities recently Tend to, however the complexity of Chinese text is more much higher than the complexity of English text, so the participle quality of Chinese text is past It is very big toward being influenced on last classification results.In addition, due to the comentropy bigger of Chinese, Chinese text is abundant in content more Coloured silk has many network words and neologisms to be added thereto, and is all challenge in research work.
For sentiment analysis, there are method two categories rule-based and based on study.Rule-based method, generally It is made of the rule base and depth of Manual definition, this method effect is generally good, but labor workload is very big;Based on study Method, be mostly based on conventional machines study method such as SVM, naive Bayesian etc., dependent on Feature Engineering, it is necessary to manually Data characteristics is looked for, the quality of Feature Engineering directly affects the effect of final classification.
The content of the invention
In order to solve the above technical problems, the present invention provides a kind of text emotion analysis method based on deep learning and it is System.
In a first aspect, an embodiment of the present invention provides a kind of text emotion analysis method based on deep learning, including with Lower step:
Step 1, acquisition is for trained initial text data, and carries out standardization processing to the initial text data, Generate preprocessed text data;
Step 2, clustering method is taken, by the preprocessed text data clusters to corresponding default field;
Step 3, for the default field of difference, by manually marking a part of pretreatment text in each default field Notebook data trains relevant first grader in field using all preprocessed text data as initial training language material, together When take dimension reduction method reduce mark preprocessed text data dimension, to obtain the proprietary depth in each default field;
Step 4, emotional semantic classification is carried out to the preprocessed text data not marked using trained first grader, obtained Mark language material in each default field;
Step 5, using the mark language material in each default field, and using the proprietary depth of acquisition as feature Information trains relevant second grader in field;
Step 6, text to be sorted is obtained, takes the clustering method that the text to be sorted is divided into corresponding field, Reuse the second grader relevant with the field and with reference to acquisition the field proprietary depth to the text to be sorted into Row sentiment analysis generates emotional semantic classification result and exports display.
The advantageous effect of said program is:The text emotion analysis method based on deep learning of the present invention passes through to feelings A sense analysis point field is handled, constantly automatic to expand the corresponding depth of different field, while uses text field grader knot The mode of conjunction carries out sentiment analysis to input text, and text is divided into behind corresponding field with the emotional semantic classification related with the field Device carries out sentiment analysis.Compared with prior art, the present invention has the following advantages and beneficial effect:
1st, the present invention can establish the proprietary depth in each field automatically, avoid the cost problem entirely by manually establishing;
2nd, the present invention establishes the proprietary depth in each field, carries out emotional semantic classification to text for each field, makes classification As a result it is more accurate and comprehensive;
3rd, using the original language material of a small amount of artificial screening as training data, human cost is reduced;
4th, using a small amount of labeled data, using the emotion classifiers of semi-supervised learning, a large amount of mark language materials are generated to instruct Practice the emotion classifiers in each field, make analysis result more accurate.
Further, step 7 is further included, is specially:Whether correct analyze the emotional semantic classification result of the text to be sorted, if It is incorrect, then using the text to be sorted as the initial text data, and step 1~step 6 is repeated, treated point according to described Class text is updated first grader in corresponding field and the second grader.
In one example, the step 2 includes the use of the preprocessed text data and with reference to the original text number According to column information train text field grader, by the preprocessed text data clusters to corresponding default neck Domain.
Further, the step 2 specifically includes:
The low frequency word that all occurrence numbers in the preprocessed text data are less than 10 times is deleted, wherein for single word Language, the number occurred in all texts are at least 10 times, are otherwise considered as low-frequency word;
It sorts to the sentence in each text according to its length, chooses the sentence that sentence length comes preceding 7, give up and be not chosen In sentence;
It, will be extra if the word quantity that some sentence in text includes has been more than 100 after low-frequency word is deleted Word delete;
Each sentence in the preprocessed text data is converted into bivector using 100 × 300 embeding layer;
By obtained vector by successively by convolutional layer, pond layer, full articulamentum and the softmax layers of nerve net formed Network, and the neutral net is trained using the column information, to obtain the text field grader.
Further, the step 3 specifically includes following steps:
The low frequency word that all occurrence numbers in the preprocessed text data are less than 10 times is deleted, wherein for single word Language, the number occurred in all texts are at least 10 times, are otherwise considered as low-frequency word;
It sorts to the sentence in each text according to its length, chooses the sentence that sentence length comes preceding 7, give up and be not chosen In sentence;
It, will be extra if the word quantity that some sentence in text includes has been more than 100 after low-frequency word is deleted Word delete;
The preprocessed text data are converted into 100 dimensional vectors using recurrence self-encoding encoder to represent;
Principal component analysis dimensionality reduction operation is carried out to the data after conversion, obtains the proprietary depth in each default field;
The matrix for representing text feature is established according to the data after conversion, as the emotion pole based on recurrence own coding The input of property metastasis model;
Based on the input, final model is generated using LBFGS algorithms successive ignition, it is relevant described to obtain field The text that first grader, the wherein model can represent low-dimensional real vector carries out emotional semantic classification and exports its feeling polarities.
Further, the step 5 is specially:
Word segmentation processing is carried out to the mark language material in each default field and obtains the distributed expression of text;
By treated, language material obtains characteristic pattern by convolutional layer;
Window feature sequence is extracted on the characteristic pattern, related proprietary in field is preset with this with obtaining in step 3 Depth concatenates;
The high-rise expression of text is obtained using gating cycle unit hidden layer on the characteristic sequence;
The high-rise of acquisition is represented to classify using softmax layers;
The label of the mark language material used as training data carries out the backpropagation of error, training stack volume The parameter of product Recognition with Recurrent Neural Network, to obtain default corresponding second grader in field.
Further, the special of the field of the second grader relevant with the field and combination acquisition is used in the step 6 Having depth to carry out sentiment analysis to the text to be sorted includes:
Word segmentation processing is carried out to the text to be sorted and obtains the distributed expression of the text;
By treated, text obtains characteristic pattern by convolutional layer;
Window feature sequence is extracted on the characteristic pattern, with the proprietary depth of the text fields obtained in step 3 Degree concatenation;
The high-rise expression of the text is obtained using gating cycle unit hidden layer on the characteristic sequence;
The high-rise of acquisition is represented to classify using softmax layers.
Second aspect, the present invention provides a kind of text emotion analysis system based on deep learning, including pre-processing mould Block, cluster module, the first training module, the first sort module, the second training module, the second sort module:
The preprocessing module carries out the initial text data for gathering the initial text data for training Standardization processing generates preprocessed text data;
The cluster module is for taking clustering method, by the preprocessed text data clusters to corresponding default neck Domain;
First training module is used for for different default fields, by manually marking one in each default field Divide the preprocessed text data, it is relevant to train field using all preprocessed text data as initial training language material First grader, while the dimension of the preprocessed text data of dimension reduction method reduction mark is taken, to obtain each default neck The proprietary depth in domain;
First sort module be used for using trained first grader to the preprocessed text data that do not mark into Row emotional semantic classification obtains the mark language material in each default field;
Second training module is used for using the mark language material in each default field, and with the described special of acquisition There is depth as characteristic information, train relevant second grader in field;
Second sort module takes the clustering method to draw the text to be sorted for obtaining text to be sorted Assign to corresponding field, reuse the second grader relevant with the field and with reference to acquisition the field proprietary depth to described Text to be sorted carries out sentiment analysis, generates emotional semantic classification result and exports display.
The advantageous effects of said program are:The text emotion analysis system based on deep learning of the present invention passes through Sentiment analysis point field is handled, the corresponding depth of constantly improve different field.Compared with prior art, the present invention can be with By end-to-end direct training and classification, the influence that Feature Engineering brings classification results is avoided.On the other hand, the present invention combines Text is divided into and the grader related with the field is reused behind corresponding field carries out emotion point by text field grader Class can make result more accurate.In addition, the present invention uses the original language material of a small amount of artificial screening as training data, Human cost can be reduced;And it obtains largely marking language material using semi-supervised learning grader, be learned to training based on supervision The emotion classifiers of habit can make classification results more accurate.
Further, correcting module is further included, the correcting module is used to analyze the emotional semantic classification knot of the text to be sorted Whether fruit is correct, if incorrect, using the text to be sorted as the initial text data, and drives the pretreatment mould Block, the cluster module, first training module, first sort module, second training module and described second Sort module is updated first grader in corresponding field and the second grader according to the text to be sorted.
The advantages of additional aspect of the invention, will be set forth in part in the description, and will partly become from the following description It obtains substantially or is recognized by present invention practice.
Description of the drawings
Fig. 1 is a kind of schematic stream for text emotion analysis method based on deep learning that the embodiment of the present invention 1 provides Cheng Tu;
Fig. 2 is that a kind of the structural of text emotion analysis system based on deep learning that the embodiment of the present invention 2 provides is shown It is intended to.
Specific embodiment
In being described below, in order to illustrate rather than in order to limit, it is proposed that such as specific device structure, interface, technology it The detail of class understands the present invention to cut thoroughly.However, it will be clear to one skilled in the art that there is no these specifically The present invention can also be realized in the other embodiments of details.In other situations, omit to well-known device, circuit and The detailed description of method, in case unnecessary details interferes description of the invention.
Fig. 1 gives a kind of signal of text emotion analysis method based on deep learning of the offer of the embodiment of the present invention 1 Property flow chart.As shown in Figure 1, the executive agent of method can be server, this method comprises the following steps:
Step 1, acquisition is for trained initial text data, and carries out standardization processing to the initial text data, Generate preprocessed text data;
Step 2, clustering method is taken, by the preprocessed text data clusters to corresponding default field;
Step 3, for the default field of difference, by manually marking a part of pretreatment text in each default field Notebook data trains relevant first grader in field using all preprocessed text data as initial training language material, together When take dimension reduction method reduce mark preprocessed text data dimension, to obtain the proprietary depth in each default field;
Step 4, emotional semantic classification is carried out to the preprocessed text data not marked using trained first grader, obtained Mark language material in each default field;
Step 5, using the mark language material in each default field, and using the proprietary depth of acquisition as feature Information trains relevant second grader in field;
Step 6, text to be sorted is obtained, takes the clustering method that the text to be sorted is divided into corresponding field, Reuse the second grader relevant with the field and with reference to acquisition the field proprietary depth to the text to be sorted into Row sentiment analysis generates emotional semantic classification result and exports display.
The text emotion analysis method based on deep learning of the present embodiment can establish the proprietary depth in each field automatically Degree, avoids the cost problem entirely by manually establishing;Also, using the proprietary depth in each field, for each field to text Emotional semantic classification is carried out, classification results can be made more accurate and comprehensive.On the other hand, using the original language material of a small amount of artificial screening As training data, it is possible to reduce human cost;Further, using a small amount of labeled data, the emotion point of semi-supervised learning is used Class device generates a large amount of mark language materials to the emotion classifiers in each field of training, analysis result can be made more accurate.
Each step of 1 method of embodiment is described in detail below.
In the step 1 of one preferred embodiment, the initial text data for training can be the disclosure of internet Data or other data collected by other approach, such as magazine data etc., by the original text Data carry out the processing such as non-legible symbol, separator cleaning, the preprocessed text data standardized, behind convenient Step carries out domain classification and sentiment analysis using the preprocessed text data.
In the step 2 of one preferred embodiment, convolutional neural networks may be employed, the preprocessed text data are carried out Domain classification.Convolutional neural networks essence is that multilayer convolutional calculation adds nonlinear change, due to the feature of convolutional neural networks Detection layers are learnt by training data, so when using convolutional neural networks, avoid the feature extraction of display, and hidden Learnt from training data likes;Furthermore since the neuron weights on same Feature Mapping face are identical, so network can With collateral learning, this is also that convolutional network is connected with each other a big advantage of network compared with neuron.Utilize convolutional neural networks We can classify to pretreated text, be specifically, and the column information for retaining initial text data in step 1 is made It, then can be to the text of tera incognita using the text training convolutional neural networks marked with column for text fields This progress domain classification is categorized into the fields such as finance, military affairs, government affairs, finance and economics and physical culture than preprocessed text data as will be described. In a preferred embodiment, the step 2 specifically includes following steps:
S201, for single word, the number occurred in all texts is at least 10 times, is otherwise considered as low-frequency word, Delete the low frequency word that all occurrence numbers are less than 10 times;
S202 sorts to the sentence in each document according to its length, chooses the sentence that sentence length comes preceding 7, give up Not selected sentence;
S203, after low-frequency word is deleted, if the word quantity that some sentence in document includes has been more than 100, Extra word is deleted;
Sentence is converted into bivector by S204 using 100 × 300 insertion (embedding) layer;
S205, by obtained vector by successively by convolutional layer, pond layer, full articulamentum and the softmax layers of god formed The neutral net is trained through network, and using the column field.
In the step 2 of one alternate embodiment, ElasticSearch or other gophers, which may be employed, to be located in advance Text data cluster is managed to corresponding default field.It is well known to those skilled in the art to carry out cluster using ElasticSearch Technology, omit detailed description thereof herein.
In a preferred embodiment, the step 3 specifically includes following steps:
S301, for single word, the number occurred in all documents is at least 10 times, is otherwise considered as low-frequency word, Delete the low frequency word that all occurrence numbers are less than 10 times;
S302 sorts to the sentence in each document according to its length, chooses the sentence that sentence length comes preceding 7, give up Not selected sentence;
S303, after low-frequency word is deleted, if the word quantity that some sentence in document includes has been more than 100, Extra word is deleted;
Text data is switched to 100 dimensional vectors using recurrence self-encoding encoder and represented by S304;
S305 carries out PCA (Principal Component Analysis, principal component analysis) to the data after conversion and grasps Make, obtain the proprietary depth in each field;
S306 establishes the matrix for representing text feature according to the data after conversion, as based on recurrence own coding (RAE) input of feeling polarities metastasis model;
S307 based on the input, final model is generated using LBFGS algorithms successive ignition, which can be to low-dimensional The text that real vector represents carries out emotional semantic classification and exports its feeling polarities.
In another preferred embodiment, the step 5 is specially:
Word segmentation processing is carried out to the mark language material in each default field first and obtains the distributed expression of text, then data Characteristic pattern is obtained by convolutional layer, window feature sequence is then extracted on characteristic pattern, with being obtained in step 3 and the default neck Then the related proprietary depth concatenation in domain obtains the high level of article on characteristic sequence using gating cycle unit (GRU) hidden layer It represents, the high-rise of acquisition is represented to classify using softmax layers, finally using the label of training data, carries out error Backpropagation, the parameter of training stack convolution loop neutral net.
In another preferred embodiment, the step 6 comprises the following steps.The text to be sorted is segmented first Processing obtains the distributed expression of the text;Then data obtain characteristic pattern by convolutional layer;Then window is extracted on characteristic pattern Mouth characteristic sequence, concatenates with the proprietary depth of the text fields obtained in step 3;Then door is used on characteristic sequence Control the high-rise expression that cycling element (GRU) hidden layer obtains the text;Finally the high-rise of acquisition is represented using softmax layers of progress Classification.So as to which the text emotion analysis method based on deep learning according to embodiments of the present invention is classified as phase that will input text Answer behind field to get to the sentiment analysis gone out by classifier calculated as a result, such as actively, passive or neutral, above-mentioned preferred reality The corresponding grader in each default field can be gone out with Fast Training by applying example, and pass through grader to the text to be sorted of input by neck Domain carries out emotional semantic classification, improves the speed of sentiment analysis.
In other preferred embodiments, step 7 is further included, is specially:Analyze the emotional semantic classification knot of the text to be sorted Whether fruit is correct, if incorrect, using the text to be sorted as the initial text data, and repeats step 1~step 6, first grader in corresponding field and the second grader are updated according to the text to be sorted.In another embodiment In, in the case of being clustered in step 2 using text field grader to preprocessed text, except to the first grader and second Outside grader is updated, step 7, which is further included, is updated the text field grader according to the text to be sorted.On Training corpus can be added in by the data for apparent error of classifying by stating preferred embodiment, so as to domain classification device and emotion classifiers It constantly corrects, further improves the classification accuracy of text emotion analysis method of the present invention.
Fig. 2 is that a kind of the structural of text emotion analysis system based on deep learning that the embodiment of the present invention 2 provides is shown It is intended to, as shown in Fig. 2, including preprocessing module, cluster module, the first training module, the first sort module, the second training mould Block, the second sort module;
The preprocessing module carries out the initial text data for gathering the initial text data for training Standardization processing generates preprocessed text data;
The cluster module is for taking clustering method, by the preprocessed text data clusters to corresponding default neck Domain;
First training module is used for for different default fields, by manually marking one in each default field Divide the preprocessed text data, it is relevant to train field using all preprocessed text data as initial training language material First grader, while the dimension of the preprocessed text data of dimension reduction method reduction mark is taken, to obtain each default neck The proprietary depth in domain;
First sort module be used for using trained first grader to the preprocessed text data that do not mark into Row emotional semantic classification obtains the mark language material in each default field;
Second training module is used for using the mark language material in each default field, and with the described special of acquisition There is depth as characteristic information, train relevant second grader in field;
Second sort module takes the clustering method to draw the text to be sorted for obtaining text to be sorted Assign to corresponding field, reuse the second grader relevant with the field and with reference to acquisition the field proprietary depth to described Text to be sorted carries out sentiment analysis, generates emotional semantic classification result and exports display.
The text emotion analysis system based on deep learning of above-described embodiment can by end-to-end direct training and classification, Influence of the Feature Engineering to classification results is not only avoided, and reduces the special engineered workload brought.In addition, by text Fields accounts for, and improves the efficiency that sentiment analysis is carried out to text, makes analysis result more accurate.
In another preferred embodiment, the preprocessing module is specifically used for carrying out non-text to the initial text data Word Symbol processing and/or separator cleaning.The cluster module is configured to using the preprocessed text data and with reference to described The column information of initial text data trains text field grader, by the preprocessed text data clusters to corresponding to Default field.
In one preferred embodiment, the above-mentioned text emotion analysis system based on deep learning further includes correcting module, institute Whether for analyze the emotional semantic classification result of the to be sorted text correct, if incorrect, treated described point if stating correcting module Class text drives the preprocessing module, the cluster module, first training module, institute as initial text data The first sort module, second training module and second sort module are stated, according to the text to be sorted to accordingly leading First grader and the second grader in domain are updated.In another embodiment, in step 2 using text field grader In the case of being clustered to preprocessed text, in addition to being updated to the first grader and the second grader, step 7 further includes root The text field grader is updated according to the text to be sorted.Above preferred embodiment can be by apparent error of classifying Data add in training corpus, so as to constantly being corrected to domain classification device and emotion classifiers, further improve text of the present invention The classification accuracy of sentiment analysis method.
Reader should be understood that in the description of this specification, reference term " one embodiment ", " some embodiments ", " show The description of example ", " specific example " or " some examples " etc. mean to combine the specific features of the embodiment or example description, structure, Material or feature are contained at least one embodiment of the present invention or example.In the present specification, above-mentioned term is shown The statement of meaning property need not be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described It may be combined in any suitable manner in any one or more of the embodiments or examples.In addition, without conflicting with each other, this The technical staff in field can be by the different embodiments described in this specification or example and different embodiments or exemplary spy Sign is combined and combines.
It is apparent to those skilled in the art that for convenience of description and succinctly, the dress of foregoing description The specific work process with unit is put, may be referred to the corresponding process in preceding method embodiment, details are not described herein.
In several embodiments provided herein, it should be understood that disclosed apparatus and method can pass through it Its mode is realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of unit, is only A kind of division of logic function, can there is an other dividing mode in actual implementation, for example, multiple units or component can combine or Person is desirably integrated into another system or some features can be ignored or does not perform.
The unit illustrated as separating component may or may not be physically separate, be shown as unit Component may or may not be physical location, you can be located at a place or can also be distributed to multiple networks On unit.Some or all of unit therein can be selected to realize the mesh of the embodiment of the present invention according to the actual needs 's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also It is that unit is individually physically present or two or more units integrate in a unit.It is above-mentioned integrated The form that hardware had both may be employed in unit is realized, can also be realized in the form of SFU software functional unit.
If integrated unit is realized in the form of SFU software functional unit and is independent production marketing or in use, can To be stored in a computer read/write memory medium.Based on such understanding, technical scheme substantially or Saying all or part of the part contribute to the prior art or the technical solution can be embodied in the form of software product Out, which is stored in a storage medium, is used including some instructions so that a computer equipment (can be personal computer, server or the network equipment etc.) performs all or part of each embodiment method of the present invention Step.And foregoing storage medium includes:It is USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random Access various Jie that can store program code such as memory (RAM, Random Access Memory), magnetic disc or CD Matter.
Although the embodiment of the present invention has been shown and described above, it is to be understood that above-described embodiment is example Property, it is impossible to limitation of the present invention is interpreted as, those of ordinary skill in the art within the scope of the invention can be to above-mentioned Embodiment is changed, changes, replacing and modification.

Claims (10)

1. a kind of text emotion analysis method based on deep learning, which is characterized in that comprise the following steps:
Step 1, acquisition is for trained initial text data, and standardization processing, generation are carried out to the initial text data Preprocessed text data;
Step 2, clustering method is taken, by the preprocessed text data clusters to corresponding default field;
Step 3, for the default field of difference, by manually marking a part of preprocessed text number in each default field According to training relevant first grader in field using all preprocessed text data as initial training language material, adopt simultaneously Dimension reduction method is taken to reduce the dimension of preprocessed text data of mark, to obtain the proprietary depth in each default field;
Step 4, emotional semantic classification is carried out to the preprocessed text data not marked using trained first grader, obtained each pre- If the mark language material in field;
Step 5, using the mark language material in each default field, and believed using the proprietary depth of acquisition as feature Breath, trains relevant second grader in field;
Step 6, text to be sorted is obtained, takes the clustering method that the text to be sorted is divided into corresponding field, then makes With the second grader relevant with the field and with reference to acquisition the field proprietary depth to the text to be sorted into market Sense analysis generates emotional semantic classification result and exports display.
2. the text emotion analysis method according to claim 1 based on deep learning, which is characterized in that further include step 7, be specially:Whether correct the emotional semantic classification result of the text to be sorted is analyzed, if incorrect, by the text to be sorted As the initial text data, and step 1~step 6 is repeated, according to the text to be sorted to described the of corresponding field One grader and second grader are updated.
3. the text emotion analysis method according to claim 1 or 2 based on deep learning, which is characterized in that the step Rapid 2 include the use of the preprocessed text data and the column information with reference to the initial text data trains text field Grader, by the preprocessed text data clusters to corresponding default field.
4. the text emotion analysis method according to claim 3 based on deep learning, which is characterized in that the step 2 Specifically include following steps:
The low frequency word that all occurrence numbers in the preprocessed text data are less than 10 times is deleted, wherein for single word, Its number occurred in all texts is at least 10 times, is otherwise considered as low-frequency word;
It sorts to the sentence in each text according to its length, chooses the sentence that sentence length comes preceding 7, give up not selected Sentence;
After low-frequency word is deleted, if the word quantity that some sentence in text includes has been more than 100, by extra word Language is deleted;
Each sentence in the preprocessed text data is converted into bivector using 100 × 300 embeding layer;
By obtained vector by successively by convolutional layer, pond layer, full articulamentum and the softmax layers of neutral net formed, and The neutral net is trained using the column information, to obtain the text field grader.
5. the text emotion analysis method according to claim 1 or 2 based on deep learning, which is characterized in that the step Rapid 3 specifically include following steps:
The low frequency word that all occurrence numbers in the preprocessed text data are less than 10 times is deleted, wherein for single word, Its number occurred in all texts is at least 10 times, is otherwise considered as low-frequency word;
It sorts to the sentence in each text according to its length, chooses the sentence that sentence length comes preceding 7, give up not selected Sentence;
After low-frequency word is deleted, if the word quantity that some sentence in text includes has been more than 100, by extra word Language is deleted;
The preprocessed text data are converted into 100 dimensional vectors using recurrence self-encoding encoder to represent;
Principal component analysis dimensionality reduction operation is carried out to the data after conversion, obtains the proprietary depth in each default field;
The matrix for representing text feature is established according to the data after conversion, is turned as the feeling polarities based on recurrence own coding The input of shifting formwork type;
Based on the input, final model is generated using LBFGS algorithms successive ignition, to obtain field relevant described first The text that grader, the wherein model can represent low-dimensional real vector carries out emotional semantic classification and exports its feeling polarities.
6. the text emotion analysis method according to claim 1 or 2 based on deep learning, which is characterized in that the step Rapid 5 are specially:
Word segmentation processing is carried out to the mark language material in each default field and obtains the distributed expression of text;
By treated, language material obtains characteristic pattern by convolutional layer;
Window feature sequence is extracted on the characteristic pattern, with being obtained in step 3 and the proprietary depth that preset field related Concatenation;
The high-rise expression of text is obtained using gating cycle unit hidden layer on the characteristic sequence;
The high-rise of acquisition is represented to classify using softmax layers;
The label of the mark language material used as training data, carries out the backpropagation of error, and training stack convolution is followed The parameter of ring neutral net, to obtain default corresponding second grader in field.
7. the text emotion analysis method according to claim 1 or 2 based on deep learning, which is characterized in that the step In rapid 6 using the second grader relevant with the field and with reference to acquisition the field proprietary depth to the text to be sorted Carrying out sentiment analysis includes:
Word segmentation processing is carried out to the text to be sorted and obtains the distributed expression of the text;
By treated, text obtains characteristic pattern by convolutional layer;
Window feature sequence is extracted on the characteristic pattern, with the proprietary depth string of the text fields obtained in step 3 It connects;
The high-rise expression of the text is obtained using gating cycle unit hidden layer on the characteristic sequence;
The high-rise of acquisition is represented to classify using softmax layers.
8. a kind of text emotion analysis system based on deep learning, which is characterized in that including preprocessing module, cluster module, First training module, the first sort module, the second training module, the second sort module:
The preprocessing module carries out specification for gathering the initial text data for training to the initial text data Change is handled, and generates preprocessed text data;
The cluster module is for taking clustering method, by the preprocessed text data clusters to corresponding default field;
First training module is used for for different default fields, by manually marking a part of institute in each default field Preprocessed text data are stated, field relevant first is trained using all preprocessed text data as initial training language material Grader, while the dimension of the preprocessed text data of dimension reduction method reduction mark is taken, to obtain each default field Proprietary depth;
First sort module is used for using trained first grader to the preprocessed text data that do not mark into market Sense classification, obtains the mark language material in each default field;
Second training module is used for using the mark language material in each default field, and with the proprietary depth of acquisition Degree trains relevant second grader in field as characteristic information;
Second sort module takes the clustering method to be divided into the text to be sorted for obtaining text to be sorted Corresponding field, reuses the second grader relevant with the field and the proprietary depth with reference to the field of acquisition is treated point to described Class text carries out sentiment analysis, generates emotional semantic classification result and exports display.
9. the text emotion analysis system according to claim 8 based on deep learning study, which is characterized in that further include Correcting module, whether the emotional semantic classification result that the correcting module is used to analyze the text to be sorted is correct, if incorrect, Using the text to be sorted as the initial text data, and drive the preprocessing module, the cluster module, described One training module, first sort module, second training module and second sort module, according to described to be sorted Text is updated first grader in corresponding field and the second grader.
10. the text emotion analysis system based on deep learning study according to claim 8 or claim 9, which is characterized in that institute Cluster module is stated to be configured to train using the preprocessed text data and with reference to the column information of the initial text data Go out text field grader, by the preprocessed text data clusters to corresponding default field.
It is characterized in that, the cluster module includes screening unit, input unit and training unit,
The screening unit is used to delete the low frequency word that all occurrence numbers in the preprocessed text data are less than 10 times, In for single word, otherwise the number occurred in all texts is considered as low-frequency word at least for 10 times;
It sorts to the sentence in each text according to its length, chooses the sentence that sentence length comes preceding 7, give up not selected Sentence;And
After low-frequency word is deleted, if the word quantity that some sentence in text includes has been more than 100, by extra word Language is deleted,
The input unit is configured to turn each sentence in the preprocessed text data using 100 × 300 embeding layer Change into bivector and
The vector that the training unit is configured to obtain is by successively by convolutional layer, pond layer, full articulamentum and softmax layers The neutral net of composition, and the neutral net is trained using the column information, to obtain the text field grader.
It is characterized in that, first training module includes screening unit, input unit and training unit,
The screening unit is used to delete the low frequency word that all occurrence numbers in the preprocessed text data are less than 10 times, In for single word, otherwise the number occurred in all texts is considered as low-frequency word at least for 10 times;
It sorts to the sentence in each text according to its length, chooses the sentence that sentence length comes preceding 7, give up not selected Sentence;And
After low-frequency word is deleted, if the word quantity that some sentence in text includes has been more than 100, by extra word Language is deleted,
The input unit is configured to that the preprocessed text data are converted into 100 dimensional vector tables using recurrence self-encoding encoder Show;
Principal component analysis dimensionality reduction operation is carried out to the data after conversion, obtains the proprietary depth in each default field;And
The matrix for representing text feature is established according to the data after conversion, is turned as the feeling polarities based on recurrence own coding The input of shifting formwork type and
The training unit is configured to the input based on the input unit, final using the generation of LBFGS algorithms successive ignition Model, to obtain relevant first grader in field, the wherein model can carry out the text that low-dimensional real vector represents Emotional semantic classification simultaneously exports its feeling polarities.
It is characterized in that, second training module is configured to,
Word segmentation processing is carried out to the mark language material in each default field and obtains the distributed expression of text;
By treated, language material obtains characteristic pattern by convolutional layer;
Window feature sequence is extracted on the characteristic pattern, with being obtained in step 3 and the proprietary depth that preset field related Concatenation;
The high-rise expression of text is obtained using gating cycle unit hidden layer on the characteristic sequence;
The high-rise of acquisition is represented to classify using softmax layers;
The label of the mark language material used as training data, carries out the backpropagation of error, and training stack convolution is followed The parameter of ring neutral net, to obtain default corresponding second grader in field.
It is characterized in that, second sort module is configured to,
Word segmentation processing is carried out to the text to be sorted and obtains the distributed expression of the text;
By treated, text obtains characteristic pattern by convolutional layer;
Window feature sequence is extracted on the characteristic pattern, with the proprietary depth string of the text fields obtained in step 3 It connects;
The high-rise expression of the text is obtained using gating cycle unit hidden layer on the characteristic sequence;
The high-rise of acquisition is represented to classify using softmax layers.
CN201711417352.7A 2017-12-25 2017-12-25 Text emotion analysis method and system based on deep learning Pending CN108108355A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711417352.7A CN108108355A (en) 2017-12-25 2017-12-25 Text emotion analysis method and system based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711417352.7A CN108108355A (en) 2017-12-25 2017-12-25 Text emotion analysis method and system based on deep learning

Publications (1)

Publication Number Publication Date
CN108108355A true CN108108355A (en) 2018-06-01

Family

ID=62212775

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711417352.7A Pending CN108108355A (en) 2017-12-25 2017-12-25 Text emotion analysis method and system based on deep learning

Country Status (1)

Country Link
CN (1) CN108108355A (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109033089A (en) * 2018-09-06 2018-12-18 北京京东尚科信息技术有限公司 Sentiment analysis method and apparatus
CN109189932A (en) * 2018-09-06 2019-01-11 北京京东尚科信息技术有限公司 File classification method and device, computer readable storage medium
CN109271623A (en) * 2018-08-16 2019-01-25 龙马智芯(珠海横琴)科技有限公司 Text emotion denoising method and system
CN109299271A (en) * 2018-10-30 2019-02-01 腾讯科技(深圳)有限公司 Training sample generation, text data, public sentiment event category method and relevant device
CN109359511A (en) * 2018-08-28 2019-02-19 中国农业大学 A kind of method and device of the easy germination crop germination state of monitoring
CN109472022A (en) * 2018-10-15 2019-03-15 平安科技(深圳)有限公司 New word identification method and terminal device based on machine learning
CN109739960A (en) * 2018-12-11 2019-05-10 中科恒运股份有限公司 Sentiment analysis method, sentiment analysis device and the terminal of text
CN110210024A (en) * 2019-05-28 2019-09-06 腾讯科技(深圳)有限公司 A kind of information processing method, device and storage medium
CN110362819A (en) * 2019-06-14 2019-10-22 中电万维信息技术有限责任公司 Text emotion analysis method based on convolutional neural networks
CN110381079A (en) * 2019-07-31 2019-10-25 福建师范大学 Network log method for detecting abnormality is carried out in conjunction with GRU and SVDD
CN110969181A (en) * 2018-09-30 2020-04-07 北京国双科技有限公司 Data pushing method and device
WO2020082609A1 (en) * 2018-10-22 2020-04-30 平安科技(深圳)有限公司 Method and apparatus for security research report analysis using deep learning models
CN112052646A (en) * 2020-08-27 2020-12-08 安徽聚戎科技信息咨询有限公司 Text data labeling method
WO2021051598A1 (en) * 2019-09-19 2021-03-25 平安科技(深圳)有限公司 Text sentiment analysis model training method, apparatus and device, and readable storage medium
CN112711664A (en) * 2020-12-31 2021-04-27 山西三友和智慧信息技术股份有限公司 Text emotion classification method based on TCN + LSTM
CN112818681A (en) * 2020-12-31 2021-05-18 北京知因智慧科技有限公司 Text emotion analysis method and system and electronic equipment
CN113626592A (en) * 2021-07-08 2021-11-09 中汽创智科技有限公司 Corpus-based classification method and device, electronic equipment and storage medium
CN114064897A (en) * 2021-11-22 2022-02-18 重庆邮电大学 Emotion text data labeling method, device and system and electronic equipment
CN114757659A (en) * 2022-05-19 2022-07-15 浙江大学 Intelligent management system and management method for research and development projects
CN115775116A (en) * 2023-02-13 2023-03-10 华设设计集团浙江工程设计有限公司 BIM-based road and bridge engineering management method and system

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103500175A (en) * 2013-08-13 2014-01-08 中国人民解放军国防科学技术大学 Method for microblog hot event online detection based on emotion analysis
CN106611055A (en) * 2016-12-27 2017-05-03 大连理工大学 Chinese hedge scope detection method based on stacked neural network
CN106874410A (en) * 2017-01-22 2017-06-20 清华大学 Chinese microblogging text mood sorting technique and its system based on convolutional neural networks
CN107066445A (en) * 2017-04-11 2017-08-18 华东师范大学 The deep learning method of one attribute emotion word vector
CN107193801A (en) * 2017-05-21 2017-09-22 北京工业大学 A kind of short text characteristic optimization and sentiment analysis method based on depth belief network
US20170308523A1 (en) * 2014-11-24 2017-10-26 Agency For Science, Technology And Research A method and system for sentiment classification and emotion classification
CN107301171A (en) * 2017-08-18 2017-10-27 武汉红茶数据技术有限公司 A kind of text emotion analysis method and system learnt based on sentiment dictionary
CN107341270A (en) * 2017-07-28 2017-11-10 东北大学 Towards the user feeling influence power analysis method of social platform
CN107391483A (en) * 2017-07-13 2017-11-24 武汉大学 A kind of comment on commodity data sensibility classification method based on convolutional neural networks
CN107506775A (en) * 2016-06-14 2017-12-22 北京陌上花科技有限公司 model training method and device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103500175A (en) * 2013-08-13 2014-01-08 中国人民解放军国防科学技术大学 Method for microblog hot event online detection based on emotion analysis
US20170308523A1 (en) * 2014-11-24 2017-10-26 Agency For Science, Technology And Research A method and system for sentiment classification and emotion classification
CN107506775A (en) * 2016-06-14 2017-12-22 北京陌上花科技有限公司 model training method and device
CN106611055A (en) * 2016-12-27 2017-05-03 大连理工大学 Chinese hedge scope detection method based on stacked neural network
CN106874410A (en) * 2017-01-22 2017-06-20 清华大学 Chinese microblogging text mood sorting technique and its system based on convolutional neural networks
CN107066445A (en) * 2017-04-11 2017-08-18 华东师范大学 The deep learning method of one attribute emotion word vector
CN107193801A (en) * 2017-05-21 2017-09-22 北京工业大学 A kind of short text characteristic optimization and sentiment analysis method based on depth belief network
CN107391483A (en) * 2017-07-13 2017-11-24 武汉大学 A kind of comment on commodity data sensibility classification method based on convolutional neural networks
CN107341270A (en) * 2017-07-28 2017-11-10 东北大学 Towards the user feeling influence power analysis method of social platform
CN107301171A (en) * 2017-08-18 2017-10-27 武汉红茶数据技术有限公司 A kind of text emotion analysis method and system learnt based on sentiment dictionary

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271623A (en) * 2018-08-16 2019-01-25 龙马智芯(珠海横琴)科技有限公司 Text emotion denoising method and system
CN109359511A (en) * 2018-08-28 2019-02-19 中国农业大学 A kind of method and device of the easy germination crop germination state of monitoring
CN109359511B (en) * 2018-08-28 2020-09-15 中国农业大学 Method and device for monitoring germination state of easily-germinated crops
CN109189932A (en) * 2018-09-06 2019-01-11 北京京东尚科信息技术有限公司 File classification method and device, computer readable storage medium
CN109033089A (en) * 2018-09-06 2018-12-18 北京京东尚科信息技术有限公司 Sentiment analysis method and apparatus
CN109033089B (en) * 2018-09-06 2021-01-26 北京京东尚科信息技术有限公司 Emotion analysis method and device
CN110969181A (en) * 2018-09-30 2020-04-07 北京国双科技有限公司 Data pushing method and device
CN109472022A (en) * 2018-10-15 2019-03-15 平安科技(深圳)有限公司 New word identification method and terminal device based on machine learning
WO2020082609A1 (en) * 2018-10-22 2020-04-30 平安科技(深圳)有限公司 Method and apparatus for security research report analysis using deep learning models
CN109299271A (en) * 2018-10-30 2019-02-01 腾讯科技(深圳)有限公司 Training sample generation, text data, public sentiment event category method and relevant device
CN109739960A (en) * 2018-12-11 2019-05-10 中科恒运股份有限公司 Sentiment analysis method, sentiment analysis device and the terminal of text
CN110210024A (en) * 2019-05-28 2019-09-06 腾讯科技(深圳)有限公司 A kind of information processing method, device and storage medium
CN110210024B (en) * 2019-05-28 2024-04-02 腾讯科技(深圳)有限公司 Information processing method, device and storage medium
CN110362819A (en) * 2019-06-14 2019-10-22 中电万维信息技术有限责任公司 Text emotion analysis method based on convolutional neural networks
CN110381079A (en) * 2019-07-31 2019-10-25 福建师范大学 Network log method for detecting abnormality is carried out in conjunction with GRU and SVDD
WO2021051598A1 (en) * 2019-09-19 2021-03-25 平安科技(深圳)有限公司 Text sentiment analysis model training method, apparatus and device, and readable storage medium
CN112052646B (en) * 2020-08-27 2024-03-29 安徽聚戎科技信息咨询有限公司 Text data labeling method
CN112052646A (en) * 2020-08-27 2020-12-08 安徽聚戎科技信息咨询有限公司 Text data labeling method
CN112818681A (en) * 2020-12-31 2021-05-18 北京知因智慧科技有限公司 Text emotion analysis method and system and electronic equipment
CN112711664B (en) * 2020-12-31 2022-09-20 山西三友和智慧信息技术股份有限公司 Text emotion classification method based on TCN + LSTM
CN112818681B (en) * 2020-12-31 2023-11-10 北京知因智慧科技有限公司 Text emotion analysis method and system and electronic equipment
CN112711664A (en) * 2020-12-31 2021-04-27 山西三友和智慧信息技术股份有限公司 Text emotion classification method based on TCN + LSTM
CN113626592A (en) * 2021-07-08 2021-11-09 中汽创智科技有限公司 Corpus-based classification method and device, electronic equipment and storage medium
CN114064897A (en) * 2021-11-22 2022-02-18 重庆邮电大学 Emotion text data labeling method, device and system and electronic equipment
CN114757659A (en) * 2022-05-19 2022-07-15 浙江大学 Intelligent management system and management method for research and development projects
CN115775116A (en) * 2023-02-13 2023-03-10 华设设计集团浙江工程设计有限公司 BIM-based road and bridge engineering management method and system

Similar Documents

Publication Publication Date Title
CN108108355A (en) Text emotion analysis method and system based on deep learning
CN110287320B (en) Deep learning multi-classification emotion analysis model combining attention mechanism
CN110083700A (en) A kind of enterprise's public sentiment sensibility classification method and system based on convolutional neural networks
CN107301171A (en) A kind of text emotion analysis method and system learnt based on sentiment dictionary
CN110765260A (en) Information recommendation method based on convolutional neural network and joint attention mechanism
CN109558487A (en) Document Classification Method based on the more attention networks of hierarchy
KR20190063978A (en) Automatic classification method of unstructured data
CN106815369A (en) A kind of file classification method based on Xgboost sorting algorithms
Maharjan et al. A multi-task approach to predict likability of books
CN108920445A (en) A kind of name entity recognition method and device based on Bi-LSTM-CRF model
CN106445919A (en) Sentiment classifying method and device
CN109840279A (en) File classification method based on convolution loop neural network
CN108536756A (en) Mood sorting technique and system based on bilingual information
Laubrock et al. Computational approaches to comics analysis
CN107545271A (en) Image-recognizing method, device and system
CN109241383A (en) A kind of type of webpage intelligent identification Method and system based on deep learning
CN110188195A (en) A kind of text intension recognizing method, device and equipment based on deep learning
CN111985243A (en) Emotion model training method, emotion analysis device and storage medium
CN110472245A (en) A kind of multiple labeling emotional intensity prediction technique based on stratification convolutional neural networks
Chakraborty et al. Bangla document categorisation using multilayer dense neural network with tf-idf
Jabde et al. Comparative study of machine learning and deep learning classifiers on handwritten numeral recognition
Abir et al. Bangla handwritten character recognition with multilayer convolutional neural network
CN104484437B (en) A kind of network short commentary emotion method for digging
Hayawi et al. The imitation game: Detecting human and ai-generated texts in the era of large language models
Panda et al. Complex odia handwritten character recognition using deep learning model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180601