CN110032736A - A kind of text analyzing method, apparatus and storage medium - Google Patents
A kind of text analyzing method, apparatus and storage medium Download PDFInfo
- Publication number
- CN110032736A CN110032736A CN201910220954.6A CN201910220954A CN110032736A CN 110032736 A CN110032736 A CN 110032736A CN 201910220954 A CN201910220954 A CN 201910220954A CN 110032736 A CN110032736 A CN 110032736A
- Authority
- CN
- China
- Prior art keywords
- text
- network model
- sample
- emotional value
- analyzed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000003860 storage Methods 0.000 title abstract description 21
- 230000002996 emotional effect Effects 0.000 claims abstract description 74
- 230000015654 memory Effects 0.000 claims abstract description 37
- 230000007246 mechanism Effects 0.000 claims abstract description 17
- 230000008451 emotion Effects 0.000 claims abstract description 15
- 230000006870 function Effects 0.000 claims description 24
- 238000004458 analytical method Methods 0.000 claims description 14
- 238000012549 training Methods 0.000 claims description 8
- 238000004891 communication Methods 0.000 claims description 6
- 238000009412 basement excavation Methods 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 description 17
- 238000004590 computer program Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 6
- 230000004048 modification Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000004913 activation Effects 0.000 description 4
- 238000005065 mining Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This application discloses a kind of text analyzing method, apparatus and storage mediums, are related to text classification field, to solve in the prior art, the problem of emotional semantic classification to be considered as to the general task of text classification, and has ignored the emotional factor that text contains.In this method, the primary vector of text to be analyzed being made of emotional value is obtained by sentiment dictionary, and memory network model obtains the secondary vector of text to be analyzed being made of attention weight in short-term by the length with attention mechanism.If calculating the distance of primary vector and secondary vector less than the first preset threshold, the emotion of text representation to be analyzed is obtained.In this way, realizing the excavation of the emotional factor to this paper to be analyzed by the way that sentiment dictionary, attention mechanism and long memory network model in short-term to be combined.
Description
Technical field
This application involves text classification field more particularly to a kind of text analyzing method, apparatus and storage mediums.
Background technique
The internet of rapid development has had become inalienable part in people's daily life.According to China
The newest report of inter network information center (CNNIC), oneself has been reached 7.72 hundred million for the quantity of Chinese netizen, and continues to keep
Grow steadily, maximum motive force is exactly the appearance of the instant media of countless emerging networks and flourishes among these, thus therewith
Produce the text information of magnanimity.How to carry out mining analysis to text information becomes a vital task of big data analysis.
It is relatively simple at present to the mining analysis of text information, the deep layer meaning cannot be excavated, so needing a kind of new text analyzing
Method.
Summary of the invention
Application embodiment provides a kind of text analyzing method, apparatus and storage medium, to solve in the prior art, to text
The mining analysis of this information, it is relatively simple, the problem of deep layer is looked like cannot be excavated.
In a first aspect, the embodiment of the present application provides a kind of text analyzing method, this method comprises:
Obtain text to be analyzed;
By the preparatory trained length memory network model analysis in short-term with attention mechanism to the text to be analyzed
This is analyzed, and the emotion of the text representation to be analyzed is obtained;Wherein, the network model is trained according to following methods
It arrives:
Read sample text;And the primary vector of the sample text being made of emotional value is obtained according to sentiment dictionary;
And the sample text of reading is input to obtained in the network model to be trained the sample text by attention weight
The secondary vector of composition;
The distance of the primary vector and the secondary vector is calculated, makes institute by adjusting the parameter of the network model
Distance is stated less than the first preset threshold.
Second aspect, the embodiment of the present application provide a kind of text analyzing device, which includes:
Text module is obtained, for obtaining text to be analyzed;
Analysis module, for passing through the preparatory trained length memory network model analysis pair in short-term with attention mechanism
The text to be analyzed is analyzed, and the emotion of the text representation to be analyzed is obtained;Wherein, the network model be according to
What lower method training obtained:
Vector module is obtained, for reading sample text;And according to sentiment dictionary obtain the sample text by emotion
It is worth the primary vector constituted;And the sample text of reading is input in the network model to be trained and obtains the sample
The secondary vector of text being made of attention weight;
Computing module, for calculating the distance of the primary vector and the secondary vector, by adjusting the network mould
The parameter of type makes the distance less than the first preset threshold.
The third aspect, another embodiment of the application additionally provide a kind of computing device, including at least one processor;With
And;
The memory being connect at least one described processor communication;Wherein, the memory be stored with can by it is described extremely
The instruction that a few processor executes, described instruction are executed by least one described processor, so that at least one described processing
Device is able to carry out a kind of text analyzing method provided by the embodiments of the present application.
Fourth aspect, another embodiment of the application additionally provide a kind of computer storage medium, wherein the computer is deposited
Storage media is stored with computer executable instructions, and the computer executable instructions are for making computer execute the embodiment of the present application
One of text analyzing method.
A kind of text analyzing method, apparatus provided by the embodiments of the present application and storage medium, by sentiment dictionary obtain to
The primary vector of text being made of emotional value is analyzed, and memory network model obtains in short-term by the length with attention mechanism
To the secondary vector of text to be analyzed being made of attention weight.If calculating the distance of primary vector and secondary vector less than the
One preset threshold then obtains the emotion of text representation to be analyzed.In this way, by by sentiment dictionary, attention mechanism and length
When memory network model be combined, realize the excavation of the emotional factor to this paper to be analyzed.
Other features and advantage will illustrate in the following description, also, partly become from specification
It obtains it is clear that being understood and implementing the application.The purpose of the application and other advantages can be by written explanations
Specifically noted structure is achieved and obtained in book, claims and attached drawing.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present application, constitutes part of this application, this Shen
Illustrative embodiments and their description please are not constituted an undue limitation on the present application for explaining the application.In the accompanying drawings:
Fig. 1 be the embodiment of the present application in training have attention mechanism length in short-term memory network model process signal
Figure;
Fig. 2 is the flow diagram that primary vector is obtained in the embodiment of the present application;
Fig. 3 is the flow diagram that secondary vector is obtained in the embodiment of the present application;
Fig. 4 is the structural schematic diagram of the length memory network model in short-term in the embodiment of the present application with attention mechanism;
Fig. 5 is the structural schematic diagram of long memory network model in short-term in the embodiment of the present application;
Fig. 6 is the flow diagram of loss function adjustment in the embodiment of the present application;
Fig. 7 is the flow diagram of text analyzing in the embodiment of the present application;
Fig. 8 is text analyzing structural schematic diagram in the embodiment of the present application;
Fig. 9 is the structural schematic diagram according to the computing device of the application embodiment.
Specific embodiment
In order to solve that emotional semantic classification is considered as to the general task of text classification in the prior art, and has ignored text and contain
Emotional factor the problem of, a kind of text analyzing method, apparatus and storage medium are provided in the embodiment of the present application.In order to better
Understand technical solution provided by the embodiments of the present application, the basic principle of the program done briefly describe here:
A kind of text analyzing method, apparatus provided by the embodiments of the present application and storage medium, by sentiment dictionary obtain to
The primary vector of text being made of emotional value is analyzed, and memory network model obtains in short-term by the length with attention mechanism
To the secondary vector of text to be analyzed being made of attention weight.If calculating the distance of primary vector and secondary vector less than the
One preset threshold then obtains the emotion of text representation to be analyzed.In this way, by by sentiment dictionary, attention mechanism and length
When memory network model be combined, realize the excavation of the emotional factor to this paper to be analyzed.The emotional factor of excavation can
Preferably characterization text true intention to be expressed.
Text analyzing refers to the expression to text and its selection of characteristic item;Text analyzing is text mining, information retrieval
A basic problem, it quantifies the Feature Words extracted from text to indicate text information.The semantic meeting of text
Reflect specific position, viewpoint, value and interests.Emotion element abundant is generally comprised in these information, is had centainly
Researching value, therefore, how to efficiently use these information carry out text emotion analysis be increasingly becoming natural language processing and
The big hot subject of the one of artificial intelligence field.
If emotional semantic classification to be considered as to the general task of text classification, and have ignored the emotional factor that text contains.Below
To how to train the length with attention mechanism, memory network model is described in detail in short-term.As shown in Figure 1, including following
Step:
Step 101: reading sample text.
Step 102: the primary vector of the sample text being made of emotional value is obtained according to sentiment dictionary.
Wherein, in sentiment dictionary, each word or phrase are by expert's imparting feeling polarities or emotional intensity, researcher's knot
Sentiment dictionary data are closed, artificial rule, the emotional value of judgement sample text are constructed.
Step 103: the sample text of reading being input in the network model to be trained and obtain the sample text
The secondary vector being made of attention weight.
Step 104: the distance of the primary vector and the secondary vector is calculated, by adjusting the ginseng of the network model
It counts so that the distance is less than the first preset threshold.
In this way, can make to instruct by the way that sentiment dictionary, attention mechanism and long memory network model in short-term to be combined
The model perfected carries out text analyzing and more meets the emotion that the mankind are recognized.
In the embodiment of the present application, as shown in Fig. 2, being needed by the emotional value that sentiment dictionary obtains sample text to sample
Text is segmented, and obtains the emotional value of each text.Therefore, step 101 is specific implementable for following steps:
Step 201: the text in the sample text being labeled by part-of-speech tagging tool, obtains the word of each text
Property.
Wherein, so-called text refers to the result that participle obtains in the embodiment of the present application.Such as Chinese, segment
The text arrived is each independent word, such as: Chinese dream includes three texts, and after being segmented, obtained text is respectively
" in ", " state ", " dream ", and what each text referred to for foreign language such as English is each English word, such as to I have a
After dream is segmented, obtained text is respectively " I ", " have ", " a ", " dream ".
Wherein, part-of-speech tagging refers to for one correct part of speech of each label character in word segmentation result, namely determines every
A text is the process of noun, verb, adjective or other parts of speech.
Step 202: inquiry sentiment dictionary obtains emotional value of each text under its each part of speech.
Step 203: the primary vector of the sample text is made of the emotional value of each text.
In one embodiment, there are many paraphrase, such as power for a text possibility;Paraphrase are as follows: strength, swashs at electric power
It encourages, fast forward through ....And each paraphrase can correspond to an emotional value, if a text has a variety of paraphrase, it is determined that should
The method of the emotional value of text is specifically implementable are as follows:
Step A1: it is directed to each text, determines the paraphrase of the text.
Step A2: the corresponding part of speech of each paraphrase of the text is determined.
Step A3: in sentiment dictionary, the corresponding emotional value of each part of speech of the text is searched.
Step A4: using the ratio of the sum of the emotional value of the text and the paraphrase kind number of the text as the final of the text
Emotional value.
Wherein, for ease of understanding, when with the text of a variety of paraphrase, the emotional value of the text such as step A1-A4's is retouched
State available formula (1) expression:
In formula (1), S indicates emotional value, and n indicates the kind number of the text paraphrase, eiIndicate the emotion under every kind of paraphrase
Value.
In one embodiment, it if a text has 3 kinds of paraphrase, is counted after determining the corresponding emotional value of each paraphrase respectively
Calculate the emotional value of the text.For example, the corresponding emotional value of 3 kinds of paraphrase is respectively 2,4,6, then the emotional value of the text is (2+4+
6)/3=4, therefore, the emotional value of the text are 4.
In this way, when obtaining the emotional value of each text herein, if passing through above method meter there are many text of paraphrase
The emotional value of the text is calculated, to can quickly determine emotional value when facing the text of a variety of paraphrase.
In one embodiment, if a text has a variety of paraphrase, probability highest can be determined according to the context of text
A kind of paraphrase, and using the paraphrase as the emotional value of the text.
In the embodiment of the present application, it in order to further increase the accuracy of text analyzing, can filter out in sample text
The not high text of emotional value, it is specific it is implementable be step B1- step B2:
Step B1: the emotional value of each text is compared with the second preset threshold.
Step B2: text of the emotional value less than the second preset threshold and its corresponding emotional value are filtered out.
In one embodiment, a word in text is " I have a dream ", obtains each word in the words
Emotional value, such as: it be the emotional value of 8, " a " is that the emotional value of 2, " dream " is that the emotional value of " I ", which is the emotional value of 6, " have ",
10.If set second preset threshold is 5, " a " in the words is filtered out, therefore, filtered text is " I have
dream".When obtaining the primary vector of the text, using the emotional value of each text in filtered text as in primary vector
Element.It should be noted that the second preset threshold can be configured according to the actual situation, the application is without limitation.In this way, mistake
The text lower than the second preset threshold has been filtered, the emotional factor of text can have been protruded, to further increase the standard of text analyzing
True property.
After obtaining primary vector by sentiment dictionary, it is also necessary to sample text is put into network model to be trained into
Row training, obtains the secondary vector about sample text, as shown in figure 3, specific implementable are as follows:
Step 301: each text in the sample text being input in the network model to be trained, obtain each text
Word shared attention weight in the sample text.
Wherein, as described above, in the embodiment of the present application, if having filtered out sample text when obtaining the emotional value of text
Filtered sample text is input to by the not high text of emotional value in this then when being input to the network model wait train
In network model to be trained.
Step 302: by each text, shared attention weight constitutes the second of the sample text in the sample text
Vector.
As shown in figure 4, for network model training flow chart to be trained.Wherein, W1, W2 ... Wm are the text of input, will
Each text is input in LSTM (long memory network model in short-term) in sample text, can obtain each text institute in the sample text
The attention weight (H1, H2 ... Hm) accounted for.Wherein, W and H is corresponded.The shared attention in the sample text by each text
Secondary vector of the power weight as the sample text.
The network model to be trained in the embodiment of the present application is described above, below to the LSTM in the network model into
Row further instruction.It is specific implementable for step C1- step C3:
Step C1: the state of activation primitive is determined according to the text of input in forgeing gate layer;And according to activation primitive
State carries out selectivity to the pre-existing text in model and gives up, and obtains important element.
Step C2: the important element is carried out more according to gating function and the text of the input in input gate layer
Newly.
Step C3: according to gating function and activation primitive using updated element as shared by the text in output gate layer
Attention weight exported.
In this way, by calling LSTM to be trained sample text, it is available about LSTM for each in sample text
Attention weight shared by text.Wherein, Fig. 5 is the structural schematic diagram of LSTM.Wherein, σ is activation primitive, and tanh is gate letter
Number.
In the embodiment of the present application, after obtaining primary vector and secondary vector, by calculating primary vector and secondary vector
Distance adjust network model to be trained.The distance can be added in the loss function of the network model and be carried out
It calculates, is illustrated in figure 6 the flow diagram of this method, it may include:
Step 601: the distance being added in the loss function of the network model.
Step 602: adjusting the parameter in the loss function, make the distance less than the first preset threshold.
Wherein, the formula for calculating distance can be as shown in formula (2):
In formula (2), CviIndicate primary vector, attiIndicate secondary vector, LδIndicate primary vector and secondary vector
In the distance between mutual corresponding i-th of element.
It is added to distance above-mentioned as the parameter of loss function in loss function, obtained loss function can be such as formula
(3) shown in:
In formula (3), Loss is the value of loss function, and i and j are the indexes of sentence in training set, wherein i and j is not
With the index of label, y is the true distribution of label in text, and y^ is the label distribution of model prediction;β ‖ θ ‖ is that L2 regularization is punished
Penalty parameter, there is the case where over-fitting in network model to be trained in order to prevent.
In this way, distance and L2 regularization punishment parameter are added in the loss function of the model, by adjusting loss
Parameter in function can make trained model more accurate.
It is described in detail above how to train the memory network model in short-term of the length with attention mechanism, below by specific
Embodiment to how by trained network model to text to be analyzed carry out text analyzing be described in detail.Fig. 7 is
The flow diagram of text analyzing method, comprising the following steps:
Step 701: obtaining text to be analyzed.
Step 702: by the preparatory trained length memory network model analysis in short-term with attention mechanism to described
Text to be analyzed is analyzed, and the emotion of the text representation to be analyzed is obtained.
In this way, by trained network model, the emotion of available text representation to be analyzed is treated to realize
Analyze the excavation of the emotional factor of this paper.
Based on identical inventive concept, the embodiment of the present application also provides a kind of text analyzing devices.As shown in figure 8, should
Device includes:
Text module 801 is obtained, for obtaining text to be analyzed;
Analysis module 802, for memory network model to divide in short-term by the preparatory trained length with attention mechanism
The text to be analyzed is analyzed in analysis, obtains the emotion of the text representation to be analyzed;Wherein, the network model is root
It is obtained according to following methods training:
Vector module 803 is obtained, for reading sample text;And according to sentiment dictionary obtain the sample text by feelings
The primary vector that inductance value is constituted;And the sample text of reading is input in the network model to be trained and obtains the sample
The secondary vector being made of attention weight of this text;
Computing module 804, for calculating the distance of the primary vector and the secondary vector, by adjusting the network
The parameter of model makes the distance less than the first preset threshold.
Further, obtaining vector module 803 includes:
Mark part of speech unit is obtained for being labeled by part-of-speech tagging tool to the text in the sample text
The part of speech of each text;
Query unit obtains emotional value of each text under its each part of speech for inquiring sentiment dictionary;
Primary vector unit, for being made of the primary vector of the sample text the emotional value of each text.
Further, query unit includes:
It determines paraphrase subelement, for being directed to each text, determines the paraphrase of the text;
Part of speech subelement is determined, for determining the corresponding part of speech of each paraphrase of the text;
Subelement is searched, for searching the corresponding emotional value of each part of speech of the text in sentiment dictionary;
Emotional value subelement, for using the ratio of the sum of the emotional value of the text and the paraphrase kind number of the text as this article
The final emotional value of word.
Further, obtaining vector module 803 includes:
Attention weight unit, for each text in the sample text to be input to the network model to be trained
In, obtain each text shared attention weight in the sample text;
Secondary vector unit, for shared attention weight to constitute the sample text in the sample text by each text
This secondary vector.
Further, described device further include:
Comparison module, for obtain vector module 803 according to sentiment dictionary obtain the sample text by emotional value structure
At primary vector before, the emotional value of each text is compared with the second preset threshold;
Filtering module, for filtering out text of the emotional value less than the second preset threshold and its corresponding emotional value.
Further, computing module 804 includes:
Adding unit, for the distance to be added in the loss function of the network model;
Adjustment unit makes the distance less than the first preset threshold for adjusting the parameter in the loss function.
It further, include L2 regularization punishment parameter in the loss function.
After describing the method and device of text analyzing of the application illustrative embodiments, next, introducing root
According to the computing device of the another exemplary embodiment of the application.
Person of ordinary skill in the field it is understood that the various aspects of the application can be implemented as system, method or
Program product.Therefore, the various aspects of the application can be with specific implementation is as follows, it may be assumed that complete hardware embodiment, complete
The embodiment combined in terms of full Software Implementation (including firmware, microcode etc.) or hardware and software, can unite here
Referred to as circuit, " module " or " system ".
In some possible embodiments, according to an embodiment of the present application, computing device can include at least at least one
A processor and at least one processor.Wherein, memory is stored with program code, when program code is executed by processor
When, so that processor executes the text analyzing method according to the various illustrative embodiments of the application of this specification foregoing description
In step 701- step 702.
The computing device 90 of this embodiment according to the application is described referring to Fig. 9.The calculating dress that Fig. 9 is shown
Setting 90 is only an example, should not function to the embodiment of the present application and use scope bring any restrictions.The computing device
Such as can be mobile phone, tablet computer etc..
As shown in figure 9, computing device 90 is showed in the form of general-purpose calculating appts.The component of computing device 90 may include
But it is not limited to: at least one above-mentioned processor 91, above-mentioned at least one processor 92, (including the storage of the different system components of connection
Device 92 and processor 91) bus 93.
Bus 911 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller,
Peripheral bus, processor or the local bus using any bus structures in a variety of bus structures.
Memory 92 may include the readable medium of form of volatile memory, such as random access memory (RAM) 921
And/or cache memory 922, it can further include read-only memory (ROM) 923.
Memory 92 can also include program/utility 925 with one group of (at least one) program module 924, this
The program module 924 of sample includes but is not limited to: operating system, one or more application program, other program modules and journey
It may include the realization of network environment in ordinal number evidence, each of these examples or certain combination.
Computing device 90 can also be communicated with one or more external equipments 94 (such as sensing equipment etc.), can also be with one
Or it is multiple enable a user to the equipment interacted with computing device 90 communication, and/or with enable the computing device 90 and one
Or any equipment (such as router, modem etc.) communication that a number of other computing devices are communicated.This communication can
To be carried out by input/output (I/O) interface 95.Also, computing device 90 can also by network adapter 96 and one or
The multiple networks of person (such as local area network (LAN), wide area network (WAN) and/or public network, such as internet) communication.As shown,
Network adapter 96 is communicated by bus 93 with other modules for computing device 90.It will be appreciated that though be not shown in the figure,
Other hardware and/or software module can be used in conjunction with computing device 90, including but not limited to: microcode, device driver, superfluous
Remaining processor, external disk drive array, RAID system, tape drive and data backup storage system etc..
In some possible embodiments, the various aspects of text analyzing method provided by the present application are also implemented as
A kind of form of program product comprising program code, when program product is run on a computing device, program code is used for
Computer equipment is set to execute the side of the text analyzing according to the various illustrative embodiments of the application of this specification foregoing description
Step in method executes step 701- step 702 as shown in Figure 7.
Program product can be using any combination of one or more readable mediums.Readable medium can be readable signal Jie
Matter or readable storage medium storing program for executing.Readable storage medium storing program for executing for example may be-but not limited to-electricity, magnetic, optical, electromagnetic, infrared
The system of line or semiconductor, device or device, or any above combination.The more specific example of readable storage medium storing program for executing is (non-
The list of exhaustion) include: electrical connection with one or more conducting wires, portable disc, hard disk, random access memory (RAM),
Read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, the read-only storage of portable compact disc
Device (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.
The text analyzing method of the application embodiment can be using portable compact disc read only memory (CD-ROM) simultaneously
Including program code, and can run on the computing device.However, the program product of the application is without being limited thereto, in this document,
Readable storage medium storing program for executing can be any tangible medium for including or store program, which can be commanded execution system, device
Either device use or in connection.
Readable signal medium may include in a base band or as the data-signal that carrier wave a part is propagated, wherein carrying
Readable program code.The data-signal of this propagation can take various forms, including --- but being not limited to --- electromagnetism letter
Number, optical signal or above-mentioned any appropriate combination.Readable signal medium can also be other than readable storage medium storing program for executing it is any can
Read medium, the readable medium can send, propagate or transmit for by instruction execution system, device or device use or
Program in connection.
The program code for including on readable medium can transmit with any suitable medium, including --- but being not limited to ---
Wirelessly, wired, optical cable, RF etc. or above-mentioned any appropriate combination.
Can with any combination of one or more programming languages come write for execute the application operation program
Code, programming language include object oriented program language-Java, C++ etc., further include conventional process
Formula programming language-such as " C " language or similar programming language.Program code can be calculated fully in user
It executes on device, partly execute on a user device, executing, as an independent software package partially in user's computing device
Upper part executes on remote computing device or executes on remote computing device or server completely.It is being related to remotely counting
In the situation for calculating device, remote computing device can pass through the network of any kind --- including local area network (LAN) or wide area network
(WAN)-it is connected to user's computing device, or, it may be connected to external computing device (such as provided using Internet service
Quotient is connected by internet).
It should be noted that although being referred to several unit or sub-units of device in the above detailed description, this stroke
It point is only exemplary not enforceable.In fact, according to presently filed embodiment, it is above-described two or more
The feature and function of unit can embody in a unit.Conversely, the feature and function of an above-described unit can
It is to be embodied by multiple units with further division.
In addition, although in the accompanying drawings sequentially to describe the operation of the application method, this does not require that or implies
These operations must be sequentially executed according to this, or have to carry out operation shown in whole and be just able to achieve desired result.It is attached
Add ground or it is alternatively possible to omit certain steps, multiple steps are merged into a step and are executed, and/or by a step point
Solution is execution of multiple steps.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with side
In the computer-readable memory of formula work, so that it includes instruction dress that instruction stored in the computer readable memory, which generates,
The manufacture set, the command device are realized in one box of one or more flows of the flowchart and/or block diagram or multiple
The function of being specified in box.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
Although the preferred embodiment of the application has been described, it is created once a person skilled in the art knows basic
Property concept, then additional changes and modifications can be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as
It selects embodiment and falls into all change and modification of the application range.
Obviously, those skilled in the art can carry out various modification and variations without departing from the essence of the application to the application
Mind and range.In this way, if these modifications and variations of the application belong to the range of the claim of this application and its equivalent technologies
Within, then the application is also intended to include these modifications and variations.
Claims (10)
1. a kind of text analyzing method, which is characterized in that the described method includes:
Obtain text to be analyzed;
By the preparatory trained length memory network model analysis in short-term with attention mechanism to the text to be analyzed into
Row analysis, obtains the emotion of the text representation to be analyzed;Wherein, the network model is obtained according to following methods training
:
Read sample text;And the primary vector of the sample text being made of emotional value is obtained according to sentiment dictionary;And it will
The sample text of reading, which is input in the network model to be trained, obtains being made of attention weight for the sample text
Secondary vector;
Calculate the distance of the primary vector and the secondary vector, by adjusting the network model parameter make it is described away from
From less than the first preset threshold.
2. the method according to claim 1, wherein it is described according to sentiment dictionary obtain the sample text by
The primary vector that emotional value is constituted, specifically includes:
The text in the sample text is labeled by part-of-speech tagging tool, obtains the part of speech of each text;
Sentiment dictionary is inquired, emotional value of each text under its each part of speech is obtained;
The primary vector of the sample text is made of the emotional value of each text.
3. according to the method described in claim 2, it is characterized in that, the inquiry sentiment dictionary, obtains each text in its each word
Emotional value under property, specifically includes:
For each text, the paraphrase of the text is determined;
Determine the corresponding part of speech of each paraphrase of the text;
In sentiment dictionary, the corresponding emotional value of each part of speech of the text is searched;
Using the ratio of the sum of the emotional value of the text and the paraphrase kind number of the text as the final emotional value of the text.
4. the method according to claim 1, wherein described be input to the sample text of reading institute to be trained
The secondary vector being made of attention weight for obtaining the sample text in network model is stated, is specifically included:
Each text in the sample text is input in the network model to be trained, obtains each text in the sample
Shared attention weight in text;
By each text, shared attention weight constitutes the secondary vector of the sample text in the sample text.
5. according to the method described in claim 2, it is characterized in that, it is described according to sentiment dictionary obtain the sample text by
Before the primary vector that emotional value is constituted, the method also includes:
The emotional value of each text is compared with the second preset threshold;
Filter out text of the emotional value less than the second preset threshold and its corresponding emotional value.
6. according to the method described in claim 5, it is characterized in that, the primary vector and the secondary vector of calculating
Distance makes the distance less than the first preset threshold, specifically includes by adjusting the parameter of the network model:
The distance is added in the loss function of the network model;
The parameter in the loss function is adjusted, makes the distance less than the first preset threshold.
7. according to the method described in claim 6, it is characterized in that, including L2 regularization punishment parameter in the loss function.
8. a kind of text analyzing device, which is characterized in that described device includes:
Text module is obtained, for obtaining text to be analyzed;
Analysis module, for passing through the preparatory trained length memory network model analysis in short-term with attention mechanism to described
Text to be analyzed is analyzed, and the emotion of the text representation to be analyzed is obtained;Wherein, the network model is according to lower section
Method training obtains:
Vector module is obtained, for reading sample text;And according to sentiment dictionary obtain the sample text by emotional value structure
At primary vector;And the sample text of reading is input in the network model to be trained and obtains the sample text
The secondary vector being made of attention weight;
Computing module, for calculating the distance of the primary vector and the secondary vector, by adjusting the network model
Parameter makes the distance less than the first preset threshold.
9. a kind of computer-readable medium, is stored with computer executable instructions, which is characterized in that the computer is executable to be referred to
It enables for executing the method as described in any claim in claim 1-7.
10. a kind of computing device characterized by comprising
At least one processor;And the memory being connect at least one described processor communication;Wherein, the memory is deposited
The instruction that can be executed by least one described processor is contained, described instruction is executed by least one described processor, so that institute
It states at least one processor and is able to carry out method as described in any claim in claim 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910220954.6A CN110032736A (en) | 2019-03-22 | 2019-03-22 | A kind of text analyzing method, apparatus and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910220954.6A CN110032736A (en) | 2019-03-22 | 2019-03-22 | A kind of text analyzing method, apparatus and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110032736A true CN110032736A (en) | 2019-07-19 |
Family
ID=67236423
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910220954.6A Pending CN110032736A (en) | 2019-03-22 | 2019-03-22 | A kind of text analyzing method, apparatus and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110032736A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110427330A (en) * | 2019-08-13 | 2019-11-08 | 腾讯科技(深圳)有限公司 | A kind of method and relevant apparatus of code analysis |
CN110991163A (en) * | 2019-11-29 | 2020-04-10 | 达而观信息科技(上海)有限公司 | Document comparison analysis method and device, electronic equipment and storage medium |
CN111291187A (en) * | 2020-01-22 | 2020-06-16 | 北京芯盾时代科技有限公司 | Emotion analysis method and device, electronic equipment and storage medium |
CN116738298A (en) * | 2023-08-16 | 2023-09-12 | 杭州同花顺数据开发有限公司 | Text classification method, system and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105138506A (en) * | 2015-07-09 | 2015-12-09 | 天云融创数据科技(北京)有限公司 | Financial text sentiment analysis method |
WO2017101342A1 (en) * | 2015-12-15 | 2017-06-22 | 乐视控股(北京)有限公司 | Sentiment classification method and apparatus |
CN107077486A (en) * | 2014-09-02 | 2017-08-18 | 菲特尔销售工具有限公司 | Affective Evaluation system and method |
WO2017149540A1 (en) * | 2016-03-02 | 2017-09-08 | Feelter Sales Tools Ltd | Sentiment rating system and method |
CN108170681A (en) * | 2018-01-15 | 2018-06-15 | 中南大学 | Text emotion analysis method, system and computer readable storage medium |
CN108460009A (en) * | 2017-12-14 | 2018-08-28 | 中山大学 | The attention mechanism Recognition with Recurrent Neural Network text emotion analytic approach of embedded sentiment dictionary |
CN108932227A (en) * | 2018-06-05 | 2018-12-04 | 天津大学 | A kind of short text emotion value calculating method based on sentence structure and context |
CN109271493A (en) * | 2018-11-26 | 2019-01-25 | 腾讯科技(深圳)有限公司 | A kind of language text processing method, device and storage medium |
-
2019
- 2019-03-22 CN CN201910220954.6A patent/CN110032736A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107077486A (en) * | 2014-09-02 | 2017-08-18 | 菲特尔销售工具有限公司 | Affective Evaluation system and method |
CN105138506A (en) * | 2015-07-09 | 2015-12-09 | 天云融创数据科技(北京)有限公司 | Financial text sentiment analysis method |
WO2017101342A1 (en) * | 2015-12-15 | 2017-06-22 | 乐视控股(北京)有限公司 | Sentiment classification method and apparatus |
WO2017149540A1 (en) * | 2016-03-02 | 2017-09-08 | Feelter Sales Tools Ltd | Sentiment rating system and method |
CN108460009A (en) * | 2017-12-14 | 2018-08-28 | 中山大学 | The attention mechanism Recognition with Recurrent Neural Network text emotion analytic approach of embedded sentiment dictionary |
CN108170681A (en) * | 2018-01-15 | 2018-06-15 | 中南大学 | Text emotion analysis method, system and computer readable storage medium |
CN108932227A (en) * | 2018-06-05 | 2018-12-04 | 天津大学 | A kind of short text emotion value calculating method based on sentence structure and context |
CN109271493A (en) * | 2018-11-26 | 2019-01-25 | 腾讯科技(深圳)有限公司 | A kind of language text processing method, device and storage medium |
Non-Patent Citations (4)
Title |
---|
XIAOFENG CAI等: "Multi-view and Attention-Based BI-LSTM for Weibo Emotion Recognition", 《NCCE 2018》 * |
於雯;周武能;: "基于LSTM的商品评论情感分析", 计算机系统应用, no. 08 * |
易顺明;周洪斌;周国栋;: "Twitter推文与情感词典SentiWordNet匹配算法研究", 南京师范大学学报(工程技术版), no. 03 * |
易顺明;易昊;周国栋;: "采用情感特征向量的Twitter情感分类方法研究", 小型微型计算机系统, no. 11 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110427330A (en) * | 2019-08-13 | 2019-11-08 | 腾讯科技(深圳)有限公司 | A kind of method and relevant apparatus of code analysis |
CN110427330B (en) * | 2019-08-13 | 2023-09-26 | 腾讯科技(深圳)有限公司 | Code analysis method and related device |
CN110991163A (en) * | 2019-11-29 | 2020-04-10 | 达而观信息科技(上海)有限公司 | Document comparison analysis method and device, electronic equipment and storage medium |
CN110991163B (en) * | 2019-11-29 | 2023-09-19 | 达观数据有限公司 | Document comparison and analysis method and device, electronic equipment and storage medium |
CN111291187A (en) * | 2020-01-22 | 2020-06-16 | 北京芯盾时代科技有限公司 | Emotion analysis method and device, electronic equipment and storage medium |
CN111291187B (en) * | 2020-01-22 | 2023-08-08 | 北京芯盾时代科技有限公司 | Emotion analysis method and device, electronic equipment and storage medium |
CN116738298A (en) * | 2023-08-16 | 2023-09-12 | 杭州同花顺数据开发有限公司 | Text classification method, system and storage medium |
CN116738298B (en) * | 2023-08-16 | 2023-11-24 | 杭州同花顺数据开发有限公司 | Text classification method, system and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108363790B (en) | Method, device, equipment and storage medium for evaluating comments | |
KR102577514B1 (en) | Method, apparatus for text generation, device and storage medium | |
CN109241524B (en) | Semantic analysis method and device, computer-readable storage medium and electronic equipment | |
CN110717339A (en) | Semantic representation model processing method and device, electronic equipment and storage medium | |
CN110032736A (en) | A kind of text analyzing method, apparatus and storage medium | |
CN111241237B (en) | Intelligent question-answer data processing method and device based on operation and maintenance service | |
CN102866989B (en) | Viewpoint abstracting method based on word dependence relationship | |
CN111738016B (en) | Multi-intention recognition method and related equipment | |
CN114694076A (en) | Multi-modal emotion analysis method based on multi-task learning and stacked cross-modal fusion | |
CN109271493A (en) | A kind of language text processing method, device and storage medium | |
CN113239169B (en) | Answer generation method, device, equipment and storage medium based on artificial intelligence | |
CN111310440B (en) | Text error correction method, device and system | |
Wang et al. | Response selection for multi-party conversations with dynamic topic tracking | |
CN108228576B (en) | Text translation method and device | |
CN111144120A (en) | Training sentence acquisition method and device, storage medium and electronic equipment | |
CN110377905A (en) | Semantic expressiveness processing method and processing device, computer equipment and the readable medium of sentence | |
CN108536670A (en) | Output statement generating means, methods and procedures | |
CN110851601A (en) | Cross-domain emotion classification system and method based on layered attention mechanism | |
CN110717341A (en) | Method and device for constructing old-Chinese bilingual corpus with Thai as pivot | |
US20230094730A1 (en) | Model training method and method for human-machine interaction | |
CN115357719A (en) | Power audit text classification method and device based on improved BERT model | |
CN116010581A (en) | Knowledge graph question-answering method and system based on power grid hidden trouble shooting scene | |
CN116127060A (en) | Text classification method and system based on prompt words | |
CN113723077B (en) | Sentence vector generation method and device based on bidirectional characterization model and computer equipment | |
CN116913278B (en) | Voice processing method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
AD01 | Patent right deemed abandoned | ||
AD01 | Patent right deemed abandoned |
Effective date of abandoning: 20240322 |