CN106547740A - Text message processing method and device - Google Patents

Text message processing method and device Download PDF

Info

Publication number
CN106547740A
CN106547740A CN201611043882.5A CN201611043882A CN106547740A CN 106547740 A CN106547740 A CN 106547740A CN 201611043882 A CN201611043882 A CN 201611043882A CN 106547740 A CN106547740 A CN 106547740A
Authority
CN
China
Prior art keywords
word
dictionary
text message
term vector
emotion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611043882.5A
Other languages
Chinese (zh)
Inventor
黄勇
卢康
张磊
宋国志
崔凯铜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Silent Information Technology Co Ltd
Original Assignee
Sichuan Silent Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Silent Information Technology Co Ltd filed Critical Sichuan Silent Information Technology Co Ltd
Priority to CN201611043882.5A priority Critical patent/CN106547740A/en
Publication of CN106547740A publication Critical patent/CN106547740A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a kind of text message processing method and device, belong to natural language processing and data mining technology field.Methods described includes:Obtain text message;Word segmentation processing is carried out to the text message and obtains multiple words undetermined;Obtain the plurality of word undetermined and distinguish corresponding term vector;Calculate the similarity of the corresponding term vector of each word undetermined term vector corresponding with each the emotion word in default sentiment dictionary;The emotion attribute of the text message is judged according to the similarity of the corresponding term vector of each word undetermined term vector corresponding with each the emotion word in the sentiment dictionary.Compared to existing method, text message processing method and device that the present invention is provided reduce the requirement of the renewal speed to sentiment dictionary, avoid sentiment dictionary and update the problem for causing sentiment analysis effect poor not in time, be effectively improved precision of analysis.

Description

Text message processing method and device
Technical field
The present invention relates to natural language processing and data mining technology field, at a kind of text message Reason method and device.
Background technology
The sentiment analysis of text message are that the subjective texts with emotional color are analyzed, conclusion and reasoning is processed Process, be widely used in the aspects such as Internet public opinion analysis and early warning, business decision.Traditional sentiment analysis method is main Emotion attribute based on sample word.For example, sample word includes substantial amounts of emotion word, judges to treat by searching sample word The emotion attribute of analysis word.But, existing method extremely relies on these sample words, if searched not in sample word The emotion attribute of word to be analyzed cannot be then obtained to word to be analyzed, that is to say, that renewal of the existing method to sample word Rate request is very high.
The content of the invention
In view of this, it is an object of the present invention to provide a kind of text message processing method and device, can be effective Ground improves the problems referred to above.
To achieve these goals, technical scheme is as follows:
In a first aspect, embodiments providing a kind of text message processing method, methods described includes:Obtain text Information;Word segmentation processing is carried out to the text message and obtains multiple words undetermined;Obtain the plurality of word undetermined to correspond to respectively Term vector;Calculate the corresponding term vector of each word undetermined word corresponding with each the emotion word in default sentiment dictionary The similarity of vector, wherein, the sentiment dictionary includes at least two dictionaries, and each described dictionary belongs to corresponding to a kind of emotion Property, each described dictionary includes at least one emotion word, each emotion word one term vector of correspondence;According to each word undetermined The similarity of the corresponding term vector of language term vector corresponding with each the emotion word in the sentiment dictionary judges the text The emotion attribute of information.
Second aspect, the embodiment of the present invention additionally provide a kind of text message processing apparatus, and described device includes:Text envelope Breath acquisition module, word-dividing mode, term vector acquisition module, similarity calculation module and emotion attribute determination module.Text message Acquisition module, for obtaining text message.Word-dividing mode, obtains multiple undetermined for word segmentation processing is carried out to the text message Word.Term vector acquisition module, distinguishes corresponding term vector for obtaining the plurality of word undetermined.Similarity calculation module, For calculating the corresponding term vector of each word undetermined term vector corresponding with each the emotion word in default sentiment dictionary Similarity, wherein, the sentiment dictionary includes at least two dictionaries, each described dictionary correspond to a kind of emotion attribute, often The individual dictionary includes at least one emotion word, each emotion word one term vector of correspondence.Emotion attribute determination module, uses In the phase according to the corresponding term vector of each word undetermined term vector corresponding with each the emotion word in the sentiment dictionary The emotion attribute of the text message is judged like degree.
Text message processing method provided in an embodiment of the present invention and device be according to the corresponding word of each word undetermined to The similarity of amount term vector corresponding with each the emotion word in default sentiment dictionary belongs to come the emotion for judging text message Property.Compared to existing method, it is not necessary to ensure to exist in default sentiment dictionary the word undetermined that needs to judge and its corresponding Emotion attribute, reduce the more new demand to sentiment dictionary, it is to avoid sentiment dictionary updates causes the sentiment analysis to imitate not in time Really poor problem, is effectively improved precision of analysis.
To enable the above objects, features and advantages of the present invention to become apparent, preferred embodiment cited below particularly, and coordinate Appended accompanying drawing, is described in detail below.
Description of the drawings
In order to be illustrated more clearly that the technical scheme of the embodiment of the present invention, below by to be used attached needed for embodiment Figure is briefly described, it will be appreciated that the following drawings illustrate only certain embodiments of the present invention, thus be not construed as it is right The restriction of scope, for those of ordinary skill in the art, on the premise of not paying creative work, can be with according to this A little accompanying drawings obtain other related accompanying drawings.
Fig. 1 shows a kind of structured flowchart of the computer that can be applicable to the embodiment of the present invention;
Fig. 2 shows a kind of schematic flow sheet of text message processing method that first embodiment of the invention is provided;
Fig. 3 shows a kind of embodiment party of step S150 in the text message processing method that first embodiment of the invention is provided The schematic flow sheet of formula;
Fig. 4 shows another kind of enforcement of step S150 in the text message processing method that first embodiment of the invention is provided The schematic flow sheet of mode;
Fig. 5 shows the schematic flow sheet of the text message processing method that second embodiment of the invention is provided;
Fig. 6 shows the model support composition of NNLM models;
Fig. 7 shows the model support composition of CBOW models;
Fig. 8 shows the model support composition of Skip-gram models;
Fig. 9 shows a kind of functional block diagram of text message processing apparatus that third embodiment of the invention is provided;
Figure 10 shows a kind of functional block diagram of text message processing apparatus that fourth embodiment of the invention is provided.
Specific embodiment
Below in conjunction with accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Ground description, it is clear that described embodiment is only a part of embodiment of the invention, rather than the embodiment of whole.Generally exist The component of the embodiment of the present invention described and illustrated in accompanying drawing can be arranged and be designed with a variety of configurations herein.Cause This, the detailed description of the embodiments of the invention to providing in the accompanying drawings is not intended to limit claimed invention below Scope, but it is merely representative of the selected embodiment of the present invention.Based on embodiments of the invention, those skilled in the art are not doing The every other embodiment obtained on the premise of going out creative work, belongs to the scope of protection of the invention.
As shown in figure 1, being a kind of block diagram of computer 100.The computer 100 includes that text information processing is filled Put, memorizer 120, storage control 130, processor 140, Peripheral Interface 150, input-output unit 160 and display unit 170。
It is the memorizer 120, storage control 130, processor 140, Peripheral Interface 150, input-output unit 160, aobvious Show that 170 each element of unit is directly or indirectly electrically connected with each other, to realize the transmission or interaction of data.For example, these Element can pass through one or more communication bus each other or holding wire is realized being electrically connected with.Text message processing apparatus include During at least one can be stored in the memorizer 120 in the form of software or firmware (firmware) or it is solidificated in the computer Software function module in 100 operating system (operating system, OS).The processor 140 is used to perform storage The executable module stored in device 120, the software function mould that such as text message processing apparatus provided in an embodiment of the present invention include Block or computer program.
Wherein, memorizer 120 may be, but not limited to, random access memory (Random Access Memory, RAM), read only memory (Read Only Memory, ROM), programmable read only memory (Programmable Read-Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM), Electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only Memory, EEPROM) etc.. Wherein, memorizer 120 is used for storage program, and the processor 140 performs described program after execute instruction is received, this Method performed by the computer 100 of the flow definition that bright any embodiment is disclosed is can apply in processor 140, or Realized by processor 140.
A kind of possibly IC chip of processor 140, the disposal ability with signal.Above-mentioned processor 140 can Being general processor, including central processing unit (Central Processing Unit, abbreviation CPU), network processing unit (Network Processor, abbreviation NP) etc.;Can also be digital signal processor (DSP), special IC (ASIC), It is ready-made programmable gate array (FPGA) or other PLDs, discrete gate or transistor logic, discrete hard Part component.Can realize or perform disclosed each method in the embodiment of the present invention, step and logic diagram.General processor Can be microprocessor or the processor can also be any conventional processor etc..
Various input/output devices are coupled to processor 140 and memorizer 120 by the Peripheral Interface 150.At some In embodiment, Peripheral Interface 150, processor 140 and storage control 130 can be realized in one single chip.Other one In a little examples, they can be realized by independent chip respectively.
Input-output unit 160 is used to be supplied to user input data to realize interacting for user and computer 100.It is described defeated Enter output unit 160 may be, but not limited to, mouse and keyboard etc..
Display unit 170 provide between computer 100 and user an interactive interface (such as user interface) or Refer to user for display image data.In the present embodiment, the display unit 170 can be liquid crystal display or touch-control Display.If touch control display, which can be the capacitance type touch control screen or electric resistance touch-control for supporting single-point and multi-point touch operation Screen etc..Support that single-point and multi-point touch operation refer to that touch control display can sense on the touch control display one or more The touch control operation for being produced at position simultaneously, and transfer to processor 140 to be calculated and processed the touch control operation for sensing.
It is appreciated that structure shown in Fig. 1 is only illustrated, computer 100 may also include more more than shown in Fig. 1 or more Few component, or with the configuration different from shown in Fig. 1.Each component shown in Fig. 1 can adopt hardware, software or its group Close and realize.
Text is made up of word, just can determine that the emotion category of whole text with reference to the emotion attribute of each word Property.The emotion attribute of word is for example liked and is detested, actively and passiveness etc. it can be appreciated that the Sentiment orientation that represents of word. The embodiment of the present invention is primarily directed in Chinese short text, it is proposed that a kind of emotion attribute analysis method of text message.According to The similarity of the corresponding term vector of each word undetermined term vector corresponding with each the emotion word in default sentiment dictionary To judge the emotion attribute of text message, effectively reduce the requirement of the renewal speed to sentiment dictionary, it is to avoid sentiment dictionary The problem that renewal causes sentiment analysis effect poor not in time, is effectively improved precision of analysis.Certainly, the present invention The text message processing method and device that embodiment is provided can be used for the emotion attribute analysis of other Languages text.
First embodiment
The flow chart that Fig. 2 shows a kind of text message processing method that first embodiment of the invention is provided.Refer to figure 2, the text message processing method includes:
Step S110, obtains text message;
In the present embodiment, text message is mainly Chinese short text, can be input into by input-output unit 160, or, Network Capture can also be passed through.Certainly, text message can also be the text of other Languages, for example, it is also possible to be English text.
Step S120, carries out word segmentation processing and obtains multiple words undetermined to the text message;
When text message is Chinese text, different from English text between two neighboring word using space as dividing naturally Boundary accords with, no obvious delimiter between the adjacent word of Chinese text, accordingly, it would be desirable to carry out at Chinese word segmentation to text message Reason.Chinese word segmentation will Chinese character sequence be cut into one by one individually word.In the present embodiment, Chinese word segmentation can be selected Python jieba participles component or Chinese lexical analysis system (Institute of Computing Technology, Chinese Lexical Analysis System, ICTCLAS).It is of course also possible to use other Chinese Word Automatic Segmentations.On The two kinds of Chinese word cutting methods stated respectively have advantage, wherein, jieba participles component is installed, is used simply, and ICTCLAS participles Precision is higher.
Additionally, before word segmentation processing is carried out to text message, in order to improve analysis efficiency, needing to carry out text message Data prediction.The data prediction of text message is included:Data cleansing is carried out to the text message, by text without There are the html tag of emotion attribute, CSS labels, URL link etc. to remove.
Further, the data prediction of text message is also included:The coded format of text message is changed into default volume Code form.For example, the coded formats such as UTF-8, GBK can be converted to.For example, the coding of text message is entered using chardet Row identification, and by the decode functions of python, encode functions, it is possible to rapidly the coding of text message is united One.
In order to further save memory space and improve analysis efficiency, word segmentation processing is carried out to text message and obtains multiple treating After determining word, in addition it is also necessary to remove the stop words in the plurality of word undetermined.Stop words is usually function word, with other word phases Than being typically no physical meaning, can be preposition, pronoun, function word and some characters unrelated with emotion etc., for example " i.e. Make ", " 1. ", " # " etc..The mode for removing stop words is specifically as follows:By resulting word multiple undetermined and default deactivation The stop words that vocabulary includes is contrasted, and when the stop words consistent with word undetermined is found in vocabulary is disabled, is removed The word undetermined.For example, above-mentioned deactivation vocabulary can adopt the deactivation word list of Harbin Institute of Technology's offer, Baidu to provide Deactivation word list and Sichuan University's machine intelligence laboratory disable one or more in word list.
Step S130, obtains the plurality of word undetermined and distinguishes corresponding term vector;
Term vector can be represented with Distributed representation, be a kind of low-dimensional real number vector.Term vector With the good feature of semanteme, it is the usual way for characterizing word feature.Which represents certain semantic and grammer per one-dimensional value The feature of upper explanation.Therefore, it can for the every one-dimensional of term vector to be referred to as a word feature.The term vector of certain word can pass through Language model training is obtained, i.e., each word is mapped to K by training and ties up real number, and wherein K is the integer more than 1.For example, certain The term vector of word can be expressed as [0.785,0.109, -0.117, -0.127,0.652 ...].
Step S140, calculates each the emotion word in the corresponding term vector of each word undetermined and default sentiment dictionary The similarity of corresponding term vector;
Wherein, sentiment dictionary includes at least two dictionaries, and each described dictionary corresponds to a kind of emotion attribute, described in each Dictionary includes at least one emotion word.
Sentiment dictionary can need setting according to user.For example, when user needs to differentiate the Sentiment orientation of microblogging text, Differentiate microblogging text belong to front, it is negative or neutral when, sentiment dictionary can include three dictionaries, the corresponding emotion of difference Attribute is front, negative and neutrality.Correspondence emotion attribute is that positive dictionary includes multiple representative front words, For example, like, happiness, happiness etc..Correspondence emotion attribute is that negative dictionary includes multiple representative negative words, For example, oppressive, gloomy etc..Correspondence emotion attribute is that neutral dictionary includes multiple representative neutral words.
In above-mentioned sentiment dictionary, each emotion word one term vector of correspondence.It should be noted that each emotion word pair The term vector answered can be using obtaining with step S130 identical method, and the corresponding term vector of emotion word and word pair undetermined The dimension of the term vector answered is identical.
At this point it is possible to according to the corresponding term vector of certain word undetermined word corresponding with certain the emotion word in sentiment dictionary to Amount judges the semantic similarity between the two words.For example, language can be judged by methods such as cosine similarity, Euclidean distances Adopted similarity.By taking cosine similarity algorithm as an example, the included angle cosine value between two term vectors is calculated as the two term vectors The similarity of corresponding word, it is assumed that two term vectors are respectively a, b, the angle of two term vectors is θ, cos θ=(ab)/ (a|·|b).At this point it is possible to calculate the similarity in each word undetermined and sentiment dictionary between each emotion word respectively.
Step S150, according to each the emotion word pair in the corresponding term vector of each word undetermined and the sentiment dictionary The similarity of the term vector answered judges the emotion attribute of the text message.
With reference to the semanteme of all words undetermined, you can to judge the emotion attribute of whole text message.For example, when being needed Determine in word, the quantity of the word undetermined of a certain emotion attribute is most, you can to judge that the emotion attribute is whole text The emotion attribute of information.
As in sentiment dictionary, each dictionary includes at least one emotion word, certainly, in order that the feelings of word undetermined The judgement of sense attribute is more accurate, and each dictionary preferably includes multiple emotion words.
Specifically, as shown in figure 3, a kind of embodiment of step S150 can include step S151 and step S152.
Step S151, belongs to same in calculating the corresponding term vector of all words undetermined and the sentiment dictionary respectively The similarity sum of each corresponding term vector of emotion word of dictionary, as the phase between each dictionary and the text message Like degree.
Can by word each word undetermined corresponding term vector corresponding with each emotion word for belonging to same dictionary to The similarity sum of amount is used as the similarity between each dictionary and each word undetermined.Then again by all words undetermined with it is same Similarity sum between one dictionary is used as the similarity between the dictionary and text message.Thus, it is possible to try to achieve each word Similarity between storehouse and text message.
For example, text message is 20 according to the word undetermined that step S120 is divided, and default sentiment dictionary includes difference Three dictionaries of emotion attribute, respectively S1, S2 and S3, each dictionary include 10 emotion words.It is undetermined that each is calculated respectively Word and the similarity of dictionary S1, S2, S3, then calculate the similarity sum of 20 words undetermined and dictionary S1 again as text Similarity between information and dictionary S1, the similarity sum for calculating 20 words undetermined with dictionary S2 is used as text message and word Similarity between the S2 of storehouse, calculates the similarity sum of 20 words undetermined and dictionary S3 as between text message and dictionary S3 Similarity.
Step S152, using the emotion attribute of the similarity between the text message maximum dictionary as the text The emotion attribute of information.
Similarity in comparison step S151 between calculated each dictionary and text message, will with text message it Between the maximum dictionary of similarity emotion attribute as text message emotion attribute.Assume compared between dictionary S1 Similarity and with the similarity between dictionary S2, similarity between 20 above-mentioned words undetermined and dictionary S3 is maximum, then will Emotion attribute of the emotion attribute of dictionary S3 as text message.
As another embodiment, it is also possible to by each word undetermined and each emotion word for belonging to same dictionary Meansigma methodss between similarity are used as the word undetermined and the similarity of each dictionary.Then calculate again all words undetermined with it is same The similarity sum of one dictionary is used as the similarity between the dictionary and text message.
In a kind of specific embodiment of the present embodiment, sentiment dictionary includes the first dictionary and the second dictionary, wherein, institute It is positive to state the corresponding emotion attribute of the first dictionary, and the corresponding emotion attribute of second dictionary is passiveness.Now, such as Fig. 4 institutes Show, it is above-mentioned calculate the corresponding term vector of all words undetermined and sentiment dictionary respectively in belong to each feelings of same dictionary The similarity sum of the corresponding term vector of sense word, as the similarity between each dictionary and text message, will be with the text The emotion attribute of the dictionary of the similarity maximum between this information is specifically included as the emotion attribute of the text message:
Step S1501, calculates the corresponding term vector of all words undetermined and each emotion word in first dictionary The similarity sum of the corresponding term vector of language, as the similarity between first dictionary and the text message;
Step S1502, calculates the corresponding term vector of all words undetermined and each emotion word in second dictionary The similarity sum of the corresponding term vector of language, as the similarity between second dictionary and the text message;
Step S1503, calculate first dictionary and the similarity of the text message and second dictionary with it is described Difference between the similarity of text message;
Whether step S1504, judge the difference more than 0;
When the difference is more than zero, represent the similarity between the first dictionary and text message more than the second dictionary and text Similarity between this information, execution step S1505;When the difference is less than zero, represent the second dictionary and text message it Between similarity more than similarity between the first dictionary and text message, execution step S1506.
Step S1505, judges the emotion attribute of the text message as actively;
Step S1506, judges the emotion attribute of the text message as passiveness.
The text message processing method that the present embodiment is provided be according to the corresponding term vector of each word undetermined with it is default The similarity of each the corresponding term vector of emotion word in sentiment dictionary is judging the emotion attribute of text message.Compared to existing Some methods, it is not necessary to there is the word undetermined and its corresponding emotion attribute for needing to judge in ensureing default sentiment dictionary, Reduce the requirement of the renewal speed to sentiment dictionary, it is to avoid sentiment dictionary updates causes the sentiment analysis effect poor not in time Problem, be effectively improved precision of analysis.
Second embodiment
The flow chart that Fig. 5 shows a kind of text message processing method that second embodiment of the invention is provided.Refer to figure 5, the text message processing method includes:
Step S210, obtains language material;
In the present embodiment, Sohu's news data (SogouCS) that search dog laboratory provides can be adopted, and reptile is adopted The language material as training such as the microblogging language material of technology crawl or forum's comment.
Step S220, the language material is trained using word2vec algorithms obtain correspondence table multiple training words and Each described training corresponding term vector of word;
Each word can be expressed as Distributed Representation term vector forms by Word2vec, and vectorial Similarity spatially can be used to represent the similarity on phrase semantic.Word2vec be with based on three-layer neural network from So language makes improvement based on estimating model (Neural Network Language Model, NNLM), proposes based on two Logarithm linear shape model (Log-linear Model):Continuous bag of words (continuous-bag-of-words, CBOW) model and company Continuous Skip-gram models (Continuous Skip-gram Model), this improves and causes the training speed of neutral net big It is big to improve.
Wherein, the illustraton of model of NNLM is as shown in fig. 6, as NNLM is a kind of existing model, its concrete principle here is not It is described further.Word2vec includes two kinds of training patterns of CBOW and Skip-gram.Fig. 7 shows the model support composition of CBOW models, Fig. 8 shows the model support composition of Skip-gram models.CBOW models and Skip-gram models include input layer, mapping layer And output layer.The ultimate principle of CBOW models and Skip-gram models, is not also described further herein.
In the present embodiment, CBOW models or Skip-gram models can be adopted to be trained the language material for obtaining and to obtain many Individual training word and each corresponding term vector of training word.Multiple training words and each corresponding term vector structure of training word Into correspondence table.
Step S230, obtains text message;
Step S240, carries out word segmentation processing and obtains multiple words undetermined to the text message;
The specific embodiment of step S230 and step S240 be referred to step S110 in above-mentioned first embodiment and Step S120, here is omitted.
Step S250, the instruction pre-conditioned with the consistent sexual satisfaction of word undetermined each described in the default correspondence table of lookup Practice word, using the term vector for the corresponding term vector of word being trained as the word undetermined;
Correspondence table is obtained by step S220, the correspondence table include one-to-one multiple training words and multiple words to Amount.It is in step S250, pre-conditioned setting to be needed according to user.For example, it is pre-conditioned to be:Lexical similarity is set Degree threshold value, when the acceptation similarity between two words is more than or equal to the threshold value, then judges that the similarity of two words is full Foot is pre-conditioned.Or, a synonym table is set, synonym table includes multigroup synonym, when existing in correspondence table and During current word identical training word undetermined, using the training word as bar default with the consistent sexual satisfaction of current word undetermined The training word of part;When not existing in correspondence table with current word identical training word undetermined, obtained according to synonym table The synonym of current word undetermined, will train with the synonym identical of current word undetermined in correspondence table word as with it is current The pre-conditioned training word of the consistent sexual satisfaction of word undetermined.
Using each word undetermined as current word undetermined, the concordance in lookup correspondence table with current word undetermined Meet pre-conditioned training word, and the corresponding term vector of word is trained as the word of current word undetermined using what is found Vector, so as to obtain the corresponding term vector of each word undetermined.
Step S260, calculates each the emotion word in the corresponding term vector of each word undetermined and default sentiment dictionary The similarity of corresponding term vector;
Step S270, according to each the emotion word pair in the corresponding term vector of each word undetermined and the sentiment dictionary The similarity of the term vector answered judges the emotion attribute of the text message.
The specific embodiment of step S260 and step S270 be referred to step S140 in above-mentioned first embodiment and Step S150, here is omitted.
The text message processing method that the present embodiment is provided be according to the corresponding term vector of each word undetermined with it is default The similarity of each the corresponding term vector of emotion word in sentiment dictionary is judging the emotion attribute of text message, it is not necessary to protect There is the word undetermined and its corresponding emotion attribute for needing to judge in demonstrate,proving default sentiment dictionary, reduce to sentiment dictionary The requirement of renewal speed, it is to avoid sentiment dictionary updates the problem for causing the sentiment analysis effect poor not in time, effectively improves Precision of analysis.
3rd embodiment
Fig. 9 shows a kind of functional block diagram of text message processing apparatus that third embodiment of the invention is provided.This The text message processing apparatus that embodiment is provided are can run in computer 100, for realizing the text of first embodiment offer Information processing method.Fig. 9 is referred to, the text message processing apparatus 10 that the present embodiment is provided include:Text message acquisition module 11st, word-dividing mode 12, term vector acquisition module 13, similarity calculation module 14 and emotion attribute determination module 15.
Wherein, text message acquisition module 11, for obtaining text message;
Word-dividing mode 12, obtains multiple words undetermined for word segmentation processing is carried out to the text message;
Term vector acquisition module 13, distinguishes corresponding term vector for obtaining the plurality of word undetermined;
Similarity calculation module 14, for calculating in the corresponding term vector of each word undetermined and default sentiment dictionary The similarity of each corresponding term vector of emotion word.Wherein, the sentiment dictionary includes at least two dictionaries, each institute's predicate Storehouse corresponds to a kind of emotion attribute, and each described dictionary includes at least one emotion word, each emotion word one word of correspondence Vector;
Emotion attribute determination module 15, for according in the corresponding term vector of each word undetermined and the sentiment dictionary The similarity of each corresponding term vector of emotion word judges the emotion attribute of the text message.
Further, as shown in figure 9, the emotion attribute determination module 15 includes:
Computing unit 151, for being calculated in the corresponding term vector of all words undetermined and the sentiment dictionary respectively Belong to the similarity sum of each corresponding term vector of emotion word of same dictionary, as each dictionary and the text message Between similarity;
Identifying unit 152, for using the emotion attribute of the similarity between the text message maximum dictionary as The emotion attribute of the text message.
In a kind of specific embodiment of the present embodiment, the sentiment dictionary includes the first dictionary and the second dictionary, its In, the corresponding emotion attribute of first dictionary is positive, and the corresponding emotion attribute of second dictionary is passiveness.
Now, the computing unit 151 is specifically for calculating the corresponding term vector of all words undetermined with described the The similarity sum of each corresponding term vector of emotion word in one dictionary, as first dictionary and the text message it Between similarity, calculate the corresponding term vector of all words undetermined corresponding with each emotion word in second dictionary The similarity sum of term vector, as the similarity between second dictionary and the text message.
The identifying unit 152 is specifically for calculating first dictionary and the similarity of the text message with described the Difference between the similarity of two dictionaries and the text message, when the difference is more than zero, judges the text message Emotion attribute is positive, when the difference is less than zero, judges the emotion attribute of the text message as passiveness.
In the present embodiment, each module can be that now, above-mentioned each module can be stored in computer by software code realization In 100 memorizer 120.Each module equally can be realized by hardware such as IC chip above.
Fourth embodiment
Figure 10 shows a kind of functional block diagram of text message processing apparatus that fourth embodiment of the invention is provided.This The text message processing apparatus that embodiment is provided are can run in computer 100, for realizing the text of second embodiment offer Information processing method.Figure 10 is referred to, the text message processing apparatus 20 that the present embodiment is provided include:Language material acquisition module 21, Training module 22, text message acquisition module 23, word-dividing mode 24, term vector acquisition module 25,26 and of similarity calculation module Emotion attribute determination module 27.
Wherein, language material acquisition module 21, for obtaining language material;
Training module 22, obtains the multiple of the correspondence table for being trained to the language material using word2vec algorithms Training word and each described training corresponding term vector of word;
Text message acquisition module 23, for obtaining text message;
Word-dividing mode 24, obtains multiple words undetermined for word segmentation processing is carried out to the text message;
Term vector acquisition module 25, for searching the consistent sexual satisfaction in default correspondence table with word undetermined each described Pre-conditioned training word, using the term vector for training the corresponding term vector of word as the word undetermined.Wherein, institute Stating correspondence table includes one-to-one multiple training words and multiple term vectors;
Similarity calculation module 26, for calculating in the corresponding term vector of each word undetermined and default sentiment dictionary The similarity of each corresponding term vector of emotion word.Wherein, the sentiment dictionary includes at least two dictionaries, each institute's predicate Storehouse corresponds to a kind of emotion attribute, and each described dictionary includes at least one emotion word, each emotion word one word of correspondence Vector;
Emotion attribute determination module 27, for according in the corresponding term vector of each word undetermined and the sentiment dictionary The similarity of each corresponding term vector of emotion word judges the emotion attribute of the text message.
In the present embodiment, each module can be that now, above-mentioned each module can be stored in computer by software code realization In 100 memorizer 120.Each module equally can be realized by hardware such as IC chip above.
It should be noted that each embodiment in this specification is described by the way of progressive, each embodiment weight Point explanation is all difference with other embodiment, between each embodiment identical similar part mutually referring to.
The text message processing apparatus provided by the embodiment of the present invention, which realizes the technique effect of principle and generation and aforementioned Embodiment of the method is identical, is brief description, and device embodiment part does not refer to part, refers to corresponding in preceding method embodiment Content.
In several embodiments provided herein, it should be understood that disclosed apparatus and method, it is also possible to pass through Other modes are realized.Device embodiment described above is only schematically, for example flow chart and block diagram in accompanying drawing Show the device of multiple embodiments of the invention, the architectural framework in the cards of method and computer program product, Function and operation.At this point, each square frame in flow chart or block diagram can represent the one of module, program segment or a code Part, a part for the module, program segment or code are used to realize holding for the logic function for specifying comprising one or more Row instruction.It should also be noted that at some as in the implementations replaced, the function of being marked in square frame can also be being different from The order marked in accompanying drawing occurs.For example, two continuous square frames can essentially be performed substantially in parallel, and they are sometimes Can perform in the opposite order, this is depending on involved function.It is also noted that every in block diagram and/or flow chart The combination of individual square frame and block diagram and/or the square frame in flow chart, can use the special base for performing the function or action of regulation Realize in the system of hardware, or can be realized with the combination of specialized hardware and computer instruction.
In addition, each functional module in each embodiment of the invention can integrate to form an independent portion Divide, or modules individualism, it is also possible to which two or more modules are integrated to form an independent part.
If the function is realized using in the form of software function module and as independent production marketing or when using, can be with It is stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially in other words The part contributed to prior art or the part of the technical scheme can be embodied in the form of software product, the meter Calculation machine software product is stored in a storage medium, is used including some instructions so that a computer equipment (can be individual People's computer 100, server, or network equipment etc.) perform all or part of step of each embodiment methods described of the invention Suddenly.And aforesaid storage medium includes:USB flash disk, portable hard drive, read only memory (ROM, Read-Only Memory), deposit at random Access to memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can be with the medium of store program codes. It should be noted that herein, such as first and second or the like relational terms are used merely to an entity or behaviour Make with another entity or operation make a distinction, and not necessarily require or imply these entities or operate between exist it is any this Plant actual relation or order.And, term " including ", "comprising" or its any other variant are intended to nonexcludability Include so that a series of process, method, article or equipment including key elements not only include those key elements, but also Including other key elements being not expressly set out, or also include intrinsic for this process, method, article or equipment wanting Element.In the absence of more restrictions, the key element for being limited by sentence "including a ...", it is not excluded that wanting including described The process of element, method, also there is other identical element in article or equipment.
The preferred embodiments of the present invention are the foregoing is only, the present invention is not limited to, for the skill of this area For art personnel, the present invention can have various modifications and variations.It is all within the spirit and principles in the present invention, made any repair Change, equivalent, improvement etc., should be included within the scope of the present invention.It should be noted that:Similar label and letter exist Similar terms is represented in figure below, therefore, once being defined in a certain Xiang Yi accompanying drawing, then it is not required in subsequent accompanying drawing Which is further defined and is explained.

Claims (10)

1. a kind of text message processing method, it is characterised in that methods described includes:
Obtain text message;
Word segmentation processing is carried out to the text message and obtains multiple words undetermined;
Obtain the plurality of word undetermined and distinguish corresponding term vector;
Calculate the corresponding term vector of each word undetermined term vector corresponding with each the emotion word in default sentiment dictionary Similarity, wherein, the sentiment dictionary includes at least two dictionaries, each described dictionary correspond to a kind of emotion attribute, often The individual dictionary includes at least one emotion word, each emotion word one term vector of correspondence;
According to the corresponding term vector of each word undetermined term vector corresponding with each the emotion word in the sentiment dictionary Similarity judges the emotion attribute of the text message.
2. method according to claim 1, it is characterised in that the plurality of word undetermined of the acquisition distinguishes corresponding word Vector, including:
The training word pre-conditioned with the consistent sexual satisfaction of word undetermined each described in the default correspondence table of lookup, will be described Term vector of the corresponding term vector of training word as the word undetermined, wherein, the correspondence table includes one-to-one many Individual training word and multiple term vectors.
3. method according to claim 2, it is characterised in that before the step of the acquisition text message, also include:
Obtain language material;
The language material is trained using word2vec algorithms described in the multiple training words and each for obtaining the correspondence table The corresponding term vector of training word.
4. method according to claim 1, it is characterised in that described according to the corresponding term vector of each word undetermined and institute The similarity for stating each the corresponding term vector of emotion word in sentiment dictionary judges the emotion attribute of the text message, bag Include:
Belong to each feelings of same dictionary in calculating the corresponding term vector of all words undetermined and the sentiment dictionary respectively The similarity sum of the corresponding term vector of sense word, as the similarity between each dictionary and the text message;
Belong to the emotion attribute of the similarity between the text message maximum dictionary as the emotion of the text message Property.
5. method according to claim 4, it is characterised in that the sentiment dictionary includes the first dictionary and the second dictionary, Wherein, the corresponding emotion attribute of first dictionary is positive, and the corresponding emotion attribute of second dictionary is passiveness, described point Belong to the corresponding term vector of all words undetermined and the sentiment dictionary are not calculated in each emotion word of same dictionary The similarity sum of corresponding term vector, as the similarity between each dictionary and the text message, will be with the text Emotion attribute of the emotion attribute of the dictionary of the similarity maximum between information as the text message, including:
Calculate the corresponding term vector of all words undetermined term vector corresponding with each emotion word in first dictionary Similarity sum, as the similarity between first dictionary and the text message;
Calculate the corresponding term vector of all words undetermined term vector corresponding with each emotion word in second dictionary Similarity sum, as the similarity between second dictionary and the text message;
Calculate first dictionary similar to the text message to second dictionary to the similarity of the text message Difference between degree;
When the difference is more than zero, judge the emotion attribute of the text message as actively;
When the difference is less than zero, judge the emotion attribute of the text message as passiveness.
6. a kind of text message processing apparatus, it is characterised in that described device includes:
Text message acquisition module, for obtaining text message;
Word-dividing mode, obtains multiple words undetermined for word segmentation processing is carried out to the text message;
Term vector acquisition module, distinguishes corresponding term vector for obtaining the plurality of word undetermined;
Similarity calculation module, for calculating each feelings in the corresponding term vector of each word undetermined and default sentiment dictionary The similarity of the corresponding term vector of sense word, wherein, the sentiment dictionary includes at least two dictionaries, each described dictionary correspondence In a kind of emotion attribute, each described dictionary includes at least one emotion word, each emotion word one term vector of correspondence;
Emotion attribute determination module, for according to each feelings in the corresponding term vector of each word undetermined and the sentiment dictionary The similarity of the corresponding term vector of sense word judges the emotion attribute of the text message.
7. device according to claim 6, it is characterised in that the term vector acquisition module is default specifically for searching The training word pre-conditioned with the consistent sexual satisfaction of word undetermined each described in correspondence table, will be the training word corresponding Term vector of the term vector as the word undetermined, wherein, the correspondence table includes one-to-one multiple training words and many Individual term vector.
8. device according to claim 7, it is characterised in that described device also includes:
Language material acquisition module, for obtaining language material;
Training module, for being trained the multiple training words for obtaining the correspondence table using word2vec algorithms to the language material Language and each described training corresponding term vector of word.
9. device according to claim 6, it is characterised in that the emotion attribute determination module includes:
Computing unit, belongs to same for being calculated in the corresponding term vector of all words undetermined and the sentiment dictionary respectively The similarity sum of each corresponding term vector of emotion word of dictionary, as the phase between each dictionary and the text message Like degree;
Identifying unit, for using the emotion attribute of the similarity between the text message maximum dictionary as the text The emotion attribute of information.
10. device according to claim 9, it is characterised in that the sentiment dictionary includes the first dictionary and the second dictionary, Wherein, the corresponding emotion attribute of first dictionary is positive, and the corresponding emotion attribute of second dictionary is passiveness;
The computing unit is specifically for calculating in the corresponding term vector of all words undetermined and first dictionary each The similarity sum of the corresponding term vector of emotion word, as the similarity between first dictionary and the text message, Calculate the phase of the corresponding term vector of all words undetermined term vector corresponding with each emotion word in second dictionary Like degree sum, as the similarity between second dictionary and the text message;
The identifying unit is specifically for calculating first dictionary and the similarity of the text message and second dictionary And the difference between the similarity of the text message, when the difference is more than zero, judges the emotion category of the text message Property be it is positive, when the difference be less than zero when, judge the emotion attribute of the text message as passiveness.
CN201611043882.5A 2016-11-24 2016-11-24 Text message processing method and device Pending CN106547740A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611043882.5A CN106547740A (en) 2016-11-24 2016-11-24 Text message processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611043882.5A CN106547740A (en) 2016-11-24 2016-11-24 Text message processing method and device

Publications (1)

Publication Number Publication Date
CN106547740A true CN106547740A (en) 2017-03-29

Family

ID=58394892

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611043882.5A Pending CN106547740A (en) 2016-11-24 2016-11-24 Text message processing method and device

Country Status (1)

Country Link
CN (1) CN106547740A (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169142A (en) * 2017-06-15 2017-09-15 厦门快商通科技股份有限公司 A kind of document sentiment analysis system and method automatically updated
CN107451126A (en) * 2017-08-21 2017-12-08 广州多益网络股份有限公司 A kind of near synonym screening technique and system
CN107885785A (en) * 2017-10-17 2018-04-06 北京京东尚科信息技术有限公司 Text emotion analysis method and device
CN107967258A (en) * 2017-11-23 2018-04-27 广州艾媒数聚信息咨询股份有限公司 The sentiment analysis method and system of text message
CN108052508A (en) * 2017-12-29 2018-05-18 北京嘉和美康信息技术有限公司 A kind of information extraction method and device
CN109271510A (en) * 2018-08-16 2019-01-25 龙马智芯(珠海横琴)科技有限公司 Emotion term vector construction method and system
CN109299400A (en) * 2018-09-06 2019-02-01 北京奇艺世纪科技有限公司 A kind of viewpoint abstracting method, device and equipment
CN109858004A (en) * 2019-02-12 2019-06-07 四川无声信息技术有限公司 Text Improvement, device and electronic equipment
CN109885687A (en) * 2018-12-29 2019-06-14 深兰科技(上海)有限公司 A kind of sentiment analysis method, apparatus, electronic equipment and the storage medium of text
CN109902300A (en) * 2018-12-29 2019-06-18 深兰科技(上海)有限公司 A kind of method, apparatus, electronic equipment and storage medium creating dictionary
CN110134934A (en) * 2018-02-02 2019-08-16 普天信息技术有限公司 Text emotion analysis method and device
CN110457339A (en) * 2018-05-02 2019-11-15 北京京东尚科信息技术有限公司 Data search method and device, electronic equipment, storage medium
TWI687825B (en) * 2018-12-03 2020-03-11 國立臺灣師範大學 Method and system for mapping from natural language to color combination
CN111164589A (en) * 2019-12-30 2020-05-15 深圳市优必选科技股份有限公司 Emotion marking method, device and equipment of speaking content and storage medium
CN111199148A (en) * 2019-12-26 2020-05-26 东软集团股份有限公司 Text similarity determination method and device, storage medium and electronic equipment
CN111898377A (en) * 2020-07-07 2020-11-06 苏宁金融科技(南京)有限公司 Emotion recognition method and device, computer equipment and storage medium
CN112115212A (en) * 2020-09-29 2020-12-22 中国工商银行股份有限公司 Parameter identification method and device and electronic equipment
CN112446217A (en) * 2020-11-27 2021-03-05 广州三七互娱科技有限公司 Emotion analysis method and device and electronic equipment
CN112446202A (en) * 2019-08-16 2021-03-05 阿里巴巴集团控股有限公司 Text analysis method and device
CN113807807A (en) * 2021-08-16 2021-12-17 深圳市云采网络科技有限公司 Component parameter identification method and device, electronic equipment and readable medium
CN116580402A (en) * 2023-05-26 2023-08-11 读书郎教育科技有限公司 Text recognition method and device for dictionary pen
CN117112628A (en) * 2023-09-08 2023-11-24 廊坊丛林科技有限公司 Logistics data updating method and system
CN116580402B (en) * 2023-05-26 2024-06-25 读书郎教育科技有限公司 Text recognition method and device for dictionary pen

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101634983A (en) * 2008-07-21 2010-01-27 华为技术有限公司 Method and device for text classification
CN102880600A (en) * 2012-08-30 2013-01-16 北京航空航天大学 Word semantic tendency prediction method based on universal knowledge network
CN103678278A (en) * 2013-12-16 2014-03-26 中国科学院计算机网络信息中心 Chinese text emotion recognition method
CN104462378A (en) * 2014-12-09 2015-03-25 北京国双科技有限公司 Data processing method and device for text recognition
US9075796B2 (en) * 2012-05-24 2015-07-07 International Business Machines Corporation Text mining for large medical text datasets and corresponding medical text classification using informative feature selection
CN104965822A (en) * 2015-07-29 2015-10-07 中南大学 Emotion analysis method for Chinese texts based on computer information processing technology
CN105589941A (en) * 2015-12-15 2016-05-18 北京百分点信息科技有限公司 Emotional information detection method and apparatus for web text
CN105893444A (en) * 2015-12-15 2016-08-24 乐视网信息技术(北京)股份有限公司 Sentiment classification method and apparatus

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101634983A (en) * 2008-07-21 2010-01-27 华为技术有限公司 Method and device for text classification
US9075796B2 (en) * 2012-05-24 2015-07-07 International Business Machines Corporation Text mining for large medical text datasets and corresponding medical text classification using informative feature selection
CN102880600A (en) * 2012-08-30 2013-01-16 北京航空航天大学 Word semantic tendency prediction method based on universal knowledge network
CN103678278A (en) * 2013-12-16 2014-03-26 中国科学院计算机网络信息中心 Chinese text emotion recognition method
CN104462378A (en) * 2014-12-09 2015-03-25 北京国双科技有限公司 Data processing method and device for text recognition
CN104965822A (en) * 2015-07-29 2015-10-07 中南大学 Emotion analysis method for Chinese texts based on computer information processing technology
CN105589941A (en) * 2015-12-15 2016-05-18 北京百分点信息科技有限公司 Emotional information detection method and apparatus for web text
CN105893444A (en) * 2015-12-15 2016-08-24 乐视网信息技术(北京)股份有限公司 Sentiment classification method and apparatus

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169142A (en) * 2017-06-15 2017-09-15 厦门快商通科技股份有限公司 A kind of document sentiment analysis system and method automatically updated
CN107451126A (en) * 2017-08-21 2017-12-08 广州多益网络股份有限公司 A kind of near synonym screening technique and system
CN107885785A (en) * 2017-10-17 2018-04-06 北京京东尚科信息技术有限公司 Text emotion analysis method and device
CN107967258A (en) * 2017-11-23 2018-04-27 广州艾媒数聚信息咨询股份有限公司 The sentiment analysis method and system of text message
CN107967258B (en) * 2017-11-23 2021-09-17 广州艾媒数聚信息咨询股份有限公司 Method and system for emotion analysis of text information
CN108052508A (en) * 2017-12-29 2018-05-18 北京嘉和美康信息技术有限公司 A kind of information extraction method and device
CN108052508B (en) * 2017-12-29 2021-11-09 北京嘉和海森健康科技有限公司 Information extraction method and device
CN110134934A (en) * 2018-02-02 2019-08-16 普天信息技术有限公司 Text emotion analysis method and device
CN110457339A (en) * 2018-05-02 2019-11-15 北京京东尚科信息技术有限公司 Data search method and device, electronic equipment, storage medium
CN109271510A (en) * 2018-08-16 2019-01-25 龙马智芯(珠海横琴)科技有限公司 Emotion term vector construction method and system
CN109271510B (en) * 2018-08-16 2019-07-09 龙马智芯(珠海横琴)科技有限公司 Emotion term vector construction method and system
CN109299400A (en) * 2018-09-06 2019-02-01 北京奇艺世纪科技有限公司 A kind of viewpoint abstracting method, device and equipment
TWI687825B (en) * 2018-12-03 2020-03-11 國立臺灣師範大學 Method and system for mapping from natural language to color combination
CN109902300A (en) * 2018-12-29 2019-06-18 深兰科技(上海)有限公司 A kind of method, apparatus, electronic equipment and storage medium creating dictionary
CN109885687A (en) * 2018-12-29 2019-06-14 深兰科技(上海)有限公司 A kind of sentiment analysis method, apparatus, electronic equipment and the storage medium of text
CN109858004A (en) * 2019-02-12 2019-06-07 四川无声信息技术有限公司 Text Improvement, device and electronic equipment
CN109858004B (en) * 2019-02-12 2023-08-01 四川无声信息技术有限公司 Text rewriting method and device and electronic equipment
CN112446202A (en) * 2019-08-16 2021-03-05 阿里巴巴集团控股有限公司 Text analysis method and device
CN111199148B (en) * 2019-12-26 2023-01-20 东软集团股份有限公司 Text similarity determination method and device, storage medium and electronic equipment
CN111199148A (en) * 2019-12-26 2020-05-26 东软集团股份有限公司 Text similarity determination method and device, storage medium and electronic equipment
WO2021134177A1 (en) * 2019-12-30 2021-07-08 深圳市优必选科技股份有限公司 Sentiment labeling method, apparatus and device for speaking content, and storage medium
CN111164589A (en) * 2019-12-30 2020-05-15 深圳市优必选科技股份有限公司 Emotion marking method, device and equipment of speaking content and storage medium
CN111898377A (en) * 2020-07-07 2020-11-06 苏宁金融科技(南京)有限公司 Emotion recognition method and device, computer equipment and storage medium
CN112115212A (en) * 2020-09-29 2020-12-22 中国工商银行股份有限公司 Parameter identification method and device and electronic equipment
CN112115212B (en) * 2020-09-29 2023-10-03 中国工商银行股份有限公司 Parameter identification method and device and electronic equipment
CN112446217A (en) * 2020-11-27 2021-03-05 广州三七互娱科技有限公司 Emotion analysis method and device and electronic equipment
CN112446217B (en) * 2020-11-27 2024-05-28 广州三七互娱科技有限公司 Emotion analysis method and device and electronic equipment
CN113807807A (en) * 2021-08-16 2021-12-17 深圳市云采网络科技有限公司 Component parameter identification method and device, electronic equipment and readable medium
CN116580402A (en) * 2023-05-26 2023-08-11 读书郎教育科技有限公司 Text recognition method and device for dictionary pen
CN116580402B (en) * 2023-05-26 2024-06-25 读书郎教育科技有限公司 Text recognition method and device for dictionary pen
CN117112628A (en) * 2023-09-08 2023-11-24 廊坊丛林科技有限公司 Logistics data updating method and system

Similar Documents

Publication Publication Date Title
CN106547740A (en) Text message processing method and device
CN106445998B (en) Text content auditing method and system based on sensitive words
CN106294350B (en) A kind of text polymerization and device
WO2019227710A1 (en) Network public opinion analysis method and apparatus, and computer-readable storage medium
WO2022141861A1 (en) Emotion classification method and apparatus, electronic device, and storage medium
CN107704503A (en) User's keyword extracting device, method and computer-readable recording medium
Khuc et al. Towards building large-scale distributed systems for twitter sentiment analysis
EP3179384A1 (en) Method and device for parsing interrogative sentence in knowledge base
CN106776574B (en) User comment text mining method and device
Vogel et al. Robust language identification in short, noisy texts: Improvements to liga
Jang et al. Metaphor detection in discourse
CN110096573B (en) Text parsing method and device
CN105843796A (en) Microblog emotional tendency analysis method and device
CN106649250A (en) Method and device for identifying emotional new words
Kaviani et al. Emhash: Hashtag recommendation using neural network based on bert embedding
CN111680131B (en) Document clustering method and system based on semantics and computer equipment
CN109829151B (en) Text segmentation method based on hierarchical dirichlet model
CN111488732B (en) Method, system and related equipment for detecting deformed keywords
CN113051356A (en) Open relationship extraction method and device, electronic equipment and storage medium
CN104850617A (en) Short text processing method and apparatus
CN109766447B (en) Method and device for determining sensitive information
Gao et al. Text classification research based on improved Word2vec and CNN
CN115392237B (en) Emotion analysis model training method, device, equipment and storage medium
CN115017303A (en) Method, computing device and medium for enterprise risk assessment based on news text
CN111401065A (en) Entity identification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170329