CN109902300A - A kind of method, apparatus, electronic equipment and storage medium creating dictionary - Google Patents

A kind of method, apparatus, electronic equipment and storage medium creating dictionary Download PDF

Info

Publication number
CN109902300A
CN109902300A CN201910132050.8A CN201910132050A CN109902300A CN 109902300 A CN109902300 A CN 109902300A CN 201910132050 A CN201910132050 A CN 201910132050A CN 109902300 A CN109902300 A CN 109902300A
Authority
CN
China
Prior art keywords
vocabulary
dictionary
sentiment dictionary
word
term vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910132050.8A
Other languages
Chinese (zh)
Inventor
陈海波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Deep Blue Technology Shanghai Co Ltd
Original Assignee
Deep Blue Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Deep Blue Technology Shanghai Co Ltd filed Critical Deep Blue Technology Shanghai Co Ltd
Publication of CN109902300A publication Critical patent/CN109902300A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The present embodiments relate to data processing fields, disclose a kind of method, apparatus, electronic equipment and storage medium for creating dictionary.In the section Example of the application, the method that creates dictionary, comprising: obtain the vocabulary in corpus;It is not belonging to the vocabulary of the first sentiment dictionary for each of corpus, performs the following operation respectively: determining in the first sentiment dictionary and be not belonging to the immediate word of vocabulary of the first sentiment dictionary;According to the polarity score of immediate word, the polarity score for being not belonging to the vocabulary of the first sentiment dictionary is determined;It wherein, include the polarity score of N number of word and each word in the first sentiment dictionary;N is positive integer;According to the polarity score of the vocabulary in the vocabulary and corpus in corpus, the second sentiment dictionary is created.In the realization, it can will not belong in the vocabulary write-in sentiment dictionary of sentiment dictionary, enrich the vocabulary of sentiment dictionary.

Description

A kind of method, apparatus, electronic equipment and storage medium creating dictionary
Technical field
The present embodiments relate to data processing field, in particular to a kind of method, apparatus for creating dictionary, electronic equipment And storage medium.
Background technique
Currently, having in social media largely about the user comment of products & services or evaluation, it has become user day The information source of normal decision.Due to a large amount of different opinions to some product, user may be difficult according to these comments or evaluation Sum up overall emotion.Sentiment dictionary (SentiWordNet) is considered as a kind of effective sentiment analysis lexicon. Each term in SentiWordNet is associated with the score of one group of expression its enthusiasm, passivity and neutrality.Score can It is marked with depending on the part of speech of term.It is commonly used in sentiment analysis, it be determining text emotion orientation (it is actively, passive or in Property) method set.
However, it is found by the inventors that at least there are the following problems in the prior art: currently, SentiWordNet is determining text Polar most common sentiment dictionary.However, the vocabulary in SentiWordNet is limited, this will limit the result of sentiment analysis Accuracy.
It should be noted that information is only used for reinforcing the reason to the background of the disclosure disclosed in above-mentioned background technology part Solution, therefore may include the information not constituted to the prior art known to persons of ordinary skill in the art.
Summary of the invention
Method, apparatus, electronic equipment and the storage for being designed to provide a kind of creation dictionary of embodiment of the present invention are situated between Matter, so that enriching the vocabulary of sentiment dictionary.
In order to solve the above technical problems, embodiments of the present invention provide a kind of method for creating dictionary, including following Step: the vocabulary in corpus is obtained;The vocabulary of the first sentiment dictionary is not belonging to for each of corpus, carry out respectively with Lower operation: determine in the first sentiment dictionary and be not belonging to the immediate word of vocabulary of the first sentiment dictionary;According to immediate The polarity score of word determines the polarity score for being not belonging to the vocabulary of the first sentiment dictionary;Wherein, include in the first sentiment dictionary The polarity score of N number of word and each word;N is positive integer;According to the word in the vocabulary and corpus in corpus The polarity score of remittance creates the second sentiment dictionary.
Embodiments of the present invention additionally provide a kind of device for creating dictionary, comprising: obtain module, determining module and wound Model block;Module is obtained to be used to obtain the vocabulary in corpus;Determining module is used to be not belonging to for each of corpus The vocabulary of one sentiment dictionary, performs the following operation respectively: determining in the first sentiment dictionary and be not belonging to the word of the first sentiment dictionary Converge immediate word;According to the polarity score of immediate word, the polarity for being not belonging to the vocabulary of the first sentiment dictionary is determined Score;It wherein, include the polarity score of N number of word and each word in the first sentiment dictionary;N is positive integer;Creation module For the polarity score according to the vocabulary in the vocabulary and corpus in corpus, the second sentiment dictionary is created.
Embodiments of the present invention additionally provide a kind of electronic equipment, comprising: at least one processor;And at least The memory of one processor communication connection;Wherein, memory is stored with the instruction that can be executed by least one processor, instruction It is executed by least one processor, so that at least one processor is able to carry out the side for the creation dictionary that above embodiment refers to Method.
Embodiments of the present invention additionally provide a kind of computer readable storage medium, are stored with computer program, calculate The method for the creation dictionary that above embodiment refers to is realized when machine program is executed by processor.
Embodiment of the present invention in terms of existing technologies, using closest with the vocabulary that is not belonging to the first sentiment dictionary Word polarity score, assign polarity score to be not belonging to the vocabulary of the first sentiment dictionary in corpus, increase the second feelings The word amount for feeling dictionary, enriches the second sentiment dictionary.Due to the second sentiment dictionary more horn of plenty, so that feelings of the later period to text The more accurate of result is analyzed in sense.
In addition, the immediate word of vocabulary of the first sentiment dictionary is determined in the first sentiment dictionary and is not belonging to, it is specific to wrap It includes: determining the second of each word of the first term vector and the first sentiment dictionary that are not belonging to the vocabulary of the first sentiment dictionary Term vector;Determine each second term vector respectively the distance between with the first term vector;It will be nearest with the first term vector distance Word corresponding to second term vector, as with the immediate word of vocabulary that is not belonging to the first sentiment dictionary.In the realization, root The immediate word of vocabulary for determining and being not belonging to the first sentiment dictionary according to the distance between term vector, makes it possible to from various dimensions Consider to be not belonging to the vocabulary of the first sentiment dictionary and the similitude of the word in the first sentiment dictionary.
In addition, determining that each second term vector respectively the distance between with the first term vector, specifically includes: for each the Two term vectors, perform the following operation respectively: calculating the second term vector at a distance from the first term vector according to formula a;Wherein, formula A are as follows:
Wherein, aiIndicate that the second term vector, j indicate the first term vector, ‖ A ‖[F]Indicate the first term vector and the second term vector The distance between, abs is ABS function.
In addition, determining each of the first term vector for being not belonging to the vocabulary of the first sentiment dictionary and first sentiment dictionary Second term vector of word, specifically includes: using term vector model Word2VEC method, determines the first term vector and second Term vector.In the realization, a word fast and effeciently can be expressed as by vector form by Word2VEC method, improved The processing speed of electronic equipment.
In addition, determining the polarity point for being not belonging to the vocabulary of the first sentiment dictionary according to the polarity score of immediate word Number, specifically includes: the polarity score by the polarity score of immediate word, as the vocabulary for being not belonging to the first sentiment dictionary.
In addition, obtaining the vocabulary in corpus, specifically include: using participle tool, extracting in each sentence of corpus Vocabulary.
In addition, creating the second emotion word according to the polarity score of the vocabulary in the vocabulary and corpus in corpus Allusion quotation specifically includes: according to being not belonging to the vocabulary of the first sentiment dictionary, and being not belonging to the polarity point of the vocabulary of the first sentiment dictionary Number updates the first sentiment dictionary, obtains the second sentiment dictionary.
Detailed description of the invention
One or more embodiments are illustrated by the picture in corresponding attached drawing, these exemplary theorys The bright restriction not constituted to embodiment, the element in attached drawing with same reference numbers label are expressed as similar element, remove Non- to have special statement, composition does not limit the figure in attached drawing.
Fig. 1 is the flow chart of the method for the creation dictionary of first embodiment of the invention;
Fig. 2 be in the first sentiment dictionary of determination of second embodiment of the present invention be not belonging to the word of the first sentiment dictionary Converge immediate word method flow chart;
Fig. 3 is the structural schematic diagram of the device of the creation dictionary of third embodiment of the present invention;
Fig. 4 is the structural schematic diagram of the electronic equipment of the 4th embodiment of the invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with attached drawing to the present invention Each embodiment be explained in detail.However, it will be understood by those skilled in the art that in each embodiment party of the present invention In formula, in order to make the reader understand this application better, many technical details are proposed.But even if without these technical details And various changes and modifications based on the following respective embodiments, the application technical solution claimed also may be implemented.
The first embodiment of the present invention is related to a kind of methods for creating dictionary, are applied to electronic equipment, for example, computer, The various mobile terminals such as mobile phone.As shown in Figure 1, the sentiment analysis method the following steps are included:
Step 101: obtaining the vocabulary in corpus.
Specifically, participle tool can be used in electronic equipment, for example, stammerer participle, extracts each sentence in corpus The vocabulary of son.
In one example, electronic equipment first pre-processes corpus, example before extracting the vocabulary in corpus Such as, the text in corpus is split as multiple sentences, removes the punctuation mark in each sentence, and/or, nonsensical helps Word etc..
It should be noted that it will be understood by those skilled in the art that present embodiment by way of example only, practical application In, it can according to need the tool selected for extracting the vocabulary in corpus, present embodiment, which does not limit, to be extracted in corpus Vocabulary method.
Step 102: it is not belonging to the vocabulary of the first sentiment dictionary for each of corpus, performs the following operation respectively: Determine in the first sentiment dictionary and be not belonging to the immediate word of vocabulary of the first sentiment dictionary;According to the pole of immediate word Property score, determine be not belonging to the first sentiment dictionary vocabulary polarity score.
Specifically, including the polarity score of N number of word and each word in the first sentiment dictionary;N is positive integer. Wherein, polarity score is referred to as emotion score.Wherein, the polarity score of word may include: list in the first sentiment dictionary The positive score of each meaning of word and passive score, the positive score of each meaning of word subtract the pole that passive score obtains Any one in the average value of the polarity score of the corresponding meaning of each part of speech of property score and word or any combination.
In one example, the first sentiment dictionary is sentiwordnet, part of speech including its each word recorded, The identity of word, positive score, passive score, the meaning of word and synonym of word etc..
In one example, the polarity score of the word in the first sentiment dictionary includes that each part of speech of word corresponding contains The average value of the polarity score of justice.Electronic equipment is directed to each part of speech of the word, performs the following operation respectively: determining the part of speech The polarity score of corresponding each meaning;Wherein, the polarity score of each meaning of word is equal to the word under the meaning Positive score subtracts passive score.The average value for calculating the polarity score of the corresponding each meaning of the part of speech, as the word The polarity score of the part of speech;Electronic equipment records the polarity score of each part of speech of the word.
For example, both can be used as verb there are vocabulary " statement " in corpus, can also be used as noun.Electronic equipment needle To the verb of " statement ", the average value of " statement " as the polarity score of meaning corresponding to verb is determined, by " statement " conduct The average value of the polarity score of meaning corresponding to verb, the polarity score as " statement " as verb.Electronic equipment is directed to The noun of " statement ", determines the average value of " statement " as the polarity score of meaning corresponding to noun, and " statement " is used as name The average value of the polarity score of meaning corresponding to word, the polarity score as " statement " as noun.
In one example, the polarity score of immediate word is not belonging to the first emotion word as this by electronic equipment The polarity score of the vocabulary of allusion quotation.Wherein, the determination method of the polarity score of immediate word can be with reference to the phase in step 102 Description is closed, details are not described herein again.
It should be noted that it will be understood by those skilled in the art that can also with the polarity score of immediate word and not Belong to the corresponding relationship of the polarity score of the first sentiment dictionary, present embodiment does not limit electronic equipment according to immediate word Polarity score, determine be not belonging to the first sentiment dictionary vocabulary polarity score method.
Step 103: according to the polarity score of the vocabulary in the vocabulary and corpus in corpus, creating the second emotion Dictionary.
In one example, electronic equipment is according to being not belonging to the vocabulary of the first sentiment dictionary, and is not belonging to the first emotion The polarity score of the vocabulary of dictionary updates the first sentiment dictionary, obtains the second sentiment dictionary.Specifically, electronic equipment is by language Material is not belonging to the vocabulary of the first sentiment dictionary and the polarity score of the vocabulary in library, is added in the first emotion, obtains second Sentiment dictionary.
In another example, electronic equipment determines corpus according to the polarity score of the word in the first sentiment dictionary In belong to the first sentiment dictionary vocabulary polarity score, according to the polarity score of vocabulary and vocabulary in corpus, creation New sentiment dictionary, i.e. the second sentiment dictionary, the sentiment dictionary only include the vocabulary for belonging to the corpus.
It is noted that electronic equipment creation is directed to the sentiment dictionary of a certain corpus, so that reducing analysis text The word quantity of required traversal, improves the speed of sentiment analysis when this emotion.
For example, the text in corpus is the comment text of product, so that the second sentiment dictionary is for product review Sentiment dictionary.Second sentiment dictionary eliminates in the first sentiment dictionary for the first sentiment dictionary for describing wind The incoherent words such as the word of scape, the word for describing personage use the feelings of the comment text of the second sentiment dictionary analysis product The speed of sense is higher than the speed of the emotion of the comment text using the first sentiment dictionary analysis product.
It should be noted that the above is only limit for example, not constituting to technical solution of the present invention.
Compared with prior art, provided in present embodiment creation dictionary method, using be not belonging to the first emotion The polarity score of the immediate word of the vocabulary of dictionary assigns polarity point to be not belonging to the vocabulary of the first sentiment dictionary in corpus Number, increases the word amount of the second sentiment dictionary, enriches the second sentiment dictionary.Due to the second sentiment dictionary more horn of plenty, make The later period to the more accurate of the sentiment analysis result of text.
Second embodiment of the present invention is related to a kind of method for creating dictionary, and present embodiment is to first embodiment Further refinement, specifically illustrate: determining immediate with the vocabulary that is not belonging to the first sentiment dictionary in the first sentiment dictionary The method of word.
Specifically, as shown in Fig. 2, determining closest with the vocabulary that is not belonging to the first sentiment dictionary in the first sentiment dictionary The method of word include following sub-step:
Step 201: determining the every of the first term vector and the first sentiment dictionary for being not belonging to the vocabulary of the first sentiment dictionary Second term vector of a word.
In one example, electronic equipment uses term vector model Word2VEC method, determines the first term vector, Yi Ji Two term vectors.Assuming that the vocabulary of the vocabulary composition extracted from corpus is T, T={ t1, t2, t3...tn}。Word2Vec Skip-Gram model in model can predict context according to the word of input.Therefore, it is using Skip-gram model Given t can be calculatediThe probability distribution of other terms in context, to obtain vocabulary tiVector indicate, i.e., by vocabulary tiWord Vector.Particularly, tiBy term vectorIt indicates, vectorMiddle each single item numerical value is respectively the probability of other vocabulary in vocabulary T Value.
It should be noted that it will be understood by those skilled in the art that other methods can be used and determine in practical application One term vector and the second term vector, are not listed one by one herein, present embodiment do not limit determine the first term vector and the second word to The method of amount.
It is noted that a word fast and effeciently can be expressed as by vector form by Word2VEC method, Improve the processing speed of electronic equipment.
Step 202: determining each second term vector respectively the distance between with the first term vector.
In one example, electronic equipment determines each second term vector respectively side with the distance between the first term vector Method are as follows: be directed to each second term vector, perform the following operation respectively: the second term vector and the first term vector are calculated according to formula a Distance;Wherein, formula a are as follows:
Wherein, aiIndicate that the second term vector, j indicate the first term vector, ‖ A ‖[F]Indicate the first term vector and the second term vector The distance between, abs is ABS function.
It should be noted that it will be understood by those skilled in the art that in practical application, if the first term vector and the second word to The dimension of amount is inconsistent, and the method that can be aligned by dimension so that the first term vector and the second term vector dimension are consistent, then is led to The various methods for seeking the distance between vector are crossed, the distance between the first term vector and the second term vector are solved.
It should be noted that it will be understood by those skilled in the art that can also be determined by other means in practical application The distance of first term vector and the second term vector, or perhaps similarity, do not repeat one by one herein, and present embodiment does not limit really The method of the similarity or distance of fixed first term vector and the second term vector.
It is noted that being determined according to the distance between term vector closest with the vocabulary that is not belonging to the first sentiment dictionary Word, make it possible to from various dimensions consider be not belonging to the first sentiment dictionary vocabulary and the word in the first sentiment dictionary phase Like property.
It should be noted that it will be understood by those skilled in the art that electronic equipment can also be with its other party in practical application Formula determines and is not belonging to the immediate word of vocabulary of the first sentiment dictionary, present embodiment do not limit electronic equipment determine with not Belong to the method for the immediate word of vocabulary of the first sentiment dictionary.
Step 203: by word corresponding to the second term vector nearest with the first term vector distance, as be not belonging to the The immediate word of the vocabulary of one sentiment dictionary.
It, can be with if the distance between two term vectors are close specifically, since term vector is the mapping of word to space Both think that meaning is close, therefore, can by word corresponding to the second term vector nearest with the first term vector distance, as With the immediate word of vocabulary for being not belonging to the first sentiment dictionary.
It should be noted that it will be understood by those skilled in the art that in practical application, in addition to two vocabulary pair can be passed through The distance for the term vector answered judges the degree of approach of two vocabulary, can also judge the close of two vocabulary by other means Degree, is not listed one by one herein, and present embodiment does not limit the word for determining in the first sentiment dictionary and being not belonging to the first sentiment dictionary Converge the method for immediate word.
It should be noted that the above is only limit for example, not constituting to technical solution of the present invention.
Compared with prior art, provided in present embodiment creation dictionary method, using be not belonging to the first emotion The polarity score of the immediate word of the vocabulary of dictionary assigns polarity point to be not belonging to the vocabulary of the first sentiment dictionary in corpus Number, increases the word amount of the second sentiment dictionary, enriches the second sentiment dictionary.Due to the second sentiment dictionary more horn of plenty, make The later period to the more accurate of the sentiment analysis result of text.In addition to this, it is determined and is not belonged to according to the distance between term vector In the immediate word of the vocabulary of the first sentiment dictionary, make it possible to consider the vocabulary for being not belonging to the first sentiment dictionary from various dimensions With the similitude of the word in the first sentiment dictionary.
The step of various methods divide above, be intended merely to describe it is clear, when realization can be merged into a step or Certain steps are split, multiple steps are decomposed into, as long as including identical logical relation, all in the protection scope of this patent It is interior;To adding inessential modification in algorithm or in process or introducing inessential design, but its algorithm is not changed Core design with process is all in the protection scope of the patent.
Third embodiment of the present invention is related to a kind of device for creating dictionary, as shown in Figure 3, comprising: obtains module 301, determining module 302 and creation module 303.Module 301 is obtained to be used to obtain the vocabulary in corpus;Determining module 302 is used In the vocabulary for being not belonging to the first sentiment dictionary for each of corpus, performs the following operation respectively: determining the first emotion word In allusion quotation be not belonging to the immediate word of vocabulary of the first sentiment dictionary;According to the polarity score of immediate word, determine not Belong to the polarity score of the vocabulary of the first sentiment dictionary;It wherein, include N number of word and each word in the first sentiment dictionary Polarity score;N is positive integer;Creation module 303 is used for the pole according to the vocabulary in the vocabulary and corpus in corpus Property score, create the second sentiment dictionary.
It is not difficult to find that present embodiment is system embodiment corresponding with first embodiment, present embodiment can be with First embodiment is worked in coordination implementation.The relevant technical details mentioned in first embodiment still have in the present embodiment Effect, in order to reduce repetition, which is not described herein again.Correspondingly, the relevant technical details mentioned in present embodiment are also applicable in In first embodiment.
It is noted that each module involved in present embodiment is logic module, and in practical applications, one A logic unit can be a physical unit, be also possible to a part of a physical unit, can also be with multiple physics lists The combination of member is realized.In addition, in order to protrude innovative part of the invention, it will not be with solution institute of the present invention in present embodiment The technical issues of proposition, the less close unit of relationship introduced, but this does not indicate that there is no other single in present embodiment Member.
4th embodiment of the invention is related to a kind of electronic equipment, as shown in figure 4, including at least one processor 401; And the memory 402 with the communication connection of at least one processor 401;Wherein, be stored with can be by least one for memory 402 The instruction that processor 401 executes, instruction is executed by least one processor 401, so that at least one processor 401 is able to carry out The method for the creation dictionary that above embodiment refers to.
The electronic equipment includes: one or more processors 401 and memory 402, with a processor 401 in Fig. 4 For.Processor 401, memory 402 can be connected by bus or other modes, in Fig. 4 for being connected by bus. Memory 402 is used as a kind of non-volatile computer readable storage medium storing program for executing, can be used for storing non-volatile software program, non-volatile Property computer executable program and module.The non-volatile software journey that processor 401 is stored in memory 402 by operation Sequence, instruction and module realize the side of above-mentioned creation dictionary thereby executing the various function application and data processing of equipment Method.
Memory 402 may include storing program area and storage data area, wherein storing program area can store operation system Application program required for system, at least one function;It storage data area can the Save option list etc..In addition, memory 402 can be with It can also include nonvolatile memory, for example, at least disk memory, a flash memory including high-speed random access memory Device or other non-volatile solid state memory parts.In some embodiments, it includes relative to processing that memory 402 is optional The remotely located memory of device 401, these remote memories can pass through network connection to external equipment.The example of above-mentioned network Including but not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
One or more module is stored in memory 402, when being executed by one or more processor 401, is held The method of creation dictionary in the above-mentioned any means embodiment of row.
The said goods can be performed the application embodiment provided by method, have the corresponding functional module of execution method and Beneficial effect, the not technical detail of detailed description in the present embodiment, reference can be made to method provided by the application embodiment.
5th embodiment of the invention is related to a kind of computer readable storage medium, is stored with computer program.It calculates Machine program realizes above method embodiment when being executed by processor.
That is, it will be understood by those skilled in the art that implement the method for the above embodiments be can be with Relevant hardware is instructed to complete by program, which is stored in a storage medium, including some instructions are to make It obtains an equipment (can be single-chip microcontroller, chip etc.) or processor (processor) executes side described in each embodiment of the application The all or part of the steps of method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. are various can store journey The medium of sequence code.
It will be understood by those skilled in the art that the respective embodiments described above are to realize specific embodiments of the present invention, And in practical applications, can to it, various changes can be made in the form and details, without departing from the spirit and scope of the present invention.

Claims (10)

1. a kind of method for creating dictionary characterized by comprising
Obtain the vocabulary in corpus;
It is not belonging to the vocabulary of first sentiment dictionary for each of described corpus, performs the following operation respectively: determining In first sentiment dictionary with the immediate word of vocabulary for being not belonging to first sentiment dictionary;It is most connect according to described The polarity score of close word, determine described in be not belonging to first sentiment dictionary vocabulary polarity score;Wherein, described It include the polarity score of N number of word and each word in one sentiment dictionary;N is positive integer;
According to the polarity score of the vocabulary in the vocabulary and the corpus in the corpus, the second sentiment dictionary is created.
2. according to claim 1 create dictionary method, which is characterized in that in determination first sentiment dictionary with The immediate word of vocabulary for being not belonging to first sentiment dictionary, specifically includes:
Be not belonging to described in determination the vocabulary of first sentiment dictionary the first term vector and first sentiment dictionary it is every Second term vector of a word;
Determine each second term vector respectively the distance between with first term vector;
By word corresponding to the second term vector nearest with first term vector distance, described the is not belonging to as with described The immediate word of the vocabulary of one sentiment dictionary.
3. the method for creation dictionary according to claim 2, which is characterized in that each second term vector of determination Respectively the distance between with first term vector, specifically include:
It for each second term vector, performs the following operation respectively: calculating second term vector and described first according to formula a The distance of term vector;Wherein, formula a are as follows:
Wherein, aiIndicate that second term vector, j indicate first term vector, | | A | |[F]Indicate first term vector with The distance between described second term vector, abs is ABS function.
4. the method for creation dictionary according to claim 2, which is characterized in that be not belonging to described first described in the determination Second term vector of each word of the first term vector of the vocabulary of sentiment dictionary and first sentiment dictionary, it is specific to wrap It includes:
Using term vector model Word2VEC method, first term vector and second term vector are determined.
5. the method for creation dictionary according to claim 1, which is characterized in that described according to the immediate word Polarity score, determine described in be not belonging to first sentiment dictionary vocabulary polarity score, specifically include:
Polarity point by the polarity score of the immediate word, as the vocabulary for being not belonging to first sentiment dictionary Number.
6. the method for creation dictionary according to claim 1, which is characterized in that the vocabulary obtained in corpus, tool Body includes:
Using participle tool, the vocabulary in each sentence of the corpus is extracted.
7. the method for creation dictionary according to any one of claim 1 to 6, which is characterized in that described according to institute's predicate Expect the polarity score of the vocabulary in library and the vocabulary in the corpus, create the second sentiment dictionary, specifically include:
According to the vocabulary for being not belonging to first sentiment dictionary and the vocabulary for being not belonging to first sentiment dictionary Polarity score, update first sentiment dictionary, obtain second sentiment dictionary.
8. a kind of device for creating dictionary characterized by comprising obtain module, determining module and creation module;
The module that obtains is used to obtain the vocabulary in corpus;
The determining module is used to be not belonging to the vocabulary of first sentiment dictionary for each of described corpus, respectively into The following operation of row: determine in first sentiment dictionary with the immediate list of vocabulary for being not belonging to first sentiment dictionary Word;According to the polarity score of the immediate word, the polarity of the vocabulary of first sentiment dictionary is not belonging to described in determination Score;It wherein, include the polarity score of N number of word and each word in first sentiment dictionary;N is positive integer;
The creation module is used for the polarity score according to the vocabulary in the vocabulary and the corpus in the corpus, Create the second sentiment dictionary.
9. a kind of electronic equipment characterized by comprising at least one processor;And
The memory being connect at least one described processor communication;Wherein, be stored with can be by described at least one for the memory The instruction that a processor executes, described instruction is executed by least one described processor, so that at least one described processor energy The method of enough creation dictionaries executed as described in any one of claims 1 to 7.
10. a kind of computer readable storage medium, is stored with computer program, which is characterized in that the computer program is located Reason device realizes the method that dictionary is created described in any one of claims 1 to 7 when executing.
CN201910132050.8A 2018-12-29 2019-02-22 A kind of method, apparatus, electronic equipment and storage medium creating dictionary Pending CN109902300A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811633396 2018-12-29
CN2018116333968 2018-12-29

Publications (1)

Publication Number Publication Date
CN109902300A true CN109902300A (en) 2019-06-18

Family

ID=66945281

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910132050.8A Pending CN109902300A (en) 2018-12-29 2019-02-22 A kind of method, apparatus, electronic equipment and storage medium creating dictionary

Country Status (1)

Country Link
CN (1) CN109902300A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102866989A (en) * 2012-08-30 2013-01-09 北京航空航天大学 Viewpoint extracting method based on word dependence relationship
CN102880600A (en) * 2012-08-30 2013-01-16 北京航空航天大学 Word semantic tendency prediction method based on universal knowledge network
CN106547740A (en) * 2016-11-24 2017-03-29 四川无声信息技术有限公司 Text message processing method and device
CN107729374A (en) * 2017-09-13 2018-02-23 厦门快商通科技股份有限公司 A kind of extending method of sentiment dictionary and text emotion recognition methods
CN108108433A (en) * 2017-12-19 2018-06-01 杭州电子科技大学 A kind of rule-based and the data network integration sentiment analysis method
CN108376133A (en) * 2018-03-21 2018-08-07 北京理工大学 The short text sensibility classification method expanded based on emotion word
CN108647191A (en) * 2018-05-17 2018-10-12 南京大学 It is a kind of based on have supervision emotion text and term vector sentiment dictionary construction method
US20180307677A1 (en) * 2017-04-20 2018-10-25 Ford Global Technologies, Llc Sentiment Analysis of Product Reviews From Social Media

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102866989A (en) * 2012-08-30 2013-01-09 北京航空航天大学 Viewpoint extracting method based on word dependence relationship
CN102880600A (en) * 2012-08-30 2013-01-16 北京航空航天大学 Word semantic tendency prediction method based on universal knowledge network
CN106547740A (en) * 2016-11-24 2017-03-29 四川无声信息技术有限公司 Text message processing method and device
US20180307677A1 (en) * 2017-04-20 2018-10-25 Ford Global Technologies, Llc Sentiment Analysis of Product Reviews From Social Media
CN107729374A (en) * 2017-09-13 2018-02-23 厦门快商通科技股份有限公司 A kind of extending method of sentiment dictionary and text emotion recognition methods
CN108108433A (en) * 2017-12-19 2018-06-01 杭州电子科技大学 A kind of rule-based and the data network integration sentiment analysis method
CN108376133A (en) * 2018-03-21 2018-08-07 北京理工大学 The short text sensibility classification method expanded based on emotion word
CN108647191A (en) * 2018-05-17 2018-10-12 南京大学 It is a kind of based on have supervision emotion text and term vector sentiment dictionary construction method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
(美)里彻特等: "《机器学习系统设计》", 31 July 2014, 人民邮电出版社 *
WEIXIN_30852451: "sentiwordnet的简单使用", 《HTTPS://BLOG.CSDN.NET/WEIXIN_30852451/ARTICLE/DETAILS/97794984》 *
尹路通等: "融合评论分析和隐语义模型的视频推荐算法", 《计算机应用》 *

Similar Documents

Publication Publication Date Title
Wang et al. Tree-structured regional CNN-LSTM model for dimensional sentiment analysis
CN108363790B (en) Method, device, equipment and storage medium for evaluating comments
CN106202010B (en) Method and apparatus based on deep neural network building Law Text syntax tree
Poliak A survey on recognizing textual entailment as an NLP evaluation
Orosz et al. PurePos 2.0: a hybrid tool for morphological disambiguation
CN109271493A (en) A kind of language text processing method, device and storage medium
CN109815333A (en) Information acquisition method, device, computer equipment and storage medium
Drovo et al. Named entity recognition in Bengali text using merged hidden Markov model and rule base approach
CN108108468A (en) A kind of short text sentiment analysis method and apparatus based on concept and text emotion
JP6729095B2 (en) Information processing device and program
Kuriyozov et al. Cross-lingual word embeddings for Turkic languages
CN112185361B (en) Voice recognition model training method and device, electronic equipment and storage medium
Jayaweera et al. Hidden markov model based part of speech tagger for sinhala language
Bergmann et al. Modeling the production of coverbal iconic gestures by learning bayesian decision networks
CN109284389A (en) A kind of information processing method of text data, device
Ding et al. Generative text summary based on enhanced semantic attention and gain-benefit gate
CN110110083A (en) A kind of sensibility classification method of text, device, equipment and storage medium
Kadim et al. Parallel HMM-based approach for arabic part of speech tagging.
Wang et al. Experiment on automatic functional requirements analysis with the EFRF's semantic cases
CN109885687A (en) A kind of sentiment analysis method, apparatus, electronic equipment and the storage medium of text
CN109934347A (en) Extend the device of question and answer knowledge base
CN109902300A (en) A kind of method, apparatus, electronic equipment and storage medium creating dictionary
Goertzel et al. Guiding symbolic natural language grammar induction via transformer-based sequence probabilities
Boonpa et al. Relationship extraction from Thai children's tales for generating illustration
CN109933788A (en) Type determines method, apparatus, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190618