CN104699766A - Implicit attribute mining method integrating word correlation and context deduction - Google Patents

Implicit attribute mining method integrating word correlation and context deduction Download PDF

Info

Publication number
CN104699766A
CN104699766A CN201510082519.3A CN201510082519A CN104699766A CN 104699766 A CN104699766 A CN 104699766A CN 201510082519 A CN201510082519 A CN 201510082519A CN 104699766 A CN104699766 A CN 104699766A
Authority
CN
China
Prior art keywords
word
attribute
notional
emotion
array
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510082519.3A
Other languages
Chinese (zh)
Other versions
CN104699766B (en
Inventor
张宇
刘妙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201510082519.3A priority Critical patent/CN104699766B/en
Publication of CN104699766A publication Critical patent/CN104699766A/en
Application granted granted Critical
Publication of CN104699766B publication Critical patent/CN104699766B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Abstract

The invention discloses an implicit attribute mining method integrating word correlation and context deduction. The implicit attribute mining method comprises the following steps: establishing a corpus, and according to the corpus, establishing a reference comment dataset, an attribute word dictionary, a sentiment word dictionary, a notional word dictionary, an attribute word-sentiment word modification matrix and an attribute word-notional word co-occurrence matrix of a current product category; according to the established reference comment dataset, the established attribute word dictionary, the established sentiment word dictionary, the established notional word dictionary, the established attribute word-sentiment word modification matrix and the established attribute word-notional word co-occurrence matrix, in combination with the context of a clause, sequentially mining clauses, which require implicit attribute mining, in a comment dataset to be analyzed, so as to obtain implicit attribute mining results. In the implicit attribute mining method, two different word correlations, namely attribute word-sentiment word modification relation and an attribute word-notional word co-occurrence relation, are comprehensively utilized, and the context of the clause is used for deducing, so that the implicit attribute mining accurate rate is greatly improved.

Description

A kind of implicit attribute method for digging merging word association relation and context of co-text deduction
Technical field
The present invention relates to data mining technology field, be specifically related to a kind of implicit attribute method for digging merging word association relation and context of co-text deduction.
Background technology
In opining mining field, the excavation of attribute word and emotion word excavation are two basic subtasks.Excavated by attribute word, Classifying Sum can be carried out to User Perspective, thus provide better decision support for user.At present, the attribute word digging technology of used for products comment, is mainly divided into explicit attribute to excavate and implicit attribute excavates two large classes.Explicit attribute excavates relatively simple, and scholars have carried out a large amount of research work.Implicit attribute excavates then very complicated, and current correlative study work is less.
In implicit attribute excavation, the people such as Liu propose and set up mapping between product attribute and property value by the mode of rule digging (rule mining) in document " Opinion observer:analyzing andcomparing opinions on the Web ", such as " heavy " is mapped to attribute " weight ", " big " is mapped to attribute " size ", is then carried out the excavation of implicit attribute by above-mentioned mapping relations.But the foundation of mapping ruler needs certain artificial mark, therefore, the accuracy rate that implicit attribute excavates is limited to the quality and quantity of rule mark.In addition, for new field, mapping ruler needs to re-start artificial mark, and time cost is high and accuracy rate is also difficult to guarantee.
The people such as Su propose a kind of implicit attribute method for digging based on attribute word and emotion word cooccurrence relation in document " Hidden sentiment association in Chinese Web opinionmining ", on attribute word and emotion word, the algorithm of cluster is strengthened in application mutually iteratively, obtain attribute word bunch and emotion word bunch, thus the incidence relation between single attribute word and single emotion word is expanded to the incidence relation between attribute word bunch and emotion word bunch.But their method does not consider the incidence relation between other word outside emotion word and attribute word.
The people such as Chou Guang propose a kind of implicit attribute method for digging based on regularization theme modeling (regularized topic modeling) thought in document " the implicit expression product attribute based on the modeling of regularization theme extracts ".Under the prerequisite not needing priori, realize the excavation of implicit attribute according to attribute related term, but the method does not consider the context of co-text of comment subordinate sentence.
Summary of the invention
For the deficiencies in the prior art, the present invention proposes a kind of implicit attribute method for digging merging word association relation and context of co-text deduction.
Merge an implicit attribute method for digging for word association relation and context of co-text deduction, comprise the steps:
(1) corpus is built, and reference the comment data collection of the current product series products of building of corpus, attribute word dictionary, emotion word dictionary, notional word dictionary, attribute word-emotion word modification matrix and the attribute word-notional word co-occurrence matrix described in utilizing, specific as follows:
(1-1) obtain the comment data of different product series products, and pre-service is carried out to the comment data obtained;
Detailed process is as follows:
(1-11) to the standardization processing of comment data: the complex form of Chinese characters in comment data is converted to simplified Chinese character, identify wrongly written or mispronounced characters wherein and correct, and the comment statement that there is mess code and the foreign language word that comprises None-identified is deleted;
(1-12) comment spam is filtered: utilize regular expression to containing No. QQ, cell-phone number, the information such as website comment statement filter;
(1-13) Chinese word segmentation and part-of-speech tagging are carried out to comment data, then carry out stop words filtration, finally delete in the whole text without punctuate and the long comment statement of subordinate sentence.
(1-2) pretreated comment data is utilized to build corpus;
The corpus built in the present invention is interpreted as the set of all pretreated comment data.
(1-3) for the product of current category, using the reference comment data collection of the comment data of product series products current in corpus as current product series products, and build the attribute word dictionary of current product series products, emotion word dictionary and notional word dictionary based on described reference comment data collection;
The present invention builds attribute word dictionary, emotion word dictionary and notional word dictionary according to each attribute word, emotion word and notional word in the appearance situation that described reference comment data is concentrated, specific as follows:
A () builds attribute word dictionary by following operation:
According to described reference comment data collection, the method for bidirectional iteration is utilized to build initial attribute word word set F and initial emotion word word set O:
For any one the attribute word in initial attribute word word set F, according to this attribute word in the occurrence number concentrated with reference to comment data, following formulae discovery is utilized to go out the TF-IDF weights of each attribute word in initial attribute word word set F:
w i T = tf i × idf i = tf i × log ( N n i )
Wherein, for i-th attribute word f in initial attribute word word set F itF-IDF weights, 1≤i≤n f, n ffor the number of attribute word in initial attribute word word set F.Tf ifor attribute word f iin the normalization word frequency concentrated with reference to comment data, (normalization word frequency is attribute word f ioccurrence number and the ratio concentrating all notional word occurrence numbers with reference to comment data is being concentrated) with reference to comment data; idf ifor comprising attribute word f in corpus ithe inverse of comment data quantity, i.e. inverse document frequency; N is the total quantity of all category product review data in described corpus, n ifor comprising attribute word f in described corpus ithe total quantity of comment data.
Attribute word TF-IDF weights being greater than first threshold screens, and constructs domain attribute word word set, and then from initial attribute word word set F remaining attribute word, artificial screening goes out the larger attribute word of 20 ~ 30 word frequency, constructs public attribute word word set;
Described domain attribute word word set and public attribute word word set are merged (namely asking union), constructs attribute word dictionary.
The present invention, according to the TF-IDF weights of attribute word each in initial attribute word word set F, can filter out that discrimination is high, the distinctive attribute word in field.
The value of first threshold directly has influence on the structure of domain attribute word word set, and as preferably, described first threshold is 0.01 ~ 0.02, and further preferably, described first threshold is 0.015.
Optimally, from initial attribute word word set F remaining attribute word, the composition public attribute word word set that 25 word frequency are larger is selected.During specific implementation, sorted from high to low by remaining for initial attribute word word set F attribute word according to word frequency, artificial screening goes out 25 word frequency is higher and the attribute word that field is general constructs public attribute word word set.
B () builds emotion word dictionary by following operation:
Utilize and know that " the sentiment analysis word collection " of net, " the emotion vocabulary ontology library " of Dalian University of Technology and initial emotion word word set O carry out intersection screening, construct emotion word dictionary.
C () builds notional word dictionary by following operation:
Reference comment data described in statistics is concentrated the word frequency of all notional words and by descending sort, is filtered out the notional word that word frequency is greater than Second Threshold, constructs notional word dictionary.
As preferably, Second Threshold is 50.
(1-4) based on described reference comment data collection, the attribute word dictionary described in utilization, emotion word dictionary and notional word dictionary creation attribute word-emotion word modifies matrix and attribute word-notional word co-occurrence matrix;
Described attribute word-emotion word value of modifying in matrix represents that any one attribute word and any one emotion word are at the number of times concentrating co-occurrence with reference to comment data, and the value in described attribute word-notional word co-occurrence matrix represents that any one attribute word and any one notional word are at the number of times concentrating co-occurrence with reference to comment data.
Structure attribute word-emotion word modifies matrix and attribute word-notional word co-occurrence matrix specifically comprises following operation:
(1-41) reference the comment data collection described in traversal, attribute word dictionary, emotion word dictionary and notional word dictionary described in utilization, to occurred the subordinate sentence of attribute word, extract attribute word-emotion word modification to attribute word-notional word co-occurrence pair;
(1-42) right according to the attribute word extracted-emotion word modification, build attribute word-emotion word and modify matrix; According to the attribute word-notional word co-occurrence pair extracted, build attribute word-notional word co-occurrence matrix.
Extracting attribute word-emotion word in the present invention to modify during with attribute word-notional word co-occurrence pair, is carry out in units of subordinate sentence, extracts successively to each subordinate sentence that described reference comment data is concentrated.
Implicit attribute method for digging of the present invention builds proprietary reference comment data collection, attribute word dictionary, emotion word dictionary and notional word dictionary for the product needed of different category, attribute word-emotion word modifies matrix and attribute word-notional word co-occurrence matrix, ensure that the field correlativity of attribute word, and improve the accuracy rate of implicit attribute Result.
(2) successively each subordinate sentence that comment data to be analyzed is concentrated is processed, when current subordinate sentence is processed, first current subordinate sentence is the need of carrying out implicit attribute excavation to utilize described attribute word dictionary to judge, if do not need, then directly process next subordinate sentence, otherwise, then proceed as follows:
(2-1) the emotion word dictionary described in utilization and attribute word-emotion word modify the candidate attribute word array A that matrix determines current subordinate sentence f;
(2-2) context of co-text of current subordinate sentence is analyzed, if there is explicit attribute word f in its last bar subordinate sentence or a rear subordinate sentence i, and then by f ijoin the candidate attribute word array A of current subordinate sentence fin, and by f icontext weights assignment is 1; If f i∈ A f, then f is increased icontext weights 1≤i≤n f, n frepresent candidate attribute word array A fthe number of middle attribute word;
(2-3) the emotion word dictionary described in utilization and the notional word array A of the current subordinate sentence of notional word dictionary creation t, for the candidate attribute word array A of current subordinate sentence fin each attribute word, according to co-occurrence number of times, the notional word array A of attribute word and notional word tin each notional word at this attribute word of context weight computing of the appearance situation concentrated with reference to comment data and attribute word and notional word array A tin the weighted association value of each notional word, and choose the implicit expression Result of the maximum candidate attribute word of weighted association value as current subordinate sentence.
The present invention judges that current subordinate sentence is the need of carrying out implicit attribute excavation by the following method:
First judge whether this subordinate sentence is viewpoint sentence, if not viewpoint sentence, then do not need to carry out implicit attribute excavation; If viewpoint sentence, then whether this subordinate sentence is shown to expect, wish or imagination to utilize regular expression to judge: if then do not need to carry out implicit attribute excavation; If not, then need to carry out implicit attribute excavation.
According to the pause of comment text to be analyzed self, punctuate situation in the present invention, determine the scope of each subordinate sentence.
Described step (2-1) comprises following operation:
(2-11) the emotion word dictionary described in utilization, extracts emotion word all in current subordinate sentence and forms emotion word array A o;
(2-12) the emotion word array A of the current subordinate sentence of following formulae discovery is utilized oin any one attribute word f of each emotion word and its modification ibetween point condition association relationship:
PMI ( f i , o j ) = log P ( f i , o j ) P ( f i ) P ( o j )
Wherein, 1≤i≤n, n is the number of attribute word in attribute word dictionary, o jfor emotion word array A oin emotion word, 1≤j≤n o, n orepresent emotion word array A othe number of middle emotion word, P (f i, o j) be attribute word f iwith emotion word o jthe number of times of co-occurrence is concentrated, P (f in described reference comment data i, o j) be modify matrix reading from described attribute word-emotion word to obtain, P (f i), P (o j) be respectively attribute word f iwith emotion word o jthe number of times of appearance is concentrated in described reference comment data;
(2-13) according to emotion word array A oin point condition association relationship between each emotion word and the attribute word of its modification, choose 3 attribute words alternatively attribute word that point condition association relationship is the highest, then will according to emotion word array A oin the candidate attribute word chosen of all emotion word merge, delete the candidate attribute word array A that the attribute word wherein repeated constructs current subordinate sentence f, and by A fin each attribute word f icontext weights initial value compose be 1.
In step of the present invention (2-2), if f i∈ A f, then illustrate and utilize word association relation excavation candidate attribute word array A out fin contain the attribute word f utilizing context of co-text to infer i, f ithe possibility becoming current subordinate sentence implicit attribute word is comparatively large, therefore increases f icontext weights, as preferably, in described step (2-2), if f i∈ A f, then f is increased icontext weights for original 2 times.
(2-31) the notional word dictionary described in utilization, extracts notional words all in current subordinate sentence and forms notional word array A t, and delete notional word array A tin emotion word;
(2-32) following formulae discovery candidate attribute word array A is utilized fin each attribute word and notional word array A tin the relating value of all notional words:
T ( f i ) = Σ k = 1 v P ( f i | t k ) v ,
Wherein, T (f i) be attribute word f iwith notional word array A tin the relating value of all notional words, t kfor notional word array A tin notional word, 1≤i≤n f, n frepresent candidate attribute word array A fthe number of middle attribute word, 1≤k≤v, v represents notional word array A tthe number of middle notional word, P (f i| t k) concentrate attribute word f for described reference comment data iwith notional word array A tmiddle notional word t kconditional probability in co-occurrence situation, according to following formulae discovery:
P ( f i | t k ) = P ( f i , t k ) P ( t k ) = n c / n n n t k / n n = n c n t k ,
Wherein, n creference comment data described in expression concentrates attribute word f iwith notional word t kco-occurrence number of times, n cread from described attribute word-notional word co-occurrence matrix and obtain, represent notional word t kthe number of times of appearance is being concentrated, n with reference to comment data nrepresent that in notional word dictionary, all notional words are at the number of times concentrating appearance with reference to comment data;
(2-33) for candidate attribute word array A fin each candidate attribute word f i, with following formulae discovery itself and notional word array A tin the weighted association value T'(f of all notional words i):
T ′ ( f i ) = w f i × T ( f i )
Wherein, for each candidate attribute word f icontext weights, 1≤i≤n f, n frepresent candidate attribute word array A fthe number of middle attribute word.And choose the implicit attribute Result of the maximum candidate attribute word of weighted association value as current subordinate sentence according to result of calculation.
Do not make specified otherwise, in the present invention, the word frequency of certain word (comprising notional word, emotion word and attribute word) is the number of times that this word occurs in current category product review data centralization.
Do not make specified otherwise, the comment statement in the present invention refers to a comment of acquisition, and comment data then refers to the set of some comment statements.
Compared with prior art, tool of the present invention has the following advantages:
(1) first non-viewpoint sentence is carried out to subordinate sentence and table is expected, wished or the identification of imaginary statement, implicit attribute deduction is not carried out to above-mentioned a few class subordinate sentence, not only reduces workload, also improve the accuracy rate that implicit attribute excavates;
(2) modified relationship between emotion word and attribute word is utilized to obtain multiple candidate attribute word, implicit attribute excavation is carried out again according to the cooccurrence relation of candidate attribute word and notional word, this method has fully utilized two kinds of different word association relations, effectively can improve the accuracy rate that implicit attribute excavates;
(3) consider the context of co-text of subordinate sentence, by the context weights of adjustment candidate attribute word, the accuracy rate that implicit attribute excavates can be improved further.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the fusion word association relation of the present embodiment and the implicit attribute method for digging of context of co-text deduction;
Fig. 2 is for carry out pretreated process flow diagram to comment data;
Fig. 3 is the process flow diagram building attribute word dictionary, emotion word dictionary and notional word dictionary;
Fig. 4 is the process flow diagram of calculated candidate attribute word context weights;
Fig. 5 is the process flow diagram of calculated candidate attribute word weighted association value.
Embodiment
Below in conjunction with the drawings and specific embodiments, the specific embodiment of the present invention is described in further detail.
Be described for the cell phone type product review that Taobao captures in the present embodiment.
As shown in Figure 1, the fusion word association relation of the present embodiment and the implicit attribute method for digging of context of co-text deduction comprise the following steps:
(1) capture the comment data of different product series products from website (being Taobao the present embodiment), comprise the different categories such as clothes, jewelry, household electrical appliances, mobile phone, number, and pre-service is carried out to the comment data obtained, form corpus S.For comment data pretreatment process as shown in Figure 2, comprise the steps:
(1-1) standardization processing of comment data: the complex form of Chinese characters in comment data is converted to simplified Chinese character, identifies wrongly written or mispronounced characters wherein and corrects, and to there is mess code and comprise None-identified foreign language word comment statement delete.
Citing is described respectively below:
A () either traditional and simplified characters is changed: " father is this mobile phone of Xi Huan very ", “ Huan in subordinate sentence " be the complex form of Chinese characters, after either traditional and simplified characters conversion, export as " father enjoys a lot this mobile phone ".
B () wrongly written or mispronounced characters identification and corrigendum: " mobile phone reflection is very slow ", " reflection " in subordinate sentence should be " reaction ", after identifying corrigendum, exports as " handset response is very slow ".
(c) identification and the deletion of mess code statement: " the not smoothgoing Yu letter of Pi fine jade adze umbrella shackles silicon Shi Hao toot В Mo harasses for a Qin swinging Yun adze An Mao whore DEG C Retained fermium Xue Previous-set Breathe heavily Acta+Ren Lian and borrow Chen adze An Jue 5 Liu Hai? ", in this comment statement, comprise mess code, directly by its deletion.
(1-2) comment spam is filtered: utilize regular expression to containing No. QQ, cell-phone number, the information such as website comment statement filter.The regular expression wherein identifying phone number is " (13|18|15|17) [0-9] { 9} ", and this expression formula can identify the comment statement comprised with 11 figure place word strings of 13,18,15,17 beginnings.Identify the regular expression of No. QQ be " .*qq.* [1-9] [0-9] { 4; } | .*QQ.* [1-9] [0-9] { 4; } | .* button button .* [1-9] [0-9] { 4; } ", wherein " [1-9] [0-9] { 4, } " represents the Connected digits of more than 5, if when there is the key word such as " QQ ", " qq " or " button button " before Connected digits, namely judge that this Connected digits is as QQ number, this comment statement is comment spam, and deletes.
Such as: " taken at [321fanli.cn] and returned profit, found this dotey to return a lot of money by [321fanli.cn]! Remember [network address: 321fanli.cn] be directly inputted in browser-----help they publicize evaluate also have reward, contact QQ:15325973793." this comments in statement and occurred website and No. QQ, belongs to comment spam, utilize above-mentioned regular expression identified and delete.
(1-3) Chinese word segmentation and part-of-speech tagging are carried out to comment data, then carry out stop words filtration, finally delete in the whole text without punctuate and the long comment statement of subordinate sentence.
Such as: " mobile phone/n buys/v/u is good/d for a long time/a/u/d comes/v comment/v shyly/a/y mobile phone/n very/d is handy/a with/v/u is several/m days/q/father u/n very/d likes/a ", this comment statement do not have in the whole text punctuate and also length long, be easy to the analysis result producing mistake, therefore deleted.
(2) according to the corpus S built in step (1), the reference comment data collection S of wherein cell phone type product is utilized phone, build attribute word dictionary Dic_F, the emotion word dictionary Dic_O of cell phone type and notional word dictionary Dic_T, concrete steps as shown in Figure 3:
(2-1) method of bidirectional iteration is utilized to build initial attribute word word set F and initial emotion word word set O:
First manually selected 1 ~ 2 (in the present embodiment being 2) seed attribute word joins in initial attribute word word set F.For each the attribute word f in F i, the reference comment data collection S of traversal cell phone type product phonein comment statement, find out one by one and modify attribute word f iemotion word o j.If then by o jjoin in initial emotion word word set O;
Otherwise, for each the emotion word o in initial emotion word word set O j, the reference comment data collection S of traversal cell phone type product phonein comment statement, find out one by one by it modify attribute word f i.If then by f ijoin in initial attribute word word set F.So iterate, until the word number in F and O all no longer increases.
The corpus built in the present embodiment is actually obtained whole product series products, pretreated comment data set, and cell phone type is actually the set of all cell phone type product review data in corpus with reference to comment data collection.
Such as: selection " mobile phone ", " service " carry out bidirectional iteration as seed words, finally can obtain initial attribute word word set F and initial emotion word word set O.
(2-2) following formula is utilized:
w i T = tf i × idf i = tf i × log ( N n i )
Calculate the TF-IDF weights of each attribute word in initial attribute word word set F, wherein, for i-th attribute word f in initial attribute word word set F itF-IDF weights, 1≤i≤n f, n ffor the number of attribute word in initial attribute word word set F.Tf ifor attribute word f iat cell phone type product with reference to comment data collection S phonein normalization word frequency (normalization word frequency is attribute word f iat reference comment data collection S phonemiddle occurrence number and reference comment data collection S phonein the ratio of all notional word occurrence numbers); idf ifor comprising attribute word f in corpus S ithe inverse of comment data quantity, i.e. inverse document frequency; N is the total quantity of all category product review data in corpus S, n ifor comprising attribute word f in corpus S ithe total quantity of comment data.
Next, utilize threshold value to screen according to result of calculation (the TF-IDF weights of each attribute word), the attribute word being greater than first threshold 0.015 is screened, is built into domain attribute word word set.The attribute word being less than or equal to first threshold is joined public attribute word candidate word set, and artificial screening is carried out to public attribute word candidate word set obtains public attribute word word set.
Artificial screening method is as follows: by whole attribute word (the attribute word namely in public attribute word candidate word set) remaining in initial attribute word word set F, by word frequency, (namely this attribute word is at cell phone type product reference comment data collection S phonethe number of times of middle appearance) sequence (by descending sort in the present embodiment, namely by order arrangement from high to low), and artificial screening goes out the general attribute word in field becomes public attribute word word set.
Finally, domain attribute word word set and public attribute word word set are merged, constructs attribute word dictionary Dic_F.
Such as: the TF-IDF weights of the word such as " mobile phone ", " screen ", " button ", higher than first threshold, are screened and join in domain attribute word word set.The TF-IDF weights of the word such as " dotey ", " logistics ", lower than first threshold, are joined in public attribute word word set after artificial screening.Finally domain attribute word word set and public attribute word word set are merged, construct attribute word dictionary Dic_F.
(2-3) utilization knows that " the sentiment analysis word collection " of net, " the emotion vocabulary ontology library " of Dalian University of Technology and initial emotion word word set O carry out intersection screening, constructs emotion word dictionary Dic_O.
To initial emotion word word set O be appeared at simultaneously and know that the emotion word in net " sentiment analysis word collection " joins in emotion word dictionary Dic_O.In like manner, the emotion word simultaneously appeared in initial emotion word word set O and Dalian University of Technology's " emotion vocabulary ontology library " is also joined in emotion word dictionary Dic_O.Delete the emotion word repeated in emotion word dictionary Dic_O, complete the structure of emotion word dictionary Dic_O.
(2-4) the reference comment data collection S of cell phone type product is added up phonein the word frequency of all notional words (namely each notional word is at S phonethe number of times of middle appearance) and by descending sort, filter out the notional word that word frequency is greater than Second Threshold (in the present embodiment, Second Threshold is 50), construct notional word dictionary Dic_T.
(3) the reference comment data collection S of cell phone type product is utilized phonebuild attribute word-emotion word and modify matrix M fOwith attribute word-notional word co-occurrence matrix M fT:
(3-1) the reference comment data collection S of cell phone type product is traveled through phone, utilize the dictionary (comprising attribute word dictionary Dic_F, emotion word dictionary Dic_O and notional word dictionary Dic_T) built in step (2), extract attribute word-emotion word modify to attribute word-notional word co-occurrence pair.
In the present embodiment with " battery/n charging/v /u time/n very/d not /d is stable/a ,/w " for example, extract result as follows:
It is right that attribute word-emotion word is modified: " battery-stable ";
Attribute word-notional word co-occurrence pair: " battery-charging ", " battery-time ", " battery-stable ".
(3-2) right according to the attribute word extracted-emotion word modification, build attribute word-emotion word and modify matrix M fO; According to the attribute word-notional word co-occurrence pair extracted, build attribute word-notional word co-occurrence matrix M fT.
In the present embodiment: the attribute word-emotion word as extracted above is modified " battery-stable ", finds " battery " position i in attribute word dictionary Dic_F, finds " stablizing " position j in emotion word dictionary Dic_O.Whenever extracting " battery-stable " this attribute word-emotion word modification pair, then by matrix M fOthe value of the upper element of the i-th row jth row adds 1.In like manner, according to the attribute word-notional word co-occurrence pair extracted, whenever extracting corresponding attribute word-notional word pair, then by attribute word-notional word co-occurrence matrix M fTon middle relevant position, the value of element adds 1.
(4) (the present embodiment be 5000, this part comment data is not comprised in S again to capture a small amount of cell phone type product review data from Taobao phonein), and carry out pre-service according to the method in step (1), build comment data collection D to be analyzed.Read the comment subordinate sentence in comment data collection D to be analyzed one by one, and analyze in accordance with the following steps, till to the last a subordinate sentence processes:
During process current commentary subordinate sentence, first read in current commentary subordinate sentence (i.e. subordinate sentence), by word match attribute word dictionary Dic_F, if there is not explicit attribute word in this comment subordinate sentence, then obtain candidate attribute word array A according to following steps f;
Explicit attribute word refers to the product attribute word of explicit appearance in comment subordinate sentence, and such as: " price " explicitly in " too expensive " appears in subordinate sentence, can directly be extracted out according to attribute word dictionary Dic_F, be therefore explicit attribute word." mobile phone is fine, is exactly too expensive! " this comment second subordinate sentence in, " expensive " modifies " price ", but " price " this attribute word does not have explicitly appears in this subordinate sentence, and needing to carry out implicit attribute and excavate and could obtain, is therefore implicit attribute word.
(4-1) first judge whether this comment subordinate sentence is viewpoint sentence:
If non-viewpoint sentence, then do not carry out implicit attribute excavation, continue to read in next subordinate sentence;
If viewpoint sentence, then regular expression is utilized to make the following judgment:
If this comment subordinate sentence table is expected, wish or imaginary, then this subordinate sentence does not also carry out implicit attribute excavation, continues to read in next subordinate sentence;
Otherwise carry out implicit attribute excavation, and extract emotion word all in this subordinate sentence according to emotion word dictionary Dic_O, form the emotion word array A of this subordinate sentence o.
Citing is described respectively below:
(a) non-viewpoint sentence: " I/r this/r is several/m days/q goes on business/v/y." current subordinate sentence do not have emotion word, be therefore non-viewpoint sentence, do not carry out implicit attribute excavation.
(b) viewpoint sentence: " if/c again/d is cheap/a 1 point/m just/d is good/a/u./ w ", occurred in subordinate sentence the imaginary clause of table " if ... just ... ", therefore do not carry out implicit attribute excavation.
C (), for needing the subordinate sentence carrying out implicit attribute excavation, is extracted wherein all emotion word, is formed the emotion word array of this subordinate sentence, such as: " very/d not /d is durable/a./ w ", having there is emotion word " durable " in this subordinate sentence, but does not have explicit attribute word, therefore needs to carry out implicit attribute excavation.Extract from subordinate sentence " durable ", form the emotion word array A of this subordinate sentence o={ durable }.
(4-2) matrix M is modified according to the attribute word built in step (3)-emotion word fO, utilize the emotion word array A of following this subordinate sentence of formulae discovery oin any one attribute word f of each emotion word and its modification ibetween point condition association relationship PMI (f i, o j) (PMI value, Point Mutual Information):
PMI ( f i , o j ) = log P ( f i , o j ) P ( f i ) P ( o j )
Wherein, 1≤i≤n, n is the number of attribute word in attribute word dictionary, o jfor emotion word array A oin emotion word, P (f i, o j) be attribute word f iwith emotion word o jat the reference comment data collection S of cell phone type product phone(dependency word-emotion word modifies matrix M to the number of times of middle co-occurrence fOmiddle reading obtains), P (f i), P (o j) be respectively attribute word f iwith emotion word o jat cell phone type product with reference to comment data collection S phonethe number of times (i.e. word frequency) of middle appearance.
According to result of calculation (PMI value between the attribute word of each emotion word and its modification), for the emotion word array A of this subordinate sentence oin each emotion word, 3 the attribute words the highest with its PMI value are joined the candidate attribute word array A of this subordinate sentence fin.After all having added, delete the attribute word wherein repeated, build the candidate attribute word array A obtaining this subordinate sentence f, and by the candidate attribute word array A of this subordinate sentence fin each candidate attribute word f icontext weights initial value compose be 1.
Such as: calculate " durable " has all properties word of modified relationship PMI value with it, and filter out the candidate attribute word of 3 the highest attribute words of PMI value as this subordinate sentence:
PMI (battery)=log (918/6242)=-0.8325,
PMI (electroplax)=log (24/337)=-1.1474,
PMI (loom)=log (6/9616)=-3.2048.
The candidate attribute word array A finally constructed f=[battery, electroplax, loom].
(5) computation attribute word f icontext weight, as shown in Figure 4, first read in context subordinate sentence (i.e. the last bar subordinate sentence of current subordinate sentence and a rear subordinate sentence), judge whether there is explicit attribute word in context subordinate sentence:
If there is certain explicit attribute word f in its context subordinate sentence iand then extract this explicit attribute word f i, and by attribute word f ijoin candidate attribute word array A fin, and by its context weights assignment is 1.If f i∈ A f, then by f icontext weights double.
Such as: " battery/n charging/v /u time/n very/d not /d is stable/a ,/w very/d not /d is durable/a./ w ", for subordinate sentence " very/d not /d is durable/a./ w ", based on context linguistic context, can obtain its context property word for " battery ", " battery " ∈ A f, then by double for the context weights of " battery ", i.e. w battery=2.
(6) calculated candidate attribute word array A fin each candidate attribute word and current subordinate sentence in relating value between the notional word that occurs, as shown in Figure 5, concrete steps are as follows:
(6-1) utilize the notional word dictionary Dic_T built in step (2) to extract notional words all in current subordinate sentence, and delete wherein all emotion word according to emotion word dictionary Dic_O, form notional word array A t.
Such as: " battery/n too/d not /d to power/a ,/w once/m just/d do not have/v electricity/n/u very/d not /d is durable/a./ w ", extract notional words all in second subordinate sentence: " once ", " not having ", " electricity ", " durable ", and delete emotion word " durable " wherein, form notional word array A tnot=[once, not having, electricity].
(6-2) for candidate attribute word array A fin each attribute word f i, according to following formulae discovery itself and notional word array A tin the relating value T (f of all notional words i):
T ( f i ) = Σ k = 1 v P ( f i | t k ) v
Wherein, 1≤i≤n f, n frepresent candidate attribute word array A fthe number of middle candidate attribute word, 1≤k≤v, v represents notional word array A tthe number of middle notional word, P (f i| t k) represent the reference comment data collection S of cell phone type product phonemiddle attribute word f iwith notional word array A tin notional word t kconditional probability in co-occurrence situation.
In the present embodiment, P (f i| t k) calculate according to following formula:
P ( f i | t k ) = P ( f i , t k ) P ( t k ) = n c / n n n t k / n n = n c n t k
Wherein, n crepresent attribute word f iwith notional word t knumber of times (dependency word-notional word co-occurrence matrix the M of co-occurrence fTmiddle reading obtains), represent notional word t kat reference comment data collection S phonethe number of times (i.e. word frequency) of middle appearance, n nrepresent all notional words in notional word dictionary Dic_T with reference to comment data collection S phonethe number of times of middle appearance.
(6-3) for candidate attribute word array A fin each candidate attribute word f i, with following formulae discovery itself and notional word array A tin the weighted association value T'(f of all notional words i):
T ′ ( f i ) = w f i × T ( f i )
Wherein, for candidate attribute word f icontext weights, 1≤i≤n f, n frepresent candidate attribute word array A fthe number of middle attribute word.According to result of calculation, choose the Result of the maximum candidate attribute word of weighted association value as implicit attribute, and export.
Above-described embodiment has been described in detail technical scheme of the present invention and beneficial effect; be understood that and the foregoing is only most preferred embodiment of the present invention; be not limited to the present invention; all make in spirit of the present invention any amendment, supplement and equivalent to replace, all should be included within protection scope of the present invention.

Claims (7)

1. merge an implicit attribute method for digging for word association relation and context of co-text deduction, it is characterized in that, comprise the steps:
(1) corpus is built, and reference the comment data collection of the current product series products of building of corpus, attribute word dictionary, emotion word dictionary, notional word dictionary, attribute word-emotion word modification matrix and the attribute word-notional word co-occurrence matrix described in utilizing;
(2) successively each subordinate sentence that comment data to be analyzed is concentrated is processed, when current subordinate sentence is processed, first current subordinate sentence is the need of carrying out implicit attribute excavation to utilize described attribute word dictionary to judge, if do not need, then directly process next subordinate sentence, otherwise, proceed as follows:
(2-1) the emotion word dictionary described in utilization and attribute word-emotion word modify the candidate attribute word array A that matrix determines current subordinate sentence f;
(2-2) context of co-text of current subordinate sentence is analyzed, if there is explicit attribute word f in its last bar subordinate sentence or a rear subordinate sentence i, and then by f ijoin the candidate attribute word array A of current subordinate sentence fin, and by f icontext weights assignment is 1; If f i∈ A f, then f is increased icontext weights 1≤i≤n f, n frepresent candidate attribute word array A fthe number of middle attribute word;
(2-3) the emotion word dictionary described in utilization and the notional word array A of the current subordinate sentence of notional word dictionary creation t, for the candidate attribute word array A of current subordinate sentence fin each attribute word, according to co-occurrence number of times, the notional word array A of attribute word and notional word tin each notional word at this attribute word of context weight computing of the appearance situation concentrated with reference to comment data and attribute word and notional word array A tin the weighted association value of all notional words, and choose the implicit attribute Result of the maximum candidate attribute word of weighted association value as current subordinate sentence.
2. the implicit attribute method for digging merging word association relation and context of co-text deduction as claimed in claim 1, it is characterized in that, described step (1) comprises following operation:
(1-1) obtain the comment data of different product series products, and pre-service is carried out to the comment data obtained;
(1-2) all pretreated comment data are utilized to build corpus;
(1-3) for the product of current category, using the reference comment data collection of the comment data of product series products current in corpus as current product series products, and build the attribute word dictionary of current product series products, emotion word dictionary and notional word dictionary based on described reference comment data collection;
(1-4) based on described reference comment data collection, the attribute word dictionary described in utilization, emotion word dictionary and notional word dictionary creation attribute word-emotion word modifies matrix and attribute word-notional word co-occurrence matrix;
Described attribute word-emotion word value of modifying in matrix represents that any one attribute word and any one emotion word are at the number of times concentrating co-occurrence with reference to comment data, and the value in described attribute word-notional word co-occurrence matrix represents that any one attribute word and any one notional word are at the number of times concentrating co-occurrence with reference to comment data.
3. the implicit attribute method for digging merging word association relation and context of co-text deduction as claimed in claim 2, it is characterized in that, it is as follows that described step (1-1) carries out pre-service to comment data:
(1-11) to the standardization processing of comment data: the complex form of Chinese characters in comment data is converted to simplified Chinese character, identify wrongly written or mispronounced characters wherein and correct, and the comment statement that there is mess code and the foreign language word that comprises None-identified is deleted;
(1-12) comment spam filter: utilize regular expression to containing No. QQ, cell-phone number, website information comment statement filter;
(1-13) Chinese word segmentation and part-of-speech tagging are carried out to comment data, then carry out stop words filtration, finally delete in the whole text without punctuate and the long comment statement of subordinate sentence.
4. the implicit attribute method for digging merging word association relation and context of co-text deduction as claimed in claim 2, it is characterized in that, described step (1-3) builds attribute word dictionary, emotion word dictionary and notional word dictionary according to each notional word, attribute word and emotion word in the appearance situation that described reference comment data is concentrated.
5. the implicit attribute method for digging merging word association relation and context of co-text deduction as claimed in claim 2, it is characterized in that, described step (1-4) comprises following operation:
(1-41) reference the comment data collection described in traversal, attribute word dictionary, emotion word dictionary and notional word dictionary described in utilization, to occurred the subordinate sentence of attribute word, extract attribute word-emotion word modification to attribute word-notional word co-occurrence pair;
(1-42) right according to the attribute word extracted-emotion word modification, build attribute word-emotion word and modify matrix; According to the attribute word-notional word co-occurrence pair extracted, build attribute word-notional word co-occurrence matrix.
6. the implicit attribute method for digging merging word association relation and context of co-text deduction as claimed in claim 1, it is characterized in that, described step (2-1) comprises following operation:
(2-11) the emotion word dictionary described in utilization, extracts emotion word all in current subordinate sentence and forms emotion word array A o;
(2-12) the emotion word array A of the current subordinate sentence of following formulae discovery is utilized oin any one attribute word f of each emotion word and its modification ibetween point condition association relationship:
PMI ( f i , o j ) = log P ( f i , o j ) P ( f i ) P ( o j )
Wherein, 1≤i≤n, n is the number of attribute word in attribute word dictionary, o jfor emotion word array A oin emotion word, 1≤j≤n o, n ofor emotion word array A othe number of middle emotion word, P (f i, o j) be attribute word f iwith emotion word o jthe number of times of co-occurrence is concentrated, P (f in described reference comment data i, o j) be modify matrix to read from described attribute word-emotion word to obtain, P (f i), P (o j) be respectively attribute word f iwith emotion word o jthe number of times of appearance is concentrated in described reference comment data;
(2-13) according to emotion word array A oin point condition association relationship between each emotion word and the attribute word of its modification, choose 3 attribute words alternatively attribute word that point condition association relationship is the highest, then will according to emotion word array A oin the candidate attribute word chosen of all emotion word merge, delete the candidate attribute word array A that the attribute word wherein repeated constructs current subordinate sentence f, and by A fin each attribute word f icontext weights initial value compose be 1.
7., as the implicit attribute method for digging that the fusion word association relation in claim 1 ~ 6 as described in any one and context of co-text are inferred, it is characterized in that, described step (2-3) comprises following operation:
(2-31) the notional word dictionary described in utilization, extracts notional words all in current subordinate sentence and forms notional word array A t, and delete notional word array A tin emotion word;
(2-32) following formulae discovery candidate attribute word array A is utilized fin each attribute word f iwith notional word array A tin the relating value of all notional words:
T ( f i ) = Σ k = 1 v P ( f i | t k ) v ,
Wherein, T (f i) be attribute word f iwith notional word array A tin the relating value of all notional words, t kfor notional word array A tin notional word, 1≤i≤n f, n frepresent candidate attribute word array A fthe number of middle attribute word, 1≤k≤v, v represents notional word array A tthe number of middle notional word, P (f i| t k) concentrate attribute word f for described reference comment data iwith notional word array A tmiddle notional word t kconditional probability in co-occurrence situation, according to following formulae discovery:
P ( f i | t k ) = P ( f i , t k ) P ( t k ) = n c / n n n t k / n n = n c n t k ,
Wherein, n creference comment data described in expression concentrates attribute word f iwith notional word t kco-occurrence number of times, n cread from described attribute word-notional word co-occurrence matrix and obtain, represent notional word t kthe number of times of appearance is being concentrated, n with reference to comment data nrepresent that in notional word dictionary, all notional words are at the number of times concentrating appearance with reference to comment data;
(2-33) for candidate attribute word array A fin each candidate attribute word f i, with following formulae discovery itself and notional word array A tin the weighted association value T'(f of all notional words i):
T ′ ( f i ) = w f i × T ( f i )
Wherein, for each candidate attribute word f icontext weights, 1≤i≤n f, n frepresent candidate attribute word array A fthe number of middle attribute word.And choose the implicit attribute Result of the maximum candidate attribute word of weighted association value as current subordinate sentence according to result of calculation.
CN201510082519.3A 2015-02-15 2015-02-15 A kind of implicit attribute method for digging for merging word association relation and context of co-text deduction Expired - Fee Related CN104699766B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510082519.3A CN104699766B (en) 2015-02-15 2015-02-15 A kind of implicit attribute method for digging for merging word association relation and context of co-text deduction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510082519.3A CN104699766B (en) 2015-02-15 2015-02-15 A kind of implicit attribute method for digging for merging word association relation and context of co-text deduction

Publications (2)

Publication Number Publication Date
CN104699766A true CN104699766A (en) 2015-06-10
CN104699766B CN104699766B (en) 2018-01-02

Family

ID=53346887

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510082519.3A Expired - Fee Related CN104699766B (en) 2015-02-15 2015-02-15 A kind of implicit attribute method for digging for merging word association relation and context of co-text deduction

Country Status (1)

Country Link
CN (1) CN104699766B (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183847A (en) * 2015-09-07 2015-12-23 北京京东尚科信息技术有限公司 Feature information collecting method and device for web review data
CN105354183A (en) * 2015-10-19 2016-02-24 Tcl集团股份有限公司 Analytic method, apparatus and system for internet comments of household electrical appliance products
CN105868185A (en) * 2016-05-16 2016-08-17 南京邮电大学 Part-of-speech-tagging-based dictionary construction method applied in shopping comment emotion analysis
CN106066870A (en) * 2016-05-27 2016-11-02 南京信息工程大学 A kind of bilingual teaching mode constructing system of linguistic context mark
CN106407236A (en) * 2015-08-03 2017-02-15 北京众荟信息技术有限公司 An emotion tendency detection method for comment data
CN106407438A (en) * 2016-09-28 2017-02-15 珠海迈越信息技术有限公司 Data processing method and system
CN107391575A (en) * 2017-06-20 2017-11-24 浙江理工大学 A kind of implicit features recognition methods of word-based vector model
CN107526721A (en) * 2017-06-21 2017-12-29 深圳美云智数科技有限公司 A kind of disambiguation method and device to electric business product review vocabulary
CN107766318A (en) * 2016-08-17 2018-03-06 北京金山安全软件有限公司 Keyword extraction method and device and electronic equipment
CN109670542A (en) * 2018-12-11 2019-04-23 田刚 A kind of false comment detection method based on comment external information
CN109933788A (en) * 2019-02-14 2019-06-25 北京百度网讯科技有限公司 Type determines method, apparatus, equipment and medium
CN110399491A (en) * 2019-07-19 2019-11-01 电子科技大学 A kind of microblogging event evolution analysis method based on feature word co-occurrence graph
CN110706028A (en) * 2019-09-26 2020-01-17 四川长虹电器股份有限公司 Commodity evaluation emotion analysis system based on attribute characteristics
CN112328658A (en) * 2020-11-03 2021-02-05 北京百度网讯科技有限公司 User profile data processing method, device, equipment and storage medium
CN112529627A (en) * 2020-12-16 2021-03-19 中国联合网络通信集团有限公司 Method and device for extracting implicit attribute of commodity, computer equipment and storage medium
CN113112310A (en) * 2021-05-12 2021-07-13 北京大学 Commodity service culture added value assessment method, device and system
CN113191145A (en) * 2021-05-21 2021-07-30 百度在线网络技术(北京)有限公司 Keyword processing method and device, electronic equipment and medium
CN113298365A (en) * 2021-05-12 2021-08-24 北京信息科技大学 LSTM-based cultural additional value assessment method
CN113378542A (en) * 2021-02-05 2021-09-10 中国司法大数据研究院有限公司 Method and device for evaluating quality of referee document
CN114298012A (en) * 2021-12-31 2022-04-08 中国电子科技集团公司电子科学研究院 Optimization method for generating long text scientific and technological information model

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7756879B2 (en) * 2004-07-23 2010-07-13 Jeffrey Parsons System and method for estimating user ratings from user behavior and providing recommendations
CN102682074A (en) * 2012-03-09 2012-09-19 浙江大学 Product implicit attribute recognition method based on manifold learning
CN102591472B (en) * 2011-01-13 2014-06-18 新浪网技术(中国)有限公司 Method and device for inputting Chinese characters

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7756879B2 (en) * 2004-07-23 2010-07-13 Jeffrey Parsons System and method for estimating user ratings from user behavior and providing recommendations
CN102591472B (en) * 2011-01-13 2014-06-18 新浪网技术(中国)有限公司 Method and device for inputting Chinese characters
CN102682074A (en) * 2012-03-09 2012-09-19 浙江大学 Product implicit attribute recognition method based on manifold learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
朱卫祥: "《面向电子商务评论文本的观点挖掘系统研究与实现》", 《中国优秀硕士学位论文全文数据库 信息科技辑 》 *

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407236B (en) * 2015-08-03 2019-07-23 北京众荟信息技术股份有限公司 A kind of emotion tendency detection method towards comment data
CN106407236A (en) * 2015-08-03 2017-02-15 北京众荟信息技术有限公司 An emotion tendency detection method for comment data
CN105183847A (en) * 2015-09-07 2015-12-23 北京京东尚科信息技术有限公司 Feature information collecting method and device for web review data
CN105354183A (en) * 2015-10-19 2016-02-24 Tcl集团股份有限公司 Analytic method, apparatus and system for internet comments of household electrical appliance products
CN105868185A (en) * 2016-05-16 2016-08-17 南京邮电大学 Part-of-speech-tagging-based dictionary construction method applied in shopping comment emotion analysis
CN106066870B (en) * 2016-05-27 2019-03-15 南京信息工程大学 A kind of bilingual teaching mode building system of context mark
CN106066870A (en) * 2016-05-27 2016-11-02 南京信息工程大学 A kind of bilingual teaching mode constructing system of linguistic context mark
CN107766318A (en) * 2016-08-17 2018-03-06 北京金山安全软件有限公司 Keyword extraction method and device and electronic equipment
CN107766318B (en) * 2016-08-17 2021-03-16 北京金山安全软件有限公司 Keyword extraction method and device and electronic equipment
CN106407438A (en) * 2016-09-28 2017-02-15 珠海迈越信息技术有限公司 Data processing method and system
CN107391575A (en) * 2017-06-20 2017-11-24 浙江理工大学 A kind of implicit features recognition methods of word-based vector model
CN107391575B (en) * 2017-06-20 2020-08-04 浙江理工大学 Implicit feature recognition method based on word vector model
CN107526721A (en) * 2017-06-21 2017-12-29 深圳美云智数科技有限公司 A kind of disambiguation method and device to electric business product review vocabulary
CN107526721B (en) * 2017-06-21 2020-07-10 深圳美云智数科技有限公司 Ambiguity elimination method and device for comment vocabularies of e-commerce products
CN109670542A (en) * 2018-12-11 2019-04-23 田刚 A kind of false comment detection method based on comment external information
CN109933788B (en) * 2019-02-14 2023-05-23 北京百度网讯科技有限公司 Type determining method, device, equipment and medium
CN109933788A (en) * 2019-02-14 2019-06-25 北京百度网讯科技有限公司 Type determines method, apparatus, equipment and medium
CN110399491A (en) * 2019-07-19 2019-11-01 电子科技大学 A kind of microblogging event evolution analysis method based on feature word co-occurrence graph
CN110706028A (en) * 2019-09-26 2020-01-17 四川长虹电器股份有限公司 Commodity evaluation emotion analysis system based on attribute characteristics
CN112328658A (en) * 2020-11-03 2021-02-05 北京百度网讯科技有限公司 User profile data processing method, device, equipment and storage medium
CN112328658B (en) * 2020-11-03 2023-08-08 北京百度网讯科技有限公司 User profile data processing method, device, equipment and storage medium
CN112529627A (en) * 2020-12-16 2021-03-19 中国联合网络通信集团有限公司 Method and device for extracting implicit attribute of commodity, computer equipment and storage medium
CN112529627B (en) * 2020-12-16 2023-06-13 中国联合网络通信集团有限公司 Method and device for extracting implicit attribute of commodity, computer equipment and storage medium
CN113378542A (en) * 2021-02-05 2021-09-10 中国司法大数据研究院有限公司 Method and device for evaluating quality of referee document
CN113378542B (en) * 2021-02-05 2022-04-01 中国司法大数据研究院有限公司 Method and device for evaluating quality of referee document
CN113112310A (en) * 2021-05-12 2021-07-13 北京大学 Commodity service culture added value assessment method, device and system
CN113298365A (en) * 2021-05-12 2021-08-24 北京信息科技大学 LSTM-based cultural additional value assessment method
CN113298365B (en) * 2021-05-12 2023-12-01 北京信息科技大学 Cultural additional value assessment method based on LSTM
CN113191145A (en) * 2021-05-21 2021-07-30 百度在线网络技术(北京)有限公司 Keyword processing method and device, electronic equipment and medium
CN113191145B (en) * 2021-05-21 2023-08-11 百度在线网络技术(北京)有限公司 Keyword processing method and device, electronic equipment and medium
CN114298012A (en) * 2021-12-31 2022-04-08 中国电子科技集团公司电子科学研究院 Optimization method for generating long text scientific and technological information model

Also Published As

Publication number Publication date
CN104699766B (en) 2018-01-02

Similar Documents

Publication Publication Date Title
CN104699766A (en) Implicit attribute mining method integrating word correlation and context deduction
Hai et al. Identifying features in opinion mining via intrinsic and extrinsic domain relevance
CN104391942B (en) Short essay eigen extended method based on semantic collection of illustrative plates
CN103970729B (en) A kind of multi-threaded extracting method based on semantic category
CN101593200B (en) Method for classifying Chinese webpages based on keyword frequency analysis
CN103745000B (en) Hot topic detection method of Chinese micro-blogs
CN101655857B (en) Method for mining data in construction regulation field based on associative regulation mining technology
CN100595760C (en) Method for gaining oral vocabulary entry, device and input method system thereof
CN104268197A (en) Industry comment data fine grain sentiment analysis method
CN105760493A (en) Automatic work order classification method for electricity marketing service hot spot 95598
CN105095433A (en) Recommendation method and device for entities
CN109670039A (en) Sentiment analysis method is commented on based on the semi-supervised electric business of tripartite graph and clustering
CN112966091B (en) Knowledge map recommendation system fusing entity information and heat
CN110287329A (en) A kind of electric business classification attribute excavation method based on commodity text classification
CN105740227A (en) Genetic simulated annealing method for solving new words in Chinese segmentation
CN106156372A (en) The sorting technique of a kind of internet site and device
CN109063147A (en) Online course forum content recommendation method and system based on text similarity
CN105975475A (en) Chinese phrase string-based fine-grained thematic information extraction method
Yan et al. An improved single-pass algorithm for chinese microblog topic detection and tracking
Sadr et al. Unified topic-based semantic models: A study in computing the semantic relatedness of geographic terms
CN110334180A (en) A kind of mobile application security appraisal procedure based on comment data
Kaur et al. Sentiment analysis on electricity Twitter posts
CN103019924A (en) Input method intelligence evaluation system and input method intelligence evaluation method
Yan et al. Sentiment Analysis of Short Texts Based on Parallel DenseNet.
CN114491033A (en) Method for building user interest model based on word vector and topic model

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180102

Termination date: 20190215

CF01 Termination of patent right due to non-payment of annual fee