CN105095288A - Data analysis method and data analysis device - Google Patents

Data analysis method and data analysis device Download PDF

Info

Publication number
CN105095288A
CN105095288A CN201410204300.1A CN201410204300A CN105095288A CN 105095288 A CN105095288 A CN 105095288A CN 201410204300 A CN201410204300 A CN 201410204300A CN 105095288 A CN105095288 A CN 105095288A
Authority
CN
China
Prior art keywords
label
storehouse
subject
text
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410204300.1A
Other languages
Chinese (zh)
Other versions
CN105095288B (en
Inventor
温春龙
陈妍
梁璟彪
骆玘
黄利贤
樊中一
吕虹
刘敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Tencent Cloud Computing Beijing Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201410204300.1A priority Critical patent/CN105095288B/en
Publication of CN105095288A publication Critical patent/CN105095288A/en
Application granted granted Critical
Publication of CN105095288B publication Critical patent/CN105095288B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention provides a data analysis method and a data analysis device. The method comprises: establishing a product label library according to input text content; according to the text content, acquiring a subject that a word-of-mouth word modifies, wherein the word-of-mouth word is obtained by performing word segmentation on the text content and filtering words that reach a preset frequency after word segmentation is performed on a prestored thesaurus; matching the subject and labels in the product label library; and according to a label that matches the subject, generating a result label tree that reflects a common problem in the text content. According to the method, comment content is collected comprehensively in real time, the existing data analysis method is simplified, and data analysis accuracy is improved.

Description

Data analysing method and data analysis set-up
Technical field
The present invention relates to Internet technology, particularly relate to a kind of data analysing method and data analysis set-up.
Background technology
At present, after some enterprise collects the user feedback of certain product, manually sort out according to content of text, judge comment content it is mentioned which aspect concrete (as function, bug (leak)) of product and the feeling polarities (front, negative) of comment.
That is, the public praise of artificial judgment product and conclusion favorable comment, difference comment centrostigma.Read comment by an artificial rule, judge that the emotion that this comment is expressed belongs to front, negative or neutral, judge in comment, evaluation object belongs to which dimension of product (being such as performance, function or price class) simultaneously.Then manually sort out, finally add up and sort, draw the favorable comment of product, difference comments and mainly concentrate on those dimensions.
But in the large situation of data volume, too much artificial participation causes the duplication of labour and poor efficiency, and classification and conclusion lack systematicness and consistance, cause manpower consumption's cost high, lack real-time.
For this reason, also occur in prior art that a kind of Taobao attribute is to classification, such as, mated one by one by pre-set attribute word and emotion word, statistical induction result.
But Taobao's attribute comprises the defect sorted out: the first, lack comprehensive to the classification of data; The second, fail binding analysis public praise situation, can only see and comment on conclusion in a certain respect.
For this reason, a kind of method comprehensively can carrying out data analysis is in real time needed.
Summary of the invention
For solving defect of the prior art, the invention provides a kind of data analysing method and data analysis set-up, for comprehensively collecting comment content in real time, and simplifying existing data analysis mode, improving the accuracy of data analysis.
First aspect, the embodiment of the present invention provides a kind of data analysing method, comprising:
Content of text according to input sets up Product labelling storehouse;
Obtain according to described content of text the subject that public praise word modifies, described public praise word carries out word segmentation processing to described content of text, and to be screened the word reaching the default frequency after word segmentation processing by the dictionary that prestores and obtain;
Described subject is mated with the label in described Product labelling storehouse;
According to the label matched with described subject, generate the result tag tree of common problem in the described content of text of reflection.
In conjunction with first aspect, in the implementation that the first is possible, the described content of text according to input sets up Product labelling storehouse, comprising:
Content of text according to input sets up dynamic labels storehouse;
The product category corresponding according to described content of text sets up special tags storehouse;
Described dynamic labels storehouse, special tags storehouse and default universal tag storehouse are generated described Product labelling storehouse.
In conjunction with the first possible implementation of first aspect, in the implementation that the second is possible, the described content of text according to input sets up dynamic labels storehouse, comprising:
Obtain the noun in described content of text;
Judge whether the frequency number of times that described noun occurs is greater than predetermined threshold value;
If the frequency number of times that described noun occurs is greater than predetermined threshold value, determines the label in described noun and described special tags storehouse, whether label in universal tag storehouse repeat;
If when the label in described noun and described special tags storehouse, the label in universal tag storehouse do not repeat, described noun generates described dynamic labels storehouse as label.
In conjunction with the first possible implementation of first aspect, in the implementation that the third is possible, the product category corresponding according to described content of text sets up special tags storehouse;
The product category corresponding according to described content of text obtains the customized label belonging to described product;
Search the synonym of described customized label, near synonym;
Described customized label, the synonym of described customized label, near synonym are generated the described special tags storehouse of described content of text.
In conjunction with the implementation that the second of first aspect is possible, in the 4th kind of possible implementation, the noun in the described content of text of described acquisition, comprising:
According to self-defined dictionary, word segmentation processing is carried out to described content of text, obtain the noun of described content of text.
In conjunction with the above-mentioned possible implementation of first aspect and first aspect, in the 5th kind of possible implementation, obtain according to described content of text the subject that public praise word modifies, comprising:
The subject of public praise word modification and/or implicit subject is obtained in described content of text;
Described subject is mated with the label in described Product labelling storehouse, comprising:
Described subject and/or implicit subject are mated with the label in described Product labelling storehouse respectively.
In conjunction with first to fourth kind of first aspect and first aspect possible implementation, in the 6th kind of possible implementation, the label that described basis matches with described subject, before generating the step of the result tag tree of common problem in the described content of text of reflection, described method also comprises:
Obtain the expansion public praise word of label in described Product labelling storehouse;
Described expansion public praise word and the label corresponding with described expansion public praise word are mated in described content of text;
According to the label matched with described subject, generate the result tag tree of common problem in the described content of text of reflection, comprising:
According to the label matched with described subject, and the matching result of described expansion public praise word and label corresponding to described expansion public praise word, generate the result tag tree of common problem in the described content of text of reflection.
In conjunction with first to fourth kind of first aspect and first aspect possible implementation, in the 7th kind of possible implementation, the described file content according to input also comprises after setting up the step in Product labelling storehouse:
According to the membership in described Product labelling storehouse between each label, set up multi-layer tag tree;
Described subject is mated with the label in described Product labelling storehouse, comprising:
Described subject is mated with the bottom label in described multi-layer tag tree.
In conjunction with the 7th kind of possible implementation of first aspect, in the 8th kind of possible implementation,
The label that described basis matches with described subject, generates the result tag tree of common problem in the described content of text of reflection, comprising:
If described subject and described bottom label match, then record is carried out in the position belonging to described bottom label;
The record result of described bottom label is oppositely pulled the position belonging to the upper label in described multi-layer tag tree, obtain the result tag tree reflecting common problem in described text.
In conjunction with the first possible implementation of first aspect, in the 9th kind of possible implementation, described method also comprises:
If described subject and described label do not mate, obtain similarity and the importance degree of described subject according to described semantic similarity and importance degree computation rule;
If the similarity of described subject is more than or equal to the first preset value, and/or the importance degree of described subject is more than or equal to the second preset value, and described subject is added described dynamic labels storehouse as label.
Second aspect, the embodiment of the present invention provides a kind of data analysis set-up, comprising:
Unit is set up in Product labelling storehouse, for setting up Product labelling storehouse according to the content of text of input;
Subject acquiring unit, for obtaining the subject that public praise word is modified according to described content of text, described public praise word carries out word segmentation processing to described content of text, and to be screened the word reaching the default frequency after word segmentation processing by the dictionary that prestores and obtain;
Matching unit, the label set up in the Product labelling storehouse that unit sets up for the subject that described subject acquiring unit obtained and described Product labelling storehouse mates;
Result tag tree generation unit, for according to the label matched with described subject in described matching unit, generates the result tag tree of common problem in the described content of text of reflection.
In conjunction with second aspect, in the implementation that the first is possible, unit is set up in described Product labelling storehouse, for
Content of text according to input sets up dynamic labels storehouse;
The product category corresponding according to described content of text sets up special tags storehouse;
Described dynamic labels storehouse, special tags storehouse and default universal tag storehouse are generated described Product labelling storehouse.
In conjunction with the first possible implementation of second aspect, in the implementation that the second is possible, unit is set up in described Product labelling storehouse, for
Obtain the noun in described content of text;
Judge whether the frequency number of times that described noun occurs is greater than predetermined threshold value;
If the frequency number of times that described noun occurs is greater than predetermined threshold value, determines the label in described noun and described special tags storehouse, whether label in universal tag storehouse repeat;
If the label in described noun and described special tags storehouse, the label in universal tag storehouse do not repeat, described noun is generated described dynamic labels storehouse as label.
In conjunction with the first possible implementation of second aspect, in the implementation that the third is possible, unit is set up in described Product labelling storehouse, for
The product category corresponding according to described content of text obtains the customized label belonging to described product;
Search the synonym of described customized label, near synonym;
Described customized label, the synonym of described customized label, near synonym are generated the described special tags storehouse of described content of text.
In conjunction with the implementation that the second of second aspect is possible, in the 4th kind of possible implementation, unit is set up in described Product labelling storehouse, for
According to self-defined dictionary, word segmentation processing is carried out to described content of text, obtain the noun of described content of text.
In conjunction with the above-mentioned possible implementation of second aspect and second aspect, in the 5th kind of possible implementation, described subject acquiring unit, for
The subject of public praise word modification and/or implicit subject is obtained in described content of text;
Described matching unit, for
The label that the subject obtain described subject acquiring unit and/or implicit subject are set up with described Product labelling storehouse in the Product labelling storehouse that unit sets up respectively mates.
In conjunction with first to fourth kind of second aspect and second aspect possible implementation, in the 6th kind of possible implementation, described device also comprises:
Expansion public praise word acquiring unit, sets up the expansion public praise word of the label in the Product labelling storehouse of unit foundation for obtaining described Product labelling storehouse;
Described matching unit, also for
The expansion public praise word obtain described expansion public praise word acquiring unit and the label corresponding with described expansion public praise word mate in described content of text;
Result tag tree generation unit, for
According to the label matched with described subject, and the matching result of described expansion public praise word and label corresponding to described expansion public praise word, generate the result tag tree of common problem in the described content of text of reflection.
In conjunction with first to fourth kind of second aspect and second aspect possible implementation, in the 7th kind of possible implementation, described device also comprises:
Multi-layer tag tree sets up unit, for setting up the membership in Product labelling storehouse that unit sets up between each label according to described Product labelling storehouse, sets up multi-layer tag tree;
Described matching unit, for
The bottom label set up in the multi-layer tag tree that unit sets up for the subject that described subject acquiring unit obtained and described multi-layer tag tree mates.
In conjunction with the 7th kind of possible implementation of second aspect, in the 8th kind of possible implementation, described result tag tree generation unit, for
If described subject and described bottom label match, then record is carried out in the position belonging to described bottom label;
The record result of described bottom label is oppositely pulled the position belonging to the upper label in described multi-layer tag tree, obtain the result tag tree reflecting common problem in described text.
In conjunction with the first possible implementation of second aspect, in the 9th kind of possible implementation, described device also comprises:
Subject similarity acquiring unit, for the matching result according to described matching unit, when described subject and described label do not mate, obtains similarity and the importance degree of described subject according to described semantic similarity and importance degree computation rule;
Subject processing unit, similarity for the described subject obtained at described subject similarity acquiring unit is more than or equal to the first preset value, and/or the importance degree of described subject is when being more than or equal to the second preset value, described subject is added described dynamic labels storehouse as label.
As shown from the above technical solution, the data analysing method of the embodiment of the present invention and data analysis set-up, by setting up comprehensive Product labelling storehouse, and then obtain the subject that public praise word modifies, subject is mated with the label in Product labelling storehouse, described subject with described label to when mating, generate the result tag tree of reflection common problem, the comment content of comprehensively collecting in real time in content of text can be realized, and simplify existing data analysis mode, and improve the accuracy of data analysis.
Accompanying drawing explanation
The schematic flow sheet of the data analysing method that Fig. 1 provides for one embodiment of the invention;
The schematic flow sheet of the data analysing method that Fig. 2 provides for another embodiment of the present invention;
The schematic diagram that the multi-layer tag tree that Fig. 3 provides for one embodiment of the invention generates;
The schematic diagram of the multi-layer tag tree that Fig. 4 A provides for one embodiment of the invention;
The schematic diagram of the result tag tree that Fig. 4 B provides for one embodiment of the invention;
The structural representation of the data analysis set-up that Fig. 5 provides for one embodiment of the invention;
The structural representation of the data analysis set-up that Fig. 6 A and Fig. 6 B provides for another embodiment of the present invention.
Embodiment
In embodiments of the present invention, label refers to the concrete comment object of user when commenting on this product.As " the new edition interface very rubbish of XX video ", the concrete comment object of user is " interface ", " interface " one word form a label.
The embodiment of the present invention provides the just negative public praise of a kind of robotization positioning product to concentrate data analysing method and the data analysis set-up of dimension, mainly solve following problem: the feeling polarities (front, negative, neutral) judging user comment, dynamically by the common problem automatic clustering under different emotions polarity, add up and sort, point primary and secondary and show the favorable comment of user feedback at many levels, difference is commented, focus is discussed concentrates on which aspect of product, and follows the trail of variation tendency.For example, the data analysing method of the embodiment of the present invention can realize as follows:
The first, the principal dimensions of positioning effects public praise: by obtaining a large amount of user feedbacks (microblogging, third-party application market, forum) and just negative public praise word, automatically, analyze the just negative public praise of product in real time, all sidedly and concentrate on which dimension, the underlying causes affecting acceptance of the users is excavated by the semantic analysis degree of depth, help product quick position is mainly poor to be punctuated and annotated and problem points, helps product clearly improvement aspect.
The second, each dimension public praise change of analytic product: automatic analysis product each dimension public praise and variation tendency, as new edition issues front and back public praise contrast, the public praise of product New function, interface dimension public praise change, intuitively shows with the sudden change of vision.Be responsible for the employee of disparate modules in product team, the emphasis of concern is not identical yet, and as exploitation may stress to pay close attention to performance, design may stress to pay close attention to interface and style, and each the dimension public praise of refinement product changes, and meets different concern sides demand.
Three, feedback focus is sorted out: analyze user comment focus, by merger synonym, near synonym, modularization merger user feedback focus, makes categorization results have more accuracy and practicality.
Fig. 1 shows the schematic flow sheet of the data analysing method that one embodiment of the invention provides, and as shown in Figure 1, the data analysing method of the present embodiment is as described below.
101, Product labelling storehouse is set up according to the batch text content of input.
For example, the Product labelling storehouse in the present embodiment can comprise dynamic labels storehouse, special tags storehouse and universal tag storehouse.
Wherein, dynamic labels storehouse sets up according to the batch text content of input, and special tags storehouse is that the product category corresponding according to described batch text content is set up.
Universal tag storehouse can be artificial classification increase in advance.
102, according to the subject that described batch text content obtaining public praise word is modified.
For example, public praise word carries out word segmentation processing to described content of text, and to be screened the word reaching the default frequency after word segmentation processing by the dictionary that prestores and obtain.
Will be understood that, by application programming interface (ApplicationProgramInterface, be called for short API) or the web crawlers raw information (corresponding above-mentioned batch text content) relevant to product (comprise the title of product, series call or the title of some importance functional block) that captures that microblogging and/or forum gather user comment, existing Chinese lexical analysis public praise trend can be adopted, statistics front or negative public praise word after raw information is cleaned.
For example, clean can be regarded as to raw information and repetition and invalid information are removed to raw information, namely filtration treatment is carried out to raw data, and then Chinese lexical analysis system row word segmentation processing can be adopted to the information after filtration treatment, and then by the dictionary prestored, public praise trend is analyzed to the word after word segmentation processing, and screen.
103, described subject is mated with the label in described Product labelling storehouse.
104, according to and the label that matches of described subject, generate the result tag tree of common problem in the described batch text content of reflection, as shown in Figure 4 B.
The data analysing method of the present embodiment, by setting up comprehensive Product labelling storehouse, and then obtain the subject of public praise word modification, subject is mated with the label in Product labelling storehouse, described subject with described label to when mating, generate the result tag tree of reflection common problem, the comment content of collecting in real time comprehensively in batch text content can be realized, and simplify existing data analysis mode, and improve the accuracy of data analysis.
Fig. 2 shows the schematic flow sheet of the data analysing method that one embodiment of the invention provides, the schematic diagram that the multi-layer tag tree that Fig. 3 shows one embodiment of the invention to be provided generates, and as shown in Figures 2 and 3, the data analysing method of the present embodiment is as described below.
201, dynamic labels storehouse is set up according to the batch text content of input.
Different times, the focus that user evaluates may shift or occur new label, needs to set up dynamic labels storehouse to ensure real-time and accurate demand.Such as, XX music newly releases a function " listening song to know bent ", becomes rapidly the focus of attention of user, but " listening song to know bent " this label is not included in existing tag library, now need to add new label screen and add mechanism, to ensure completeness and the real-time of tag library.Dynamic labels can be regarded as the hot word or neologisms etc. that occur a certain period.
Set up in the process of tag library, scheduling algorithm can be found by acceptation similarity/near synonym, improve label further comprehensive.Such as " interface " this label, has similar expression-form (similar word or near synonym) in user evaluates, as: the page, panel, outward appearance, external form, layout, skin, desktop etc., this needs to carry out synonym classification.
Such as, this step 201 also can comprise following not shown sub-step:
A2011, the noun obtained in described batch text content.
For example, according to the generic noun of product and/or competing product, by Chinese lexical analysis system, word segmentation processing is carried out to batch text, obtain noun and/or the public praise word of described batch text content.
Such as, in the participle interface interchange ICTCLAS provided by ICTCLAS system, segmentation methods carries out word segmentation processing to batch text content.
ICTCLAS system needs to call self-defined dictionary.Self-defined dictionary comprises concrete word and part-of-speech tagging, and self-defined dictionary is equivalent to a submodule of Words partition system, and on the basis of self-defined dictionary, a word cutting could be different words by segmentation methods.The comprehensive accuracy affecting participle of self-defined dictionary, self-defined dictionary meet renewable, can accumulate and agree with the requirement of microblogging linguistic context/forum.
A2012, judge whether the frequency number of times that described noun occurs is greater than predetermined threshold value.
If the frequency number of times that the described noun of A2013 occurs is greater than predetermined threshold value, determines the label in described noun and described special tags storehouse, whether label in universal tag storehouse repeat.
Certainly, the frequency number of times occurred at described noun is less than or equal to predetermined threshold value, this noun can be ignored or abandon.
If the label in the described noun of A2014 and described special tags storehouse, the label in universal tag storehouse do not repeat, described noun generates dynamic labels storehouse as label.
Certainly, the label in noun and special tags storehouse repeats, then abandon this noun.Or the label in noun and universal tag storehouse repeats, then abandon this noun.Each label thus in above-mentioned dynamic labels storehouse and special tags storehouse, universal tag storehouse does not repeat.
In the present embodiment, can be the tag library of each name composition in dynamic labels storehouse.
202, corresponding according to described batch text content product category sets up special tags storehouse.
For example, a certain content of text is " XX video clip very rubbish ", the product category that then text content is corresponding can be the interface categories in computing machine, now, special tags storehouse can be should the tag library of interface categories, can comprise: the labels such as interface, outward appearance, layout, skin, desktop in this tag library.
Special tags can be regarded as the conventional noun etc. in the field belonging to a certain content of text, and the label in special tags storehouse belongs to specific area respectively.
Such as, this step 202 also can comprise following not shown sub-step:
A2021, the product category corresponding according to described batch text content obtain the customized label belonging to described product;
A2022, the synonym searching described customized label, near synonym;
Such as, can according to the synonym of lexical similarity rule searching customized label, near synonym.
A2023, described customized label, the synonym of described customized label, near synonym are generated the special tags storehouse of described batch text content.
That is, due to the otherness of product category, need again to set up customized label dictionary for different product, to ensure the degree of accuracy of semantic analysis.Such as, the user of music series products pays close attention to " tonequality, resource, speed of download " etc., and the product of electric business's class pays close attention to " price, logistics, attitude " etc., needs to set up special tag library according to different product.
203, described dynamic labels storehouse, special tags storehouse and default universal tag storehouse are generated described Product labelling storehouse.
In the present embodiment, universal tag storehouse, is also called common tag storehouse, considers product general character, needs to set up universal tag dictionary, to save time and manpower consumption.Such as, in the user feedback of all products, comment object all can relate to labels such as " bug, network speed, interface, performance, charges ", and these labels just have public attribute, can add universal tag storehouse.
Basic coverage rate can be met due to current universal tag storehouse and special tags storehouse but all labels can not be hit, thus, in the present embodiment, being also provided with dynamic labels storehouse.The Product labelling storehouse consisted of the mode of above-mentioned steps 201 to step 203 has real-time and comprehensive.
204, according to the membership in described Product labelling storehouse between each label, multi-layer tag tree is set up.
Product labelling storehouse needs to set up the hierarchical relationship in Product labelling storehouse between each label or membership, namely sets up multi-layer tag tree after improving.The comment label of user to a certain product is initial comprises different dimensions, as " entirety, function, design, performance, content resource, activity and advertisement " etc.Thinner secondary dimension may can be divided into again for above-mentioned large dimension, comprise as " performance " " sudden strain of a muscle is moved back, crash, blank screen, broadcasting speed card, upgrading, installation question " etc., when user's statement " broadcasting speed card " this secondary dimension, have again different expression-forms (synonym or similar word), these labels such as such as " speed, network speed, networking, networking load " all can be used for describing " broadcasting speed ", these labels are positioned at bottom tag library, exactly with the label of stating in the registered permanent residence, as shown in Figure 4 A.
205, in described batch text content, obtain the subject of public praise word modification and/or implicit subject.
For example, the subject of public praise word modification and/or implicit subject can be obtained according to the syntax rule preset in described batch text content.
Be understandable that, acquisition implicit subject in this place can be whole public praise word is that benchmark obtains, and part public praise word also can be adopted to be that benchmark obtains.
Such as, the subject that general negative public praise word is modified is analyzed:
Extract the negative public praise word occurred in negative microblogging, and analyze the subject of negative public praise word modification, the subject of extraction and analysis.Such as, in " the new edition interface very rubbish of XX video " this unfavorable ratings, " rubbish " is negative public praise word, its subject modified is " interface ", be extracted as " interface, rubbish ", certainly according to the different levels of subject, can be extracted as (XX video-new edition-interface, rubbish).
Analyze the negative public praise word having implicit subject:
There is negative public praise word in some negative reviews, but do not find obvious subject.As " xx video outward appearance is very beautiful, but has been exactly Tai Kadun ", in this comment, " card " is identified as negative public praise word, but its subject modified has been hidden in fact, the speed Tai Kadun being meant to xx video that user expresses.Have the actual negative public praise word referring to implication for this, find obvious subject in comment if fail, system can the corresponding subject storehouse of Automatically invoked be its coupling, is extracted as (xx video--speed, card).
206, the expansion public praise word of label in Product labelling storehouse is obtained.
For example, word conllinear rule can be adopted or/and the artificial mode sorted out obtains the expansion public praise word of label.Will be understood that, expansion public praise word not emotion word truly at last, expansion public praise word only has just has physical meaning with distinctive label collocation.As logistics very " soon ", " soon " is expansion public praise, only have and " logistics " collocation just have actual emotion implication.
Should be understood that word conllinear rule refers to the algorithm of the probability that calculating two words or word together occur.
Such as, the subject that negative expansion public praise word is modified is analyzed:
There is not obvious negative public praise word in some user comment, but still express negative emotions.As " speed of xx video is very slow; also very fast with flow ", in this comment, " slowly " and " soon " is not negative public praise word (if being judged to negative public praise word to there will be a large amount of erroneous judgement), but when these two words are together with specific subject (label) collocation time, can negative emotion be given expression to.Now need to set up negative expansion public praise dictionary and the significant syntax rule of correspondence, analyze this class situation, be extracted as (speed, slow), (flow, fast).
207, described subject and/or implicit subject are mated with the bottom label in described multi-layer tag tree respectively, and described expansion public praise word and the label corresponding with described expansion public praise word are mated in described batch text content.
In the present embodiment, expansion public praise and public praise word (i.e. common public praise word) can be rank arranged side by side, and these expansion public praises only have and peculiar label (the bottom label as in multi-layer tag tree) collocation, just can be of practical significance.
Expansion public praise word is that the label in corresponding product tag library obtains, and that is, the subject that expansion public praise word is modified is fixed.Expansion public praise subject be exactly label in fact, and then need in batch text mate " label+expansion public praise " and correct quantity.
208, according to and the label that matches of described subject as bottom label, and the matching result of described expansion public praise word and label corresponding to described expansion public praise word, generates the result tag tree of common problem in the described batch text content of reflection.
Such as, if described subject and described bottom label match, then record is carried out in the position belonging to described bottom label, and
When expanding public praise word and the label corresponding with described expansion public praise word and batch text content matching, record is carried out in the position belonging to label corresponding for described expansion public praise word;
And then the record result of described bottom label oppositely can be pulled the position belonging to upper label in described multi-layer tag tree, obtain the result tag tree comprising matching result.
That is, in the present embodiment, first in batch text, find public praise word, find public praise word and modify subject, if find, the bottom label in subject coupling multi-layer tag tree, successful then bottom label+1; If do not find, then can find the implicit subject of public praise word, then implicit subject coupling bottom label, successful then bottom label+1.With this simultaneously, " expansion public praise word+label " traversal coupling in batch text content, successful then corresponding with expansion public praise word label+1.
Usually, the process of multi-layer tag tree coupling is upwards carried out by the bottom of tag tree adding up, but user is that bottom is checked from top to bottom when checking result tag tree, such as, check " xx video " → " performance " → " broadcasting speed card ", as shown in Figure 4 B, the text message that now user is concrete under needing to check this module, just needs the result (be and oppositely pull) that " pulling " mates.
In the present embodiment, system can pull that the match is successful record, retrtieval position, and highlight or increase the weight of process, after oppositely pulling, and then obtain the result tag tree comprising matching result.
In figure 4b, each layer label numeral number that the match is successful below, the i.e. number of common problem under module for this reason.Can see in xx video negative feedback, tell groove the most concentrated be function (300), design (250) and performance (240) aspect, exhaustive division result is also clear readable.
It should be noted that bottom label can be the nethermost label of each branch, as the speed in Fig. 4 B, network speed, networking, networking, loading etc.If no longer include label below activity and advertisement, then movable and advertisement also belongs to bottom label.
The common problem of a negative public praise is illustrated in the present embodiment.In other embodiments, also can by flow process above, obtain common problem automatic clustering under the public praise of front, the present embodiment no longer describes in detail.
Alternatively, when described subject and described label do not mate, similarity and the importance degree of described subject is calculated according to described semantic similarity and importance degree computation rule;
Be more than or equal to the first preset value in the similarity of described subject, and/or when the importance degree of described subject is more than or equal to the second preset value, described subject added dynamic labels storehouse as label.
That is, after having analyzed the subject of negative public praise word modification, subject has been mated with dynamic labels storehouse, record matching result.If there is subject to fail to mate, then according to semantic similarity and importance degree computation rule, preferentially enter dynamic labels storehouse.
Data analysing method in above-described embodiment, can according to matching result, the result of same class label is sorted out and merges, duplicate removal statistics.Successively upwards sort out statistics according to tag tree, until all label completes, obtain last result tag tree.
Fig. 5 shows the structural representation of the data analysis set-up that one embodiment of the invention provides, as shown in Figure 5, the data analysis set-up in the present embodiment comprises: unit 51, subject acquiring unit 52, matching unit 53 and result tag tree generation unit 54 are set up in Product labelling storehouse;
Wherein, unit 51 is set up for setting up Product labelling storehouse according to the batch text content of input in Product labelling storehouse;
The subject of subject acquiring unit 52 for modifying according to described batch text content obtaining public praise word, described public praise word carries out word segmentation processing to described batch text content, and to be screened the word reaching the default frequency after word segmentation processing by the dictionary that prestores and obtain;
The label that matching unit 53 is set up in the Product labelling storehouse that unit sets up for the subject that obtained by described subject acquiring unit 52 and described Product labelling storehouse mates;
Result tag tree generation unit 54, for according to the label matched with described subject in described matching unit 53, generates the result tag tree of common problem in the described batch text content of reflection.
For example, aforesaid Product labelling storehouse set up unit 51 for
Batch text content according to input sets up dynamic labels storehouse;
The product category corresponding according to described batch text content sets up special tags storehouse;
Described dynamic labels storehouse, special tags storehouse and default universal tag storehouse are generated described Product labelling storehouse.
In the optional application scenarios of one, aforesaid Product labelling storehouse set up unit 51 for
Obtain the noun in described batch text content; Such as, according to self-defined dictionary, word segmentation processing is carried out to described batch text content, obtain the noun of described batch text content.
Judge whether the frequency number of times that described noun occurs is greater than predetermined threshold value;
If the frequency number of times that described noun occurs is greater than predetermined threshold value, determines the label in described noun and described special tags storehouse, whether label in universal tag storehouse repeat;
If the label in described noun and described special tags storehouse, the label in universal tag storehouse do not repeat, described noun is generated dynamic labels storehouse as label.
In the optional application scenarios of the second, aforesaid Product labelling storehouse is set up unit 51 and also be can be used for
The product category corresponding according to described batch text content obtains the customized label belonging to described product;
Search the synonym of described customized label, near synonym;
Described customized label, the synonym of described customized label, near synonym are generated the special tags storehouse of described batch text content.
In the third optional application scenarios, described subject acquiring unit 52 for
The subject of public praise word modification and/or implicit subject is obtained in described batch text content;
Described matching unit 53 for
The label that the subject obtain described subject acquiring unit and/or implicit subject are set up with described Product labelling storehouse in the Product labelling storehouse that unit sets up respectively mates.
In the 4th kind of optional application scenarios, described device also can comprise the expansion public praise word acquiring unit 55 shown in Fig. 6 A:
Expansion public praise word acquiring unit 55 sets up the expansion public praise word of the label in the Product labelling storehouse of unit 51 foundation for obtaining described Product labelling storehouse;
Described matching unit 53 also for, the expansion public praise word obtain described expansion public praise word acquiring unit 55 and the label corresponding with described expansion public praise word mate in described batch text content;
Result tag tree generation unit 54 for
According to the label matched with described subject, and the matching result of described expansion public praise word and label corresponding to described expansion public praise word, generate the result tag tree of common problem in the described batch text content of reflection.
In the 5th kind of optional application scenarios, described device also can comprise multi-layer tag tree and set up unit 56, as shown in Figure 6B:
Multi-layer tag tree sets up unit 56 for the membership in the Product labelling storehouse setting up unit 51 according to described Product labelling storehouse and set up between each label, sets up multi-layer tag tree;
Described matching unit 53 for
The bottom label set up in the multi-layer tag tree that unit sets up for the subject that described subject acquiring unit obtained and described multi-layer tag tree mates.
In the 6th kind of optional application scenarios, described result tag tree generation unit 54 for
If described subject and described bottom label match, then record is carried out in the position belonging to described bottom label;
The record result of described bottom label is oppositely pulled the position belonging to the upper label in described multi-layer tag tree, obtain the result tag tree reflecting common problem in described batch text.
That is, result tag tree generation unit 54 for the matching result according to described matching unit 53, described subject and described tag match correct time, matching result to be recorded in described multi-layer tag tree the position of corresponding bottom label, and
By the position of described bottom label the match is successful result oppositely the pulls upper label in described multi-layer tag tree, obtain the result tag tree reflecting common problem in described batch text.
In the 7th kind of optional application scenarios, described device also comprises not shown subject similarity acquiring unit 57 and subject processing unit 58:
Subject similarity acquiring unit 57, for the matching result according to described matching unit 53, when described subject and described label do not mate, obtains similarity and the importance degree of described subject according to described semantic similarity and importance degree computation rule;
Subject processing unit 58, similarity for the described subject obtained at described subject similarity acquiring unit 57 is more than or equal to the first preset value, and/or the importance degree of described subject is when being more than or equal to the second preset value, described subject is added dynamic labels storehouse as label.
Above-mentioned data analysis set-up can perform the technical scheme of the arbitrary shown embodiment of the method for aforesaid Fig. 1 to Fig. 3, and it realizes principle and technique effect is similar, repeats no more herein.
Data analysis set-up in above-described embodiment can embody the intellectuality of data processing: according to the feeling polarities of data characteristics automatic decision data, and the concentrated dimension that automatic clustering favorable comment, difference are commented; High-level efficiency: after a configuration and customization, all flow processs all can complete in robotization, significantly reduce manpower consumption; Systematicness: solve in data classification, different executor's subjective criterion difference and the incomplete problem of framework; Instantaneity: the latest tendency of sharp feedback product, supports that real-time results are shown.
One of ordinary skill in the art will appreciate that: all or part of step realizing above-mentioned each embodiment of the method can have been come by the hardware that programmed instruction is relevant.Aforesaid program can be stored in a computer read/write memory medium.This program, when performing, performs the step comprising above-mentioned each embodiment of the method; And aforesaid storage medium comprises: ROM, RAM, magnetic disc or CD etc. various can be program code stored medium.
Last it is noted that above each embodiment is only in order to illustrate technical scheme of the present invention, be not intended to limit; Although with reference to foregoing embodiments to invention has been detailed description, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein some or all of technical characteristic; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the scope of various embodiments of the present invention technical scheme.

Claims (20)

1. a data analysing method, is characterized in that, comprising:
Content of text according to input sets up Product labelling storehouse;
Obtain according to described content of text the subject that public praise word modifies, described public praise word carries out word segmentation processing to described content of text, and to be screened the word reaching the default frequency after word segmentation processing by the dictionary that prestores and obtain;
Described subject is mated with the label in described Product labelling storehouse;
According to the label matched with described subject, generate the result tag tree of common problem in the described content of text of reflection.
2. method according to claim 1, is characterized in that, the described content of text according to input sets up Product labelling storehouse, comprising:
Content of text according to input sets up dynamic labels storehouse;
The product category corresponding according to described content of text sets up special tags storehouse;
Described dynamic labels storehouse, special tags storehouse and default universal tag storehouse are generated described Product labelling storehouse.
3. method according to claim 2, is characterized in that, the described content of text according to input sets up dynamic labels storehouse, comprising:
Obtain the noun in described content of text;
Judge whether the frequency number of times that described noun occurs is greater than predetermined threshold value;
If the frequency number of times that described noun occurs is greater than predetermined threshold value, determines the label in described noun and described special tags storehouse, whether label in universal tag storehouse repeat;
If the label in described noun and described special tags storehouse, the label in universal tag storehouse do not repeat, described noun is generated described dynamic labels storehouse as label.
4. method according to claim 2, is characterized in that, the product category corresponding according to described content of text sets up special tags storehouse;
The product category corresponding according to described content of text obtains the customized label belonging to described product;
Search the synonym of described customized label, near synonym;
Described customized label, the synonym of described customized label, near synonym are generated the described special tags storehouse of described content of text.
5. method according to claim 3, is characterized in that, the noun in the described content of text of described acquisition, comprising:
According to self-defined dictionary, word segmentation processing is carried out to described content of text, obtain the noun of described content of text.
6. the method according to any one of claim 1 to 5, is characterized in that, obtains the subject of public praise word modification, comprising according to described content of text:
The subject of public praise word modification and/or implicit subject is obtained in described content of text;
Described subject is mated with the label in described Product labelling storehouse, comprising:
Described subject and/or implicit subject are mated with the label in described Product labelling storehouse respectively.
7. the method according to any one of claim 1 to 5, is characterized in that, the label that described basis matches with described subject, and before generating the step of the result tag tree of common problem in the described content of text of reflection, described method also comprises:
Obtain the expansion public praise word of label in described Product labelling storehouse;
Described expansion public praise word and the label corresponding with described expansion public praise word are mated in described content of text;
According to the label matched with described subject, generate the result tag tree of common problem in the described content of text of reflection, comprising:
According to the label matched with described subject, and the matching result of described expansion public praise word and label corresponding to described expansion public praise word, generate the result tag tree of common problem in the described content of text of reflection.
8. the method according to any one of claim 1 to 5, is characterized in that, the described file content according to input also comprises after setting up the step in Product labelling storehouse:
According to the membership in described Product labelling storehouse between each label, set up multi-layer tag tree;
Described subject is mated with the label in described Product labelling storehouse, comprising:
Described subject is mated with the bottom label in described multi-layer tag tree.
9. method according to claim 8, is characterized in that, the label that described basis matches with described subject, generates the result tag tree of common problem in the described content of text of reflection, comprising:
If described subject and described bottom label match, then record is carried out in the position belonging to described bottom label;
The record result of described bottom label is oppositely pulled the position belonging to the upper label in described multi-layer tag tree, obtain the result tag tree reflecting common problem in described text.
10. method according to claim 2, is characterized in that, described method also comprises:
If described subject and described label do not mate, obtain similarity and the importance degree of described subject according to described semantic similarity and importance degree computation rule;
If the similarity of described subject is more than or equal to the first preset value, and/or the importance degree of described subject is more than or equal to the second preset value, and described subject is added described dynamic labels storehouse as label.
11. 1 kinds of data analysis set-ups, is characterized in that, comprising:
Unit is set up in Product labelling storehouse, for setting up Product labelling storehouse according to the content of text of input;
Subject acquiring unit, for obtaining the subject that public praise word is modified according to described content of text, described public praise word carries out word segmentation processing to described content of text, and to be screened the word reaching the default frequency after word segmentation processing by the dictionary that prestores and obtain;
Matching unit, the label set up in the Product labelling storehouse that unit sets up for the subject that described subject acquiring unit obtained and described Product labelling storehouse mates;
Result tag tree generation unit, for according to the label matched with described subject in described matching unit, generates the result tag tree of common problem in the described content of text of reflection.
12. devices according to claim 11, is characterized in that, unit is set up in described Product labelling storehouse, for
Content of text according to input sets up dynamic labels storehouse;
The product category corresponding according to described content of text sets up special tags storehouse;
Described dynamic labels storehouse, special tags storehouse and default universal tag storehouse are generated described Product labelling storehouse.
13. devices according to claim 12, is characterized in that, unit is set up in described Product labelling storehouse, for
Obtain the noun in described content of text;
Judge whether the frequency number of times that described noun occurs is greater than predetermined threshold value;
If the frequency number of times that described noun occurs is greater than predetermined threshold value, determines the label in described noun and described special tags storehouse, whether label in universal tag storehouse repeat;
If the label in described noun and described special tags storehouse, the label in universal tag storehouse do not repeat, described noun is generated described dynamic labels storehouse as label.
14. devices according to claim 12, is characterized in that, unit is set up in described Product labelling storehouse, for
The product category corresponding according to described content of text obtains the customized label belonging to described product;
Search the synonym of described customized label, near synonym;
Described customized label, the synonym of described customized label, near synonym are generated the described special tags storehouse of described content of text.
15. devices according to claim 13, is characterized in that, unit is set up in described Product labelling storehouse, for
According to self-defined dictionary, word segmentation processing is carried out to described content of text, obtain the noun of described content of text.
16., according to claim 11 to the device described in 15 any one, is characterized in that, described subject acquiring unit, for
The subject of public praise word modification and/or implicit subject is obtained in described content of text;
Described matching unit, for
The label that the subject obtain described subject acquiring unit and/or implicit subject are set up with described Product labelling storehouse in the Product labelling storehouse that unit sets up respectively mates.
17., according to claim 11 to the device described in 15 any one, is characterized in that, described device also comprises:
Expansion public praise word acquiring unit, sets up the expansion public praise word of the label in the Product labelling storehouse of unit foundation for obtaining described Product labelling storehouse;
Described matching unit, also for the expansion public praise word of described expansion public praise word acquiring unit acquisition and the label corresponding with described expansion public praise word being mated in described content of text;
Result tag tree generation unit, for
According to the label matched with described subject, and the matching result of described expansion public praise word and label corresponding to described expansion public praise word, generate the result tag tree of common problem in the described content of text of reflection.
18., according to claim 11 to the device described in 15 any one, is characterized in that, described device also comprises:
Multi-layer tag tree sets up unit, for setting up the membership in Product labelling storehouse that unit sets up between each label according to described Product labelling storehouse, sets up multi-layer tag tree;
Described matching unit, the bottom label set up in the multi-layer tag tree that unit sets up for the subject that described subject acquiring unit obtained and described multi-layer tag tree mates.
19. devices according to claim 18, is characterized in that, described result tag tree generation unit, for
If described subject and described bottom label match, then record is carried out in the position belonging to described bottom label;
The record result of described bottom label is oppositely pulled the position belonging to the upper label in described multi-layer tag tree, obtain the result tag tree reflecting common problem in described text.
20. devices according to claim 12, is characterized in that, described device also comprises:
Subject similarity acquiring unit, for the matching result according to described matching unit, when described subject and described label do not mate, obtains similarity and the importance degree of described subject according to described semantic similarity and importance degree computation rule;
Subject processing unit, similarity for the described subject obtained at described subject similarity acquiring unit is more than or equal to the first preset value, and/or the importance degree of described subject is when being more than or equal to the second preset value, described subject is added described dynamic labels storehouse as label.
CN201410204300.1A 2014-05-14 2014-05-14 Data analysis method and data analysis device Active CN105095288B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410204300.1A CN105095288B (en) 2014-05-14 2014-05-14 Data analysis method and data analysis device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410204300.1A CN105095288B (en) 2014-05-14 2014-05-14 Data analysis method and data analysis device

Publications (2)

Publication Number Publication Date
CN105095288A true CN105095288A (en) 2015-11-25
CN105095288B CN105095288B (en) 2020-02-07

Family

ID=54575741

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410204300.1A Active CN105095288B (en) 2014-05-14 2014-05-14 Data analysis method and data analysis device

Country Status (1)

Country Link
CN (1) CN105095288B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105824898A (en) * 2016-03-14 2016-08-03 苏州大学 Label extracting method and device for network comments
CN106021433A (en) * 2016-05-16 2016-10-12 北京百分点信息科技有限公司 Public praise analysis method and apparatus for product review data
CN106156041A (en) * 2015-03-26 2016-11-23 科大讯飞股份有限公司 Hot information finds method and system
CN106250420A (en) * 2016-07-21 2016-12-21 深圳市辣妈帮科技有限公司 Label correlating method and device
CN107153641A (en) * 2017-05-08 2017-09-12 北京百度网讯科技有限公司 Comment information determines method, device, server and storage medium
CN107391480A (en) * 2017-06-23 2017-11-24 广州市万隆证券咨询顾问有限公司 A kind of stock invester's personality characters analysis method and system based on stock invester's market sentiment
CN107861944A (en) * 2017-10-24 2018-03-30 广东亿迅科技有限公司 A kind of text label extracting method and device based on Word2Vec
CN107918778A (en) * 2016-10-11 2018-04-17 阿里巴巴集团控股有限公司 A kind of information matching method and relevant apparatus
CN107918667A (en) * 2017-11-28 2018-04-17 杭州有赞科技有限公司 Generation method, system and the device of text label word
CN108009715A (en) * 2017-11-28 2018-05-08 邢加和 It is a kind of automatically analyze index fluctuation root because method
CN108153856A (en) * 2017-12-22 2018-06-12 北京百度网讯科技有限公司 For the method and apparatus of output information
CN108510285A (en) * 2017-05-17 2018-09-07 苏州纯青智能科技有限公司 A kind of evaluation method based on trade order
CN109145301A (en) * 2018-08-29 2019-01-04 上海汽车集团股份有限公司 Information classification approach and device, computer readable storage medium
CN109325148A (en) * 2018-08-03 2019-02-12 百度在线网络技术(北京)有限公司 The method and apparatus for generating information
CN113505192A (en) * 2021-05-25 2021-10-15 平安银行股份有限公司 Data tag library construction method and device, electronic equipment and computer storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216842A (en) * 2008-01-07 2008-07-09 华为技术有限公司 Method for obtaining page key words and page information processing apparatus
CN101968788A (en) * 2009-07-27 2011-02-09 富士通株式会社 Method and device for extracting product attribute information
CN102982076A (en) * 2012-10-30 2013-03-20 新华通讯社 Multi-dimensionality content labeling method based on semanteme label database
CN103399916A (en) * 2013-07-31 2013-11-20 清华大学 Internet comment and opinion mining method and system on basis of product features
CN103679462A (en) * 2012-08-31 2014-03-26 阿里巴巴集团控股有限公司 Comment data processing method and device and searching method and system
CN103761264A (en) * 2013-12-31 2014-04-30 浙江大学 Concept hierarchy establishing method based on product review document set

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216842A (en) * 2008-01-07 2008-07-09 华为技术有限公司 Method for obtaining page key words and page information processing apparatus
CN101968788A (en) * 2009-07-27 2011-02-09 富士通株式会社 Method and device for extracting product attribute information
CN103679462A (en) * 2012-08-31 2014-03-26 阿里巴巴集团控股有限公司 Comment data processing method and device and searching method and system
CN102982076A (en) * 2012-10-30 2013-03-20 新华通讯社 Multi-dimensionality content labeling method based on semanteme label database
CN103399916A (en) * 2013-07-31 2013-11-20 清华大学 Internet comment and opinion mining method and system on basis of product features
CN103761264A (en) * 2013-12-31 2014-04-30 浙江大学 Concept hierarchy establishing method based on product review document set

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106156041A (en) * 2015-03-26 2016-11-23 科大讯飞股份有限公司 Hot information finds method and system
CN106156041B (en) * 2015-03-26 2019-05-28 科大讯飞股份有限公司 Hot information finds method and system
CN105824898A (en) * 2016-03-14 2016-08-03 苏州大学 Label extracting method and device for network comments
CN106021433A (en) * 2016-05-16 2016-10-12 北京百分点信息科技有限公司 Public praise analysis method and apparatus for product review data
CN106021433B (en) * 2016-05-16 2019-05-10 北京百分点信息科技有限公司 A kind of the public praise analysis method and device of comment on commodity data
CN106250420A (en) * 2016-07-21 2016-12-21 深圳市辣妈帮科技有限公司 Label correlating method and device
CN107918778A (en) * 2016-10-11 2018-04-17 阿里巴巴集团控股有限公司 A kind of information matching method and relevant apparatus
CN107153641A (en) * 2017-05-08 2017-09-12 北京百度网讯科技有限公司 Comment information determines method, device, server and storage medium
CN108510285A (en) * 2017-05-17 2018-09-07 苏州纯青智能科技有限公司 A kind of evaluation method based on trade order
CN107391480A (en) * 2017-06-23 2017-11-24 广州市万隆证券咨询顾问有限公司 A kind of stock invester's personality characters analysis method and system based on stock invester's market sentiment
CN107861944A (en) * 2017-10-24 2018-03-30 广东亿迅科技有限公司 A kind of text label extracting method and device based on Word2Vec
CN108009715A (en) * 2017-11-28 2018-05-08 邢加和 It is a kind of automatically analyze index fluctuation root because method
CN107918667A (en) * 2017-11-28 2018-04-17 杭州有赞科技有限公司 Generation method, system and the device of text label word
CN107918667B (en) * 2017-11-28 2020-09-04 杭州有赞科技有限公司 Method, system and device for generating text label words
CN108153856A (en) * 2017-12-22 2018-06-12 北京百度网讯科技有限公司 For the method and apparatus of output information
CN108153856B (en) * 2017-12-22 2022-09-06 北京百度网讯科技有限公司 Method and apparatus for outputting information
CN109325148A (en) * 2018-08-03 2019-02-12 百度在线网络技术(北京)有限公司 The method and apparatus for generating information
CN109145301A (en) * 2018-08-29 2019-01-04 上海汽车集团股份有限公司 Information classification approach and device, computer readable storage medium
CN109145301B (en) * 2018-08-29 2023-01-24 上海汽车集团股份有限公司 Information classification method and device and computer readable storage medium
CN113505192A (en) * 2021-05-25 2021-10-15 平安银行股份有限公司 Data tag library construction method and device, electronic equipment and computer storage medium

Also Published As

Publication number Publication date
CN105095288B (en) 2020-02-07

Similar Documents

Publication Publication Date Title
CN105095288A (en) Data analysis method and data analysis device
CN106682192B (en) Method and device for training answer intention classification model based on search keywords
Boia et al. A:) is worth a thousand words: How people attach sentiment to emoticons and words in tweets
Gu et al. " what parts of your apps are loved by users?"(T)
Rizzo et al. NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud.
CN101292238B (en) Method and system for automated rich presentation of a semantic topic
CN104503958B (en) The generation method and device of documentation summary
CN104462363B (en) Comment point shows method and apparatus
CN103678564A (en) Internet product research system based on data mining
Halibas et al. Application of text classification and clustering of Twitter data for business analytics
CN111831802B (en) Urban domain knowledge detection system and method based on LDA topic model
CN104978332B (en) User-generated content label data generation method, device and correlation technique and device
Chawla et al. Product opinion mining using sentiment analysis on smartphone reviews
CN103425640A (en) Multimedia questioning-answering system and method
CN103390051A (en) Topic detection and tracking method based on microblog data
CN105005564A (en) Data processing method and apparatus based on question-and-answer platform
CN110738033B (en) Report template generation method, device and storage medium
CA3166094A1 (en) Commodity short title generation method and apparatus
US20160299891A1 (en) Matching of an input document to documents in a document collection
CN103092943A (en) Method of advertisement dispatch and advertisement dispatch server
Oramas et al. ELMD: An automatically generated entity linking gold standard dataset in the music domain
Gasparetti et al. Exploiting web browsing activities for user needs identification
CN110929007A (en) Electric power marketing knowledge system platform and application method
Menezes et al. Building a massive corpus for named entity recognition using free open data sources
CN105511869A (en) Demand tracking system and method based on user feedback

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20231227

Address after: 518000 Tencent Building, No. 1 High-tech Zone, Nanshan District, Shenzhen City, Guangdong Province, 35 Floors

Patentee after: TENCENT TECHNOLOGY (SHENZHEN) Co.,Ltd.

Patentee after: TENCENT CLOUD COMPUTING (BEIJING) Co.,Ltd.

Address before: 2, 518044, East 403 room, SEG science and Technology Park, Zhenxing Road, Shenzhen, Guangdong, Futian District

Patentee before: TENCENT TECHNOLOGY (SHENZHEN) Co.,Ltd.