CN105447036A - Opinion mining-based social media information credibility evaluation method and apparatus - Google Patents

Opinion mining-based social media information credibility evaluation method and apparatus Download PDF

Info

Publication number
CN105447036A
CN105447036A CN201410436605.5A CN201410436605A CN105447036A CN 105447036 A CN105447036 A CN 105447036A CN 201410436605 A CN201410436605 A CN 201410436605A CN 105447036 A CN105447036 A CN 105447036A
Authority
CN
China
Prior art keywords
information
assessed
social media
bar
media information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410436605.5A
Other languages
Chinese (zh)
Other versions
CN105447036B (en
Inventor
尚利峰
李斌阳
黄锦辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201410436605.5A priority Critical patent/CN105447036B/en
Publication of CN105447036A publication Critical patent/CN105447036A/en
Application granted granted Critical
Publication of CN105447036B publication Critical patent/CN105447036B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

Embodiments of the present invention disclose an opinion mining-based social media information credibility evaluation method and apparatus. The method comprises: acquiring to-be-evaluated information; calculating an uncertainty score of each piece of the to-be-evaluated information; calculating credibility of a publisher of each piece of the to-be-evaluated information; counting a proportion of supporting opinions in comments of each piece of the to-be-evaluated information; and inputting the uncertainty score of each piece of the to-be-evaluated information, the credibility of the publisher of each piece of the to-be-evaluated information and the proportion of the supporting opinions in the comments of each piece of the to-be-evaluated information into a pre-trained quantitative evaluation model to perform calculation, so that the quantitative evaluation model outputs a credibility order of each piece of the to-be-evaluated information. According to the opinion mining-based social media information credibility evaluation method and apparatus provided by the embodiments of the present invention, the credibility of social media information can be accurately evaluated.

Description

A kind of social media information credibility appraisal procedure based on opining mining and device
Technical field
The present invention relates to communication technical field, be specifically related to a kind of social media information credibility appraisal procedure based on opining mining and device.
Background technology
Along with the development of second-generation internet WEB2.0 technology and universal, various types of social media (as: microblogging, micro-letter, Twitter etc.) continue to bring out and profoundly change that people issue, obtain, exchange, the mode of expressing information or viewpoint.Particularly along with the maturation of ng mobile communication and widely using of Intelligent mobile equipment, social media has become the indispensable sharing information and express the platform of viewpoint of being used in people's daily life.But because the content on this platform is primarily of large quantities of netizen spontaneous creation, extraction, so false, unreliable information extensively exists.The confidence level how automatically assessing social media information produces directly impact by the effect of the application systems such as follow-up information recommendation, market survey, automatic question answering.
The information credibility analysis that prior art provides mainly for the data of a certain specific area, particular type, such as biomedicine experiment report, newswire, wikipedia etc.For the reliability assessment of biomedicine experiment report, because such data have fixing structure and pattern, so different features can be extracted easily, particularly there are many sections of associated laboratory reports, so the mutual checking between can reporting by experiment is to identify laboratory report with a low credibility for some problems.And the confidence level of wikipedia information is mainly undertaken characterizing by the amendment record of information.
Namely early stage information credibility analysis tool is mainly in order to the data of particular structured design, do not consider data structure feature and the language performance custom of social media information itself, text message particularly in social media is a kind of non-structured data, what the process of such data was serious depends on natural language processing technique: such as semantic analysis and sentiment analysis etc., so this kind of early stage technology be not suitable for the reliability assessment of social media information.Therefore, be necessary to provide the confidence level of a kind of new method to social media information to assess.
Summary of the invention
In view of this, the invention provides a kind of social media information credibility appraisal procedure based on opining mining and device, accurate evaluation can be carried out to the confidence level of social media information.
First aspect, the social media information credibility appraisal procedure based on opining mining that the embodiment of the present invention provides, comprising:
Obtain information to be assessed;
Calculate the uncertain score of every bar information to be assessed;
Calculate the confidence level of the publisher of every bar information to be assessed;
Add up in the comment of every bar information to be assessed the ratio supported shared by suggestion;
By the uncertain score of information to be assessed for every bar, support to calculate in the quantitative appraisement model that the input of the ratio shared by suggestion training in advance is good in the confidence level of the publisher of every bar information to be assessed and the comment of every bar information to be assessed, the output of described quantitative appraisement model is the reliability order of every bar information to be assessed.
In conjunction with first aspect, in the first embodiment of first aspect, before acquisition information to be assessed, described method also comprises:
Build the theme dictionary relevant to current subject under discussion;
Each descriptor in described theme dictionary and each emotion word in emotion dictionary are combined to form viewpoint word pair;
Obtain the social media information relevant to current subject under discussion;
According to the similarity of each viewpoint word pair and every bar social media information and each viewpoint word pair viewpoint value with Similarity Measure every bar social media information of the comment of every bar social media information;
Filter the social media information that viewpoint value is less than predetermined threshold value, using remaining social media information as described information to be assessed.
In conjunction with in the first embodiment of first aspect, in the second embodiment of first aspect, the theme dictionary that described structure is relevant to current subject under discussion specifically comprises:
The social media information that search is relevant to current subject under discussion in social networks;
Extract the keyword in described social media information and add up each keyword occur frequency;
The keyword choosing predetermined number according to frequency order from high to low builds described theme dictionary as descriptor.
In conjunction with in the first embodiment of first aspect, in the third embodiment of first aspect, the described similarity according to each viewpoint word pair and every bar social media information and each viewpoint word pair specifically comprise with the viewpoint value of Similarity Measure every bar social media information of the comment of every bar social media information:
Calculate the similarity of each keyword in the descriptor of a viewpoint word centering and a social media information, extract similarity maximal value a; Calculate the similarity of each keyword in the descriptor of described viewpoint word centering and the comment of described social media information, extract similarity maximal value x;
Calculate the similarity of each emotion word in the emotion word of described viewpoint word centering and described social media information, extract similarity maximal value b; Calculate the similarity of each emotion word in the emotion word of described viewpoint word centering and the comment of described social media information, extract similarity maximal value y;
Described viewpoint word pair is s1=λ a+ (1-λ) b with the similarity of described social media information, λ is greater than 0 and is less than 1, described viewpoint word pair and the similarity of the comment of described social media information are that s2=μ x+ (1-μ) y, μ are greater than 0 and are less than 1;
The similarity of described viewpoint word pair and described social media information and described viewpoint word pair are added with the similarity of the comment of described social media information the viewpoint subvalue obtaining described social media information;
Each viewpoint word being obtained all viewpoint subvalues of described social media information to doing process equally, all viewpoint subvalues being added up and obtains the viewpoint value of described social media information, by that analogy, obtaining the viewpoint value of each social media information.
In conjunction with first aspect, or the first embodiment of first aspect, or the second embodiment of first aspect, or the third embodiment of first aspect, in the 4th kind of embodiment of first aspect, the uncertain score of the every bar of described calculating information to be assessed comprises:
Determine the classification of the uncertain content comprised in every bar information to be assessed;
Calculate the category score of the uncertain content of every class comprised in every bar information to be assessed;
The cumulative uncertain score obtaining every bar information to be assessed after the category score of the uncertain content of every class comprised in information to be assessed for every bar is multiplied by default weight.
In conjunction with first aspect, or the first embodiment of first aspect, or the second embodiment of first aspect, or the third embodiment of first aspect, in the 5th kind of embodiment of first aspect, in the uncertain score by information to be assessed for every bar, support to carry out in the process calculated in the quantitative appraisement model that the input of the ratio shared by suggestion training in advance is good in the confidence level of the publisher of every bar information to be assessed and the comment of every bar information to be assessed, the uncertain score of described information to be assessed is higher, and the confidence level of described information to be assessed is lower; The confidence level of the publisher of described information to be assessed is lower, and the confidence level of described information to be assessed is lower; Support that the ratio shared by suggestion is less in the comment of described information to be assessed, and/or along with the change of time, support that the ratio shared by suggestion is more and more less in the comment of described information to be assessed, the confidence level of described information to be assessed is lower.
Second aspect, the social media information credibility apparatus for evaluating based on opining mining that the embodiment of the present invention provides, comprising:
First acquiring unit, for obtaining information to be assessed;
First computing unit, for calculating the uncertain score of every bar information to be assessed;
Second computing unit, calculates the confidence level of the publisher of every bar information to be assessed;
Statistic unit, for add up every bar information to be assessed comment in support shared by suggestion ratio;
Reliability assessment unit, for the uncertain score by information to be assessed for every bar, support to calculate in the quantitative appraisement model that the input of the ratio shared by suggestion training in advance is good in the confidence level of the publisher of every bar information to be assessed and the comment of every bar information to be assessed, the output of described quantitative appraisement model is the reliability order of every bar information to be assessed.
In conjunction with second aspect, in the first embodiment of second aspect, described device also comprises:
Dictionary construction unit, for building the theme dictionary relevant to current subject under discussion;
Word to forming unit, for each descriptor in described theme dictionary and each emotion word in emotion dictionary are combined to form viewpoint word pair;
Second acquisition unit, for obtaining the social media information relevant to current subject under discussion;
3rd computing unit, for according to the similarity of each viewpoint word pair and every bar social media information and each viewpoint word pair viewpoint value with Similarity Measure every bar social media information of the comment of every bar social media information;
Information filtering unit, is less than the social media information of predetermined threshold value for filtering viewpoint value, using remaining social media information as described information to be assessed.
In conjunction with the first embodiment of second aspect, in the second embodiment of second aspect, described dictionary construction unit specifically comprises:
Search subelement, for the social media information that search in social networks is relevant to current subject under discussion;
Statistics subelement, adds up for the keyword that extracts in described social media information the frequency that each keyword occurs;
Dictionary builds subelement, builds described theme dictionary for the keyword choosing predetermined number according to frequency order from high to low as descriptor.
In conjunction with the first embodiment of second aspect, in the third embodiment of second aspect, described 3rd computing unit specifically for:
Calculate the similarity of each keyword in the descriptor of a viewpoint word centering and a social media information, extract similarity maximal value a; Calculate the similarity of each keyword in the descriptor of described viewpoint word centering and the comment of described social media information, extract similarity maximal value x;
Calculate the similarity of each emotion word in the emotion word of described viewpoint word centering and described social media information, extract similarity maximal value b; Calculate the similarity of each emotion word in the emotion word of described viewpoint word centering and the comment of described social media information, extract similarity maximal value y;
Described viewpoint word pair is s1=λ a+ (1-λ) b with the similarity of described social media information, λ is greater than 0 and is less than 1, described viewpoint word pair and the similarity of the comment of described social media information are that s2=μ x+ (1-μ) y, μ are greater than 0 and are less than 1;
The similarity of described viewpoint word pair and described social media information and described viewpoint word pair are added with the similarity of the comment of described social media information the viewpoint subvalue obtaining described social media information;
Each viewpoint word being obtained all viewpoint subvalues of described social media information to doing process equally, all viewpoint subvalues being added up and obtains the viewpoint value of described social media information, by that analogy, obtaining the viewpoint value of each social media information.
In conjunction with second aspect, or the first embodiment of second aspect, or the second embodiment of second aspect, or the third embodiment of second aspect, in the 4th kind of embodiment of second aspect, described first computing unit specifically for:
Determine the classification of the uncertain content comprised in every bar information to be assessed;
Calculate the category score of the uncertain content of every class comprised in every bar information to be assessed;
The cumulative uncertain score obtaining every bar information to be assessed after the category score of the uncertain content of every class comprised in information to be assessed for every bar is multiplied by default weight.
In conjunction with second aspect, or the first embodiment of second aspect, or the second embodiment of second aspect, or the third embodiment of second aspect, in the 5th kind of embodiment of second aspect, described reliability assessment unit is in the uncertain score by information to be assessed for every bar, support to carry out in the process calculated in the quantitative appraisement model that the input of the ratio shared by suggestion training in advance is good in the confidence level of the publisher of every bar information to be assessed and the comment of every bar information to be assessed, the uncertain score of described information to be assessed is higher, the confidence level of described information to be assessed is lower, the confidence level of the publisher of described information to be assessed is lower, and the confidence level of described information to be assessed is lower, support that the ratio shared by suggestion is less in the comment of described information to be assessed, and/or along with the change of time, support that the ratio shared by suggestion is more and more less in the comment of described information to be assessed, the confidence level of described information to be assessed is lower.
In the embodiment of the present invention, by the uncertain score of information to be assessed for every bar, support to calculate in the quantitative appraisement model that the input of the ratio shared by suggestion training in advance is good in the confidence level of the publisher of every bar information to be assessed and the comment of every bar information to be assessed, by these three classes data, every bar information to be assessed is assessed, add the accuracy of assessment.
Accompanying drawing explanation
In order to be illustrated more clearly in the technical scheme of the embodiment of the present invention, below the accompanying drawing used required in describing the embodiment of the present invention is briefly described, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those skilled in the art, other accompanying drawing can also be obtained as these accompanying drawings.
Fig. 1 is social media information credibility appraisal procedure embodiment schematic diagram based on opining mining provided by the invention;
Fig. 2 is another embodiment schematic diagram of the social media information credibility appraisal procedure based on opining mining provided by the invention;
Fig. 3 is an embodiment schematic diagram of the social media information credibility apparatus for evaluating based on opining mining provided by the invention;
Fig. 4 is another embodiment schematic diagram of the social media information credibility apparatus for evaluating based on opining mining provided by the invention;
Fig. 5 is another embodiment schematic diagram of the social media information credibility apparatus for evaluating based on opining mining provided by the invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, the every other embodiment that those skilled in the art obtain, all belongs to the scope of protection of the invention.
As shown in Figure 1, the embodiment of the present invention provides a kind of social media information credibility appraisal procedure based on opining mining, and described method comprises:
101, information to be assessed is obtained;
Information to be assessed is the information extracted from the social media information social networks, and these information are all relevant to current subject under discussion, and social networks can be microblogging, micro-letter, Twitter etc.
102, the uncertain score of every bar information to be assessed is calculated;
This step is mainly used in the uncertainty degree judging whether to contain uncertain content and this information in every bar information to be assessed.
103, the confidence level of the publisher of every bar information to be assessed is calculated;
The calculating of the confidence level of information publisher is mainly based on the various features of information publisher on social networks, such as: deliver the number of microblogging, whether be authenticated, user gradations etc. carry out reliability assessment to user, and concrete appraisal procedure can refer to existing method, repeats no more herein.
104, the ratio supported shared by suggestion is added up in the comment of every bar information to be assessed;
105, by the uncertain score of information to be assessed for every bar, support to calculate in the quantitative appraisement model that the input of the ratio shared by suggestion training in advance is good in the confidence level of the publisher of every bar information to be assessed and the comment of every bar information to be assessed, the output of quantitative appraisement model is the reliability order of every bar information to be assessed.
It should be noted that, above-mentioned steps 102 to 104 is in specific implementation, and execution sequence does not have dividing of priority, can perform side by side.
In the present embodiment, by the uncertain score of information to be assessed for every bar, support to calculate in the quantitative appraisement model that the input of the ratio shared by suggestion training in advance is good in the confidence level of the publisher of every bar information to be assessed and the comment of every bar information to be assessed, by these three classes data, every bar information to be assessed is assessed, add the accuracy of assessment.
For ease of understanding, be described information credibility appraisal procedure of the present invention with a specific embodiment below, refer to Fig. 2, the method for the present embodiment comprises:
201, the theme dictionary relevant to current subject under discussion is built;
In the present embodiment, the theme dictionary relevant to current subject under discussion is built by Word-frequency, concrete grammar is as follows: the social media information that search is relevant to current subject under discussion in social networks, extract the keyword in social media information and add up the frequency that each keyword occurs, the keyword choosing predetermined number according to frequency order from high to low builds theme dictionary as descriptor.
In a concrete example, such as Huawei Company has issued p7 mobile phone, the social media information relevant to p7 mobile phone has been emerged very soon in social networks, namely the social media information relevant to current subject under discussion p7 mobile phone can be searched for, extract keyword in these information of searching such as: Huawei, screen, Hai Si, millet etc., add up the frequency that each keyword occurs, the keyword then choosing the higher predetermined number of the frequency of occurrences builds theme dictionary as descriptor.
In addition, in other examples, conventional potential topic model latenttopicmodel can also be used to build the theme dictionary relevant to current subject under discussion.
202, each descriptor in described theme dictionary and each emotion word in emotion dictionary are combined to form viewpoint word pair;
Emotion dictionary can adopt existing main flow sentiment dictionary, each viewpoint word forms by a descriptor and an emotion word, viewpoint word to such as < outward appearance, beautiful >, < Hai Si, proud >.
203, the social media information relevant to current subject under discussion is obtained;
In specific implementation, can using the keyword in current subject under discussion as inputting at the enterprising line search of social media and crawling.
204, according to the similarity of each viewpoint word pair and every bar social media information and each viewpoint word pair viewpoint value with Similarity Measure every bar social media information of the comment of every bar social media information;
The similarity of any two words A, B wherein A 0, B 0represent the term vector of word A, B respectively, || A 0|| represent A 0norm, || B 0|| represent B 0norm.
First calculate the similarity of each keyword in the descriptor of a viewpoint word centering and a social media information according to above-mentioned calculating formula of similarity, extract similarity maximal value a; Calculate the similarity of each keyword in the descriptor of this viewpoint word centering and the comment of this social media information simultaneously, extract similarity maximal value x;
Next the emotion word of this viewpoint word centering and the similarity of each emotion word in this social media information is calculated according to above-mentioned calculating formula of similarity, extraction similarity maximal value b; Calculate the similarity of each emotion word in the emotion word of this viewpoint word centering and the comment of this social media information simultaneously, extract similarity maximal value y;
The similarity of this viewpoint word pair and this social media information is s1=λ a+ (1-λ) b, λ is greater than 0, and to be less than 1, λ predeterminable, and the similarity of the comment of this viewpoint word pair and this social media information is s2=μ x+ (1-μ) y, μ is greater than 0, and to be less than 1, μ predeterminable.
The similarity of this viewpoint word pair and this social media information and this viewpoint word pair are added the viewpoint subvalue obtaining this social media information with the similarity of the comment of this social media information;
Each viewpoint word being obtained all viewpoint subvalues of this social media information to doing process equally, all viewpoint subvalues being added up and obtains the viewpoint value of this social media information, by that analogy, obtaining the viewpoint value of each social media information.
205, the social media information that viewpoint value is less than predetermined threshold value is filtered, using remaining social media information as described information to be assessed;
In the present embodiment, can think that viewpoint value is less than the social media information of predetermined threshold value not expressing some viewpoints subjectively, clearly, state something or describe certain product with such as just having no emotional color, this part social media information will be filtered; Can think that the social media information that viewpoint value is more than or equal to predetermined threshold value have expressed some viewpoints subjectively, clearly, this part social media information often becomes hot spot of public opinions, affect the cognition of people to event or product, therefore in the present embodiment using this part social media information as information to be assessed, the confidence level of main this part information of assessment.
206, the uncertain score of every bar information to be assessed is calculated;
In the present embodiment, first can train an information uncertainty assessment models, the uncertain content comprised in information to be classified, such as, the uncertain content comprised in information can be done following classification:
Type Clue word or phrase Example sentence
Problem type Really Does is p7's really that chip is thought in sea?
Hear type It is said It is said that p7 is on sale in Europe
Wish type Really think Really think just there is platform p7 now
Conviction type Believe I believes that I has platform p7 some day
Conditional If If salary raise, I can buy p7
Possible type Should I should be able to buy p7 mobile phone
In specific implementation, by clue word or the every bar of phrase on-line checkingi information to be assessed, to determine the classification of the uncertain content comprised in every bar information to be assessed, then the category score of the uncertain content of every class comprised in every bar information to be assessed is calculated, the cumulative uncertain score obtaining every bar information to be assessed after finally the category score of the uncertain content of every class comprised in information to be assessed for every bar being multiplied by default weight.
Such as, a given information to be assessed, it may belong to multiple classification simultaneously, such as belong to A, B, C tri-class simultaneously, a score is had according to each classification of model, it is larger that the higher expression of mark belongs to such other possibility, and the mark that the uncertainty such as calculating this information to be assessed assigns to these three classifications is respectively S a, S b, S c, so final the uncertain of this information to be assessed must be divided into H=W a* S a+ W b* S b* W c* S c, wherein W a, W b, W cfor weight coefficient, the value of three weight coefficients can be different, and such as can arrange a weight coefficient for each classification in advance as required, certain three weight coefficients also can get same value.
207, the confidence level of the publisher of every bar information to be assessed is calculated;
The calculating of the confidence level of information publisher is mainly based on the various features of information publisher on social networks, such as: deliver the number of microblogging, whether be authenticated, user gradations etc. carry out reliability assessment to user, and concrete appraisal procedure can refer to existing method, repeats no more herein.
208, the ratio supported shared by suggestion is added up in the comment of every bar information to be assessed;
209, by the uncertain score of information to be assessed for every bar, support to calculate in the quantitative appraisement model that the input of the ratio shared by suggestion training in advance is good in the confidence level of the publisher of every bar information to be assessed and the comment of every bar information to be assessed, the output of quantitative appraisement model is the reliability order of every bar information to be assessed.
Specifically in the process calculated, the uncertain score of information to be assessed is higher, and the confidence level of this information to be assessed is lower; The confidence level of the publisher of information to be assessed is lower, and the confidence level of this information to be assessed is lower; Support in the comment of information to be assessed that ratio shared by suggestion is less, and/or along with the change of time, support in the comment of information to be assessed that ratio shared by suggestion is more and more less, the confidence level of this information to be assessed is lower.
It should be noted that, above-mentioned steps 206 to 208 is in specific implementation, and execution sequence does not have dividing of priority, can perform side by side.
In the present embodiment, after obtaining the social media information relevant to current subject under discussion, theme dictionary constructed by utilization and emotion dictionary calculate the similarity of emotion word pair and social media information and review information thereof, thus extract subjectively, the social information that have expressed some viewpoints is clearly assessed, in the process of assessment, by the uncertain score of information to be assessed for every bar, support to calculate in the quantitative appraisement model that the input of the ratio shared by suggestion training in advance is good in the confidence level of the publisher of every bar information to be assessed and the comment of every bar information to be assessed, by these three classes data, every bar information to be assessed is assessed, add the accuracy of assessment.
In actual applications, focus or user that user in social networks pays close attention to the view of a certain event, user can be understood exactly to the demand of a certain product according to assessment result, thus can improve for user does some information recommendations or does some to product, to promote Consumer's Experience exactly.
Be described the information credibility apparatus for evaluating that the embodiment of the present invention provides below, refer to Fig. 3, the device 300 of the present embodiment comprises:
First acquiring unit 301, for obtaining information to be assessed;
First computing unit 302, for calculating the uncertain score of every bar information to be assessed;
Second computing unit 303, calculates the confidence level of the publisher of every bar information to be assessed;
Statistic unit 304, for add up every bar information to be assessed comment in support shared by suggestion ratio;
Reliability assessment unit 305, for the uncertain score by information to be assessed for every bar, support to calculate in the quantitative appraisement model that the input of the ratio shared by suggestion training in advance is good in the confidence level of the publisher of every bar information to be assessed and the comment of every bar information to be assessed, the output of quantitative appraisement model is the reliability order of every bar information to be assessed.
In the present embodiment, reliability assessment unit is by the uncertain score of information to be assessed for every bar, support that the ratio shared by suggestion calculates as the input of quantitative appraisement model in the confidence level of the publisher of every bar information to be assessed and the comment of every bar information to be assessed, by these three classes data, every bar information to be assessed is assessed, add the accuracy of assessment.
For ease of understanding, be described information credibility apparatus for evaluating of the present invention with a specific embodiment below, refer to Fig. 4, the device 400 of the present embodiment comprises:
Dictionary construction unit 401, for building the theme dictionary relevant to current subject under discussion;
Word to forming unit 402, for each descriptor in described theme dictionary and each emotion word in emotion dictionary are combined to form viewpoint word pair;
Second acquisition unit 403, for obtaining the social media information relevant to current subject under discussion;
3rd computing unit 404, for according to the similarity of each viewpoint word pair and every bar social media information and each viewpoint word pair viewpoint value with Similarity Measure every bar social media information of the comment of every bar social media information;
Information filtering unit 405, is less than the social media information of predetermined threshold value for filtering viewpoint value, using remaining social media information as described information to be assessed;
First acquiring unit 406, for obtaining information to be assessed;
First computing unit 407, for calculating the uncertain score of every bar information to be assessed;
Second computing unit 408, calculates the confidence level of the publisher of every bar information to be assessed;
Statistic unit 409, for add up every bar information to be assessed comment in support shared by suggestion ratio;
Reliability assessment unit 410, for the uncertain score by information to be assessed for every bar, support to calculate in the quantitative appraisement model that the input of the ratio shared by suggestion training in advance is good in the confidence level of the publisher of every bar information to be assessed and the comment of every bar information to be assessed, the output of quantitative appraisement model is the reliability order of every bar information to be assessed.
In addition, dictionary construction unit 401 specifically comprises search subelement 4011, statistics subelement 4012 and dictionary and builds subelement 4013, wherein:
Search subelement 4011 is for, the search social media information relevant to current subject under discussion in social networks;
Statistics subelement 4012 for, extract the keyword in described social media information and add up each keyword occur frequency;
Dictionary build subelement 4013 for, the keyword choosing predetermined number according to frequency order from high to low builds described theme dictionary as descriptor.
For further understanding, with a practical application scene, the interactive mode between each unit in information credibility apparatus for evaluating 400 in the present embodiment is described below, specific as follows:
First, dictionary construction unit 401 builds the theme dictionary relevant to current subject under discussion by Word-frequency.Particularly, the social media information relevant to current subject under discussion can be searched in social networks by search subelement 4011, then add up subelement 4012 extract the keyword in the social media information that searches of search subelement 4011 and add up the frequency that each keyword occurs, dictionary builds subelement 4013 chooses predetermined number keyword according to frequency order from high to low and builds theme dictionary as descriptor.
In a concrete example, such as Huawei Company has issued p7 mobile phone, the social media information relevant to p7 mobile phone has been emerged very soon in social networks, namely search subelement 4011 can search for the social media information relevant to current subject under discussion p7 mobile phone, statistics subelement 4012 extracts keyword in these information searched such as: Huawei, screen, Hai Si, millet etc., add up the frequency that each keyword occurs, then dictionary builds subelement 4013 and chooses the keyword of the higher predetermined number of the frequency of occurrences as descriptor structure theme dictionary.
In addition, in other examples, dictionary construction unit 401 can also use conventional potential topic model latenttopicmodel to build the theme dictionary relevant to current subject under discussion.
Each descriptor in the theme dictionary that dictionary construction unit 401 builds forming unit 402 by word and each emotion word in emotion dictionary are combined to form viewpoint word pair, emotion dictionary can adopt existing main flow sentiment dictionary, each viewpoint word forms by a descriptor and an emotion word, viewpoint word is to such as < outward appearance, beautiful >, < Hai Si, proud >.
Second acquisition unit 403 obtains the social media information relevant to current subject under discussion, in specific implementation, second acquisition unit 403 can using the keyword in current subject under discussion as input the enterprising line search of social media with crawl the social media information relevant to current subject under discussion.
The similarity of any two words A, B wherein A 0, B 0represent the term vector of word A, B respectively, || A 0|| represent the norm of A0, || B 0|| represent B 0norm.
3rd computing unit 404 is according to the similarity of each viewpoint word pair and every article of social media information and each viewpoint word pair viewpoint value with Similarity Measure every article of social media information of the comment of every article of social media information.
Particularly, the 3rd computing unit 404 can calculate the similarity of each keyword in the descriptor of a viewpoint word centering and one article of social media information according to above-mentioned calculating formula of similarity, extract similarity maximal value a; Calculate the similarity of each keyword in the descriptor of this viewpoint word centering and the comment of this social media information simultaneously, extract similarity maximal value x;
Following 3rd computing unit 404 calculates the similarity of each emotion word in the emotion word of this viewpoint word centering and this social media information according to above-mentioned calculating formula of similarity, extracts similarity maximal value b; Calculate the similarity of each emotion word in the emotion word of this viewpoint word centering and the comment of this social media information simultaneously, extract similarity maximal value y;
The similarity of this viewpoint word pair and this social media information is s1=λ a+ (1-λ) b, λ is greater than 0, and to be less than 1, λ predeterminable, and the similarity of the comment of this viewpoint word pair and this social media information is s2=μ x+ (1-μ) y, μ is greater than 0, and to be less than 1, μ predeterminable.
The similarity of this viewpoint word pair and this social media information and this viewpoint word pair are added the viewpoint subvalue obtaining this social media information by the 3rd computing unit 404 with the similarity of the comment of this social media information;
Each viewpoint word is obtained all viewpoint subvalues of this social media information to doing process equally by the 3rd computing unit 404, all viewpoint subvalues is added up and obtains the viewpoint value of this social media information, by that analogy, obtain the viewpoint value of each social media information.
Information filtering unit 405 filters the social media information that viewpoint value is less than predetermined threshold value, and using remaining social media information as described information to be assessed, the first acquiring unit 406 obtaining information filter element 405 filters rear remaining social media information.
In the present embodiment, can think that viewpoint value is less than the social media information of predetermined threshold value not expressing some viewpoints subjectively, clearly, state something or describe certain product with such as just having no emotional color, this part social media information will be filtered; Can think that the social media information that viewpoint value is more than or equal to predetermined threshold value have expressed some viewpoints subjectively, clearly, this part social media information often becomes hot spot of public opinions, affect the cognition of people to event or product, therefore in the present embodiment using this part social media information as information to be assessed, the confidence level of main this part information of assessment.
First computing unit 407 calculates the uncertain score of every bar information to be assessed, in the present embodiment, first computing unit 407 first can train an information uncertainty assessment models, the uncertain content comprised in information to be classified, such as, the uncertain content comprised in information can be done following classification:
Type Clue word or phrase Example sentence
Problem type Really Does is p7's really that chip is thought in sea?
Hear type It is said It is said that p7 is on sale in Europe
Wish type Really think Really think just there is platform p7 now
Conviction type Believe I believes that I has platform p7 some day
Conditional If If salary raise, I can buy p7
Possible type Should I should be able to buy p7 mobile phone
In specific implementation, first computing unit 407 is by clue word or the every bar of phrase on-line checkingi information to be assessed, to determine the classification of the uncertain content comprised in every bar information to be assessed, then the category score of the uncertain content of every class comprised in every bar information to be assessed is calculated, the cumulative uncertain score obtaining every bar information to be assessed after finally the category score of the uncertain content of every class comprised in information to be assessed for every bar being multiplied by default weight.
Such as, a given information to be assessed, it may belong to multiple classification simultaneously, such as belong to A, B, C tri-class simultaneously, a score is had according to each classification of model, it is larger that the higher expression of mark belongs to such other possibility, and the mark that the uncertainty such as calculating this information to be assessed assigns to these three classifications is respectively S a, S b, S c, the uncertain of this information to be assessed final that so the first computing unit 407 calculates must be divided into H=W a* S a+ W b* S b* W c* S c, wherein W a, W b, W cfor weight coefficient, the value of three weight coefficients can be different, and such as can arrange a weight coefficient for each classification in advance as required, certain three weight coefficients also can get same value.
Second computing unit 408 calculates the confidence level of the publisher of every bar information to be assessed, the calculating of the confidence level of information publisher is mainly based on the various features of information publisher on social networks, such as: deliver the number of microblogging, whether be authenticated, user gradations etc. carry out reliability assessment to user, concrete appraisal procedure can refer to existing method, repeats no more herein.
Statistic unit 409 adds up in the comment of every bar information to be assessed the ratio supported shared by suggestion, reliability assessment unit 410 is by the uncertain score of information to be assessed for every bar, support to calculate in the quantitative appraisement model that the input of the ratio shared by suggestion training in advance is good in the confidence level of the publisher of every bar information to be assessed and the comment of every bar information to be assessed, the output of quantitative appraisement model is the reliability order of every bar information to be assessed.In the process that reliability assessment unit 410 calculates, the uncertain score of information to be assessed is higher, and the confidence level of this information to be assessed is lower; The confidence level of the publisher of information to be assessed is lower, and the confidence level of this information to be assessed is lower; Support in the comment of information to be assessed that ratio shared by suggestion is less, and/or along with the change of time, support in the comment of information to be assessed that ratio shared by suggestion is more and more less, the confidence level of this information to be assessed is lower.
In the present embodiment, after second acquisition unit obtains the social media information relevant to current subject under discussion, 3rd computing unit utilizes theme dictionary constructed by dictionary construction unit and emotion dictionary to calculate the similarity of emotion word pair and social media information and review information thereof, thus extract subjectively, the social information that have expressed some viewpoints is clearly assessed, in the process that reliability assessment unit is assessed every bar information to be assessed, by the uncertain score of information to be assessed for every bar, support that the ratio shared by suggestion calculates as the input of quantitative appraisement model in the confidence level of the publisher of every bar information to be assessed and the comment of every bar information to be assessed, by these three classes data, every bar information to be assessed is assessed, add the accuracy of assessment.
Refer to Fig. 5 below, Fig. 5 provides another embodiment schematic diagram of information credibility apparatus for evaluating of the present invention, the information credibility apparatus for evaluating 500 of the present embodiment may be used for implementing the information credibility appraisal procedure that above-described embodiment provides, in actual applications, information credibility apparatus for evaluating 500 can be integrated in electronic equipment, and this electronic equipment can be the equipment such as mobile phone, panel computer.Specifically:
Information credibility apparatus for evaluating 500 can comprise RF (RadioFrequency, radio frequency) circuit 510, the storer 520 including one or more computer-readable recording mediums, input block 530, display unit 540, sensor 550, voicefrequency circuit 560, WiFi (wirelessfidelity, Wireless Fidelity) module 570, include the parts such as processor 580 and power supply 590 that more than or processes core.It will be understood by those skilled in the art that the structure shown in Fig. 5 does not form the restriction to information credibility apparatus for evaluating 500, the parts more more or less than diagram can be comprised, or combine some parts, or different parts are arranged.Wherein:
RF circuit 510 can be used in messaging or communication process, the reception of signal and transmission, especially, after being received by the downlink information of base station, transfers to more than one or one processor 580 to process; In addition, base station is sent to by relating to up data.Usually, RF circuit 510 includes but not limited to antenna, at least one amplifier, tuner, one or more oscillator, subscriber identity module (SIM) card, transceiver, coupling mechanism, LNA (LowNoiseAmplifier, low noise amplifier), diplexer etc.In addition, RF circuit 510 can also by radio communication and network and other devices communicatings.Described radio communication can use arbitrary communication standard or agreement, include but not limited to GSM (GlobalSystemofMobilecommunication, global system for mobile communications), GPRS (GeneralPacketRadioService, general packet radio service), CDMA (CodeDivisionMultipleAccess, CDMA), WCDMA (WidebandCodeDivisionMultipleAccess, Wideband Code Division Multiple Access (WCDMA)), LTE (LongTermEvolution, Long Term Evolution), Email, SMS (ShortMessagingService, Short Message Service) etc.
Storer 520 can be used for storing software program and module, and processor 580 is stored in software program and the module of storer 520 by running, thus performs the application of various function and data processing.Storer 520 mainly can comprise storage program district and store data field, and wherein, storage program district can store operating system, application program (such as sound-playing function, image player function etc.) etc. needed at least one function; Storage data field can store and create data (such as voice data, phone directory etc.) according to the use of memory device.In addition, storer 520 can comprise high-speed random access memory, can also comprise nonvolatile memory, such as at least one disk memory, flush memory device or other volatile solid-state parts.Correspondingly, storer 520 can also comprise Memory Controller, to provide the access of processor 580 and input block 530 pairs of storeies 520.
Input block 530 can be used for the numeral or the character information that receive input, and produces and to arrange with user and function controls relevant keyboard, mouse, control lever, optics or trace ball signal and inputs.Particularly, input block 530 can comprise Touch sensitive surface 531 and other input equipments 532.Touch sensitive surface 531, also referred to as touch display screen or Trackpad, user can be collected or neighbouring touch operation (such as user uses any applicable object or the operations of annex on Touch sensitive surface 531 or near Touch sensitive surface 531 such as finger, stylus) thereon, and drive corresponding coupling arrangement according to the formula preset.Optionally, Touch sensitive surface 531 can comprise touch detecting apparatus and touch controller two parts.Wherein, touch detecting apparatus detects the touch orientation of user, and detects the signal that touch operation brings, and sends signal to touch controller; Touch controller receives touch information from touch detecting apparatus, and converts it to contact coordinate, then gives processor 580, and the order that energy receiving processor 580 is sent also is performed.In addition, the polytypes such as resistance-type, condenser type, infrared ray and surface acoustic wave can be adopted to realize Touch sensitive surface 531.Except Touch sensitive surface 531, input block 530 can also comprise other input equipments 532.Particularly, other input equipments 532 can include but not limited to one or more in physical keyboard, function key (such as volume control button, switch key etc.), trace ball, mouse, control lever etc.
Display unit 540 can be used for the various graphical user interface showing information or the information being supplied to user and the device inputted by user, and these graphical user interface can be made up of figure, text, icon, video and its combination in any.Display unit 540 can comprise display panel 541, optionally, the form such as LCD (LiquidCrystalDisplay, liquid crystal display), OLED (OrganicLight-EmittingDiode, Organic Light Emitting Diode) can be adopted to configure display panel 541.Further, Touch sensitive surface 531 can cover display panel 541, when Touch sensitive surface 531 detects thereon or after neighbouring touch operation, send processor 580 to determine the type of touch event, on display panel 541, provide corresponding vision to export with preprocessor 580 according to the type of touch event.Although in Figure 5, Touch sensitive surface 531 and display panel 541 be as two independently parts realize input and input function, in certain embodiments, can by Touch sensitive surface 531 and display panel 541 integrated and realize input and output function.
Information credibility apparatus for evaluating 500 also can comprise at least one sensor 550, such as optical sensor, motion sensor and other sensors.Particularly, optical sensor can comprise ambient light sensor and proximity transducer, and wherein, ambient light sensor the light and shade of environmentally light can regulate the brightness of display panel 541, proximity transducer when device 500 moves in one's ear, can cut out display panel 541 and/or backlight.As the one of motion sensor, Gravity accelerometer can detect the size of all directions (are generally three axles) acceleration, size and the direction of gravity can be detected time static, can be used for the application (such as horizontal/vertical screen switching, dependent game, magnetometer pose calibrating) of recognition device attitude, Vibration identification correlation function (such as passometer, knock) etc.; As for device 500 also other sensors such as configurable gyroscope, barometer, hygrometer, thermometer, infrared ray sensor, do not repeat them here.
Voicefrequency circuit 560, loudspeaker 561, microphone 562 can provide the audio interface between user and device.Voicefrequency circuit 560 can by receive voice data conversion after electric signal, be transferred to loudspeaker 561, by loudspeaker 561 be converted to voice signal export; On the other hand, the voice signal of collection is converted to electric signal by microphone 562, voice data is converted to after being received by voicefrequency circuit 560, after again voice data output processor 580 being processed, through RF circuit 510 to send to such as another device, or export voice data to storer 520 to process further.Voicefrequency circuit 560 also may comprise earphone jack, to provide the communication of peripheral hardware earphone and device.
WiFi belongs to short range wireless transmission technology, and information credibility apparatus for evaluating 500 can help user to send and receive e-mail by WiFi module 570, browse webpage and access streaming video etc., and its broadband internet wireless for user provides is accessed.Although Fig. 5 shows WiFi module 570, be understandable that, it does not belong to must forming of device, can omit in the scope of essence not changing invention as required completely.
Processor 580 is control centers of information credibility apparatus for evaluating, utilize the various piece of various interface and the whole device of connection, software program in storer 520 and/or module is stored in by running or performing, and call the data be stored in storer 520, perform various function and the process data of memory device, thus integral monitoring is carried out to memory device.Optionally, processor 580 can comprise one or more process core; Preferably, processor 580 accessible site application processor and modem processor, wherein, application processor mainly processes operating system, user interface and application program etc., and modem processor mainly processes radio communication.Be understandable that, above-mentioned modem processor also can not be integrated in processor 580.
Information credibility apparatus for evaluating is put 500 and is also comprised the power supply 590 (such as battery) of powering to all parts, preferably, power supply can be connected with processor 580 logic by power-supply management system, thus realizes the functions such as management charging, electric discharge and power managed by power-supply management system.Power supply 590 can also comprise one or more direct current or AC power, recharging system, power failure detection circuit, power supply changeover device or the random component such as inverter, power supply status indicator.
Although not shown, information credibility apparatus for evaluating 500 can also comprise camera, bluetooth module etc., does not repeat them here.Specifically in the present embodiment, information credibility apparatus for evaluating 500 includes storer 520, and one or more than one program, one of them or more than one program are stored in storer 520, and are configured to perform above-mentioned more than one or one routine package containing the instruction for carrying out following operation by more than one or one processor 580:
Obtain information to be assessed;
Calculate the uncertain score of every bar information to be assessed;
Calculate the confidence level of the publisher of every bar information to be assessed;
Add up in the comment of every bar information to be assessed the ratio supported shared by suggestion;
By the uncertain score of information to be assessed for every bar, support to calculate in the quantitative appraisement model that the input of the ratio shared by suggestion training in advance is good in the confidence level of the publisher of every bar information to be assessed and the comment of every bar information to be assessed, the output of quantitative appraisement model is the reliability order of every bar information to be assessed.
It should be noted that the information credibility apparatus for evaluating 500 that the embodiment of the present invention provides can also be used for other function realized in said apparatus embodiment, not repeat them here.
It should be noted that in addition, device embodiment described above is only schematic, the wherein said unit illustrated as separating component or can may not be and physically separates, parts as unit display can be or may not be physical location, namely can be positioned at a place, or also can be distributed in multiple network element.Some or all of module wherein can be selected according to the actual needs to realize the object of the present embodiment scheme.In addition, in device embodiment accompanying drawing provided by the invention, the annexation between module represents to have communication connection between them, specifically can be implemented as one or more communication bus or signal wire.Those of ordinary skill in the art, when not paying creative work, are namely appreciated that and implement.
Through the above description of the embodiments, those skilled in the art can be well understood to the mode that the present invention can add required common hardware by software and realize, and can certainly comprise special IC, dedicated cpu, private memory, special components and parts etc. realize by specialized hardware.Generally, all functions completed by computer program can realize with corresponding hardware easily, and the particular hardware structure being used for realizing same function also can be diversified, such as mimic channel, digital circuit or special circuit etc.But under more susceptible for the purpose of the present invention condition, software program realizes is better embodiment.Based on such understanding, technical scheme of the present invention can embody with the form of software product the part that prior art contributes in essence in other words, this computer software product is stored in the storage medium that can read, as the floppy disk of computing machine, USB flash disk, portable hard drive, ROM (read-only memory) (ROM, Read-OnlyMemory), random access memory (RAM, RandomAccessMemory), magnetic disc or CD etc., comprising some instructions in order to make a computer equipment (can be personal computer, server, or the network equipment etc.) perform method described in the present invention each embodiment.
A kind of social media information credibility appraisal procedure based on opining mining provided the embodiment of the present invention above and device are described in detail, for one of ordinary skill in the art, according to the thought of the embodiment of the present invention, all will change in specific embodiments and applications, therefore, this description should not be construed as limitation of the present invention.

Claims (12)

1., based on a social media information credibility appraisal procedure for opining mining, it is characterized in that, comprising:
Obtain information to be assessed;
Calculate the uncertain score of every bar information to be assessed;
Calculate the confidence level of the publisher of every bar information to be assessed;
Add up in the comment of every bar information to be assessed the ratio supported shared by suggestion;
By the uncertain score of information to be assessed for every bar, support to calculate in the quantitative appraisement model that the input of the ratio shared by suggestion training in advance is good in the confidence level of the publisher of every bar information to be assessed and the comment of every bar information to be assessed, the output of described quantitative appraisement model is the reliability order of every bar information to be assessed.
2. the method for claim 1, is characterized in that, before acquisition information to be assessed, described method also comprises:
Build the theme dictionary relevant to current subject under discussion;
Each descriptor in described theme dictionary and each emotion word in emotion dictionary are combined to form viewpoint word pair;
Obtain the social media information relevant to current subject under discussion;
According to the similarity of each viewpoint word pair and every bar social media information and each viewpoint word pair viewpoint value with Similarity Measure every bar social media information of the comment of every bar social media information;
Filter the social media information that viewpoint value is less than predetermined threshold value, using remaining social media information as described information to be assessed.
3. method as claimed in claim 2, it is characterized in that, the theme dictionary that described structure is relevant to current subject under discussion specifically comprises:
The social media information that search is relevant to current subject under discussion in social networks;
Extract the keyword in described social media information and add up each keyword occur frequency;
The keyword choosing predetermined number according to frequency order from high to low builds described theme dictionary as descriptor.
4. method as claimed in claim 2, it is characterized in that, the described similarity according to each viewpoint word pair and every bar social media information and each viewpoint word pair specifically comprise with the viewpoint value of Similarity Measure every bar social media information of the comment of every bar social media information:
Calculate the similarity of each keyword in the descriptor of a viewpoint word centering and a social media information, extract similarity maximal value a; Calculate the similarity of each keyword in the descriptor of described viewpoint word centering and the comment of described social media information, extract similarity maximal value x;
Calculate the similarity of each emotion word in the emotion word of described viewpoint word centering and described social media information, extract similarity maximal value b; Calculate the similarity of each emotion word in the emotion word of described viewpoint word centering and the comment of described social media information, extract similarity maximal value y;
Described viewpoint word pair is s1=λ a+ (1-λ) b with the similarity of described social media information, λ is greater than 0 and is less than 1, described viewpoint word pair and the similarity of the comment of described social media information are that s2=μ x+ (1-μ) y, μ are greater than 0 and are less than 1;
The similarity of described viewpoint word pair and described social media information and described viewpoint word pair are added with the similarity of the comment of described social media information the viewpoint subvalue obtaining described social media information;
Each viewpoint word being obtained all viewpoint subvalues of described social media information to doing process equally, all viewpoint subvalues being added up and obtains the viewpoint value of described social media information, by that analogy, obtaining the viewpoint value of each social media information.
5. as the method in Claims 1-4 as described in any one, it is characterized in that, the uncertain score of the every bar of described calculating information to be assessed comprises:
Determine the classification of the uncertain content comprised in every bar information to be assessed;
Calculate the category score of the uncertain content of every class comprised in every bar information to be assessed;
The cumulative uncertain score obtaining every bar information to be assessed after the category score of the uncertain content of every class comprised in information to be assessed for every bar is multiplied by default weight.
6. as the method in Claims 1-4 as described in any one, it is characterized in that, in the uncertain score by information to be assessed for every bar, support to carry out in the process calculated in the quantitative appraisement model that the input of the ratio shared by suggestion training in advance is good in the confidence level of the publisher of every bar information to be assessed and the comment of every bar information to be assessed, the uncertain score of described information to be assessed is higher, and the confidence level of described information to be assessed is lower; The confidence level of the publisher of described information to be assessed is lower, and the confidence level of described information to be assessed is lower; Support that the ratio shared by suggestion is less in the comment of described information to be assessed, and/or along with the change of time, support that the ratio shared by suggestion is more and more less in the comment of described information to be assessed, the confidence level of described information to be assessed is lower.
7., based on a social media information credibility apparatus for evaluating for opining mining, it is characterized in that, comprising:
First acquiring unit, for obtaining information to be assessed;
First computing unit, for calculating the uncertain score of every bar information to be assessed;
Second computing unit, calculates the confidence level of the publisher of every bar information to be assessed;
Statistic unit, for add up every bar information to be assessed comment in support shared by suggestion ratio;
Reliability assessment unit, for the uncertain score by information to be assessed for every bar, support to calculate in the quantitative appraisement model that the input of the ratio shared by suggestion training in advance is good in the confidence level of the publisher of every bar information to be assessed and the comment of every bar information to be assessed, the output of described quantitative appraisement model is the reliability order of every bar information to be assessed.
8. device as claimed in claim 7, it is characterized in that, described device also comprises:
Dictionary construction unit, for building the theme dictionary relevant to current subject under discussion;
Word to forming unit, for each descriptor in described theme dictionary and each emotion word in emotion dictionary are combined to form viewpoint word pair;
Second acquisition unit, for obtaining the social media information relevant to current subject under discussion;
3rd computing unit, for according to the similarity of each viewpoint word pair and every bar social media information and each viewpoint word pair viewpoint value with Similarity Measure every bar social media information of the comment of every bar social media information;
Information filtering unit, is less than the social media information of predetermined threshold value for filtering viewpoint value, using remaining social media information as described information to be assessed.
9. device as claimed in claim 8, it is characterized in that, described dictionary construction unit specifically comprises:
Search subelement, for the social media information that search in social networks is relevant to current subject under discussion;
Statistics subelement, adds up for the keyword that extracts in described social media information the frequency that each keyword occurs;
Dictionary builds subelement, builds described theme dictionary for the keyword choosing predetermined number according to frequency order from high to low as descriptor.
10. device as claimed in claim 8, is characterized in that, described 3rd computing unit specifically for:
Calculate the similarity of each keyword in the descriptor of a viewpoint word centering and a social media information, extract similarity maximal value a; Calculate the similarity of each keyword in the descriptor of described viewpoint word centering and the comment of described social media information, extract similarity maximal value x;
Calculate the similarity of each emotion word in the emotion word of described viewpoint word centering and described social media information, extract similarity maximal value b; Calculate the similarity of each emotion word in the emotion word of described viewpoint word centering and the comment of described social media information, extract similarity maximal value y;
Described viewpoint word pair is s1=λ a+ (1-λ) b with the similarity of described social media information, λ is greater than 0 and is less than 1, described viewpoint word pair and the similarity of the comment of described social media information are that s2=μ x+ (1-μ) y, μ are greater than 0 and are less than 1;
The similarity of described viewpoint word pair and described social media information and described viewpoint word pair are added with the similarity of the comment of described social media information the viewpoint subvalue obtaining described social media information;
Each viewpoint word being obtained all viewpoint subvalues of described social media information to doing process equally, all viewpoint subvalues being added up and obtains the viewpoint value of described social media information, by that analogy, obtaining the viewpoint value of each social media information.
11. devices as described in claim 7 to 10 any one, is characterized in that, described first computing unit specifically for:
Determine the classification of the uncertain content comprised in every bar information to be assessed;
Calculate the category score of the uncertain content of every class comprised in every bar information to be assessed;
The cumulative uncertain score obtaining every bar information to be assessed after the category score of the uncertain content of every class comprised in information to be assessed for every bar is multiplied by default weight.
12. devices as described in claim 7 to 10 any one, it is characterized in that, described reliability assessment unit is in the uncertain score by information to be assessed for every bar, support to carry out in the process calculated in the quantitative appraisement model that the input of the ratio shared by suggestion training in advance is good in the confidence level of the publisher of every bar information to be assessed and the comment of every bar information to be assessed, the uncertain score of described information to be assessed is higher, and the confidence level of described information to be assessed is lower; The confidence level of the publisher of described information to be assessed is lower, and the confidence level of described information to be assessed is lower; Support that the ratio shared by suggestion is less in the comment of described information to be assessed, and/or along with the change of time, support that the ratio shared by suggestion is more and more less in the comment of described information to be assessed, the confidence level of described information to be assessed is lower.
CN201410436605.5A 2014-08-29 2014-08-29 A kind of social media information credibility evaluation method and device based on opining mining Active CN105447036B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410436605.5A CN105447036B (en) 2014-08-29 2014-08-29 A kind of social media information credibility evaluation method and device based on opining mining

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410436605.5A CN105447036B (en) 2014-08-29 2014-08-29 A kind of social media information credibility evaluation method and device based on opining mining

Publications (2)

Publication Number Publication Date
CN105447036A true CN105447036A (en) 2016-03-30
CN105447036B CN105447036B (en) 2019-08-16

Family

ID=55557224

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410436605.5A Active CN105447036B (en) 2014-08-29 2014-08-29 A kind of social media information credibility evaluation method and device based on opining mining

Country Status (1)

Country Link
CN (1) CN105447036B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105824805A (en) * 2016-05-09 2016-08-03 腾讯科技(深圳)有限公司 Identification method and device
CN106528813A (en) * 2016-11-18 2017-03-22 腾讯科技(深圳)有限公司 Multimedia recommendation method and apparatus
CN106649433A (en) * 2016-09-05 2017-05-10 东南大学 Topic viewpoint strength calculating method based on viewpoint statement confidence level
CN106776551A (en) * 2016-12-06 2017-05-31 桂林电子科技大学 A kind of analysis method of english composition emotion viewpoint
CN106951408A (en) * 2017-03-17 2017-07-14 国信优易数据有限公司 A kind of data digging method
CN107741938A (en) * 2016-10-13 2018-02-27 腾讯科技(深圳)有限公司 A kind of network information recognition methods and device
CN108074071A (en) * 2016-11-18 2018-05-25 腾讯科技(深圳)有限公司 A kind of project data processing method and processing device
CN109299400A (en) * 2018-09-06 2019-02-01 北京奇艺世纪科技有限公司 A kind of viewpoint abstracting method, device and equipment
CN109508370A (en) * 2018-09-28 2019-03-22 北京百度网讯科技有限公司 Opinions Extraction method, equipment and storage medium
CN110059190A (en) * 2019-04-18 2019-07-26 东南大学 A kind of user's real-time point of view detection method based on social media content and structure
CN110537176A (en) * 2017-02-21 2019-12-03 索尼互动娱乐有限责任公司 Method for determining accuracy of news
CN111539562A (en) * 2020-04-10 2020-08-14 支付宝(杭州)信息技术有限公司 Data evaluation method and system based on model
US10805255B2 (en) 2016-10-13 2020-10-13 Tencent Technology (Shenzhen) Company Limited Network information identification method and apparatus
CN112000709A (en) * 2020-07-17 2020-11-27 微梦创科网络科技(中国)有限公司 Method and device for batch mining of total exposure of social media information
CN112711650A (en) * 2019-10-24 2021-04-27 富驰律法(北京)科技有限公司 Public welfare litigation clue mining method and system
CN112711650B (en) * 2019-10-24 2024-04-12 富驰律法(北京)科技有限公司 Method and system for mining clues of public welfare litigation

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101404591A (en) * 2008-11-14 2009-04-08 西安交通大学 Self-adapting dynamic trust weight estimation method
CN101404572A (en) * 2008-11-14 2009-04-08 西安交通大学 Network node total trust degree estimation method based on feedback trust aggregation
CN101466098A (en) * 2009-01-21 2009-06-24 中国人民解放军信息工程大学 Method, device and communication system for evaluating network trust degree
WO2013082395A1 (en) * 2011-12-01 2013-06-06 Google Inc Identifying recommended merchants
US20130227700A1 (en) * 2012-02-28 2013-08-29 Disney Enterprises, Inc. Dynamic Trust Score for Evaulating Ongoing Online Relationships
CN103390194A (en) * 2012-05-07 2013-11-13 北京三星通信技术研究有限公司 Method, device and system for predicating user intention and recommending suggestion

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101404591A (en) * 2008-11-14 2009-04-08 西安交通大学 Self-adapting dynamic trust weight estimation method
CN101404572A (en) * 2008-11-14 2009-04-08 西安交通大学 Network node total trust degree estimation method based on feedback trust aggregation
CN101466098A (en) * 2009-01-21 2009-06-24 中国人民解放军信息工程大学 Method, device and communication system for evaluating network trust degree
WO2013082395A1 (en) * 2011-12-01 2013-06-06 Google Inc Identifying recommended merchants
US20130227700A1 (en) * 2012-02-28 2013-08-29 Disney Enterprises, Inc. Dynamic Trust Score for Evaulating Ongoing Online Relationships
CN103390194A (en) * 2012-05-07 2013-11-13 北京三星通信技术研究有限公司 Method, device and system for predicating user intention and recommending suggestion

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李亚平: "基于模糊可信度的软件质量的度量研究", 《长江大学学报(自科版)》 *
高雅: "微博新闻事件信息可信度评价", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105824805A (en) * 2016-05-09 2016-08-03 腾讯科技(深圳)有限公司 Identification method and device
CN106649433A (en) * 2016-09-05 2017-05-10 东南大学 Topic viewpoint strength calculating method based on viewpoint statement confidence level
CN107741938A (en) * 2016-10-13 2018-02-27 腾讯科技(深圳)有限公司 A kind of network information recognition methods and device
US10805255B2 (en) 2016-10-13 2020-10-13 Tencent Technology (Shenzhen) Company Limited Network information identification method and apparatus
CN106528813A (en) * 2016-11-18 2017-03-22 腾讯科技(深圳)有限公司 Multimedia recommendation method and apparatus
CN108074071B (en) * 2016-11-18 2021-06-18 腾讯科技(深圳)有限公司 Project data processing method and device
CN108074071A (en) * 2016-11-18 2018-05-25 腾讯科技(深圳)有限公司 A kind of project data processing method and processing device
CN106776551A (en) * 2016-12-06 2017-05-31 桂林电子科技大学 A kind of analysis method of english composition emotion viewpoint
CN106776551B (en) * 2016-12-06 2020-05-08 桂林电子科技大学 Method for analyzing emotion viewpoints of English composition
CN110537176A (en) * 2017-02-21 2019-12-03 索尼互动娱乐有限责任公司 Method for determining accuracy of news
CN106951408A (en) * 2017-03-17 2017-07-14 国信优易数据有限公司 A kind of data digging method
CN109299400A (en) * 2018-09-06 2019-02-01 北京奇艺世纪科技有限公司 A kind of viewpoint abstracting method, device and equipment
CN109508370A (en) * 2018-09-28 2019-03-22 北京百度网讯科技有限公司 Opinions Extraction method, equipment and storage medium
CN109508370B (en) * 2018-09-28 2022-07-08 北京百度网讯科技有限公司 Comment extraction method, comment extraction device and storage medium
CN110059190A (en) * 2019-04-18 2019-07-26 东南大学 A kind of user's real-time point of view detection method based on social media content and structure
CN112711650A (en) * 2019-10-24 2021-04-27 富驰律法(北京)科技有限公司 Public welfare litigation clue mining method and system
CN112711650B (en) * 2019-10-24 2024-04-12 富驰律法(北京)科技有限公司 Method and system for mining clues of public welfare litigation
CN111539562A (en) * 2020-04-10 2020-08-14 支付宝(杭州)信息技术有限公司 Data evaluation method and system based on model
CN112000709A (en) * 2020-07-17 2020-11-27 微梦创科网络科技(中国)有限公司 Method and device for batch mining of total exposure of social media information
CN112000709B (en) * 2020-07-17 2023-10-24 微梦创科网络科技(中国)有限公司 Social media information total exposure batch mining method and device

Also Published As

Publication number Publication date
CN105447036B (en) 2019-08-16

Similar Documents

Publication Publication Date Title
CN105447036A (en) Opinion mining-based social media information credibility evaluation method and apparatus
US20170091335A1 (en) Search method, server and client
CN103425736B (en) A kind of web information recognition, Apparatus and system
CN104123937B (en) Remind method to set up, device and system
CN104133832B (en) The recognition methods of pirate application and device
CN104967679A (en) Information recommendation system, method and device
CN105335398A (en) Service recommendation method and terminal
CN104717125B (en) Graphic code store method and device
CN105095432A (en) Display method and device for webpage annotations
RU2612598C2 (en) Method, equipment and terminal symbol selection device
CN106227774A (en) Information search method and device
CN105335653A (en) Abnormal data detection method and apparatus
CN104281394A (en) Method and device for intelligently selecting words
CN104239343A (en) User input information processing method and device
CN105320701A (en) Method and device for screening function point test implementing ways, and terminal
CN104281600A (en) Method and device for intelligent selecting words
CN107040610A (en) Method of data synchronization, device, storage medium, terminal and server
CN109543014B (en) Man-machine conversation method, device, terminal and server
CN104618223A (en) Information recommendation management method, device and system
CN107885718B (en) Semantic determination method and device
CN105630846A (en) Head portrait updating method and apparatus
CN104267882A (en) Page suspension frame display method and device
CN103327029B (en) A kind of detection method of malice network address and equipment
CN105512150A (en) Method and device for information search
CN113940033B (en) User identification method and related product

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant