CN104636408A - News authentication early warning method and system based on user generated content - Google Patents

News authentication early warning method and system based on user generated content Download PDF

Info

Publication number
CN104636408A
CN104636408A CN201410414956.6A CN201410414956A CN104636408A CN 104636408 A CN104636408 A CN 104636408A CN 201410414956 A CN201410414956 A CN 201410414956A CN 104636408 A CN104636408 A CN 104636408A
Authority
CN
China
Prior art keywords
news
user
information
content
early warning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410414956.6A
Other languages
Chinese (zh)
Other versions
CN104636408B (en
Inventor
曹娟
吴波
谢菲
张勇东
苏宇
李锦涛
吕锐
曹学会
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XINHUA NEWS AGENCY
Institute of Computing Technology of CAS
Original Assignee
XINHUA NEWS AGENCY
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by XINHUA NEWS AGENCY, Institute of Computing Technology of CAS filed Critical XINHUA NEWS AGENCY
Priority to CN201410414956.6A priority Critical patent/CN104636408B/en
Publication of CN104636408A publication Critical patent/CN104636408A/en
Application granted granted Critical
Publication of CN104636408B publication Critical patent/CN104636408B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a news authentication early warning method and system based on user generated content. The method includes the steps of obtaining reference data by conducting semantic extension on news clues; conducting oriented collection on the reference data so as to obtain information content, transmission modes, user groups and behavior information, attribute information and the like of the user groups, wherein the information content, the transmission modes, the user groups and the behavior information, attribute information and the like of the user groups are related to the news clues; conducting semantic knowledge extraction on the information content; conducting clustering and similarity calculation on semantic knowledge; recognizing the news clues which are not matched with a history news clue database; conducting credibility evaluation on the news clues on the aspects of semantic knowledge comparison, the user groups, the spreading modes, the information content and the like. Finally, the classified measurement and early warning of the UGC news authenticity are achieved, and the decision support is provided for judging whether news is true or not.

Description

Based on news certification method for early warning and the system of user-generated content
Technical field
The present invention relates to news field of authentication, particularly relate to the news certification method for early warning based on user-generated content and system.
Background technology
Along with being rooted in the hearts of the people and WEB2.0 technology flourish of Internet technology, domestic consumer becomes the main producers of the content on internet.UGC (User Generated Content) is the abbreviation of user-generated content, and UGC news is the spontaneous media event information uploaded or share of users in Social Media (such as microblogging, blog, social networks etc.).UGC content is reacted in time because it has, is propagated the features such as fast, also becomes a primary information resource of traditional media.But because the threshold of UGC content is low, any user to internet upload contents, can lack the effective supervision to UGC content, there is a large amount of Deceptive news in UGC, this brings puzzlement in issue UGC information also to traditional news agency.
From domestic and international progress, on the one hand, research relevant is at present credible mainly for UGC content (non-news content), or traditional news media contribution (non-UGC) is credible, set up the information credibility assessment indicator system of science comprehensively, and be also in the blank stage for the research of UGC news; On the other hand, these researchs are all from dissemination, and psychology, sociological angle is set out, and carries out theoretical analysis by the mode of survey.And just at the early-stage for the research of UGC news certification in application, temporarily also there is no ripe solution.Therefore, more and more rely on internet news resource at government and society, and under the undesirable background of Internet news confidence level present situation, this project is from the actual demand of UGC news clue certification, carry out the key technology research of internet UGC news content certification, will important researching value be had.
Summary of the invention
In order to the deficiency solved the problem, the invention provides the news certification method for early warning based on user-generated content and system.The object of the invention is the confidence level realizing automatic mining after input news clue and news time point, judge related news clues, and showed by visual result that user is abundant, authentication result, data and evidence intuitively.
For achieving the above object, the present invention extracts crucial elements of certificate from internet UGC news clue, and from user group, the gordian technique of confidence level certification is carried out in the aspects such as communication mode and the information content (content of multimedia, content of text) to it.The early warning classification of final formation UGC accuracy of news, for whether this news is for real information provides decision support.
A kind of news certification early warning system based on user-generated content provided by the invention, comprising:
News clue semantic extension module, for obtaining news clue, and carries out semantic extension to this news clue, obtains reference data;
Supplemental characteristic oriented acquisition module, for carrying out oriented acquisition to this reference data, the behavioural information of obtaining information content, transmission mode, user group, user group and attribute information;
Semantic knowledge extraction module, extracts semantic knowledge from this information content;
Semantic knowledge contrast authentication module, carries out cluster and Similarity Measure to this semantic knowledge, identifies and the unmatched news clue of history news clue database;
User group's authentication module, analyzes behavioural information and the attribute information of user group, and the confidence level elements of certificate of extraction behavior information and attribute information, obtains the confidence level authentication result R (U) of this user group;
Information content authentication module, for the information content being carried out logic contrast certification, the mood of this information content being reacted, news viewpoint are classified, and obtain this information content authentication result R (M);
Communication mode authentication module, for excavating the communication mode of this news viewpoint, detecting and having abnormal communication mode, obtaining this communication mode abnormality degree authentication result R (G);
Early warning diversity module, for according to these authentication results and corresponding weight thereof, carries out early warning classification to the confidence level of this news clue.
The described news certification early warning system based on user-generated content, also comprise: authentication result display module, for the confidence level authentication result R (U) of this user group, this information content authentication result R (M), this communication mode abnormality degree authentication result R (G) are carried out arrangement and structuring displaying.
The described news certification early warning system based on user-generated content, also comprises:
This reference data comprises micro-blog information link, author's account of news clue label, source microblogging and correspondence;
This information content comprises, content of text and content of multimedia;
This semantic knowledge comprises this multimedia and text semantic knowledge, and this semantic information of multimedia knowledge refers to the visual fingerprint extracting high stability and distinction from this content of multimedia;
This early warning classification computing formula is: R={w 1r (U)+w 2r (G)+w 3r (M) }, wherein w 1, w 2, w 3for weight.
The described news certification early warning system based on user-generated content, user group's authentication module also comprises:
Identification module, first identifies the user group relevant to this news clue in microblog;
Specificity analysis module, carries out depth analysis to the key figure in this user group, sums up behavioural information and the attribute information of this key figure;
Feature extraction module, from the behavior information and attribute information extract user's bean vermicelli number, user pays close attention to number, user collects number, the mutual powder number of user, user geographic position, user profile, user's concern, user's bean vermicelli, user tag, user's mutual powder certification ratio and user's microblogging number, structuring user's colony certification analytical model.
The described news certification early warning system based on user-generated content, this communication mode authentication module also comprises:
Analysis module, for analyzing the source microblogging of this news viewpoint, and the user group in this news viewpoint communication process;
Filtering module, for obtaining comment public sentiment and the communication mode of this user group, detects the news viewpoint of exception or matching conflict, and filters this user group further.
The present invention also provides a kind of news certification method for early warning based on user-generated content, it is characterized in that, comprising:
S1, obtains news clue, and carries out semantic extension to this news clue, obtains reference data;
S2, carries out oriented acquisition to this reference data, the behavioural information of obtaining information content, communication mode, user group, user group and attribute information;
S3, extracts semantic knowledge from this information content;
S4, carries out cluster and Similarity Measure to this semantic knowledge, identifies and the unmatched news clue of history news clue database;
S5, analyzes behavioural information and the attribute information of user group, and extraction behavior information and attribute information confidence level elements of certificate, obtain the confidence level authentication result R (U) of this user group;
S6, carries out logic excavation to the information content, and the mood this information content reflected, news viewpoint are classified, and obtains this information content authentication result R (M);
S7, excavates the communication mode of this news viewpoint, detects and has abnormal communication mode, obtain this communication mode abnormality degree authentication result R (G);
S8, according to these evaluation results and corresponding weight thereof, carries out early warning classification to this news clue.
The described news certification method for early warning based on user-generated content, also comprises:
By the confidence level authentication result R (U) of this user group, this information content authentication result R (M), this communication mode abnormality degree authentication result R (G) carries out arrangement and structuring is shown.
The described news certification method for early warning based on user-generated content, also comprises:
This reference data comprises micro-blog information link, author's account of news clue label, source microblogging and correspondence;
This information content comprises, content of text and content of multimedia;
This semantic knowledge comprises this multimedia and text semantic knowledge, and this semantic information of multimedia knowledge refers to the visual fingerprint extracting high stability and distinction from this content of multimedia;
This early warning classification computing formula is: R={w 1r (U)+w 2r (G)+w 3r (M) }, wherein w 1, w 2, w 3for weight.
Described user-generated content news authentication method, this step S5 also comprises:
S51, first identifies the user group relevant to this news clue in microblog;
S52, carries out depth analysis to the key figure of this user group, sums up behavioural information and the attribute information of this key figure;
S53, from the behavior information and attribute information extract user's bean vermicelli number, user pays close attention to number, user collects number, the mutual powder number of user, user geographic position, user profile, user's concern, user's bean vermicelli, user tag, user's mutual powder certification ratio and user's microblogging number, structuring user's colony certification analytical model.
Described user-generated content news authentication method, this step S7 also comprises:
S71, analyzes the propagating source of this news viewpoint, and the user group in this news viewpoint communication process;
S72, by obtaining comment public sentiment and the communication mode of this user group, detecting and having the abnormal information content and news viewpoint, and further filter user colony.
From above scheme, the invention has the advantages that:
One is that the present invention can excavate by carrying out the degree of depth to UGC semantic knowledge and analyze, and carries out early warning classification to its confidence level;
Two is utilize method of the present invention or system can reach the accuracy rate of about 70% to the certification of media event.
Accompanying drawing explanation
Fig. 1 is news certification early warning system frame diagram;
Fig. 2 is news certification method for early warning process flow diagram;
Embodiment
Provide embodiment below, by reference to the accompanying drawings the present invention is described in detail, but not as a limitation of the invention.
News certification early warning system based on user-generated content provided by the invention, by carrying out semantic extension to news clue, obtain reference data, this reference data is carried out oriented acquisition, obtain the information content that this news clue is relevant, transmission mode, user group, the behavioural information of user group and attribute information etc., semantic knowledge extraction is carried out to the information content, this semantic knowledge is carried out cluster and Similarity Measure, identify and the unmatched news clue of history news clue database, and from user group, communication mode and information content tripartite carry out the gordian technique of reliability assessment in the face of it.The rank metric of final formation UGC accuracy of news and early warning, for whether this news is for real information provides decision support.
See Fig. 1, Fig. 1 is the system framework schematic diagram of one embodiment of the invention, news certification early warning system based on user-generated content of the present invention comprises, news clue semantic extension module 1, supplemental characteristic oriented acquisition module 2, knowledge extraction module 3, certification analysis module 4, this certification analysis module comprises, semantic knowledge contrast authentication module 4a, user group's authentication module 4b, communication mode authentication module 4c, information content authentication module 4d, certification display module 5, this certification display module comprises: early warning diversity module 5a, authentication result display module 5b;
News clue semantic extension module 1, obtains news clue, and carries out semantic extension to this news clue from UGC news data source, obtains reference data; UGC news certification clue comprises news clue keyword or phrase, the contingent initial time of news, termination time form jointly.According to hints data, obtain the micro-blog information link, author's account etc. of representative clue label, source microblogging and correspondence, as reference data.
Supplemental characteristic oriented acquisition module 2, for carrying out oriented acquisition to this reference data, obtaining the depth information page, comprising: the behavioural information of the information content, transmission mode, user group, user group and attribute information, this information content comprises, content of text and content of multimedia, specifically, first carries out simulation and logs in, and by analyzing the information acquisition mandate of Sina, obtaining and gathering license, and secondly, system, by access authentication mandate, uses every data, services that microblogging provides, finally oriented acquisition is carried out to reference data, this reference data oriented acquisition refers to and utilizes micro-blog information acquisition system (public technology, number of patent application 2013102981197) collect comprehensive information at short notice, obtain the information content of this news clue, user group (blazer), the behavior of user group and attribute information, viewpoint, transmission mode etc., specifically, by this semantic knowledge and microblogging link, microblogging ID, the link of the structure such as user ID management static page request, the link of this management static page request is sent to micro blog server, obtain the data in static page, these data comprise this news clue relevant microblog, microblogging comment and microblogging distribution link, each personal information with this news clue associated user, user pays close attention to, the information such as user's bean vermicelli.Obtain the depth information page that this reference data is relevant, this depth information page comprises this news clue relevant microblog, and microblogging is commented on, and microblogging conversion link and the userspersonal information relevant to this news clue, user pay close attention to, user's bean vermicelli.
Semantic knowledge extraction module, extracts semantic knowledge from this information content; Specifically, from news clue to be certified and reference data, find in order to quick the data that there is conflict, and carry out quick and precisely comparison between the two, validity feature need be extracted from this reference data, and refine into semantic knowledge.Further, this semantic knowledge extraction module carries out correlation analysis to news clue to be certified and supplemental characteristic, filters the data that correlativity is lower, and the data that retention relationship is higher are used for further contrasting certification; For content of text, first from text content, extract the elements of news such as keyword, personage, time, place, organizational structure and event header, as the semantic knowledge of content of text.For content of multimedia, from picture and video, extract the visual fingerprint of high stability and distinction, as the semantic knowledge of content of multimedia.
Certification analysis module is mainly analyzed the behavioural information of the information content, transmission mode, user group, user group and attribute information, obtain the certification assessment result of each several part, below four modules be that never Tongfang carries out automatic mining in the face of this news clue;
Semantic knowledge contrast authentication module 4a, cluster and Similarity Measure are carried out to this semantic knowledge, this clustering method adopts Once-clustering (Single Pass Cluster), this semantic knowledge and history news clue database are contrasted and examines, identify and the unmatched semantic knowledge of history news clue database; Carry out the examination of false key element by the method for text semantic knowledge, the comparison of semantic information of multimedia knowledge element, and generate the example evidence of authentication result based on this.Considering history news clue data to detecting the whether false effect of news, the news authentication efficiency that there is false key element can be improved; Specifically, the fundamental of news comprises personage, time, the semantic informations such as place.In addition, element of news also comprises the semantic knowledge that knowledge extraction module extracts.By extracting the capable cluster of these semantic knowledges and Similarity Measure, can identify the Deceptive news having notable difference with history news clue data more quickly, these history news clue data refer to the news data source before the generation of this news clue.Furtherly, element of news comparison will be carried out based on history news clue database, comprise examining information such as related person, relevant place, associated mechanisms description, event initial time, End Event, duration, text semantic knowledge and semantic information of multimedia knowledge.
User group's authentication module 4b, for analyzing behavioural information and the attribute information of user group, extraction behavior information and attribute information confidence level elements of certificate, obtain the confidence level authentication result R (U) of this user group; Specifically, predict the behavioral trait of user group from the visible user social contact relation data of static state, thus judge the confidence level of user group.Personage is the core of a media event, and first this user group's authentication module identifies the user group relevant to news clue by algorithm in microblog.Carry out depth analysis for the key figure participating in this media event, comprise everyone hobby, network public opinion, social circle, Regional Distribution etc. carry out labor, sum up behavioural information and the attribute information of this personage, for confirming that this media event true and false provides important clue.From the behavior information and attribute information extract, comprise user's bean vermicelli number, user pay close attention to number, user collects number, user's mutual powder number, user geographic position, user profile, user's concern, user's bean vermicelli, user tag, user's mutual powder certification ratio and user microblogging number, for assessing User reliability.The assessment models that final formation is analyzed based on the certification of user group, assessment result is with R (U).
Information content authentication module 4c, for carrying out logic contrast certification to the information content, information content certification comprises the logic certification to attributes such as the time comprised in information text content, place, organizational structure, quantity of information, also comprises the logic comparison certification to information multi-media content (picture, video etc.).The mood this information content reflected, news viewpoint are classified, and obtain this information content authentication result R (M); Specifically, this module is to all microbloggings involved by this information content, and opining mining is carried out in subevent, obtain this information content attributive classification, the analysis of public opinion of this information content of final formation on microblogging and classifying content, be used to guide the reliability assessment of the information content, the final assessment models forming the certification based on the information content and analyze, obtaining information content evaluation result.
Communication mode authentication module 4d, for excavating the communication mode of this news viewpoint, detecting and having abnormal communication mode, obtaining this communication mode abnormality degree authentication result R (G); Excavate by the interpersonal communication relevant to this news clue and Information Communication situation, judge the confidence level of news dependent event, specifically, first analyze the propagating source of this news viewpoint, and the user group in this news viewpoint communication process.Then by obtaining comment public sentiment and the communication mode of this user group, whether extremely, detect this communication mode, and further filter this user group, the final assessment models forming the certification based on communication mode abnormality degree and analyze, assessment result R (G) represents.
Early warning diversity module 5a, for according to these evaluation results and corresponding weight thereof, carries out early warning classification to this news clue; Further, the assessment result of other modules comprehensive, induction-arrangement and early warning are carried out, for follow-up decision-making provides data analysis, supports and displaying to data such as there is the abnormal information content, news viewpoint, communication mode, user group, the behavioural information of user group and attribute information.Early warning form is grading forewarning system, is divided into 10 ranks, and the confidence level of the higher information of rank is lower.1 ~ 5 grade be normal, 6 ~ 10 grades for confidence level abnormal.The assessment result of each part is shown separately, meanwhile, and comprehensive assessment result R={w 1r (U)+w 2r (G)+w 3r (M) } represent, w ifor the weight of different piece.
Authentication result display module 5b, for by the confidence level authentication result R (U) according to this user group, this information content authentication result R (M), this communication mode abnormality degree authentication result R (G), on the one hand element of news and background are shown, on the other hand reference data is arranged, and carry out imagery and structuring displaying, this authentication result is exported to the display of application service window.
Shown in Fig. 2, the invention still further relates to a kind of news certification method for early warning based on user-generated content, the method comprises:
Step S1, news clue semantic extension, obtains news clue, and carries out semantic extension to this news clue, obtains reference data;
Step S2, supplemental characteristic oriented acquisition, carries out oriented acquisition to this reference data, the behavioural information of obtaining information content, communication mode, user group, user group and attribute information; This information content comprises content of text and content of multimedia, specifically, comprises the steps:
Step S21, simulation logs in.By the information acquisition mandate of Sina, obtain and gather license.
Step S22, after logining successfully, system, by access authentication mandate, uses every data, services that microblogging provides;
Step S23, carries out directed query expansion and reference data collection from the information content, communication mode, user group's three entrances to this news clue, obtains the depth information page that this news clue is relevant.Comprise this news clue relevant microblog, microblogging comment and microblogging distribution link, each personal information with news associated user, user pay close attention to, the information such as user's bean vermicelli.
Step S3, semantic knowledge extracts, from this information content, extract semantic knowledge; Specifically, from news clue to be certified and reference data, find in order to quick the data that there is conflict, and carry out quick and precisely comparison between the two, validity feature need be extracted from this reference data, and refine into senior evental news key element.
Step S31, carries out correlation analysis to news clue to be certified and supplemental characteristic, filters the data that correlativity is lower, and the data that retention relationship is higher are used for further contrasting certification; Such as picture relevant for news clue to be certified and reference data picture are compared, for whether the picture that this news clue to be certified of discovery is relevant utilizes other history figures to modify obtain.
Step S32, refines senior text and semantic information of multimedia knowledge from this reference data.
Wherein, this information content comprises, content of text and content of multimedia, for content of text, first from text content, extracts the elements of news such as keyword, personage, time, place, organizational structure and event header, as the semantic knowledge of content of text.For content of multimedia, from picture and video, extract the visual fingerprint of high stability and distinction, as the semantic knowledge of content of multimedia.
Step S4, semantic knowledge contrast certification, cluster and Similarity Measure are carried out to the semantic knowledge of the text and content of multimedia, this clustering method adopts Once-clustering (Single Pass Cluster), this semantic knowledge and history news clue database are contrasted and examines, identify and the unmatched news clue of history news clue data;
Step S5, user group's certification, analyzes behavioural information and the attribute information of user group, and extraction behavior information and attribute information confidence level elements of certificate, obtain the confidence level authentication result R (U) of this user group; Behavioural information refers to dissemination, comment behavior, relationship behavior, marking behavior etc., attribute information refers to user locations, user type, user tag, user profile, age of user, user area, professional history, education experience, user's bean vermicelli number, user pays close attention to number, user collects number, the mutual powder number of user.
Step S51, first identifies the user group relevant to this news clue in microblog;
Step S52, carries out depth analysis to the key figure of this user group, sums up behavioural information and the attribute information of this key figure;
Step S53, from the behavior information and attribute information extract user's bean vermicelli number, user pays close attention to number, user collects number, the mutual powder number of user, user geographic position, user profile, user's concern, user's bean vermicelli, user tag, user's mutual powder certification ratio and user's microblogging number, structuring user's colony certification analytical model.
Step S6, information content certification, carries out logic excavation to the information content, and the semantic knowledges such as the mood this information content reflected, news viewpoint are classified, detect abnormal feeling degree and news viewpoint abnormality degree, obtain this information content authentication result R (M);
Information content logic is excavated the logic comprised the time comprising in information text content or relate to, place, associated mechanisms or the attribute such as group, quantity of information and is excavated, and also comprises and excavating the logic comparison of information multi-media content (picture, video etc.).For all microbloggings that this news clue relates to, and opining mining is carried out in subevent, obtains the affiliated classification of this information content.The analysis of public opinion of the whole news clue of final formation on microblogging and classifying content, in order to the reliability assessment of tutorial message content.The assessment models that final formation is analyzed based on information content certification, assessment result R (M) represents.
Step S7, excavates the communication mode of this different news viewpoint, detects and has abnormal communication mode, obtain this communication mode abnormality degree authentication result R (G); Comprise the steps:
Step S71, dissecting needle to the propagating source of the different news viewpoints that news clue produces, and promotes the user group of this viewpoint dissemination;
Step S72, by the communication mode of the comment public sentiment and news viewpoint that obtain this user group, detects and has the abnormal information content and news viewpoint.
Step S8, according to these evaluation results and corresponding weight thereof, carries out early warning classification to this news clue;
Other authentication results comprehensive, the information content, news viewpoint, communication mode, user group, the behavioural information of user group and attribute information etc. of conflicting news clue to be certified and history news clue database is had to carry out induction-arrangement and early warning, for follow-up decision-making provides data analysis, supports and displaying.Early warning form is grading forewarning system, is divided into 10 ranks, and the confidence level of the higher information of rank is lower.1 ~ 5 grade be normal, 6 ~ 10 grades for confidence level abnormal.The assessment result of each part is shown separately, meanwhile, and comprehensive assessment result R={w 1r (U)+w 2r (G)+w 3r (M) } represent, w ifor the weight of different piece.
Step S9, by the confidence level authentication result R (U) according to this user group, this information content authentication result R (M), this communication mode abnormality degree authentication result R (G), on the one hand element of news and background are shown, on the other hand reference data is arranged, and carry out imagery and structuring displaying.This authentication result is transferred to the display of application service window;
In order to describe in more detail the implementation process of said method, from information content authenticated connection for an embodiment, if having such news clue, the time period: on Dec 9,5 days ~ 2010 Dec in 2010, clue keyword: certain dead report of gold.
Suppose news clue to be certified be gold certain have and do not have dead, so, how carried out confidence level and the early warning classification of this news clue of certification by a large amount of data processings according to the present invention?
Step S1, obtains news clue from UGC data source, and carries out semantic extension to this news clue, obtains reference data; This news clue comprises news clue keyword or phrase " certain dead report of gold ", the contingent beginning and ending time section of news " on Dec 9 ,-2010 years on the 5th Dec in 2010 "; According to this hints data, obtain the micro-blog information link, author's account etc. of representative clue label, source microblogging and correspondence, as reference data.
Step S2, carries out directed information collection to this reference data such as source microblogging, author's account, obtain the information content " gold certain, 19XX was born March 22, and because certain disease was 19: 07 on the 6th Dec in 2010, at certain harbor city, so-and-so hospital is dead.”
Step S2, from this news report (information content), carry out semantic knowledge extraction, extract personage's " gold certain ", time " 19XX March 22 ", " on Dec 6th, 2010 ", place " certain harbor city ", the semantic knowledges such as organizational structure " so-and-so hospital ";
Step S3, directed information collection is carried out to this semantic knowledge, that is directed information collection is carried out to " personage ", " time ", " place ", " organizational structure ", obtain personage's " gold certain basic document ", time " activities before 2,010 6 days Dec in 2010 of annuity ", place " gold certain and certain harbor city relevant information ", organizational structure's " golden certain and so-and-so hospital's relevant information " even depth page info;
Step S4, this semantic knowledge above step extracted and degree of depth page info and history news clue database carry out logic comparison certification, find certain date of birth mistake of gold, should be and be born 19XX March 10, timing error, it attends certain college credit doctor awarding ceremony on Dec 5th, 2010, organizational structure's mistake, do not have so-and-so hospital, place is correct, gold certain settle down certain harbor city;
Step S5, this early warning system judges that this news report is Deceptive news, this information content authentication result is shown with visual means, obtains advanced warning grade.
Embodiment two, from news clue transmission mode authenticated connection for an embodiment
Such as have such news report " on March 25th, 2013, be called as the most beauty child in Shenzhen Shenzhen girl after 90s literary composition certain move passerby to disabled beggar's feeding in the street.”
The news clue of this news report comprises keyword, " the most beauty child in Shenzhen ", the news clue data such as news clue beginning and ending time " on March 29,25 days ~ 2013 March in 2013 ".According to these news clue data, obtain representative clue label, source microblogging ID, user ID etc.
Semantic knowledge extraction is carried out to this news report, obtain personage's " literary composition certain (leading lady) ", " gold certain few chivalrous (reporter) ", " Zheng (China certain society vice president) ", time " on March 25th, 2013 ", place " Shenzhen seashore ", mechanism's " certain new net; special zone, Shenzhen is reported ", the semantic knowledges such as multimedia document " picture, video ";
To the retrieval that this news report is carried out in all directions, micro-blog information acquisition system is utilized to collect comprehensive information (number of patent application: 2013102981197) at short notice, obtain the information content of this news clue, user group (blazer), the behavior of user group and attribute information, viewpoint, transmission mode etc., specifically, by this semantic knowledge and microblogging link, microblogging ID, the link of the structure such as user ID management static page request, the link of this management static page request is sent to micro blog server, obtain the data in static page, these data comprise this news clue relevant microblog, microblogging comment and microblogging distribution link, and personal information that is each and this news clue associated user, user pays close attention to, the information such as user's bean vermicelli.
The information content logic of this news clue is excavated, this information content logic is excavated the logic comprised the time comprising in information text content or relate to, place, associated mechanisms or the attribute such as group, quantity of information and is excavated, and also comprises and excavating the logic comparison of information multi-media content (picture, video etc.).The semantic knowledges such as the mood this information content reflected, news viewpoint are classified, and for all microbloggings that this news clue relates to, and opining mining is carried out in subevent, obtains the affiliated classification of this information content.The analysis of public opinion of the whole news clue of final formation on microblogging and classifying content, in order to the reliability assessment of tutorial message content.News report such as about the most beauty child in this Shenzhen carries out information content certification, and the news viewpoint obtaining this information content is as follows, newly certain net, and the viewpoint of certain People's Net is publicity positive energy; Photo address queried by certain Shenzhen reader; Writer-sky query reporter's gold certain few chivalrous be network pushing hands; Zheng, certain new net Shenzhen responsible official deny it being pseudo event; Certain new net is apologized; Microblog users is condemned.
By excavating this information content logic, obtain this information content and reflect different news viewpoint or mood, after carrying out viewpoint analysis, the communication mode of this news viewpoint and the user group that promotes this viewpoint dissemination are analyzed, such as excavate the communication mode in certain hour section of this news viewpoint, 21: 7 on the 25th March, the most beauty child in report report Shenzhen, special zone, Shenzhen, information source, in certain new net, reports reporter's stone; 8: 45 on the 26th March, certain People's Net, newly certain Wang Deng each medium reprints; In 17: 06 on the 26th March, it is propagandize that certain Shenzhen reader is solved; In 23: 33 on the 26th March, writer-sky is exposed and is disclosed that people's stone is network pushing hands; In 8: 57 on the 27th March, North America net report reporter denies it being pseudo event; In 7: 55 on the 28th March, certain new net, Zheng, stone is apologized.On the other hand to promoting analyzing with colony (blazer) of this dissemination of news, such as can from disseminator role analysis to the blazer of this news clue, the publisher of such as this news clue, certain new net, special zone, Shenzhen is reported, crucial blazer: certain Shenzhen reader, writer-sky, certain People's Net, can from blazer's attributive analysis to the blazer of this news clue, such as media associated user: special zone, Shenzhen is reported, certain People's Net, certain China of society online story, common large V: writer-sky, certain is few chivalrous for gold, Zheng etc., by to after the multianalysis of these user groups, obtain blazer and public sentiment is discussed, with propagation trend, detect and there is abnormal news viewpoint,
In a word, utilize method provided by the invention can after input news clue and news time point automatic mining, judge the confidence level of related news clues.And show user to bring abundant, authentication result, data and evidence intuitively by visual result.

Claims (10)

1., based on a news certification early warning system for user-generated content, it is characterized in that, comprising:
News clue semantic extension module, for obtaining news clue, and carries out semantic extension to this news clue, obtains reference data;
Supplemental characteristic oriented acquisition module, for carrying out oriented acquisition to this reference data, the behavioural information of obtaining information content, transmission mode, user group, user group and attribute information;
Semantic knowledge extraction module, extracts semantic knowledge from this information content;
Semantic knowledge contrast authentication module, carries out cluster and Similarity Measure to this semantic knowledge, identifies and the unmatched news clue of history news clue database;
User group's authentication module, analyzes behavioural information and the attribute information of user group, and the confidence level elements of certificate of extraction behavior information and attribute information, obtains the confidence level authentication result R (U) of this user group;
Information content authentication module, for the information content being carried out logic contrast certification, the mood of this information content being reacted, news viewpoint are classified, and obtain this information content authentication result R (M);
Communication mode authentication module, for excavating the communication mode of this news viewpoint, detecting and having abnormal communication mode, obtaining this communication mode abnormality degree authentication result R (G);
Early warning diversity module, for according to these authentication results and corresponding weight thereof, carries out early warning classification to the confidence level of this news clue.
2. as claimed in claim 1 based on the news certification early warning system of user-generated content, it is characterized in that, also comprise: authentication result display module, for the confidence level authentication result R (U) of this user group, this information content authentication result R (M), this communication mode abnormality degree authentication result R (G) are carried out arrangement and structuring displaying.
3., as claimed in claim 1 based on the news certification early warning system of user-generated content, it is characterized in that, also comprise:
This reference data comprises micro-blog information link, author's account of news clue label, source microblogging and correspondence;
This information content comprises, content of text and content of multimedia;
This semantic knowledge comprises this multimedia and text semantic knowledge, and this semantic information of multimedia knowledge refers to the visual fingerprint extracting high stability and distinction from this content of multimedia;
This early warning classification computing formula is: R={w 1r (U)+w 2r (G)+w 3r (M) }, wherein w 1, w 2, w 3for weight.
4., as claimed in claim 1 based on the news certification early warning system of user-generated content, it is characterized in that, user group's authentication module also comprises:
Identification module, first identifies the user group relevant to this news clue in microblog;
Specificity analysis module, carries out depth analysis to the key figure in this user group, sums up behavioural information and the attribute information of this key figure;
Feature extraction module, from the behavior information and attribute information extract user's bean vermicelli number, user pays close attention to number, user collects number, the mutual powder number of user, user geographic position, user profile, user's concern, user's bean vermicelli, user tag, user's mutual powder certification ratio and user's microblogging number, structuring user's colony certification analytical model.
5., as claimed in claim 1 based on the news certification early warning system of user-generated content, it is characterized in that, this communication mode authentication module also comprises:
Analysis module, for analyzing the source microblogging of this news viewpoint, and the user group in this news viewpoint communication process;
Filtering module, for obtaining comment public sentiment and the communication mode of this user group, detects the news viewpoint of exception or matching conflict, and filters this user group further.
6., based on a news certification method for early warning for user-generated content, it is characterized in that, comprising:
S1, obtains news clue, and carries out semantic extension to this news clue, obtains reference data;
S2, carries out oriented acquisition to this reference data, the behavioural information of obtaining information content, communication mode, user group, user group and attribute information;
S3, extracts semantic knowledge from this information content;
S4, carries out cluster and Similarity Measure to this semantic knowledge, identifies and the unmatched news clue of history news clue database;
S5, analyzes behavioural information and the attribute information of user group, and extraction behavior information and attribute information confidence level elements of certificate, obtain the confidence level authentication result R (U) of this user group;
S6, carries out logic excavation to the information content, and the mood this information content reflected, news viewpoint are classified, and obtains this information content authentication result R (M);
S7, excavates the communication mode of this news viewpoint, detects and has abnormal communication mode, obtain this communication mode abnormality degree authentication result R (G);
S8, according to these evaluation results and corresponding weight thereof, carries out early warning classification to this news clue.
7., as claimed in claim 6 based on the news certification method for early warning of user-generated content, it is characterized in that, also comprise:
By the confidence level authentication result R (U) of this user group, this information content authentication result R (M), this communication mode abnormality degree authentication result R (G) carries out arrangement and structuring is shown.
8., as claimed in claim 6 based on the news certification method for early warning of user-generated content, it is characterized in that, also comprise:
This reference data comprises micro-blog information link, author's account of news clue label, source microblogging and correspondence;
This information content comprises, content of text and content of multimedia;
This semantic knowledge comprises this multimedia and text semantic knowledge, and this semantic information of multimedia knowledge refers to the visual fingerprint extracting high stability and distinction from this content of multimedia;
This early warning classification computing formula is: R={w 1r (U)+w 2r (G)+w 3r (M) }, wherein w 1, w 2, w 3for weight.
9. user-generated content news authentication method as claimed in claim 6, it is characterized in that, this step S5 also comprises:
S51, first identifies the user group relevant to this news clue in microblog;
S52, carries out depth analysis to the key figure of this user group, sums up behavioural information and the attribute information of this key figure;
S53, from the behavior information and attribute information extract user's bean vermicelli number, user pays close attention to number, user collects number, the mutual powder number of user, user geographic position, user profile, user's concern, user's bean vermicelli, user tag, user's mutual powder certification ratio and user's microblogging number, structuring user's colony certification analytical model.
10. user-generated content news authentication method as claimed in claim 6, it is characterized in that, this step S7 also comprises:
S71, analyzes the propagating source of this news viewpoint, and the user group in this news viewpoint communication process;
S72, by obtaining comment public sentiment and the communication mode of this user group, detecting and having the abnormal information content and news viewpoint, and further filter user colony.
CN201410414956.6A 2014-08-21 2014-08-21 News certification method for early warning and system based on user-generated content Active CN104636408B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410414956.6A CN104636408B (en) 2014-08-21 2014-08-21 News certification method for early warning and system based on user-generated content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410414956.6A CN104636408B (en) 2014-08-21 2014-08-21 News certification method for early warning and system based on user-generated content

Publications (2)

Publication Number Publication Date
CN104636408A true CN104636408A (en) 2015-05-20
CN104636408B CN104636408B (en) 2017-08-08

Family

ID=53215168

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410414956.6A Active CN104636408B (en) 2014-08-21 2014-08-21 News certification method for early warning and system based on user-generated content

Country Status (1)

Country Link
CN (1) CN104636408B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682532A (en) * 2016-12-08 2017-05-17 宇龙计算机通信科技(深圳)有限公司 Method and device for processing message
CN108829656A (en) * 2017-05-03 2018-11-16 腾讯科技(深圳)有限公司 The data processing method and data processing equipment of the network information
CN109345075A (en) * 2018-08-31 2019-02-15 深圳市轱辘汽车维修技术有限公司 A kind of professional person, which authenticates, investigates method, apparatus and terminal device
CN110537176A (en) * 2017-02-21 2019-12-03 索尼互动娱乐有限责任公司 method for determining the authenticity of news
CN111125588A (en) * 2018-10-30 2020-05-08 北京国双科技有限公司 Propagation effect graph drawing and evaluating method and device, storage medium and processor
CN111414496A (en) * 2020-03-27 2020-07-14 腾讯科技(深圳)有限公司 Artificial intelligence-based multimedia file detection method and device
CN111611561A (en) * 2020-06-09 2020-09-01 中国电子科技集团公司第二十八研究所 Edge-hierarchical-user-oriented unified management and control method for authentication and authorization
CN111881881A (en) * 2020-08-10 2020-11-03 晶璞(上海)人工智能科技有限公司 Machine intelligent text recognition credibility judgment method based on multiple dimensions
CN112583804A (en) * 2020-12-05 2021-03-30 星极实业(深圳)有限公司 Monitoring management system capable of tracking and evidence obtaining of network illegal behaviors in real time
CN115688707A (en) * 2022-12-08 2023-02-03 中国传媒大学 Multi-language mixed news value sorting method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080201285A1 (en) * 2005-12-21 2008-08-21 Tencent Technology (Shenzhen) Company Ltd. Method and apparatus for delivering network information
CN101751458A (en) * 2009-12-31 2010-06-23 暨南大学 Network public sentiment monitoring system and method
CN101763401A (en) * 2009-12-30 2010-06-30 暨南大学 Network public sentiment hotspot prediction and analysis method
CN102194001A (en) * 2011-05-17 2011-09-21 杭州电子科技大学 Internet public opinion crisis early-warning method
CN102708096A (en) * 2012-05-29 2012-10-03 代松 Network intelligence public sentiment monitoring system based on semantics and work method thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080201285A1 (en) * 2005-12-21 2008-08-21 Tencent Technology (Shenzhen) Company Ltd. Method and apparatus for delivering network information
CN101763401A (en) * 2009-12-30 2010-06-30 暨南大学 Network public sentiment hotspot prediction and analysis method
CN101751458A (en) * 2009-12-31 2010-06-23 暨南大学 Network public sentiment monitoring system and method
CN102194001A (en) * 2011-05-17 2011-09-21 杭州电子科技大学 Internet public opinion crisis early-warning method
CN102708096A (en) * 2012-05-29 2012-10-03 代松 Network intelligence public sentiment monitoring system based on semantics and work method thereof

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682532A (en) * 2016-12-08 2017-05-17 宇龙计算机通信科技(深圳)有限公司 Method and device for processing message
CN110537176A (en) * 2017-02-21 2019-12-03 索尼互动娱乐有限责任公司 method for determining the authenticity of news
US12072943B2 (en) 2017-02-21 2024-08-27 Sony Interactive Entertainment LLC Marking falsities in online news
CN108829656A (en) * 2017-05-03 2018-11-16 腾讯科技(深圳)有限公司 The data processing method and data processing equipment of the network information
CN109345075A (en) * 2018-08-31 2019-02-15 深圳市轱辘汽车维修技术有限公司 A kind of professional person, which authenticates, investigates method, apparatus and terminal device
CN111125588B (en) * 2018-10-30 2023-04-07 北京国双科技有限公司 Method and device for drawing and evaluating propagation effect graph, storage medium and processor
CN111125588A (en) * 2018-10-30 2020-05-08 北京国双科技有限公司 Propagation effect graph drawing and evaluating method and device, storage medium and processor
CN111414496A (en) * 2020-03-27 2020-07-14 腾讯科技(深圳)有限公司 Artificial intelligence-based multimedia file detection method and device
CN111414496B (en) * 2020-03-27 2023-04-07 腾讯科技(深圳)有限公司 Artificial intelligence-based multimedia file detection method and device
CN111611561A (en) * 2020-06-09 2020-09-01 中国电子科技集团公司第二十八研究所 Edge-hierarchical-user-oriented unified management and control method for authentication and authorization
CN111611561B (en) * 2020-06-09 2022-09-06 中国电子科技集团公司第二十八研究所 Edge-hierarchical-user-oriented unified management and control method for authentication and authorization
CN111881881A (en) * 2020-08-10 2020-11-03 晶璞(上海)人工智能科技有限公司 Machine intelligent text recognition credibility judgment method based on multiple dimensions
CN112583804A (en) * 2020-12-05 2021-03-30 星极实业(深圳)有限公司 Monitoring management system capable of tracking and evidence obtaining of network illegal behaviors in real time
CN115688707A (en) * 2022-12-08 2023-02-03 中国传媒大学 Multi-language mixed news value sorting method

Also Published As

Publication number Publication date
CN104636408B (en) 2017-08-08

Similar Documents

Publication Publication Date Title
CN104636408A (en) News authentication early warning method and system based on user generated content
Zannettou et al. On the origins of memes by means of fringe web communities
Hou et al. Survey on data analysis in social media: A practical application aspect
Flatow et al. On the accuracy of hyper-local geotagging of social media content
Barbier et al. Provenance data in social media
Hristakieva et al. The spread of propaganda by coordinated communities on social media
Kovacs-Gyori et al. # London2012: Towards citizen-contributed urban planning through sentiment analysis of twitter data
CN106354845A (en) Microblog rumor recognizing method and system based on propagation structures
CN104536956A (en) A Microblog platform based event visualization method and system
Kalampokis et al. Combining social and government open data for participatory decision-making
CN103927297A (en) Evidence theory based Chinese microblog credibility evaluation method
CN102945268A (en) Method and system for excavating comments on characteristics of product
CN103577404A (en) Microblog-oriented discovery method for new emergencies
CN106529492A (en) Video topic classification and description method based on multi-image fusion in view of network query
CN105447144A (en) Microblog forwarding visualization analysis method and system based on big data analysis technology
CN105677906A (en) Automatic collecting and analyzing system and method for network events
Waghmare et al. Fake news detection of social media news in blockchain framework
Liang et al. An integrated approach of sensing tobacco-oriented activities in online participatory media
Firmansyah et al. Enhancing disaster response with automated text information extraction from social media images
KR20190019589A (en) System and Method for Checking Fact
Li et al. Vandalism detection in OpenStreetMap via user embeddings
KR102025813B1 (en) Device and method for chronological big data curation system
Mouty et al. Survey on steps of truth detection on Arabic tweets
Kim et al. Recognition using Cyber bullying in view of Semantic-Enhanced Minimized Auto-Encoder
Brenner et al. Multimodal detection, retrieval and classification of social events in web photo collections

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant