CN107222381A - The propagation path of microblog data determines method and apparatus - Google Patents

The propagation path of microblog data determines method and apparatus Download PDF

Info

Publication number
CN107222381A
CN107222381A CN201610162128.7A CN201610162128A CN107222381A CN 107222381 A CN107222381 A CN 107222381A CN 201610162128 A CN201610162128 A CN 201610162128A CN 107222381 A CN107222381 A CN 107222381A
Authority
CN
China
Prior art keywords
microblog data
forwarding
mark
microblog
identified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610162128.7A
Other languages
Chinese (zh)
Other versions
CN107222381B (en
Inventor
王文文
杨建武
赵增峰
郑孙雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New Founder Holdings Development Co ltd
Peking University
Beijing Founder Electronics Co Ltd
Original Assignee
Peking University
Peking University Founder Group Co Ltd
Beijing Founder Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University, Peking University Founder Group Co Ltd, Beijing Founder Electronics Co Ltd filed Critical Peking University
Priority to CN201610162128.7A priority Critical patent/CN107222381B/en
Publication of CN107222381A publication Critical patent/CN107222381A/en
Application granted granted Critical
Publication of CN107222381B publication Critical patent/CN107222381B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/21Monitoring or handling of messages
    • H04L51/216Handling conversation history, e.g. grouping of messages in sessions or threads
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/52User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail for supporting social networking services

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

Method and apparatus are determined the present invention relates to a kind of propagation path of microblog data, by gathering microblog data, and each microblog data collected is parsed, the microblog data of included forwarding is determined, and the author mark for the microblog data being forwarded, original content mark are determined from the microblog data of forwarding;Identified again by determining whether there is the forwarding user being identified to from publisher between author mark in the content information of microblog data, to form a forwarding relation chain of this microblog data;Further according to original content mark, in all microblog datas, it is determined that all forwarding relation chains corresponding with original content mark;All forwarding relation chains corresponding to each original content mark carry out deduplication operation, so as to obtain the propagation path that each original content identifies each self-corresponding microblog data.Realize and the microblog data of transmission on Internet is traced to its source, grasp the propagation path of microblog data, it is ensured that the information security interests of country and the public.

Description

The propagation path of microblog data determines method and apparatus
Technical field
Determined the present invention relates to the propagation path of microblog data communication sphere, more particularly to a kind of microblog data Method and apparatus.
Background technology
With the fast development of internet, people constantly reprint various media by internet platform daily The event of report, such as Sina weibo, Tengxun's microblogging internet platform.
Internet platform has Information Communication fast, and user carries out the propagation of microblog data with virtual user name The features such as, this allows for the microblog data reprinted, and it is propagated, the path of diffusion is difficult to be slapped comprehensively Hold.If in the microblog data reprinted have negatively instigate, the speech of dangerous menace, its is right The public can form safely threat.
Therefore, the method set up and can be tracked to the Spreading and diffusion path of microblog data is needed badly, with For being controlled to information source tracing and to the propagation path of information, it is ensured that national information safety, it is ensured that The public gets the positive and positive network information.
The content of the invention
The present invention provides a kind of propagation path of microblog data and determines method and apparatus, existing for solving It is fast to spread speed in technology, the problem of microblog data contained much information is difficult to control its propagation path, The present invention can extract its turn reprinted by analyzing microblog data from microblog data Relation chain is sent out, the microblog data of transmission on Internet is traced to its source so as to realize, microblog data is obtained Propagation path, to ensure the information security interests of country and the public.
The present invention provides a kind of propagation path of microblog data and determines method, including:
Gather microblog data;The microblog data includes:It is the content information of the microblog data, described micro- The attribute information of rich data;Wherein, the attribute information of the microblog data includes:The microblog data Publisher's mark, the content information uniquely corresponding content identification with the microblog data;
The each microblog data collected is parsed, in each microblog data Determine whether to include in the microblog data microblog data of forwarding in content information;
The author mark of the microblog data of the forwarding is obtained, is obtained and the microblog data of the forwarding The unique corresponding original content mark of content information;Determine whether in the content information of the microblog data In the presence of the forwarding user mark being identified to from the publisher between the author mark, form forwarding and close Tethers;
According to the original content identify, in all microblog datas, it is determined that with the original content Identify corresponding all forwarding relation chains;
Corresponding all forwarding relation chains are identified to each original content and carry out deduplication operations, are obtained The propagation path of each self-corresponding microblog data is identified to each original content.
Optionally, it is described to be determined whether there is in the content information of the microblog data from the publisher The forwarding user mark between the author mark is identified to, forwarding relation chain is formed, including:
Determined whether there is in the content information of the microblog data from the publisher and be identified to the original Forwarding user mark between wound person's mark;
If in the presence of, sequence is forwarded according to the sequencing formation of the forwarding user mark arrangement, will be described Author mark is arranged on the original position of the forwarding sequence, publisher mark is arranged on described The final position of sequence is forwarded, the forwarding relation chain is formed;
If being not present, the forwarding for only including and publisher's mark being identified to from the author is formed Relation chain.
Optionally, determined whether there is in the content information of the microblog data from publisher mark Identified to the forwarding user between the author is identified, including:
The localization of text edit field in the content information of the microblog data;
In the text editing field, it is determined whether there is forwarding mark;
If in the presence of extraction is described to forward the forwarding user mark identified.
Optionally, the attribute information of the microblog data also includes:
The issuing time of the microblog data, the source web of the microblog data, the microblog data URL;
Accordingly, before the described pair of each microblog data collected is parsed, in addition to:
According to the issuing time of the microblog data, the source web of the microblog data, the microblogging number According to URL at least one of, the microblog data collected is classified and sorted;
The described pair of each microblog data collected is parsed, including:
According to the sequencing after the classification and sequence, carried out one by one to collecting the microblog data Parsing.
Optionally, it is described that corresponding all forwarding relation chains progress are identified to each original content Deduplication operation, obtains the propagation path that each original content identifies each self-corresponding microblog data, bag Include:
The corresponding all forwarding relation chains of each original content mark are compared two-by-two, remove from First place in the forwarding relation chain starts, the row of each forwarding user mark and each forwarding user mark The forwarding relation chain that row sequencing is included by other forwarding relation chains completely.
The present invention also provides a kind of propagation path determining device of microblog data, including:Acquisition module, is used In collection microblog data;The microblog data includes:The content information of the microblog data, the microblogging The attribute information of data;Wherein, the attribute information of the microblog data includes:The hair of the microblog data Cloth person mark, the content information uniquely corresponding content identification with the microblog data;
Parsing module, for being parsed to each microblog data collected;
Determining module, for determining the microblog data in the content information of each microblog data In whether include the microblog data of forwarding;
Acquisition module, the author mark of the microblog data for obtaining the forwarding, is obtained and described turn The unique corresponding original content mark of the content information of the microblog data of hair;
The determining module, is additionally operable to determine whether there is in the content information of the microblog data from institute The forwarding user mark that publisher is identified between the author mark is stated, forwarding relation chain is formed;Root According to the original content identify, in all microblog datas, it is determined that with the original content mark pair All forwarding relation chains answered;
Deduplication module, enters for identifying corresponding all forwarding relation chains to each original content Row deduplication operation, obtains the propagation path that each original content identifies each self-corresponding microblog data.
Optionally, the determining module includes:
Determination sub-module is identified, for being determined whether there is in the content information of the microblog data from institute State the forwarding user mark that publisher is identified between the author mark;
Sequence determination sub-module, for determining exist from publisher mark in the mark determination sub-module Know after the forwarding user mark between the author mark, according to the forwarding user mark arrangement Sequencing formation forwarding sequence, the author is identified the original position for being arranged on the forwarding sequence, The publisher is identified to the final position for being arranged on the forwarding sequence, the forwarding relation chain is formed;
The sequence determination sub-module, is additionally operable to determine to be not present from described in the mark determination sub-module Publisher is identified to after the forwarding user mark between the author mark, is formed and only included from the original Wound person is identified to the forwarding relation chain of publisher's mark.
Optionally, the determining module includes:
Submodule is positioned, for the localization of text edit field in the content information of the microblog data;
Indicate determination sub-module, in the text editing field, it is determined whether there is forwarding mark;
Extracting sub-module, for after the mark determination sub-module determines to have the forwarding mark, carrying Take the forwarding user mark for forwarding and being identified.
Optionally, the attribute information of the microblog data also includes:
The issuing time of the microblog data, the source web of the microblog data, the microblog data URL;
Accordingly, described device also includes:
Classification and ordination module, for the issuing time according to the microblog data, the microblog data come At least one of in source website, the URL of the microblog data, the microblog data collected is entered Row classification and sequence;
The parsing module, specifically for according to the sequencing after the classification and sequence, one by one to adopting Collection obtains the microblog data and parsed.
Optionally, the deduplication module, specifically for each original content mark is corresponding all The forwarding relation chain is compared two-by-two, is removed since the first place in the forwarding relation chain, each forwarding User identifies and the arrangement sequencing of each forwarding user mark forwards what relation chains were included by other completely Forward relation chain.
A kind of propagation path for microblog data that the present invention is provided determines method and apparatus, by gathering microblogging Data, and each microblog data collected is parsed, with each microblog data Hold and the microblog data of included forwarding determined in information, and determine from the microblog data of forwarding by Author mark, the original content mark of the microblog data of forwarding;Believed again by the content in microblog data Determine whether there is the forwarding user being identified between author mark from publisher in breath to identify, to be formed One forwarding relation chain of this microblog data;Further according to original content mark, in all microblog datas, It is determined that all forwarding relation chains corresponding with original content mark;Corresponding institute is identified to each original content There is forwarding relation chain to carry out deduplication operation, so that obtaining each original content identifies each self-corresponding microblogging number According to propagation path.Realize and the microblog data of transmission on Internet is traced to its source, grasp the biography of microblog data Broadcast path, it is ensured that the information security interests of country and the public.
Brief description of the drawings
Figure 1A determines the flow chart of the embodiment one of method for the propagation path of microblog data of the present invention;
Figure 1B is a kind of propagation path distribution schematic diagram of Figure 1A illustrated embodiments one;
Fig. 2 determines the flow chart of the embodiment two of method for the propagation path of microblog data of the present invention;
Fig. 3 is the structural representation of the embodiment one of the propagation path determining device of microblog data of the present invention;
Fig. 4 is the structural representation of the embodiment two of the propagation path determining device of microblog data of the present invention.
Embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with this hair Bright embodiment, the technical scheme in the embodiment of the present invention is clearly and completely described.Need explanation , in accompanying drawing or specification, similar or identical element all uses identical reference.
Figure 1A determines the flow chart of the embodiment one of method for the propagation path of microblog data of the present invention, such as Shown in Figure 1A, the propagation path of microblog data determines method, including:
Step 101, collection microblog data.
In this step, microblog data includes:The content information of microblog data, the attribute letter of microblog data Breath;Wherein, the attribute information of microblog data includes:The publisher's mark and microblog data of microblog data The unique corresponding content identification of content information.The microblog data can be appointing on any internet platform The electronic data of meaning form, for example, picture, text, video etc..Publisher's mark of microblog data can Think ID or with ID corresponding use of the user on the internet platform for issuing the microblog data Name in an account book claims;For example, user's name can be microblog users " Zhang San ";The ID of Zhang San can be “80651236”;Content identification, is that the content of every microblog data for being sent to user is identified Identification information, the generation of the content identification can be by uniquely corresponding with it to every microblog data generation Serial data obtain, for example Message Digest Algorithm 5 MD5 codes (Message Digest Algorithm, Referred to as " MD5 "), the content identification and the content of the microblog data corresponding to it have unique corresponding relation, The content of corresponding microblog data can just be known according to content identification.
Step 102, each microblog data collected is parsed, in each microblog data Content information in determine whether include the microblog data of forwarding in microblog data.
In this step, all microblog datas collected are carried out with analysis one by one, to set up every The attribute information table of microblog data, specifies the personal feature of every microblog data.Can in the attribute information table It is (unique with microblog data content information equivalent to above-mentioned with the microblogging ID for including this microblogging Corresponding content identification), content of microblog (equivalent to the content information of microblog data), microblog users ID (publisher equivalent to microblog data identifies), issuing time, source web (put down by the issue of the microblogging Platform, such as Sina, Tengxun), forwarding microblogging ID (mark for turning originator for forwarding the content of microblog), URL (Uniform Resource Locator, referred to as:" URL ") etc. information.If to every During bar microblog data is parsed, find to include the microblog data for reprinting forwarding in microblog data Content, then the microblog data is marked, in case subsequently extracting what is be forwarded in the microblog data Microblog data, its propagation path information.
Step 103, the author for the microblog data for obtaining forwarding are identified, and obtain the microblog data with forwarding The unique corresponding original content mark of content information;Determine whether to deposit in the content information of microblog data In the forwarding user mark being identified to from publisher between author mark, forwarding relation chain is formed.
In this step, the author mark of the microblog data of forwarding is extracted from the microblog data, is led to For often, during being forwarded to a certain microblog data, the author information of the microblog data be with What the content information of microblog data was bound, therefore can be got from the microblog data of forwarding original Person identifies, for example, in the repeating process of microblogging, having the mark of "@Zhang San " in the first place of forwarding manuscript Know, then Zhang San identifies for the author of the forwarding manuscript.Meanwhile, in the content information of the microblog data Include two parts content, one be the microblog data publisher oneself viewpoint description, another is Other people original forwarding manuscript contents that the publisher of the microblog data reprints;Then original content be designated with The unique corresponding mark of the forwarding manuscript content.In addition, many platforms are provided with the propagation of the forwarding manuscript Routing information, can be according to the default forwarding user mark of different platform, in the content information of microblog data In determine from publisher and be identified to forwarding user mark between author mark, to be formed comprising original Person identifies → forwarded user mark 1 → forwarding user mark 2 → forwarding user and identifies 3 → publisher mark Forward relation chain.
Step 104, identified according to original content, in all microblog datas, it is determined that with original content mark Know corresponding all forwarding relation chains.
In this step, identified according to the original content determined in step 103, in other microblog datas In find it is same corresponding forwarding microblogging identified to the original content carried out other of forwarding operation forward Relation chain, so as to find whole forwarding relations of the original microblogging sent by " Homeway.com " as shown in Figure 1B Chain.
Step 105, all forwarding relation chains corresponding to each original content mark carry out deduplication operation, Obtain the propagation path that each original content identifies each self-corresponding microblog data.
In this step, to the forwarding relation chain of the different length acquired, if wherein exist repeat and Forwarding relation chain with inclusion relation, then can by by inclusion relation forwarding relation chain removal, Retain longer forwarding relation chain.Due to the invention aims to determine the propagation path of microblog data, Then for the path repeated, only retain a most complete paths from the beginning to the end, remove what is repeated Path, to mitigate the statistics amount to microblog data propagation path.For example, if a forwarding is micro- The forwarding relation chain won is A → B → C → D;And the forwarding relation chain that another obtains is A → B → C → D → E, then retain A → B → C → D → E forwarding relation chain, removes A → B → C → D Forwarding relation chain.It can be seen that having contained turning for A → B → C → D in A → B → C → D → E Path relation is sent out, therefore, it can remove a plurality of forwardings such as A → B → C → D, A → B → C, A → B Relation chain.
The propagation path for the microblog data that the present embodiment is provided determines method, by gathering microblog data, and The each microblog data collected is parsed, with the content information of each microblog data The microblog data of included forwarding is determined, and be forwarded micro- is determined from the microblog data of forwarding Author mark, the original content mark of rich data;Again by being determined in the content information of microblog data With the presence or absence of the forwarding user mark being identified to from publisher between author mark, to form this microblogging One forwarding relation chain of data;Further according to original content mark, in all microblog datas, it is determined that with Original content identifies corresponding all forwarding relation chains;Corresponding all forwardings are identified to each original content Relation chain carries out deduplication operation, so as to obtain the biography that each original content identifies each self-corresponding microblog data Broadcast path.Realize and the microblog data of transmission on Internet traced to its source, grasp the propagation path of microblog data, Ensure country and the information security interests of the public.
Fig. 2 determines the flow chart of the embodiment two of method for the propagation path of microblog data of the present invention, such as schemes Shown in 2, on the basis of above-described embodiment one, the method for the present embodiment includes:
Step 201, collection microblog data.
In this step, microblog data includes:The content information of microblog data, the attribute letter of microblog data Breath;Wherein, the attribute information of microblog data includes:The publisher's mark and microblog data of microblog data The unique corresponding content identification of content information;In addition, in the attribute information of the microblog data collected It can also include:The issuing time of microblog data, the source web of microblog data, the URL of microblog data Deng.
Step 202, the issuing time according to microblog data, the source web of microblog data, microblog data URL at least one of, the microblog data collected is classified and sorted.
In this step, the method microblog data collected classified and sorted can be by this area Technical staff is set according to the analysis target of microblog data, for example, it is desired to send out a certain network platform The propagation path of the microblog data of cloth is analyzed, then can be according to the source web of microblog data to microblogging Data are classified;Microblog data can also be ranked up according to time order and function or according to the period to micro- Rich data carry out segment processing etc..
Step 203, according to the sequencing after classification and sequence, enter one by one to collecting microblog data Row parsing, to determine whether include forwarding in microblog data in the content information of each microblog data Microblog data.
In this step, following three kinds of contents, Yi Zhongshi are generally comprised for the microblog data collected Only include by the original content A of publisher;The content can be any type of electronic data, picture, Video, text etc.;Other people the original content B forwarded by publisher can also only be included;Can be with Both other people the original content B forwarded by publisher were included, also comprising publisher to the content of the forwarding Comment on content;The comment content can be considered the original content A of publisher.Then can be clear from three kinds Appearance form is:1) A contents are only included;2) B contents are only included;3) A contents had both been included or had been included B contents.
Step 204, the author for the microblog data for obtaining forwarding are identified, and obtain the microblog data with forwarding The unique corresponding original content mark of content information.
In this step, generally each network platform uses specific tag mark to the microblog data of forwarding It is identified, such as includes "@XX " marks in Sina weibo forwarding content;Tengxun's microblogging forwarding content In also include "@XX " indicate;Wherein " XX " represents the author mark for the content being forwarded.And the mark Will symbol is located at the beginning location for being forwarded content, by knowing to the specific tag mark of the network platform Do not position and to the position that the tag mark occurs, it may be determined that go out the author mark of the forwarding content Know.The determination process identified to original content ibid, according to setting a property for each network platform, is found Identify location with the unique corresponding original content of the content information of the microblog data of forwarding and obtain and be somebody's turn to do Mark, for example, original content mark is arranged in the URL of the original content by many network platforms, It can then be got and the unique corresponding mark of its content by parsing the corresponding URL of original content.Need Illustrate, each network platform there can be the mark of its self-defined author mark and original content mark Standard, the application is not construed as limiting to this.
Step 205, determine whether there is in the content information of microblog data be identified to from publisher it is original Forwarding user mark between person's mark.If in the presence of execution step 206;If being not present, step is performed 207。
In this step, in the content information of microblog data, especially in the original content A portions of publisher Point there is the original content that is forwarded since author, the path relation traveled between the publisher, For example, Sina weibo platform is designated " //@AXX//@BXX//@CXX " to forward-path;Tengxun is micro- Rich platform to being designated of forward-path " | |@AXX | |@BXX | |@CXX ".Each " //@" or " | | after@" " AXX ", " BXX ", " CXX " for forward the original content forwarding character relation chain.Said process Implement and can pass through, the localization of text edit field in the content information of microblog data;In text In edit field, it is determined whether there is forwarding mark;If there is forwarding mark, extract forwarding and marked The forwarding user mark of knowledge.Because the information of above-mentioned instruction original content propagation path is generally comprised within issue The original content part A of person, that is, publisher can be commented on or text editing part;Therefore Publisher oneself can choose whether to disclose above-mentioned forwarding character relation chain, meanwhile, publisher can also be right The character relation chain is modified or deletion action.Therefore, when being positioned to the character relation chain, Can be by finding the text editing field in the content information of microblog data, such as " text " field, then at this Forwarding mark is obtained in field, such as " //@" or " | |@", so as to extract turning of being indicated after the forwarding mark Hair family is identified, and obtains forwarding character relation chain.
Step 206, the sequencing formation forwarding sequence according to forwarding user's mark arrangement, by author Mark is arranged on the original position of forwarding sequence, and publisher is identified to the final position for being arranged on forwarding sequence, Form forwarding relation chain.
In this step, what is indicated in the forwarding character relation chain generally acquired in previous step is Forwarding personage between author and publisher, if making the character relation chain complete, author is identified The original position of forwarding sequence is arranged on, publisher is identified to the final position for being arranged on forwarding sequence, shape Into complete forwarding relation chain.
Step 207, formation only include the forwarding relation chain that publisher's mark is identified to from author.
In this step, refer in step 205 because forwarding relation chain is commonly included in publisher's original The content part A of wound, that is, publisher can be commented on or text editing part;Therefore issue Person oneself can choose whether to disclose above-mentioned forwarding character relation chain, meanwhile, publisher can also be to the people Thing relation chain is modified or deletion action.Therefore, it is more likely that it is original interior to get this in part A The propagation path information of appearance, then propagation path now is most short propagation path, that is, directly from original Person then forms this and only includes the forwarding relation chain that publisher's mark is identified to from author to publisher.
Step 208, identified according to original content, in all microblog datas, it is determined that with original content mark Know corresponding all forwarding relation chains.
In this step, because original content mark is and content uniquely corresponding mark, therefore pass through this Mark can find all microblog datas for including original content mark, so as to include original at these The forwarding relation chain that the relevant original content of institute is identified is extracted in the microblog data for creating content identification, can To form forwarding relationship topology figure corresponding with original content mark according to all forwarding relation chains, such as Form shown in Figure 1B.
Step 209, by each original content mark it is corresponding it is all forwarding relation chains compare two-by-two, remove Since the first place in forwarding relation chain, the arrangement of each forwarding user mark and each forwarding user mark The forwarding relation chain that sequencing is included by other forwarding relation chains completely.
In this step, deduplication operation is carried out in all forwarding relation chains, is opened up with simplifying forwarding relation The complexity of figure is flutterred, the principle of the duplicate removal can be needed voluntarily by those skilled in the art according to actual count Setting, or remove since the first place in forwarding relation chain, it is each to forward user's mark and each The forwarding relation chain that the arrangement sequencing of forwarding user's mark is included by other forwarding relation chains completely, example Such as, the forwarding relation chain that a forwarding microblogging is obtained is A → B → C → D;And another obtained forwarding Relation chain is A → B → C → D → E, then retains A → B → C → D → E forwarding relation chain, is removed A → B → C → D forwarding relation chain.It can be seen that having been contained in A → B → C → D → E A → B → C → D forward-path relation, therefore, it can remove A → B → C → D, A → B → C, The a plurality of forwarding relation chain such as A → B.
Fig. 3 is the structural representation of the embodiment one of the propagation path determining device of microblog data of the present invention, As shown in figure 3, the device of the present embodiment includes:Acquisition module 31, for gathering microblog data;Microblogging Data include:The content information of microblog data, the attribute information of microblog data;Wherein, microblog data Attribute information includes:Microblog data publisher mark, it is uniquely corresponding with the content information of microblog data Content identification;Parsing module 32, for being parsed to each microblog data collected;It is determined that Whether module 33, turn for determining to include in microblog data in the content information of each microblog data The microblog data of hair;Acquisition module 34, the author mark of the microblog data for obtaining forwarding, is obtained Identified with the unique corresponding original content of the content information of the microblog data of forwarding;Determining module 33, is also used Determine whether there is and be identified to from publisher between author mark in the content information in microblog data User's mark is forwarded, forwarding relation chain is formed;Identified according to original content, in all microblog datas, It is determined that all forwarding relation chains corresponding with original content mark;Deduplication module 35, for each original The corresponding all forwarding relation chains of content identification carry out deduplication operation, obtain each original content mark respective The propagation path of corresponding microblog data.
The device of the present embodiment, can be used for the technical scheme for performing embodiment of the method one shown in Figure 1A, Its implementing principle and technical effect is similar, and here is omitted.
The propagation path determining device for the microblog data that the present embodiment is provided, by gathering microblog data, and The each microblog data collected is parsed, with the content information of each microblog data The microblog data of included forwarding is determined, and be forwarded micro- is determined from the microblog data of forwarding Author mark, the original content mark of rich data;Again by being determined in the content information of microblog data With the presence or absence of the forwarding user mark being identified to from publisher between author mark, to form this microblogging One forwarding relation chain of data;Further according to original content mark, in all microblog datas, it is determined that with Original content identifies corresponding all forwarding relation chains;Corresponding all forwardings are identified to each original content Relation chain carries out deduplication operation, so as to obtain the biography that each original content identifies each self-corresponding microblog data Broadcast path.Realize and the microblog data of transmission on Internet traced to its source, grasp the propagation path of microblog data, Ensure country and the information security interests of the public.
Fig. 4 is the structural representation of the embodiment two of the propagation path determining device of microblog data of the present invention, As shown in figure 4, the device of the present embodiment is on the basis of Fig. 3 shown devices, further, it is determined that mould Block 33 includes:Determination sub-module 331 is identified, for determining whether to deposit in the content information of microblog data In the forwarding user mark being identified to from publisher between author mark;Sequence determination sub-module 332, For determining there is the forwarding being identified to from publisher between author mark in mark determination sub-module 331 After user's mark, according to the sequencing formation forwarding sequence of forwarding user's mark arrangement, by author mark Know the original position for being arranged on forwarding sequence, publisher identified to the final position for being arranged on forwarding sequence, Form forwarding relation chain;Sequence determination sub-module 332 is additionally operable to determine not in mark determination sub-module 331 After the forwarding user mark being identified to from publisher between author mark, formed and only included from original Person is identified to the forwarding relation chain of publisher's mark.
Optionally, determining module 33 includes:Submodule 333 is positioned, for the content letter in microblog data Localization of text edit field in breath;Indicate determination sub-module 334, in text editing field, really It is fixed to indicate with the presence or absence of forwarding;Extracting sub-module 335, for determining to deposit in mark determination sub-module 334 After forwarding mark, the identified forwarding user mark of forwarding mark is extracted.
Optionally, the attribute information of microblog data also includes:Issuing time, the microblog data of microblog data Source web, the URL of microblog data;Accordingly, device also includes:Classification and ordination module 36, is used In the issuing time according to microblog data, the source web of microblog data, the URL of microblog data extremely One item missing, is classified and is sorted to the microblog data collected;Parsing module 32, specifically for root According to the sequencing after classification and sequence, parsed one by one to collecting microblog data.
Optionally, deduplication module 35, specifically for the corresponding all forwardings of each original content mark are closed Tethers is compared two-by-two, is removed since the first place in forwarding relation chain, and each forwarding user identifies and each The forwarding relation chain that the arrangement sequencing of forwarding user's mark is included by other forwarding relation chains completely.
The device of the present embodiment, can be used for the technical scheme for performing embodiment of the method two shown in Fig. 2, its Implementing principle and technical effect are similar, and here is omitted.
Finally it should be noted that:The above embodiments are merely illustrative of the technical solutions of the present invention, rather than to it Limitation;Although the present invention is described in detail with reference to the foregoing embodiments, the ordinary skill of this area Personnel should be understood:It can still modify to the technical scheme described in previous embodiment, or Equivalent substitution is carried out to which part technical characteristic;And these modifications or replacement, do not make relevant art The essence of scheme departs from the spirit and scope of various embodiments of the present invention technical scheme.

Claims (10)

1. a kind of propagation path of microblog data determines method, it is characterised in that including:
Gather microblog data;The microblog data includes:It is the content information of the microblog data, described micro- The attribute information of rich data;Wherein, the attribute information of the microblog data includes:The microblog data Publisher's mark, the content information uniquely corresponding content identification with the microblog data;
The each microblog data collected is parsed, in each microblog data Determine whether to include in the microblog data microblog data of forwarding in content information;
The author mark of the microblog data of the forwarding is obtained, is obtained and the microblog data of the forwarding The unique corresponding original content mark of content information;Determine whether in the content information of the microblog data In the presence of the forwarding user mark being identified to from the publisher between the author mark, form forwarding and close Tethers;
According to the original content identify, in all microblog datas, it is determined that with the original content Identify corresponding all forwarding relation chains;
Corresponding all forwarding relation chains are identified to each original content and carry out deduplication operations, are obtained The propagation path of each self-corresponding microblog data is identified to each original content.
2. according to the method described in claim 1, it is characterised in that described in the microblog data Hold the forwarding user for determining whether there is and being identified to from the publisher between the author mark in information Mark, forms forwarding relation chain, including:
Determined whether there is in the content information of the microblog data from the publisher and be identified to the original Forwarding user mark between wound person's mark;
If in the presence of, sequence is forwarded according to the sequencing formation of the forwarding user mark arrangement, will be described Author mark is arranged on the original position of the forwarding sequence, publisher mark is arranged on described The final position of sequence is forwarded, the forwarding relation chain is formed;
If being not present, the forwarding for only including and publisher's mark being identified to from the author is formed Relation chain.
3. according to the method described in claim 1, it is characterised in that believe in the content of the microblog data The forwarding user mark being identified to from the publisher between the author mark is determined whether there is in breath Know, including:
The localization of text edit field in the content information of the microblog data;
In the text editing field, it is determined whether there is forwarding mark;
If in the presence of extraction is described to forward the forwarding user mark identified.
4. the method according to any one of claims 1 to 3, it is characterised in that the microblog data Attribute information also includes:
The issuing time of the microblog data, the source web of the microblog data, the microblog data URL;
Accordingly, before the described pair of each microblog data collected is parsed, in addition to:
According to the issuing time of the microblog data, the source web of the microblog data, the microblogging number According to URL at least one of, the microblog data collected is classified and sorted;
The described pair of each microblog data collected is parsed, including:
According to the sequencing after the classification and sequence, carried out one by one to collecting the microblog data Parsing.
5. according to the method described in claim 1, it is characterised in that described to each original content Identify corresponding all forwarding relation chains and carry out deduplication operation, obtain each original content mark The propagation path of each self-corresponding microblog data, including:
The corresponding all forwarding relation chains of each original content mark are compared two-by-two, remove from First place in the forwarding relation chain starts, the row of each forwarding user mark and each forwarding user mark The forwarding relation chain that row sequencing is included by other forwarding relation chains completely.
6. a kind of propagation path determining device of microblog data, it is characterised in that including:
Acquisition module, for gathering microblog data;The microblog data includes:The microblog data it is interior Hold information, the attribute information of the microblog data;Wherein, the attribute information of the microblog data includes: Publisher's mark of the microblog data, with the unique corresponding content mark of the content information of the microblog data Know;
Parsing module, for being parsed to each microblog data collected;
Determining module, for determining the microblog data in the content information of each microblog data In whether include the microblog data of forwarding;
Acquisition module, the author mark of the microblog data for obtaining the forwarding, is obtained and described turn The unique corresponding original content mark of the content information of the microblog data of hair;
The determining module, is additionally operable to determine whether there is in the content information of the microblog data from institute The forwarding user mark that publisher is identified between the author mark is stated, forwarding relation chain is formed;Root According to the original content identify, in all microblog datas, it is determined that with the original content mark pair All forwarding relation chains answered;
Deduplication module, enters for identifying corresponding all forwarding relation chains to each original content Row deduplication operation, obtains the propagation path that each original content identifies each self-corresponding microblog data.
7. device according to claim 6, it is characterised in that the determining module includes:
Determination sub-module is identified, for being determined whether there is in the content information of the microblog data from institute State the forwarding user mark that publisher is identified between the author mark;
Sequence determination sub-module, for determining exist from publisher mark in the mark determination sub-module Know after the forwarding user mark between the author mark, according to the forwarding user mark arrangement Sequencing formation forwarding sequence, the author is identified the original position for being arranged on the forwarding sequence, The publisher is identified to the final position for being arranged on the forwarding sequence, the forwarding relation chain is formed;
The sequence determination sub-module, is additionally operable to determine to be not present from described in the mark determination sub-module Publisher is identified to after the forwarding user mark between the author mark, is formed and only included from the original Wound person is identified to the forwarding relation chain of publisher's mark.
8. device according to claim 6, it is characterised in that the determining module includes:
Submodule is positioned, for the localization of text edit field in the content information of the microblog data;
Indicate determination sub-module, in the text editing field, it is determined whether there is forwarding mark;
Extracting sub-module, for after the mark determination sub-module determines to have the forwarding mark, carrying Take the forwarding user mark for forwarding and being identified.
9. the device according to any one of claim 6~8, it is characterised in that the microblog data Attribute information also includes:
The issuing time of the microblog data, the source web of the microblog data, the microblog data URL;
Accordingly, described device also includes:
Classification and ordination module, for the issuing time according to the microblog data, the microblog data come At least one of in source website, the URL of the microblog data, the microblog data collected is entered Row classification and sequence;
The parsing module, specifically for according to the sequencing after the classification and sequence, one by one to adopting Collection obtains the microblog data and parsed.
10. device according to claim 6, it is characterised in that
The deduplication module, specifically for each original content is identified into corresponding all forwardings Relation chain is compared two-by-two, is removed since the first place in the forwarding relation chain, each forwarding user mark And the forwarding relation that the arrangement sequencing of each forwarding user mark is included by other forwarding relation chains completely Chain.
CN201610162128.7A 2016-03-21 2016-03-21 Microblog data propagation path determining method and device Expired - Fee Related CN107222381B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610162128.7A CN107222381B (en) 2016-03-21 2016-03-21 Microblog data propagation path determining method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610162128.7A CN107222381B (en) 2016-03-21 2016-03-21 Microblog data propagation path determining method and device

Publications (2)

Publication Number Publication Date
CN107222381A true CN107222381A (en) 2017-09-29
CN107222381B CN107222381B (en) 2020-03-06

Family

ID=59928361

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610162128.7A Expired - Fee Related CN107222381B (en) 2016-03-21 2016-03-21 Microblog data propagation path determining method and device

Country Status (1)

Country Link
CN (1) CN107222381B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108170842A (en) * 2018-01-16 2018-06-15 重庆邮电大学 Hot microblog topic source tracing method based on tripartite graph model
CN108737252A (en) * 2018-05-17 2018-11-02 立旃(上海)科技有限公司 Information-pushing method based on block chain and device
CN108874625A (en) * 2018-05-31 2018-11-23 泰康保险集团股份有限公司 Information processing method and device, electronic equipment, storage medium
CN110110974A (en) * 2019-04-17 2019-08-09 福建天泉教育科技有限公司 The recognition methods of crucial leader of opinion and computer readable storage medium
CN110912809A (en) * 2019-12-23 2020-03-24 京东数字科技控股有限公司 Information sharing chain generation method and device, electronic equipment and storage medium
CN111447137A (en) * 2020-02-29 2020-07-24 中国平安人寿保险股份有限公司 Browsing condition data analysis method and device, server and storage medium
CN113536092A (en) * 2018-06-26 2021-10-22 创新先进技术有限公司 Retrieval method and device for propagation content

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001094600A (en) * 1999-09-24 2001-04-06 Oki Electric Ind Co Ltd Message transfer node and network
WO2010145958A1 (en) * 2009-06-15 2010-12-23 Deutsche Telekom Ag Personalized speech bubbles
CN102375866A (en) * 2010-08-24 2012-03-14 腾讯科技(深圳)有限公司 Rebroadcasting message presenting method and system
CN102387126A (en) * 2010-09-01 2012-03-21 腾讯科技(深圳)有限公司 Method, server, client and system for converging single microblog message
CN102622443A (en) * 2012-03-13 2012-08-01 北京邮电大学 Customized screening system and method for microblog
CN103379019A (en) * 2012-04-20 2013-10-30 腾讯科技(深圳)有限公司 Microblog message pushing method, device and system
CN103885993A (en) * 2012-12-24 2014-06-25 北大方正集团有限公司 Public opinion monitoring method and device for microblog
CN104092598A (en) * 2014-07-03 2014-10-08 厦门欣欣信息有限公司 Message propagation path extraction method and system
CN104243234A (en) * 2014-09-11 2014-12-24 清华大学 Method and system for constructing user relationship in social network communication topology
CN104778210A (en) * 2015-03-13 2015-07-15 国家计算机网络与信息安全管理中心 Microblog forwarding tree and forwarding forest building method
CN104954236A (en) * 2015-06-19 2015-09-30 百度在线网络技术(北京)有限公司 Method and device for generating information of propagation path for theme event
CN105227425A (en) * 2014-05-26 2016-01-06 腾讯科技(北京)有限公司 The method of syndication message, equipment and network social intercourse system

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001094600A (en) * 1999-09-24 2001-04-06 Oki Electric Ind Co Ltd Message transfer node and network
WO2010145958A1 (en) * 2009-06-15 2010-12-23 Deutsche Telekom Ag Personalized speech bubbles
CN102375866A (en) * 2010-08-24 2012-03-14 腾讯科技(深圳)有限公司 Rebroadcasting message presenting method and system
CN102387126A (en) * 2010-09-01 2012-03-21 腾讯科技(深圳)有限公司 Method, server, client and system for converging single microblog message
CN102622443A (en) * 2012-03-13 2012-08-01 北京邮电大学 Customized screening system and method for microblog
CN103379019A (en) * 2012-04-20 2013-10-30 腾讯科技(深圳)有限公司 Microblog message pushing method, device and system
CN103885993A (en) * 2012-12-24 2014-06-25 北大方正集团有限公司 Public opinion monitoring method and device for microblog
CN105227425A (en) * 2014-05-26 2016-01-06 腾讯科技(北京)有限公司 The method of syndication message, equipment and network social intercourse system
CN104092598A (en) * 2014-07-03 2014-10-08 厦门欣欣信息有限公司 Message propagation path extraction method and system
CN104243234A (en) * 2014-09-11 2014-12-24 清华大学 Method and system for constructing user relationship in social network communication topology
CN104778210A (en) * 2015-03-13 2015-07-15 国家计算机网络与信息安全管理中心 Microblog forwarding tree and forwarding forest building method
CN104954236A (en) * 2015-06-19 2015-09-30 百度在线网络技术(北京)有限公司 Method and device for generating information of propagation path for theme event

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108170842A (en) * 2018-01-16 2018-06-15 重庆邮电大学 Hot microblog topic source tracing method based on tripartite graph model
CN108170842B (en) * 2018-01-16 2021-12-14 重庆邮电大学 Microblog hot topic tracing method based on three-part graph model
CN108737252A (en) * 2018-05-17 2018-11-02 立旃(上海)科技有限公司 Information-pushing method based on block chain and device
CN108737252B (en) * 2018-05-17 2021-02-26 立旃(上海)科技有限公司 Information pushing method and device based on block chain
CN108874625A (en) * 2018-05-31 2018-11-23 泰康保险集团股份有限公司 Information processing method and device, electronic equipment, storage medium
CN108874625B (en) * 2018-05-31 2021-09-10 泰康保险集团股份有限公司 Information processing method and device, electronic equipment and storage medium
CN113536092A (en) * 2018-06-26 2021-10-22 创新先进技术有限公司 Retrieval method and device for propagation content
CN110110974A (en) * 2019-04-17 2019-08-09 福建天泉教育科技有限公司 The recognition methods of crucial leader of opinion and computer readable storage medium
CN110110974B (en) * 2019-04-17 2022-03-29 福建天泉教育科技有限公司 Key opinion leader identification method and computer readable storage medium
CN110912809A (en) * 2019-12-23 2020-03-24 京东数字科技控股有限公司 Information sharing chain generation method and device, electronic equipment and storage medium
WO2021129379A1 (en) * 2019-12-23 2021-07-01 京东数字科技控股股份有限公司 Information sharing chain generation method and apparatus, electronic device, and storage medium
CN111447137A (en) * 2020-02-29 2020-07-24 中国平安人寿保险股份有限公司 Browsing condition data analysis method and device, server and storage medium

Also Published As

Publication number Publication date
CN107222381B (en) 2020-03-06

Similar Documents

Publication Publication Date Title
CN107222381A (en) The propagation path of microblog data determines method and apparatus
Gharge et al. An integrated approach for malicious tweets detection using NLP
US7730409B2 (en) Method and system for visualizing weblog social network communities
Hogan Analyzing social networks
CN110912889B (en) Network attack detection system and method based on intelligent threat intelligence
CN104462547B (en) A kind of method and system of configurable collecting webpage data
US9563770B2 (en) Spammer group extraction apparatus and method
Solorio et al. Sockpuppet detection in wikipedia: A corpus of real-world deceptive writing for linking identities
CN101340308A (en) Network rubbish information filtering architecture, Network rubbish information cleaning system and method thereof
CN104915438B (en) A method of obtaining PCU associated data in specific topics microblogging
CN109213858B (en) Automatic identification method and system for network water army
CN108920955B (en) Webpage backdoor detection method, device, equipment and storage medium
CN103984747B (en) Method and device for screen information processing
CN104572874B (en) A kind of abstracting method and device of webpage information
CN103593397B (en) A kind of method and apparatus of acquisition content of microblog
CN108985059B (en) Webpage backdoor detection method, device, equipment and storage medium
CN106897287A (en) Homepage Publishing decimation in time method and the device for Homepage Publishing decimation in time
CN104063491B (en) A kind of method and device that the detection page is distorted
Han et al. Construction on framework of rumor detection and warning system based on web mining technology
CN109791563B (en) Information collection system, information collection method, and recording medium
CN104965929B (en) A kind of data processing method and device
CN104714933B (en) A kind for the treatment of method and apparatus of documents editing
CN110069628A (en) A kind of accurate software design approach collecting meeting summary
Perera et al. Evaluation of Online Learning Activities of Accountancy Students Based on LMS Logs.
Ranga et al. Relationship between Foreign Direct Investment and Exchange Rate

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220622

Address after: 3007, Hengqin international financial center building, No. 58, Huajin street, Hengqin new area, Zhuhai, Guangdong 519031

Patentee after: New founder holdings development Co.,Ltd.

Patentee after: Peking University

Patentee after: BEIJING FOUNDER ELECTRONICS Co.,Ltd.

Address before: 100871, Beijing, Haidian District, Cheng Fu Road, No. 298, Zhongguancun Fangzheng building, 9 floor

Patentee before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd.

Patentee before: Peking University

Patentee before: BEIJING FOUNDER ELECTRONICS Co.,Ltd.

TR01 Transfer of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200306

CF01 Termination of patent right due to non-payment of annual fee