CN110287322B - Water flow processing method, system and equipment for social media flow - Google Patents

Water flow processing method, system and equipment for social media flow Download PDF

Info

Publication number
CN110287322B
CN110287322B CN201910567614.0A CN201910567614A CN110287322B CN 110287322 B CN110287322 B CN 110287322B CN 201910567614 A CN201910567614 A CN 201910567614A CN 110287322 B CN110287322 B CN 110287322B
Authority
CN
China
Prior art keywords
blogger
content
list
analyzed
bloggers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910567614.0A
Other languages
Chinese (zh)
Other versions
CN110287322A (en
Inventor
孔晓晴
李百川
蔡锐涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Youmi Technology Co ltd
Original Assignee
Youmi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Youmi Technology Co ltd filed Critical Youmi Technology Co ltd
Priority to CN201910567614.0A priority Critical patent/CN110287322B/en
Publication of CN110287322A publication Critical patent/CN110287322A/en
Application granted granted Critical
Publication of CN110287322B publication Critical patent/CN110287322B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The invention provides a method, a system and equipment for processing water flow of social media flow, wherein the method comprises the following steps: the account information of a plurality of bloggers to be analyzed and the flow data corresponding to each piece of content released in a preset period are obtained. And classifying the bloggers to be analyzed based on the account information of the bloggers to be analyzed to obtain a plurality of groups of blogger lists. And performing normality test on all contents in the blogger list aiming at each blogger list to determine whether all contents in the blogger list accord with normal distribution or not. And according to the detection result, performing unary outlier detection or anomaly detection on all the contents in the blogger list to obtain the water flow in the flow data of each content. And calculating the water flow of each blog owner to be analyzed in each blog owner list according to the water flow of each content in each blog owner list. In the scheme, the moisture flow of the blogger is evaluated through the content of the blogger and the flow data of the content, and the difficulty of obtaining the relevant information of the blogger is low, the evaluation cost is low and the accuracy is high.

Description

Water flow processing method, system and equipment for social media flow
Technical Field
The invention relates to the technical field of data processing, in particular to a method, a system and equipment for processing water flow of social media flow.
Background
With the development of the internet, social media software is becoming an important part of people's daily life. The user publishes various types of contents on the social media software for other users to browse, so that the information can be rapidly spread and popularized through the social media software.
In the social media software, some users employ users commonly called "network water force" to publish and propagate a large number of specific messages for the purpose of increasing forwarding amount and comment amount, etc., in order to increase their influence or achieve some purpose. However, most network naves are generally robot accounts, and when information is forwarded and spread, the number of real users receiving the information is small, the information spreading quality is low, and the real influence of the users cannot be accurately judged. Therefore, the proportion number of the network water army in the fan of the user needs to be evaluated, so that the water flow in the network flow of the user is evaluated.
At present, the mode for evaluating the occupation ratio number of network navy in the fan of a user is as follows: presetting various water army characteristics, acquiring detailed information of fans and reviewers of a user, and if the information of the fans or reviewers of the user accords with the preset number of water army characteristics in the various water army characteristics, indicating that the fans or reviewers are network water army. However, in the first aspect, the strategy for manufacturing the water army is changed continuously, the accuracy of the water army can be ensured only by manually updating the preset water army features continuously, and the updating cost is high. In the second aspect, when evaluating the water army, the detailed information of all fans and reviewers of the user needs to be acquired, and the acquisition difficulty is high. In the third aspect, the daily behaviors of some real users may accord with a plurality of water army features, so that the real users are easily judged as water army by mistake, and the accuracy of evaluating the water army is low.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, a system, and a device for processing social media traffic water flow, so as to solve the problems of the existing method for evaluating a water army, such as high cost for updating water army features, difficulty in obtaining detailed information of fans and reviewers, and low evaluation accuracy.
In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:
the embodiment of the invention discloses a water flow processing method of social media flow in a first aspect, which comprises the following steps:
acquiring account information of a plurality of bloggers to be analyzed;
acquiring flow data corresponding to each piece of content issued by each blogger to be analyzed in a preset period, wherein the flow data at least comprises: the click number of each content and the number of fans of the bloggers to be analyzed when the click number is obtained, wherein the click number comprises a forwarding number or a playing amount;
classifying the plurality of bloggers to be analyzed based on the account information of each blogger to be analyzed to obtain a plurality of groups of blogger lists, wherein one group of blogger lists corresponds to one type of blogger;
for each group of the blogger lists, performing normality test on all contents of all bloggers to be analyzed in the blogger lists, and determining whether all contents in the blogger lists conform to normal distribution or not;
if all the contents in the blogger list accord with normal distribution, performing unary outlier detection on all the contents in the blogger list to obtain the water flow in the flow data of each content;
if all the contents in the blogger list do not accord with normal distribution, carrying out abnormal detection on all the contents in the blogger list to obtain the water flow in the flow data of each content;
and calculating the water flow of each blog owner to be analyzed in each group of the blog owner lists according to the water flow of each content in each group of the blog owner lists.
Preferably, the obtaining of the account information of the plurality of bloggers to be analyzed includes:
the account names and the profiles of a plurality of popular bloggers, active bloggers and common bloggers are obtained from social media within a preset period, wherein the popular bloggers are bloggers in popular list or popular recommendation, the active bloggers are bloggers commented under the content published by the popular bloggers, and the common bloggers are bloggers obtained from non-popular list sorted according to the publishing time in the social media.
Preferably, the performing a normality test on all contents of all bloggers to be analyzed in the blogger list for each group of the blogger list to determine whether all contents in the blogger list conform to a normal distribution includes:
respectively taking logarithm of the number of clicks and number of fans corresponding to each content by taking a natural constant as a base number for each group of the blogger list;
calculating the ratio of the logarithm clicking number to the number of vermicelli, and taking the ratio as the vermicelli interaction ratio;
and carrying out normality test on the vermicelli interaction ratios corresponding to all the contents to determine whether all the contents in the blogger list accord with normal distribution or not.
Preferably, the performing a unary outlier detection on all the contents in the blogger list to obtain the water flow rate in the flow rate data of each content includes:
aiming at each content in the blogger list, calculating maximum likelihood estimation corresponding to the mean value and the variance of the flow data of the content by using a maximum likelihood estimation function;
respectively calculating probability distribution values corresponding to the contents and the mean value by using a cumulative distribution function to obtain a first probability distribution value and a second probability distribution value, wherein the first probability distribution value is a probability distribution value corresponding to each content, and the second probability distribution value is a probability distribution value corresponding to each group of mean values;
and calculating the difference value of the first probability distribution value and the second probability distribution value, and taking the difference value as the water content flow rate in the flow rate data of each content.
Preferably, the performing abnormality detection on all the contents in the blogger list to obtain the moisture flow rate in the flow rate data of each content includes:
and taking all contents in the blogger list as detection objects of the density-based abnormity detection mode, and calculating the water flow in the flow data of each content according to a local reachable density lrd formula and a local outlier LOF formula.
Preferably, the method further comprises:
and converting the value format of the water flow of each content into a preset format.
Preferably, the step of calculating the water flow rate of each to-be-analyzed blogger in each group of the blogger list according to the water flow rate of each content in each group of the blogger list includes:
according to the water flow of each content in each group of the blogger list, obtaining the water flow corresponding to each content of each blogger to be analyzed;
and calculating the average value of the water flow corresponding to each content of each blogger to be analyzed to obtain the water flow of each blogger to be analyzed.
The second aspect of the embodiment of the invention discloses a social media flow water flow processing system, which comprises:
the system comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring account information of a plurality of bloggers to be analyzed;
a second obtaining unit, configured to obtain traffic data corresponding to each piece of content issued by each blogger to be analyzed in a preset period, where the traffic data at least includes: the click number of each content and the number of fans of the bloggers to be analyzed when the click number is obtained, wherein the click number comprises a forwarding number or a playing amount;
the classification unit is used for classifying the plurality of bloggers to be analyzed based on the account information of each blogger to be analyzed to obtain a plurality of groups of blogger lists, wherein one group of blogger lists corresponds to one type of blogger;
the checking unit is used for performing a normality check on all contents of all bloggers to be analyzed in the blogger list aiming at each group of the blogger list and determining whether all the contents in the blogger list accord with normal distribution or not;
the first detection unit is used for performing unary outlier detection on all the contents in the blogger list to obtain the water flow in the flow data of each content if all the contents in the blogger list conform to normal distribution;
the second detection unit is used for carrying out abnormal detection on all the contents in the blogger list if all the contents in the blogger list do not conform to normal distribution so as to obtain the water flow in the flow data of each content;
and the calculating unit is used for calculating the water flow of each to-be-analyzed blog owner in each group of the blog owner list according to the water flow of each content in each group of the blog owner list.
The third aspect of the embodiment of the present invention discloses a storage medium, where the storage medium includes a stored program, and when the program runs, a device where the storage medium is located is controlled to execute the social media flow water flow rate processing method disclosed in the first aspect of the embodiment of the present invention.
The fourth aspect of the embodiments of the present invention discloses a processor, where the processor is configured to execute a program, where the program executes a method for processing social media traffic water flow rate as disclosed in the first aspect of the embodiments of the present invention.
Based on the method, the system and the equipment for processing the water flow of the social media flow, provided by the embodiment of the invention, the method comprises the following steps: the account information of a plurality of bloggers to be analyzed and the flow data corresponding to each piece of content released in a preset period are obtained. And classifying the plurality of bloggers to be analyzed based on the account information of the bloggers to be analyzed to obtain a plurality of groups of blogger lists. And aiming at each blogger list, performing normality test on all contents of all bloggers to be analyzed in the blogger list, and determining whether all contents in the blogger list accord with normal distribution or not. And according to the detection result, performing unary outlier detection or anomaly detection on all the contents in the blogger list to obtain the water flow in the flow data of each content. And calculating the water flow of each blog owner to be analyzed in each blog owner list according to the water flow of each content in each blog owner list. In the scheme, the moisture flow of the blogger is evaluated through the content of the blogger and the flow data of the content, and the evaluation accuracy is high. The preset water force characteristics do not need to be updated frequently, the updating cost is reduced, the detailed information of all fans and reviewers does not need to be acquired, and the difficulty in acquiring the relevant information of the bloggers is low.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a social media flow water flow processing method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a normal test provided by an embodiment of the present invention;
FIG. 3 is a flow chart of unary outlier detection provided by embodiments of the present invention;
FIG. 4 is a schematic flow chart of calculating network water flow rate according to an embodiment of the present invention;
FIG. 5 is a block diagram of a social media flow water flow processing system according to an embodiment of the present invention;
FIG. 6 is a block diagram of a social media flow water flow processing system according to an embodiment of the present invention;
FIG. 7 is a block diagram of a social media flow water flow processing system according to an embodiment of the present invention;
fig. 8 is a block diagram of a social media flow water flow processing system according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
According to the background technology, in the prior art, the method for evaluating the proportion number of the network water army in the fan of the user needs to manually and continuously update the preset water army characteristics to ensure the accuracy of evaluating the water army, and the updating cost is high. In the second aspect, when evaluating the water army, the detailed information of all fans and reviewers of the user needs to be acquired, and the acquisition difficulty is high. In the third aspect, the daily behaviors of some real users may accord with a plurality of water army features, so that the real users are easily judged as water army by mistake, and the accuracy of evaluating the water army is low.
Therefore, the embodiment of the invention provides a social media flow water flow processing method, system and device, which are used for evaluating the water flow of a blogger through the content of the blogger and flow data of the content so as to improve the evaluation accuracy. The preset water army features do not need to be updated frequently, so that the updating cost is reduced. The detailed information of all fans and reviewers does not need to be acquired, so that the difficulty in acquiring the information related to the bloggers is reduced.
The moisture value of the content of the blogger to be analyzed calculated in the embodiment of the invention specifically refers to: and the deviation percentage of the flow data of the normal contents released by the bloggers to be analyzed and the bloggers of the same platform and the same type is determined, wherein the normal contents refer to the contents of which the forwarding number and/or the playing amount have no fake making condition. For example, for a bookmaker to be analyzed whose fan is 100 million, the video playing amount of a certain bookmaker to be analyzed is 1000 million, and the video playing amount distributed by a bookmaker of the same type and on the same platform as the bookmaker to be analyzed is 100 million or less, so that it can be determined that the video of 1000 million playing amount of the bookmaker to be analyzed has moisture.
Referring to fig. 1, a flow chart of a social media flow water flow processing method provided by an embodiment of the present invention is shown, where the method for calculating the network water flow includes the following steps:
step S101: and acquiring account information of a plurality of bloggers to be analyzed.
In the process of specifically implementing the step S101, account names and profiles of a plurality of hit bloggers, active bloggers and ordinary bloggers are obtained from the social media within a preset period, where a hit blogger is a hit list or a blogger in hit recommendation, an active blogger is a blogger who comments on content published by the hit blogger, and an ordinary blogger is a blogger obtained from a non-hit list sorted according to publication time in the social media. Such as: assuming that the preset period is 1 month, acquiring account information of a popular blogger from popular list or popular recommendation issued by social media every day, acquiring account information of the blogger commented under the content of the popular blogger, and acquiring account information of a common blogger from non-popular list ordered according to the issuing time in the social media every day.
It should be noted that after account information of a trending blogger, an active blogger, and a general blogger is obtained, all obtained id of the bloggers need to be integrated, and repeat id bloggers need to be deleted. Social media to which embodiments of the present invention relate include, but are not limited to: social media such as microblogs, small red books, tremble videos, volcano videos, express videos and the like are not illustrated in the embodiment of the present invention.
Step S102: and acquiring flow data corresponding to each piece of content issued by each blogger to be analyzed in a preset period.
In the process of implementing step S102 specifically, the traffic data at least includes: the number of clicks of each piece of content and the number of fans of the bloggers to be analyzed when the number of clicks is obtained, wherein the number of clicks comprises a forwarding number or a playing amount. For example: for each blogger to be analyzed, acquiring the traffic data of each piece of content issued by the blogger to be analyzed within three months, wherein the traffic data of each piece of content at least comprises: the number of clicks of each piece of content and the number of fans of the blogger when the number of clicks of the content is acquired.
It should be noted that, when calculating the moisture flow rate of the blogger to be analyzed, the flow rate data of the blogger to be analyzed needs to be acquired. For different types of social platforms, the way of publishing content fakes is different, for example, for microblogs, fakes are generally made on the forwarding number of the content, and for short video platforms such as fast-handed and trembled videos, fakes are generally made on the playing amount of the content. Therefore, before calculating the water flow of the blogger to be analyzed, the type of the selected click number is determined according to different types of social platforms.
It should be noted that the number of clicks in the microblog referred to above is a forwarding number, which is only used for example, and the content released by the blogger in the microblog is a video, so the number of clicks in the microblog may also be a play amount. The specifically selected number of clicks is a forwarding number and/or a play amount, and is set by a technician according to an actual situation, which is not specifically limited in the embodiment of the present invention.
Step S103: and classifying the plurality of bloggers to be analyzed based on the account information of each blogger to be analyzed to obtain a plurality of groups of blogger lists.
In the process of specifically implementing step S103, the category of each blogger to be analyzed is determined according to the id and the profile of each blogger to be analyzed obtained in step S101, and a plurality of bloggers to be analyzed are classified to obtain a plurality of groups of blogger lists. Such as: and classifying the bloggers to be analyzed, wherein the bloggers to be analyzed are marked with the introduction profiles as 'food bloggers', 'music bloggers' or 'child-care bloggers', and respectively obtaining a food class blogger list, a music class blogger list and a child-care class blogger list.
It should be noted that, when the account information of the blogger to be analyzed is obtained, if there may be a blogger without a profile set, that is, the category of the blogger cannot be determined from the profile, the blogger is deleted from the multiple bloggers to be analyzed obtained in step S101.
Step S104: and for each group of the blogger lists, performing normality test on all contents of all bloggers to be analyzed in the blogger lists, and determining whether all contents in the blogger lists conform to normal distribution or not. If yes, go to step S105, and if not, go to step S106.
In the process of specifically implementing step S104, a preset normal test mode is used to perform a normality test on all contents of all bloggers to be analyzed in each group of the blogger list by using the flow data of each content acquired in step S102. All the contents of all the bloggers to be analyzed in each group of the blogger list are tested for normality, for example, using the kolmogorov-Smirnov test.
It should be noted that the Kolmogorovo-Smirnov test is performed according to the following principle: based on the cumulative distribution function, the cumulative frequency distribution of the sample data is compared with a specific theoretical distribution, and if the difference between the cumulative frequency distribution and the specific theoretical distribution is smaller than a threshold value, the sample data is determined to be taken from a specific distribution family.
It should be noted that, the specific normal test method is selected and used by the skilled person according to the actual situation, and is not specifically limited in the embodiment of the present invention.
Step S105: and if all the contents in the blogger list accord with normal distribution, performing unary outlier detection on all the contents in the blogger list to obtain the water flow in the flow data of each content.
In the process of specifically implementing step S105, a single-element outlier detection is performed on all the contents in the blogger list by using a maximum likelihood estimation function (maximum likelihood estimation function) and a Cumulative Distribution Function (CDF), so as to obtain a moisture flow rate in the flow rate data of each content.
Step S106: and if all the contents in the blogger list do not accord with normal distribution, carrying out abnormal detection on all the contents in the blogger list to obtain the water flow in the flow data of each content.
In the process of specifically implementing step S106, all the contents in the blogger list are used as detection objects of the density-based anomaly detection method, and the water flow rate in the flow rate data of each content is calculated according to a local accessible density (lrd) formula and a Local Outlier Factor (LOF) formula. The specific calculation process of the water flow rate for each content is:
lrd calculation values corresponding to each content are calculated by using the formula (1), and the water flow rate corresponding to each content is calculated by substituting lrd calculation values corresponding to each content into the formula (2).
Figure BDA0002109964040000081
Figure BDA0002109964040000091
In the formula (1), Nk(p) is the kth distance domain of the kth distance neighborhood point p, i.e., all points within and the kth distance of point p. reach-disk (p, o) is the k-th reachable distance from point o to point p.
It should be noted that, if lof (p) is closer to 1, it indicates that the density of the object p is similar to the density of the neighborhood points, and the object p and the neighborhood belong to the same cluster. If LOF (p) is less than 1 and the larger the difference from 1, the density of object p is higher than the density of the neighborhood points, and object p is a dense point. If LOF (p) is greater than 1 and the difference value from 1 is larger, the density of the object p is smaller than the density of the neighborhood points, and the object p is an abnormal point. Therefore, the value of LOF (p) is defined as the water flow rate of the target p.
Note that, the range of values of the water flow rate in the flow rate data of each content calculated by using the unary outlier detection method in step S105 is [0, 0.5], and the range of values of the water flow rate in the flow rate data of each content calculated by using the density-based anomaly detection method in step S106 is [0, + ∞ ]. In order to unify the value formats of the water flow, for each blog list, the water flow value formats calculated by the two detection modes are normalized through a formula (3), so that the normalized water flow value satisfies a value interval [0, 1 ]. In the formula (3), X is the water flow rate of each content in each group of blogger list, X _ min is the minimum water flow rate in each blogger list, X _ max is the maximum water flow rate in each blogger list, and X' is the water flow rate of the content subjected to value format normalization.
x'=(x-X_min)/(X_max-X_min) (3)
Therefore, it is preferable that the value format of the water flow rate of each content is converted into a preset format after the above steps S105 and S106 are performed.
Step S107: and calculating the water flow of each blog owner to be analyzed in each group of the blog owner lists according to the water flow of each content in each group of the blog owner lists.
In the process of specifically implementing step S107, the moisture flow rate corresponding to each content of each blogger to be analyzed is obtained according to the moisture flow rate of each content in each group of blogger lists and by combining the correspondence between each content and the blogger to be analyzed. And calculating the average value of the water flow corresponding to each content of each blogger to be analyzed to obtain the water flow of each blogger to be analyzed. Such as: for the blogger A to be analyzed in the blogger list, the total release of the blogger A to be analyzed is 3 pieces of contents, and the water flow rate of each piece of content is respectively as follows: 0.5, 0.4 and 0.6. The water flow rate of the bosch a to be analyzed is (0.5+0.4+ 0.6)/3-0.5-50%.
In the embodiment of the invention, the account information of a plurality of bloggers to be analyzed and the flow data corresponding to each piece of content released in a preset period are obtained. And classifying the plurality of bloggers to be analyzed based on the account information of the bloggers to be analyzed to obtain a plurality of groups of blogger lists. And aiming at each blogger list, performing normality test on all contents of all bloggers to be analyzed in the blogger list, and determining whether all contents in the blogger list accord with normal distribution or not. And according to the detection result, performing unary outlier detection or anomaly detection on all the contents in the blogger list to obtain the water flow in the flow data of each content. And calculating the water flow of each blog owner to be analyzed in each blog owner list according to the water flow of each content in each blog owner list. The preset water army characteristics do not need to be updated frequently, and the detailed information of all fans and reviewers does not need to be acquired, so that the evaluation accuracy is high, the updating cost is low, and the difficulty in acquiring the relevant information of the bloggers is low.
In the above embodiment of the present invention, referring to fig. 2, a process of performing a normality test on all contents of all bloggers to be analyzed in the blogger list related to step S104 in fig. 1 shows a flowchart of a normal test provided in the embodiment of the present invention, which includes the following steps:
step S201: and aiming at each group of the blogger list, taking a natural constant as a base number, and respectively taking the logarithm of the number of clicks and the number of fans corresponding to each content.
In the process of implementing step S201 specifically, for the number of fans and the number of clicks of the content issued by the blogger to be analyzed, the number of fans is usually much larger than the number of clicks, for example: in a microblog platform, the number of fans is 1 hundred million when a piece of content issued by a blogger is to be analyzed, but the forwarding number of the content is only 1 ten thousand, and in normal test, the distribution of the ratio of the forwarding number to the number of fans is left biased, so that a punishment is given to data such as a large number of fans, a large number of forwarding numbers and the like, and the abnormal normal test result is easily caused. Therefore, the number of fans and the number of clicks need to be processed in a data processing manner including, but not limited to: and taking the natural constant e as a base number, and respectively taking the logarithm of the number of fans and the number of clicks corresponding to each piece of content.
It should be noted that, as can be seen from the content in step S102 in fig. 1 in the embodiment of the present invention described above, the type of the selected number of clicks is determined according to different types of social platforms, and the specifically selected number of clicks is a forwarding number and/or a playing amount, which is set by a technician according to an actual situation, which is not specifically limited in the embodiment of the present invention.
Step S202: and calculating the ratio of the logarithm clicking number to the number of the vermicelli, and taking the ratio as the vermicelli interaction ratio.
Step S203: and carrying out normality test on the vermicelli interaction ratios corresponding to all the contents to determine whether all the contents in the blogger list accord with normal distribution or not.
In the process of specifically implementing step S203, for each blogger list, performing a normality test on fan-interaction ratios of all contents in the blogger list, for example, performing a Kolmogorovo-Smirnov test on fan-interaction ratios of all contents in the blogger list, to obtain a probability a that all contents in the blogger list satisfy theoretical distribution, and assuming that the significance level is 0.05. If a is less than 0.05, all contents in the blogger list deviate from the theoretical distribution, namely do not conform to the normal distribution. If a is larger than 0.05, all contents in the blogger list do not deviate from the theoretical distribution, namely conform to the normal distribution.
Preferably, after step S203 is executed, each group of blogger lists is marked, and the mark is used to indicate whether all contents of each blogger list conform to a normal distribution. For example, the blogger list is marked 1 with all contents conforming to the normal distribution, and the blogger list is marked 2 with all contents not conforming to the normal distribution. Step S105 or step S106 in the above-described embodiment of the present invention in fig. 1 is performed according to the flag of each blogger list.
In the embodiment of the invention, a natural constant is used as a base number, the logarithm of the number of fans and the number of clicks corresponding to each content in each blogger list is obtained, and the ratio of the logarithm of the number of clicks to the number of fans is used as the fan interaction ratio. And performing normality check on all contents in each blogger list based on the fan-interaction ratio of each content in each blogger list. The water flow rate in the flow rate data of each content is calculated based on the results of the normality check. And the water flow of each blog owner to be analyzed in each blog owner list is calculated according to the water flow of each content in each blog owner list, so that the accuracy of evaluating the water flow is high.
The above-mentioned unary outlier detection process involved in step S105 in fig. 1 in the embodiment of the present invention is shown in a flowchart of unary outlier detection provided in the embodiment of the present invention with reference to fig. 3, and includes the following steps:
step S301: for each content in the blogger list, maximum likelihood estimates of the mean and variance of the flow data for the content are calculated using a maximum likelihood estimation function.
In the process of implementing step S301, the maximum likelihood estimation function is shown in formula (4), the maximum likelihood estimation of the mean μ of the content flow data is shown in formula (5), and the variance σ of the content flow data is shown in formula (4)2The maximum likelihood estimate of (c) is shown in equation (6).
Figure BDA0002109964040000111
Figure BDA0002109964040000112
Figure BDA0002109964040000113
It should be noted that the average is a location parameter of the normal distribution, and is used to describe a centralized location of the normal distribution. The abnormal condition of a certain value can be judged by comparing with the average value, namely if the difference value of the certain value and the average value is in the threshold range, the value is a normal value. And if the difference value between a certain value and the average value is out of the threshold range, the value is indicated as an abnormal value. Therefore, for each blogger list, all contents in the blogger list are compared with the average value, for example, the difference between the probability density area of each content and the average value is calculated, and the abnormal condition of each content in all contents in the blogger list is judged.
Step S302: and respectively calculating probability distribution values corresponding to the content and the average value by utilizing a cumulative distribution function to obtain a first probability distribution value and a second probability distribution value.
It should be noted that the cumulative distribution function is an integral of a probability density function, and is used to describe a probability distribution of a real random variable, where the first probability distribution value is a probability distribution value corresponding to each piece of the content, and the second probability distribution value is a probability distribution value corresponding to each group of the mean values.
In the process of implementing step S302, the first probability distribution value cdf (x) is obtained by calculating the content as a variable of a cumulative distribution function. And calculating the second probability distribution value cdf (mu) by taking the average value as a variable of a cumulative distribution function.
Step S303: and calculating the difference value of the first probability distribution value and the second probability distribution value, and taking the difference value as the water content flow rate in the flow rate data of each content.
In the process of implementing step S303, a difference between cdf (x) and cdf (μ) is calculated to obtain f (x), and f (x) is used as the water flow rate in the flow rate data of each content.
It should be noted that f (x) represents the proximity of each content to the average value, i.e. the probability that each content is a normal value. The larger the f (x), the smaller the probability that the content is an average value, and the smaller the probability that the content is a normal value. The smaller the f (x), the larger the probability that the content is an average value, and the larger the probability that the content is a normal value. Therefore, f (x) can be taken as the water flow rate in the flow rate data for each content.
In the embodiment of the invention, the water flow of all contents in the blogger list with all contents conforming to normal distribution is calculated by using a unary outlier detection mode. And calculating the water flow of each blog owner to be analyzed in each blog owner list according to the water flow of each content in each blog owner list. The evaluation accuracy is high. The preset water force characteristics do not need to be updated frequently, the updating cost is reduced, the detailed information of all fans and reviewers does not need to be acquired, and the difficulty in acquiring the relevant information of the bloggers is low.
To better explain the contents of the steps shown in fig. 1 to fig. 3 in the above-described embodiment of the present invention, the contents shown in fig. 1 to fig. 3 are combined and exemplified by fig. 4. Referring to fig. 4, a flow chart for calculating the network water flow rate according to the embodiment of the invention is shown, which includes the following steps:
step S401: and respectively acquiring the id of the trending blogger, the id of the active blogger and the id of the general blogger from the social media.
Step S402: and integrating the acquired hit blogger, active blogger and common blogger to obtain a target blogger list.
In the process of implementing step S402, duplicate ids of the id of the trenbolor, the active blogger, and the general blogger are deleted.
Step S403: and acquiring the flow data of each piece of content of the target blogger in the target blogger list in the last three months.
Step S404: and according to the categories of the target bloggers, grouping all the target bloggers according to the categories to obtain a plurality of groups of blogger lists.
Step S405: for each group of blogger lists, all contents of all target bloggers in each group of blogger lists are subjected to a normality check.
Step S406: and judging whether all contents of each group Bowman list conform to normal distribution. If yes, go to step S407, otherwise go to step S410.
Step S407: a unary outlier detection is performed for all content in the blogger list.
Step S408: the degree of deviation of each content in the blogger list from the mean is calculated.
Step S409: the water flow rate for each content in the blogger list is obtained.
Step S410: density-based anomaly detection is performed on all content in the blogger list.
Step S411: the degree of abnormality for each content in the blogger list.
Step S412: the water flow rate for each content in the blogger list is obtained.
Step S413: and calculating the average value of the water flow of all contents of each target player to obtain the water flow of each target player.
It should be noted that the target blogger involved in the above steps 401 to S413 is equivalent to the blogger to be analyzed shown in fig. 1 to 3 in the above embodiment of the present invention. The execution principle of step 401 to step S413 can refer to the corresponding content shown in fig. 1 to fig. 3 in the above embodiments of the present invention, and will not be described again here.
It should be noted that the content shown in fig. 4 of the embodiment of the present invention is only suitable for illustration.
In the embodiment of the invention, account information of a plurality of target bloggers and flow data corresponding to each piece of content released in a preset period are obtained. And classifying the plurality of target bloggers based on the account information of the target bloggers to obtain a plurality of groups of blogger lists. And performing normality test on all contents of all target bloggers in the blogger list aiming at each blogger list to determine whether all contents in the blogger list accord with normal distribution. And according to the detection result, performing unary outlier detection or anomaly detection on all the contents in the blogger list to obtain the water flow in the flow data of each content. And calculating the water flow of each target blogger in each blogger list according to the water flow of each content in each blogger list. The preset water army characteristics do not need to be updated frequently, and the detailed information of all fans and reviewers does not need to be acquired, so that the evaluation accuracy is high, the updating cost is low, and the difficulty in acquiring the relevant information of the bloggers is low.
Corresponding to the method for processing the water flow rate of the social media flow rate provided by the embodiment of the present invention, referring to fig. 5, an embodiment of the present invention further provides a structural block diagram of a system for processing the water flow rate of the social media flow rate, where the system includes: a first acquisition unit 501, a second acquisition unit 502, a classification unit 503, a verification unit 504, a first detection unit 505, a second detection unit 506, and a calculation unit 507.
A first obtaining unit 501, configured to obtain account information of a plurality of bloggers to be analyzed.
In a specific implementation, the first obtaining unit 501 is specifically configured to: the account names and the profiles of a plurality of popular bloggers, active bloggers and common bloggers are obtained from social media within a preset period, wherein the popular bloggers are bloggers in popular list or popular recommendation, the active bloggers are bloggers commented under the content published by the popular bloggers, and the common bloggers are bloggers obtained from non-popular list sorted according to the publishing time in the social media.
A second obtaining unit 502, configured to obtain traffic data corresponding to each piece of content issued by each blogger to be analyzed in a preset period, where the traffic data at least includes: the number of clicks of each piece of content and the number of fans of the bloggers to be analyzed when the number of clicks is obtained, wherein the number of clicks comprises a forwarding number or a playing amount.
The classifying unit 503 is configured to classify the multiple bloggers to be analyzed based on the account information of each blogger to be analyzed, so as to obtain multiple groups of blogger lists, where one group of blogger list corresponds to one class of blogger. The specific classification process is described in the above embodiment of the present invention in correspondence with step S103 in fig. 1.
A checking unit 504, configured to perform a normality check on all contents of all bloggers to be analyzed in the blogger list for each group of the blogger list, and determine whether all contents in the blogger list conform to a normal distribution.
A first detecting unit 505, configured to perform unary point detection on all the contents in the blogger list if all the contents in the blogger list conform to the normal distribution, so as to obtain a water flow rate in the flow rate data of each content.
Preferably, the first detecting unit 505 is further configured to convert a value format of the water flow rate of each content into a preset format.
A second detecting unit 506, configured to perform anomaly detection on all the contents in the blogger list if all the contents in the blogger list do not conform to the normal distribution, so as to obtain a water flow rate in the flow rate data of each content.
In a specific implementation, the second detecting unit 506 is specifically configured to: and taking all contents in the blogger list as detection objects of the density-based abnormity detection mode, and calculating the water flow rate in the flow rate data of each content according to a formula (1) and a formula (2). Specifically, reference may be made to the content corresponding to step S106 in fig. 1 in the foregoing embodiment of the present invention.
Preferably, the second detecting unit 506 is further configured to convert the value format of the moisture flow rate of each content into a preset format.
And the calculating unit 507 is configured to calculate, according to the water flow rate of each content in each group of the blogger lists, a water flow rate of each to-be-analyzed blogger in each group of the blogger lists.
In the embodiment of the invention, the account information of a plurality of bloggers to be analyzed and the flow data corresponding to each piece of content released in a preset period are obtained. And classifying the plurality of bloggers to be analyzed based on the account information of the bloggers to be analyzed to obtain a plurality of groups of blogger lists. And aiming at each blogger list, performing normality test on all contents of all bloggers to be analyzed in the blogger list, and determining whether all contents in the blogger list accord with normal distribution or not. And according to the detection result, performing unary outlier detection or anomaly detection on all the contents in the blogger list to obtain the water flow in the flow data of each content. And calculating the water flow of each blog owner to be analyzed in each blog owner list according to the water flow of each content in each blog owner list. The preset water army characteristics do not need to be updated frequently, and the detailed information of all fans and reviewers does not need to be acquired, so that the evaluation accuracy is high, the updating cost is low, and the difficulty in acquiring the relevant information of the bloggers is low.
Referring to fig. 6 in conjunction with fig. 5, a block diagram of a social media flow rate water flow rate processing system according to an embodiment of the present invention is shown, where the checking unit 504 includes: a conversion module 5041, a calculation unit 5042 and a determination module 5043.
A conversion module 5041, configured to, for each group of the blogger list, take a logarithm of the number of clicks and the number of fans corresponding to each piece of content, respectively, based on a natural constant. For specific contents, reference may be made to the contents corresponding to step S201 in fig. 2 in the foregoing embodiment of the present invention.
And the calculating unit 5042 is used for calculating the ratio of the logarithm-extracted number of clicks to the number of fans, and taking the ratio as the fan interaction ratio.
A determining module 5043, configured to perform a normality test on the fan-interaction ratios corresponding to all the contents, and determine whether all the contents in the blogger list conform to a normal distribution. For a specific determination process, reference may be made to the content corresponding to step S203 in fig. 2 in the embodiment of the present invention.
In the embodiment of the invention, a natural constant is used as a base number, the logarithm of the number of fans and the number of clicks corresponding to each content in each blogger list is obtained, and the ratio of the logarithm of the number of clicks to the number of fans is used as the fan interaction ratio. And performing normality check on all contents in each blogger list based on the fan-interaction ratio of each content in each blogger list. The water flow rate in the flow rate data of each content is calculated based on the results of the normality check. And the water flow of each blog owner to be analyzed in each blog owner list is calculated according to the water flow of each content in each blog owner list, so that the accuracy of evaluating the water flow is high.
Referring to fig. 7 in conjunction with fig. 5, a structural block diagram of a social media traffic water flow rate processing system according to an embodiment of the present invention is shown, where the first detection unit 505 includes: a first calculation module 5051, a second calculation module 5052, and a third calculation module 5053.
A first calculating module 5051 is configured to calculate, for each content in the blogger list, a maximum likelihood estimate corresponding to a mean and a variance of flow data of the content by using a maximum likelihood estimating function. The specific calculation process is described in the embodiment of the present invention in fig. 3 corresponding to step S301.
A second calculating module 5052, configured to calculate probability distribution values corresponding to the content and the mean value respectively by using a cumulative distribution function, so as to obtain a first probability distribution value and a second probability distribution value. For a specific calculation process, refer to the content corresponding to step S302 in fig. 3 of the embodiment of the present invention.
A third calculation module 5053 is used for calculating a difference between the first probability distribution value and the second probability distribution value, and the difference is taken as the moisture flow rate in the flow rate data of each content. For a specific calculation process, see the content corresponding to step S303 in fig. 3 of the embodiment of the present invention.
In the embodiment of the invention, the water flow of all contents in the blogger list with all contents conforming to normal distribution is calculated by using a unary outlier detection mode. And calculating the water flow of each blog owner to be analyzed in each blog owner list according to the water flow of each content in each blog owner list. The evaluation accuracy is high. The preset water force characteristics do not need to be updated frequently, the updating cost is reduced, the detailed information of all fans and reviewers does not need to be acquired, and the difficulty in acquiring the relevant information of the bloggers is low.
Referring to fig. 8 in conjunction with fig. 5, a block diagram of a social media traffic water flow processing system according to an embodiment of the present invention is shown, where the calculating unit 507 includes:
an obtaining module 5071, configured to obtain a moisture flow rate corresponding to each content of each blogger to be analyzed according to the moisture flow rate of each content in each group of the blogger list.
An averaging module 5072, configured to calculate an average value of the moisture flow rate corresponding to each content of each blogger to be analyzed, so as to obtain the moisture flow rate of each blogger to be analyzed.
In the embodiment of the invention, the moisture flow corresponding to each content of each blogger to be analyzed is obtained, the average value of the moisture flow corresponding to each content of each blogger to be analyzed is obtained, the average value is used as the moisture flow of each blogger to be analyzed, and the evaluation accuracy is high. The preset water force characteristics do not need to be updated frequently, the updating cost is reduced, the detailed information of all fans and reviewers does not need to be acquired, and the difficulty in acquiring the relevant information of the bloggers is low.
Based on the social media flow water flow processing system disclosed by the embodiment of the invention, the modules can be realized by a hardware device consisting of a processor and a memory. The method specifically comprises the following steps: the modules are stored in a memory as program units, and the processor executes the program units stored in the memory to realize the calculation of the water flow rate of the doctor to be analyzed.
The processor comprises a kernel, and the kernel calls a corresponding program unit from the memory. The kernel can be set to be one or more than one, and the moisture flow of the doctor to be analyzed is calculated by adjusting the kernel parameters.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
Further, the embodiment of the present invention provides a processor, where the processor is configured to execute a program, where the program executes the method for processing the social media traffic water flow rate when running.
Further, an embodiment of the present invention provides a storage medium having a program stored thereon, where the program is executed by a processor to calculate the water flow rate of the blogger to be analyzed.
The present application also provides a computer program product adapted to execute a program of the steps of the water flow processing method of initializing said social media flow when executed on a data processing device.
To sum up, the embodiments of the present invention provide a method, a system, and a device for processing social media traffic, where the method includes: the account information of a plurality of bloggers to be analyzed and the flow data corresponding to each piece of content released in a preset period are obtained. And classifying the plurality of bloggers to be analyzed based on the account information of the bloggers to be analyzed to obtain a plurality of groups of blogger lists. And aiming at each blogger list, performing normality test on all contents of all bloggers to be analyzed in the blogger list, and determining whether all contents in the blogger list accord with normal distribution or not. And according to the detection result, performing unary outlier detection or anomaly detection on all the contents in the blogger list to obtain the water flow in the flow data of each content. And calculating the water flow of each blog owner to be analyzed in each blog owner list according to the water flow of each content in each blog owner list. In the scheme, the moisture flow of the blogger is evaluated through the content of the blogger and the flow data of the content, and the evaluation accuracy is high. The preset water force characteristics do not need to be updated frequently, the updating cost is reduced, the detailed information of all fans and reviewers does not need to be acquired, and the difficulty in acquiring the relevant information of the bloggers is low.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (7)

1. A method for water flow processing of social media flow, the method comprising:
acquiring account information of a plurality of bloggers to be analyzed;
acquiring flow data corresponding to each piece of content issued by each blogger to be analyzed in a preset period, wherein the flow data at least comprises: the click number of each content and the number of fans of the bloggers to be analyzed when the click number is obtained, wherein the click number comprises a forwarding number or a playing amount;
classifying the plurality of bloggers to be analyzed based on the account information of each blogger to be analyzed to obtain a plurality of groups of blogger lists, wherein one group of blogger lists corresponds to one type of blogger;
respectively taking logarithm of the number of clicks and number of fans corresponding to each content by taking a natural constant as a base number for each group of the blogger list;
calculating the ratio of the logarithm clicking number to the number of vermicelli, and taking the ratio as the vermicelli interaction ratio;
carrying out normality test on vermicelli interaction ratios corresponding to all contents, and determining whether all contents in the blogger list accord with normal distribution or not;
if all the contents in the blogger list accord with normal distribution, aiming at each content in the blogger list, calculating maximum likelihood estimation corresponding to the mean value and the variance of the flow data of the content by using a maximum likelihood estimation function;
respectively calculating probability distribution values corresponding to the contents and the mean value by using a cumulative distribution function to obtain a first probability distribution value and a second probability distribution value, wherein the first probability distribution value is a probability distribution value corresponding to each content, and the second probability distribution value is a probability distribution value corresponding to each group of mean values;
calculating a difference between the first probability distribution value and the second probability distribution value, and taking the difference as a water flow rate in the flow rate data of each content;
if all the contents in the blogger list do not accord with normal distribution, all the contents in the blogger list are used as detection objects of density-based abnormal detection modes, and the water flow in the flow data of each content is calculated according to a local reachable density lrd formula and a local outlier LOF formula;
and calculating the water flow of each blog owner to be analyzed in each group of the blog owner lists according to the water flow of each content in each group of the blog owner lists.
2. The method of claim 1, wherein the obtaining account information for a plurality of bloggers to be analyzed comprises:
the account names and the profiles of a plurality of popular bloggers, active bloggers and common bloggers are obtained from social media within a preset period, wherein the popular bloggers are bloggers in popular list or popular recommendation, the active bloggers are bloggers commented under the content published by the popular bloggers, and the common bloggers are bloggers obtained from non-popular list sorted according to the publishing time in the social media.
3. The method of claim 1, further comprising:
and converting the value format of the water flow of each content into a preset format.
4. The method of claim 1 wherein calculating the water flow rate for each of the bloggers to be analyzed in each of the blogger lists from the water flow rate for each of the blogger lists in each of the groups comprises:
according to the water flow of each content in each group of the blogger list, obtaining the water flow corresponding to each content of each blogger to be analyzed;
and calculating the average value of the water flow corresponding to each content of each blogger to be analyzed to obtain the water flow of each blogger to be analyzed.
5. A water flow processing system for social media flow, the system comprising:
the system comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring account information of a plurality of bloggers to be analyzed;
a second obtaining unit, configured to obtain traffic data corresponding to each piece of content issued by each blogger to be analyzed in a preset period, where the traffic data at least includes: the click number of each content and the number of fans of the bloggers to be analyzed when the click number is obtained, wherein the click number comprises a forwarding number or a playing amount;
the classification unit is used for classifying the plurality of bloggers to be analyzed based on the account information of each blogger to be analyzed to obtain a plurality of groups of blogger lists, wherein one group of blogger lists corresponds to one type of blogger;
the conversion module is used for respectively taking logarithm of the click number and the number of the fans corresponding to each content by taking a natural constant as a base number for each group of the blogger list;
the calculating unit is used for calculating the ratio of the logarithm-extracted clicking number to the number of the vermicelli, and the ratio is used as the vermicelli interaction ratio;
the determining module is used for carrying out normality test on the bean vermicelli interaction ratios corresponding to all the contents and determining whether all the contents in the blogger list accord with normal distribution or not;
a first calculating module, configured to calculate, for each content in the blogger list, a maximum likelihood estimation corresponding to a mean and a variance of flow data of the content by using a maximum likelihood estimation function if all contents in the blogger list conform to a normal distribution;
the second calculation module is used for calculating probability distribution values corresponding to the content and the average value respectively by utilizing a cumulative distribution function to obtain a first probability distribution value and a second probability distribution value;
a third calculating module, configured to calculate a difference between the first probability distribution value and the second probability distribution value, and use the difference as a moisture flow rate in the flow rate data of each content;
a second detection unit, configured to, if all the contents in the blogger list do not conform to the normal distribution, use all the contents in the blogger list as detection objects of a density-based abnormality detection method, and calculate a water flow rate in flow rate data of each content according to a local reachable density lrd formula and a local outlier LOF formula;
and the calculating unit is used for calculating the water flow of each to-be-analyzed blog owner in each group of the blog owner list according to the water flow of each content in each group of the blog owner list.
6. A storage medium comprising a stored program, wherein the program when executed controls a device on which the storage medium resides to perform the social media flow water flow treatment method of any one of claims 1-4.
7. A processor, characterized in that the processor is configured to run a program, wherein the program is configured to execute the method for social media traffic water flow processing according to any one of claims 1-4.
CN201910567614.0A 2019-06-27 2019-06-27 Water flow processing method, system and equipment for social media flow Active CN110287322B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910567614.0A CN110287322B (en) 2019-06-27 2019-06-27 Water flow processing method, system and equipment for social media flow

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910567614.0A CN110287322B (en) 2019-06-27 2019-06-27 Water flow processing method, system and equipment for social media flow

Publications (2)

Publication Number Publication Date
CN110287322A CN110287322A (en) 2019-09-27
CN110287322B true CN110287322B (en) 2021-04-16

Family

ID=68019285

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910567614.0A Active CN110287322B (en) 2019-06-27 2019-06-27 Water flow processing method, system and equipment for social media flow

Country Status (1)

Country Link
CN (1) CN110287322B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106651458A (en) * 2016-12-29 2017-05-10 腾讯科技(深圳)有限公司 Advertisement anti-cheating method and device
CN107491970A (en) * 2017-08-17 2017-12-19 北京三快在线科技有限公司 Anti- cheating detection monitoring method and system and computing device in real time
CN108156131A (en) * 2017-10-27 2018-06-12 上海观安信息技术股份有限公司 Webshell detection methods, electronic equipment and computer storage media
CN109558555A (en) * 2018-08-20 2019-04-02 湖北大学 Microblog water army detection method and detection system based on artificial immunity danger theory
CN109600345A (en) * 2017-09-30 2019-04-09 北京国双科技有限公司 Abnormal data flow rate testing methods and device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106815452A (en) * 2015-11-27 2017-06-09 苏宁云商集团股份有限公司 A kind of cheat detection method and device
CN106338981A (en) * 2016-09-23 2017-01-18 沈阳化工大学 Batch process online fault detection method of dynamic multi-direction local outlier factor algorithm
US10338982B2 (en) * 2017-01-03 2019-07-02 International Business Machines Corporation Hybrid and hierarchical outlier detection system and method for large scale data protection
CN106875277A (en) * 2017-01-16 2017-06-20 星云纵横(北京)大数据信息技术有限公司 A kind of determination methods of social media account influence power
US10999247B2 (en) * 2017-10-24 2021-05-04 Nec Corporation Density estimation network for unsupervised anomaly detection

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106651458A (en) * 2016-12-29 2017-05-10 腾讯科技(深圳)有限公司 Advertisement anti-cheating method and device
CN107491970A (en) * 2017-08-17 2017-12-19 北京三快在线科技有限公司 Anti- cheating detection monitoring method and system and computing device in real time
CN109600345A (en) * 2017-09-30 2019-04-09 北京国双科技有限公司 Abnormal data flow rate testing methods and device
CN108156131A (en) * 2017-10-27 2018-06-12 上海观安信息技术股份有限公司 Webshell detection methods, electronic equipment and computer storage media
CN109558555A (en) * 2018-08-20 2019-04-02 湖北大学 Microblog water army detection method and detection system based on artificial immunity danger theory

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A Centroid-Based Outlier Detection Method;Xiaochun Wang 等;《2017 International Conference on Computational Science and Computational Intelligence (CSCI)》;20181206;1411-1416 *
僵尸网络异常流量分析与检测;王新良;《中国博士学位论文全文数据库 信息科技辑》;20120715(第07期);I139-31 *
网络"水军"探测方法研究;王烁 等;《现代图书情报技术》;20140825(第7/8期);92-100 *

Also Published As

Publication number Publication date
CN110287322A (en) 2019-09-27

Similar Documents

Publication Publication Date Title
EP2811441A1 (en) System and method for detecting spam using clustering and rating of e-mails
CN109598414B (en) Risk assessment model training, risk assessment method and device and electronic equipment
CN110648180B (en) Method and device for adjusting delivery channel and electronic equipment
CN109753807B (en) Security detection method and device
CN110930218B (en) Method and device for identifying fraudulent clients and electronic equipment
CN108134944B (en) Identification method and device for anchor user with abnormal income and electronic equipment
CN110189165B (en) Channel abnormal user and abnormal channel identification method and device
CN109543940B (en) Activity evaluation method, activity evaluation device, electronic equipment and storage medium
Ma et al. Selection of the maximum spatial cluster size of the spatial scan statistic by using the maximum clustering set-proportion statistic
CN110503566A (en) Air control method for establishing model, device, computer equipment and storage medium
Morgan et al. A new mixture model for capture heterogeneity
CN108875018B (en) News influence evaluation method and device and electronic equipment
CN110287322B (en) Water flow processing method, system and equipment for social media flow
CN116701772B (en) Data recommendation method and device, computer readable storage medium and electronic equipment
CN111582722B (en) Risk identification method and device, electronic equipment and readable storage medium
Kronenfeld Validating the historical record: a relative distance test and correction formula for selection bias in presettlement land surveys
CN106656943B (en) A kind of matching process and device of network user's attribute
CN109657852B (en) Insurance business processing method and system based on big data
CN106682516A (en) Detection method, detection device and server of application programs
CN114511409A (en) User sample processing method and device and electronic equipment
CN112241820A (en) Risk identification method and device for key nodes in fund flow and computing equipment
Shao et al. Bayesian hierarchical structure for quantifying population variability to inform probabilistic health risk assessments
CN113344469B (en) Fraud identification method and device, computer equipment and storage medium
CN107545347B (en) Attribute determination method and device for risk prevention and control and server
CN112633948A (en) Target audience proportion calculation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant