KR101804020B1 - Method for sns bot detection using geographic information - Google Patents
Method for sns bot detection using geographic information Download PDFInfo
- Publication number
- KR101804020B1 KR101804020B1 KR1020150156970A KR20150156970A KR101804020B1 KR 101804020 B1 KR101804020 B1 KR 101804020B1 KR 1020150156970 A KR1020150156970 A KR 1020150156970A KR 20150156970 A KR20150156970 A KR 20150156970A KR 101804020 B1 KR101804020 B1 KR 101804020B1
- Authority
- KR
- South Korea
- Prior art keywords
- sns
- bot
- user
- distance information
- message
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000001514 detection method Methods 0.000 title claims description 35
- 230000005540 biological transmission Effects 0.000 claims 10
- 229920000136 polysorbate Polymers 0.000 abstract description 18
- 238000012549 training Methods 0.000 description 9
- 238000003860 storage Methods 0.000 description 6
- 238000012360 testing method Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 241000282412 Homo Species 0.000 description 3
- 238000009826 distribution Methods 0.000 description 3
- 239000000470 constituent Substances 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 150000001875 compounds Chemical class 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000036461 convulsion Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
-
- G06F17/218—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- G06Q50/30—
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/16—Threshold monitoring
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2463/00—Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00
- H04L2463/144—Detection or countermeasures against botnets
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Computer Security & Cryptography (AREA)
- Computer Networks & Wireless Communication (AREA)
- Tourism & Hospitality (AREA)
- Signal Processing (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Marketing (AREA)
- Computer Hardware Design (AREA)
- Operations Research (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- General Engineering & Computer Science (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Information Transfer Between Computers (AREA)
Abstract
A tweet bot detecting method using spatial information of the present invention is disclosed. Determining whether an entropy value for time information between tweets is less than a threshold value for time information; and if the entropy value for time information between tweets is less than a threshold value for time information, Determining whether the entropy value is smaller than the threshold value for the distance information, and if the entropy value for the inter-tween distance information is smaller than the threshold value for the distance information, the user who continuously transmits the tweet is discriminated as the SNS bot .
Description
The present invention relates to an SNS bot detection method using spatial information, and more particularly, to a SNS bot detection method using spatial information for detecting a malicious twitch bot using geo-tagged tweet data from a Twitter server .
A tweet-bot is a compound word for social network services such as "twitter" and "bot," which is short for "robot." For example, TweetBot (@NDSL_kr) provided by Korea Institute of Science and Technology Information (KISTI) sends a web page address (URL) that can see related contents immediately after sending the kind of the desired data and search word through the mentions. Seismic bots (@earthquakebot) give real-time reports of seismic magnitudes of 5.0 or more from around the world. Seoul weather bots (@seoul_wt) every hour Seoul weather, @KBO scores Every 10 minutes.
There are also positive aspects of twitter bots that provide information and fun, but they are often used maliciously and often have side effects. In particular, twitter bots are often anonymous accounts, increasing the number of users who become obnoxious with obscenity, profanity, and obscene content. To solve this problem, technologies for detecting tweet bots have been developed.
The technique of detecting conventional tweet bots is an increasing trend of allowing users to open geospatial information (for example, check-in service), but does not utilize this information at all to perform twot bot detection. In addition, in performing the tweet bot detection, the smart device information provided in the source field in the data set is not utilized. That is, if the time information and the tweet text information are not obtained by the conventional technology, there is a problem that the tweet bot can not be detected.
SUMMARY OF THE INVENTION It is an object of the present invention to solve the above-mentioned problems by providing a method and apparatus for searching for inter-tweet time and inter-tweet distance using a geo-tagged tweet And to provide a method of detecting a tweet bot using spatial information that enables a tweet bot to be detected by comparing temporal and spatial patterns of a person and a tweet bot by calculating an entropy value of a variable.
It is another object of the present invention to solve the above-mentioned problems, and it is an object of the present invention to provide a method and system for detecting a tweet bot by using an entropy value of a distance variable between tweets and a set of selected devices, The present invention also provides a method of detecting a twotboat using spatial information so as to be able to detect a twotboat.
According to another aspect of the present invention, there is provided a method for detecting SNS bots using spatial information, comprising the steps of: constructing a data set comprising a geo-tagged tweet; Setting a threshold value for time information that enables the specified reliability using the tween time information that is continuously tweeted by the user; and setting a threshold value for the tweet's tween distance information that is continuously tweeted by the same user in the data set Determining whether an entropy value for the time information between tweets is less than a threshold value for the time information; If the entropy value for the time information is less than the threshold value for the time information, Determining whether an entropy value for the information is less than a threshold value for the distance information; and if the entropy value for the inter-tweet distance information is less than the threshold value for the distance information, As a non-SNS bot.
In constructing the data set, the data set is collected and configured through a streaming API.
In the step of constructing the data set, the user ID, the latitude of the device location transmitting the tweet, the hardness of the device location transmitting the tweet, the time of transmitting the tweet, the angle of the device Field is adopted.
In the step of setting the threshold value for the time information, the threshold value for the time information means an entropy value for the time information enabling the specified reliability.
In the step of setting the threshold value for the distance information, the threshold value for the distance information means an entropy value for the distance information that enables the specified reliability.
In the step of discriminating the SNS bot, the step of determining SNS bot may further include increasing SNS bot count b_count to obtain SNS bot detection probability (Bot DP).
The step of determining the SNS bot further includes a step of increasing the SNS user counter h_count to obtain a false alarm probability (FAP) when the SNS user is mistaken.
Determining whether an entropy value of the time information between tweets is less than a threshold value of the time information, and if the entropy value of the time information between tweets is not less than the threshold value of the time information, And determines the sender as the SNS user.
If the entropy value for the inter-tweet distance information is not smaller than the threshold value for the distance information, determining whether the entropy value of the inter-tweet distance information is smaller than the threshold value for the distance information, And determining that the sender is the SNS user.
According to another aspect of the present invention, there is provided a method for detecting SNS bots using spatial information, comprising the steps of: constructing a data set comprising a geo-tagged tweet; Setting a threshold value for distance information that enables specified reliability using tween distance information that is continuously tweeted by a user; selecting an apparatus for sending an tweet by the SNS user as an SNS user apparatus set Determining whether an entropy value of the distance information between tweets is smaller than a threshold value of the distance information; and if the entropy value of the distance information between tweets is less than the threshold value of the distance information, Determining whether the used device belongs to the set of SNS user devices, If the device does not belong to the SNS user equipment set comprises the step of determining the user continuously sent tweet SNS SNS bot not the user.
In constructing the data set, the data set is collected and configured through a streaming API.
In the step of constructing the data set, the user ID, the latitude of the device location transmitting the tweet, the hardness of the device location transmitting the tweet, the time of transmitting the tweet, the angle of the device Field is adopted.
In the step of setting the threshold value for the distance information, the threshold value for the distance information means an entropy value for the distance information that enables the specified reliability.
In the step of discriminating the SNS bot, the step of determining SNS bot may further include increasing SNS bot count b_count to obtain SNS bot detection probability (Bot DP).
The step of determining the SNS bot further includes a step of increasing the SNS user counter h_count to obtain a false alarm probability (FAP) when the SNS user is mistaken.
If the entropy value for the inter-tweet distance information is not smaller than the threshold value for the distance information, determining whether the entropy value of the inter-tweet distance information is smaller than the threshold value for the distance information, And determining that the sender is the SNS user.
Determining whether the device used in the tweet belongs to the SNS user device set, if the device used in the tweet belongs to the SNS user device set, determining that the tweet user is the SNS user do.
According to the present invention, entropy values of two variables for an inter-tweet time and an inter-tweet distance are utilized by utilizing a geo-tagged tweet, By comparing temporal and spatial patterns of tweet bots, tweet bots can be detected more precisely.
In addition, the twitter bot can be more accurately detected through the entropy value of the distance variable between tweets and the selected device set using the smart device information of each user provided in the source field in the dataset.
In addition, by constructing a space DB (datadase) for the tweet bot, it can be used to grasp and detect spatial patterns of malicious bots in various social network services in the future.
1 is a flowchart showing an embodiment of a twot bot detecting method using spatial information according to the present invention.
2 is a flowchart showing another embodiment of a twot bot detecting method using spatial information according to the present invention.
FIG. 3 is a graph illustrating the detection probability of a tweet robot according to reliability when a twot bot detection method using spatial information according to an embodiment of the present invention is used.
FIG. 4 is a graph showing a correlation between a bot detection probability (Bot DP) and a false alarm probability (FAP) when the twot bot detection method using spatial information according to an embodiment of the present invention is used. FIG.
FIG. 5 is a graph illustrating a correlation between a tweet bot detection probability and a false alarm probability when the twot bot detection method using spatial information according to another embodiment of the present invention is used.
6 is a block diagram illustrating one embodiment of a smart device that performs methods in accordance with the present invention.
While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the invention is not intended to be limited to the particular embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.
The terms first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. And / or < / RTI > includes any combination of a plurality of related listed items or any of a plurality of related listed items.
It is to be understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, . On the other hand, when an element is referred to as being "directly connected" or "directly connected" to another element, it should be understood that there are no other elements in between.
The terminology used in this application is used only to describe a specific embodiment and is not intended to limit the invention. The singular expressions include plural expressions unless the context clearly dictates otherwise. In the present application, the terms "comprises" or "having" and the like are used to specify that there is a feature, a number, a step, an operation, an element, a component or a combination thereof described in the specification, But do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.
Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the relevant art and are to be interpreted in an ideal or overly formal sense unless explicitly defined in the present application Do not.
Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In order to facilitate the understanding of the present invention, the same reference numerals are used for the same constituent elements in the drawings and redundant explanations for the same constituent elements are omitted.
1 is a flowchart showing an embodiment of a twot bot detecting method using spatial information according to the present invention.
Referring to FIG. 1, a twot bot detecting method using spatial information according to an embodiment of the present invention is related to a two-step tweeter bot detecting method using time and distance information.
First, the twitter account is divided into a human and a tweet bot, which accounts for mass production of tweets that deliver news and news. But there are also malicious tweet bots that spread spam or malicious information, so you have to distinguish between people and tweets.
The present invention is to detect a malicious tweet bot using geo-tagged tweet data from a Twitter server, and calculates the entropy indicating uncertainty using the distance between tweets for each user.
In the present invention, first, a data set collected through a twitter streaming API is used. The dataset consists of large space tagged tweets recorded from Twitter users in specific geographic areas (e.g., Seoul, London, Los Angeles, etc.).
Each tweet includes a number of elements separated by the field name to which it belongs. For the point boundary detection technique of the present invention, the following five important fields are adopted from the metadata of the tweet.
user_id_str: String representation of the unique ID of a particular user
lat: The latitude of the device location that sent the tweet
lot: The longitude of the device location that sent the tweet
created_at: UTC / GMT time that tweet was sent
source: The smart device that created the Tweet
Classify user sets and tweet botsets using the ground-truth method. Analyze the contents of a tweet posted on a Twitter page and classify the user who repeatedly tweet the same message or URL regularly as a tweet bot. In addition, if a tweet is analyzed and the text contains spam content, the tweet user is considered as a tweet bot.
The user set is divided into two, the first set is the training set, and the second set is the test set together with the whole tweet bot set. Users are classified as user sets and tweet bot sets. However, those users who consume more than 300km / h between the tweets of the same tweets that are consecutively sent by the same user are reclassified as tweet bots. According to the present invention, a data set is composed of 892 human accounts and 115 twotbot accounts among a total of 1,007 user accounts according to the classification criteria.
Tweets tend to post tweets more regularly in time than people. Therefore, it can be concluded that the entropy of the time information between tweets in tweetbots is much smaller than that of humans.
user
end The location information of the second tweet ego, The location information of the second tweet , The geographical distance between two points Can be obtained by using a spherical law of cosines.Street
Limited to a maximum of 800 km = 0 is divided first into 101 sections according to the sections as shown in Table 1 below.Time between two consecutive tweets
The As shown in Fig. time Was limited to a maximum of 144 hours = 0 was divided first and divided into 145 sections in 1 hour unit.Table 1 below shows the geographic distance between two points when a user sends tweets at different points
) Is divided by intervals.
The entropy set in the training set and the test set is set to
and . In the training set The entropy values of the time information between the tweets of the ith user and the distance information between the tweets and .Likewise,
The entropy values of the time information between the tweets of the ith user and the distance information between the tweets and .Also,
and Means the total number of users included in the training set and the test set, respectively. At this time = 446, = 561.The following is the basic formula for finding the entropy that represents uncertainty, using the records in the training set
The entropy value for the time information between the tweets of the ith user and the distance information between the tweets is calculated by the following equation (1).
here
And The Variables for the ith index Wow Probability distribution, Represents the total number of data.The method of detecting the tweet bot using the entropy value for each user is as follows.
First,
And the threshold value for each reliability is obtained. Here, the reliability is a probability of being included in a range specified by the user, and the threshold is an entropy value enabling the specified reliability.The higher the confidence, the more people are within the specified range. For example, a variable that makes the
In the testing process, the tweet bot is detected in two stages based on the threshold value set previously. First, only the entropy for time information is used in the first step detection process. Entropy value for time information than threshold value
This little user is classified as a tweet bot.Next, detection is performed according to the entropy of the distance information proposed in the present invention when performing the second step detection process. The smaller the variance of distance information between tweets is, the higher the likelihood of tweet bots is. Therefore, an arbitrary entropy threshold value is specified, and a user smaller than this value is identified as a tweet bot.
The minimum value of the entropy corresponding to the distance in the test set is designated, and a user having an entropy value smaller than this value is detected as a tweet bot.
First, a device for detecting a tweet bot using geo-tagged tweet data from a twitter server is a large-scale tagged tweet recorded from twitter users including SNS users and SNS tweet bots. and a geo-tagged tweet (S10).
The terminal sets a threshold value indicating an entropy value for time information enabling the specified reliability using the time information of tweets continuously tweeted by the same user in the configured data set (S11).
The terminal sets a threshold indicating the entropy value of the distance information that enables the specified reliability using the tween distance information continuously tweeted by the same user in the configured data set (S12).
The terminal determines whether the entropy value of the tweet time information of the user who continuously sends the tweet is smaller than the threshold value of the tween time information (S13).
The terminal determines whether the entropy value of the inter-tweet distance information is smaller than the threshold value of the inter-tween distance information for the user whose entropy value for the time information is smaller than the threshold value for the time information (S14).
If the entropy value for the distance information is smaller than the threshold value for the distance information, the terminal determines that the user is an SNS tweet bot (S15).
However, if it is determined in step S13 that the entropy value of the tweet time information is not less than the threshold value of the tweet time information, or if the entropy value of the tween distance information is smaller than the threshold value of the tween distance information (S16), it is determined that the user is an SNS user.
When the terminal is determined to be an SNS tweet bot, the terminal increases the twot bot count (b_count)
(S17). If the SNS user is mistaken, the user count (h_count) is increased to obtain a false alarm probability (S18).That is, the terminal determines that the entropy value of the tweet time information is smaller than the threshold value of the tween time information, and the entropy value of the tween distance information is smaller than the threshold value of the tween distance information. do.
2 is a flowchart showing another embodiment of a twot bot detecting method using spatial information according to the present invention.
Referring to FIG. 2, a tweet bot detection method using spatial information according to another embodiment of the present invention relates to a tweet bot detection method using tween distance information and a user device set.
Tweeter bots tend to move closer to zero, or move relatively regularly on a larger scale than humans. Therefore, it can be concluded that the entropy of the tweet bot 's distance between tweets is much smaller than that of humans.
The entropy set in the training set and the test set is set to
and Respectively. In the training set The entropy value of the distance information between the tweets of the ith user . Likewise, The entropy value of the distance information of the ith user . At this time, using the records in the training set The entropy of the distance information of the ith user is calculated by the following equation (2).
here
The Distance information for the ith index Probability distribution, Represents the total number of data.First, a device for detecting a tweet bot using geo-tagged tweet data from a twitter server is a large-sized tagged tweet (geo-tagged tweet) recorded from twitter users including a user and a tweet bot -tagged tweet) (S20).
The terminal sets a threshold indicating the entropy value of the distance information that enables the specified reliability by using the tween distance information continuously tweeted by the same user in the configured data set (S21).
The terminal selects the devices for sending the tweet by the SNS user and sets them as the SNS user device set (S22). The device set DV of the selected SNS user is defined as follows. The set of users' DVs includes an iphone, an iPad, a Windows for social network service (SNS) such as twitter, foursquare, instagram, Windows, android phone, and so on.
The terminal selects only the devices having a probability value of 0.5% or more in the distribution map of the device used by the SNS user to send tweets, and the selected devices are used by the SNS user.
The terminal determines whether the entropy value of the distance information between the tweets of the user who continuously sends the tweet is smaller than the threshold value of the tween distance information (S23).
If the entropy value for the distance information is smaller than the threshold value for the distance information, the terminal determines whether the device of the user who sent the tweet continues to belong to the device set DV of the selected user (S24).
If the device of the user who has continuously sent the tweet does not belong to the device set DV of the selected user in advance, the terminal determines the tweet bot as a tweet bot (S25).
However, if the entropy value of the tween distance information is not less than the threshold value for the tween distance information in step S24, or if the device of the user who continuously sends the tweet in step S25 belongs to the device set DV of the user (S26). ≪ / RTI >
When the terminal is determined to be a tweet bot, the terminal increases the twot bot count (b_count)
(S27). When the SNS user is mistaken, the user count (h_count) is increased to obtain a false alarm probability (S28).That is, when the entropy value of the tweet distance information is smaller than the threshold value of the tween distance information and the device of the user who sent the tweet continuously does not belong to the selected device set DV , As a tweet bot.
FIG. 3 is a graph illustrating the detection probability of a tweet robot according to reliability when a twot bot detection method using spatial information according to an embodiment of the present invention is used.
Referring to FIG. 3, the higher the reliability, the higher the probability that a person is recognized as a person, and the twin bot can be detected stably. On the other hand, as reliability decreases, unstable twin bots are detected, but the probability of detecting tweet bots increases accordingly.
In addition, since the present invention shows a twin bot detection probability improved by about 10 to 15% in all the reliability intervals compared to the existing technology, the probability of detecting the twin bot in the same reliability is increased.
FIG. 4 is a graph showing a correlation between a bot detection probability (Bot DP) and a false alarm probability (FAP) when the twot bot detection method using spatial information according to an embodiment of the present invention is used. FIG.
Referring to FIG. 4, it can be seen that, in the present invention, in which a user having a smaller entropy value than a specific threshold value is identified as a tweet bot, the probability of detection of the tweet bot increases as the false alarm probability increases. Compared to the conventional method, the present invention shows a higher detection probability of the tweetbot at the same false alarm probability.
FIG. 5 is a graph illustrating a correlation between a tweet bot detection probability and a false alarm probability when the twot bot detection method using spatial information according to another embodiment of the present invention is used.
Referring to FIG. 5, it can be seen that the present invention shows a higher detection probability of tweet bots in the same false alarm probability as in the conventional technology, as in the twot bot detection method using spatial information according to an embodiment of the present invention.
6 is a block diagram illustrating one embodiment of a smart device that performs methods in accordance with the present invention.
Referring to FIG. 6, the
The
The
The methods according to the present invention can be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions recorded on the computer readable medium may be those specially designed and constructed for the present invention or may be available to those skilled in the computer software.
Examples of computer readable media include hardware devices that are specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those generated by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate with at least one software module to perform the operations of the present invention, and vice versa.
It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined in the appended claims. It will be possible.
100: Smart Device
110: Processor
120): memory
130: Network interface device
140: input interface device
150: Output interface device
160: Storage device
170: bus
Claims (17)
Constructing a data set comprising a large geo-tagged message;
Setting a threshold for time information that enables specified reliability using time information between consecutively published messages by the same user in the data set;
Setting a threshold value for distance information that enables specified reliability using distance information between transmission positions of consecutively posted messages by the same user in the data set;
Determining whether an entropy value for the message-to-message time information is less than a threshold value for the time information;
Determining whether an entropy value of the distance information between transmission positions is less than a threshold value of the distance information when the entropy value for the message-to-message time information is smaller than the threshold value for the time information; And
And if the entropy value of the distance information between the transmission positions is smaller than the threshold value for the distance information, discriminating the user who has posted the message continuously as the SNS bot.
In constructing the data set,
Wherein the data set is collected and configured through a streaming API.
In constructing the data set,
A SNS bot detection method in which each field for a device generating a message is adopted from the metadata of each message, a user ID, a latitude of a device location where the message is transmitted, a longitude of a device location where the message is transmitted,
In setting the threshold value for the time information,
Wherein the threshold value for the time information means an entropy value for time information enabling the specified reliability.
In the step of setting the threshold value for the distance information,
Wherein the threshold value for the distance information means an entropy value for distance information that enables the specified reliability.
In the step of determining the SNS bot,
Further comprising increasing SNS bot count (b_count) to obtain SNS bot detection probability (Bot DP) when the SNS bot is determined to be SNS bot.
In the step of determining the SNS bot,
Further comprising increasing the SNS user counter (h_count) to obtain a false alarm probability (FAP) when the user is mistaken for the SNS bot.
Determining whether an entropy value for the message-to-message time information is less than a threshold value for the time information,
If the entropy value of the message-to-message time information is not smaller than the threshold value for the time information, determining that the user who has posted the message continuously is an SNS user.
Determining whether an entropy value of the distance information between the transmission positions is smaller than a threshold value for the distance information,
If the entropy value of the distance information between the transmission positions is not smaller than the threshold value for the distance information, determining that the user who sent the message continuously is the SNS user.
Constructing a data set comprising a large geo-tagged message;
Setting a threshold value for distance information that enables specified reliability using distance information between transmission positions of consecutively posted messages by the same user in the data set;
Selecting the devices for sending the message by the SNS user and setting them as a set of SNS user devices;
Determining whether an entropy value of the distance information between the transmission positions is smaller than a threshold value for the distance information;
Determining whether the device used for message posting belongs to the set of SNS user devices when the entropy value for the distance information between the transmission positions is smaller than the threshold value for the distance information; And
And if the device used for message posting does not belong to the set of SNS user devices, determining a user who has consecutively posted a message as an SNS bot.
In constructing the data set,
Wherein the data set is collected and configured through a streaming API.
In constructing the data set,
A SNS bot detection method in which each field for a device generating a message is adopted from the metadata of each message, a user ID, a latitude of a device location where the message is transmitted, a longitude of a device location where the message is transmitted,
In the step of setting the threshold value for the distance information,
Wherein the threshold value for the distance information means an entropy value for distance information that enables the specified reliability.
In the step of determining the SNS bot,
Further comprising increasing SNS bot count (b_count) to obtain SNS bot detection probability (Bot DP) when the SNS bot is determined to be SNS bot.
In the step of determining the SNS bot,
Further comprising increasing the SNS user counter h_count to obtain a false alarm probability (FAP) when the SNS user is misidentified.
Determining whether an entropy value of the distance information between the transmission positions is smaller than a threshold value for the distance information,
If the entropy value of the distance information between the transmission positions is not smaller than the threshold value for the distance information, determining that the user who has posted the message continuously is the SNS user.
In determining whether the device used for message posting belongs to the SNS user device set,
If the device used for message posting belongs to the set of SNS user devices, determining that the user who has posted the message continuously is the SNS user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150156970A KR101804020B1 (en) | 2015-11-09 | 2015-11-09 | Method for sns bot detection using geographic information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150156970A KR101804020B1 (en) | 2015-11-09 | 2015-11-09 | Method for sns bot detection using geographic information |
Publications (2)
Publication Number | Publication Date |
---|---|
KR20170054167A KR20170054167A (en) | 2017-05-17 |
KR101804020B1 true KR101804020B1 (en) | 2017-12-28 |
Family
ID=59048678
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020150156970A KR101804020B1 (en) | 2015-11-09 | 2015-11-09 | Method for sns bot detection using geographic information |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR101804020B1 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102085593B1 (en) * | 2019-09-16 | 2020-03-06 | 포항공과대학교 산학협력단 | Method and device for detecting posting bot for blockchain SNS based on machine learning |
CN111817923B (en) * | 2020-07-28 | 2021-09-14 | 城云科技(中国)有限公司 | Early warning analysis method and device for sudden change of flow of switch port |
-
2015
- 2015-11-09 KR KR1020150156970A patent/KR101804020B1/en active IP Right Grant
Non-Patent Citations (3)
Title |
---|
Detecting Automation of Twitter Accounts: Are You a Human, Bot, or Cyborg?, IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING VOL. 9 NO. 6, 2012.08.23.* |
RESEARCH ARTICLE, Understanding Human Mobility from Twitter, 2015.07.* |
트위터에서 트윗 주기와 사용자 속도 사이 관계, 한국정보통신학회 논문지, 2015.06.* |
Also Published As
Publication number | Publication date |
---|---|
KR20170054167A (en) | 2017-05-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10673966B2 (en) | System and method for continuously monitoring and searching social networking media | |
JP6759844B2 (en) | Systems, methods, programs and equipment that associate images with facilities | |
Lee | Mining spatio-temporal information on microblogging streams using a density-based online clustering method | |
US10191945B2 (en) | Geolocating social media | |
US10747771B2 (en) | Method and apparatus for determining hot event | |
Lee et al. | A novel approach for event detection by mining spatio-temporal information on microblogs | |
CN105095211B (en) | The acquisition methods and device of multi-medium data | |
CN103795613B (en) | Method for predicting friend relationships in online social network | |
JP6689515B2 (en) | Method and apparatus for identifying the type of user geographic location | |
CN104080054B (en) | A kind of acquisition methods and device of exception point of interest | |
Kumar et al. | A behavior analytics approach to identifying tweets from crisis regions | |
US20140181109A1 (en) | System and method for analysing text stream message thereof | |
Yamaguchi et al. | Landmark-based user location inference in social media | |
JP2014502753A (en) | Web page information detection method and system | |
Williams et al. | Improving geolocation of social media posts | |
CN112214677B (en) | Point of interest recommendation method and device, electronic equipment and storage medium | |
KR20170037709A (en) | Method and System for determination of social network hot topic in consideration of users influence and time | |
Steiger et al. | Research on social media feeds–A GIScience perspective | |
Almaguer-Angeles et al. | Choosing machine learning algorithms for anomaly detection in smart building iot scenarios | |
JP2018055525A (en) | Text extraction device | |
CN104281646B (en) | Urban waterlogging detection method based on microblog data | |
KR101804020B1 (en) | Method for sns bot detection using geographic information | |
Unankard et al. | Location-based emerging event detection in social networks | |
Benkhelifa et al. | Framework for mobile devices analysis | |
CN111125369A (en) | Tacit degree detection method, equipment, server and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination | ||
E902 | Notification of reason for refusal | ||
E701 | Decision to grant or registration of patent right | ||
GRNT | Written decision to grant |