WO2022228371A1 - 恶意流量账号检测方法、装置、设备和存储介质 - Google Patents
恶意流量账号检测方法、装置、设备和存储介质 Download PDFInfo
- Publication number
- WO2022228371A1 WO2022228371A1 PCT/CN2022/088944 CN2022088944W WO2022228371A1 WO 2022228371 A1 WO2022228371 A1 WO 2022228371A1 CN 2022088944 W CN2022088944 W CN 2022088944W WO 2022228371 A1 WO2022228371 A1 WO 2022228371A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- account
- behavior
- accounts
- node
- similarity
- Prior art date
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 42
- 238000003860 storage Methods 0.000 title claims abstract description 10
- 238000000034 method Methods 0.000 claims abstract description 26
- 230000009471 action Effects 0.000 claims description 52
- 238000004364 calculation method Methods 0.000 claims description 38
- 239000011159 matrix material Substances 0.000 claims description 38
- 238000000354 decomposition reaction Methods 0.000 claims description 11
- 239000013598 vector Substances 0.000 claims description 7
- 230000008569 process Effects 0.000 abstract description 6
- 230000006399 behavior Effects 0.000 description 288
- 230000000875 corresponding effect Effects 0.000 description 39
- 238000010586 diagram Methods 0.000 description 8
- 238000001914 filtration Methods 0.000 description 8
- 230000002776 aggregation Effects 0.000 description 6
- 238000004220 aggregation Methods 0.000 description 6
- 230000002596 correlated effect Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000011946 reduction process Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000004884 risky behavior Effects 0.000 description 1
- 238000009991 scouring Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
Definitions
- the embodiments of the present application relate to the technical field of data processing, and in particular, to a malicious traffic account detection method, apparatus, device, and storage medium.
- Malicious traffic accounts refer to the accounts of the black and gray industries that engage in activities such as scouring the Internet, draining traffic, and swiping orders. Malicious traffic accounts engage in risky behaviors such as swiping attention, swiping room popularity, and maliciously draining traffic during the live webcast, making live webcasts impossible.
- risky behaviors such as swiping attention, swiping room popularity, and maliciously draining traffic during the live webcast, making live webcasts impossible.
- the aggregation detection method of malicious traffic accounts mainly relies on the aggregation detection of accounts in the registration and login links of nodes such as device identification, international mobile equipment identification number, MAC address, and advertising identification.
- nodes such as device identification, international mobile equipment identification number, MAC address, and advertising identification.
- the comprehensiveness of the above detection content is poor, the detection efficiency and accuracy are low, and more useful information cannot be reasonably mined and utilized.
- malicious traffic accounts also have login characteristics such as simulator login and modification of device identification, group control, box control, etc., which cannot be detected by the aggregation of a login device and need to be improved.
- the embodiments of the present application provide a malicious traffic account detection method, device, device, and storage medium.
- the account behavior sequence similarity between each account is calculated by the account behavior sequence, the number of associated accounts, and the total number of accounts, and then the account behavior sequence similarity is calculated according to the account behavior.
- the sequence similarity determines malicious traffic accounts. In the process of identifying malicious traffic accounts, it does not rely on the device information of the logged-in device, and can identify malicious traffic accounts such as group control and box control, improving the efficiency and accuracy of identification.
- an embodiment of the present application provides a malicious traffic account detection method, including the following steps:
- the malicious traffic account is determined according to the similarity of the account behavior sequence.
- an embodiment of the present application provides a malicious traffic account detection device, the device comprising:
- the behavior node generation module is configured to generate corresponding account behavior nodes according to the respective associated information of each account;
- a sequence determination module configured to determine the account behavior sequence corresponding to each account according to the account behavior node
- an associated account determination module configured to determine the number of associated accounts of each account behavior node according to the account behavior node
- a similarity determination module configured to calculate the similarity of the account behavior sequence between each account according to the account behavior sequence, the number of associated accounts, and the total number of accounts;
- the malicious account determination module is configured to determine the malicious traffic account according to the similarity of the account behavior sequence.
- an embodiment of the present application provides a malicious traffic account detection device, the device includes: one or more processors; a storage device configured to store one or more programs, when the one or more programs Executed by the one or more processors, so that the one or more processors implement the malicious traffic account detection method according to the first aspect.
- an embodiment of the present application provides a storage medium for storing computer-executable instructions, where the computer-executable instructions are configured to execute the malicious traffic account detection method according to the first aspect when executed by a computer processor.
- an embodiment of the present application further provides a malicious traffic account detection program, which, when executed, can implement operations related to the malicious traffic account detection method described in the first aspect.
- the corresponding account behavior node is generated according to the respective associated information of each account, and then the account behavior sequence corresponding to each account and the number of associated accounts of each account behavior node are determined according to the account behavior node, and the corresponding account behavior node is determined according to the account behavior node.
- the behavior sequence, the number of associated accounts, and the total number of accounts are calculated to obtain the similarity of the account behavior sequence between each account, and then the malicious traffic account is determined according to the similarity of the account behavior sequence.
- the scheme identifies malicious traffic accounts based on the similarity of account behavior sequences, and does not rely on the device information of the account to log in to the device during the identification process, so that malicious traffic accounts such as group control and box control can be identified, and the detection of malicious traffic accounts is improved. Recognition efficiency and accuracy.
- FIG. 1 is a flowchart of a malicious traffic account detection method provided by an embodiment of the present application
- FIG. 2 is a flowchart of another malicious traffic account detection method provided by an embodiment of the present application.
- FIG. 3 is a flowchart of another malicious traffic account detection method provided by an embodiment of the present application.
- FIG. 4 is an exemplary directed graph provided by an embodiment of the present application.
- FIG. 5 is an exemplary strongly connected component effect diagram provided by an embodiment of the present application.
- FIG. 6 is a distribution diagram of a strongly connected component based on an account number provided by an embodiment of the present application.
- FIG. 7 is a structural block diagram of a malicious traffic account detection device provided by an embodiment of the present application.
- FIG. 8 is a schematic structural diagram of a malicious traffic account detection device according to an embodiment of the present application.
- FIG. 1 is a flowchart of a method for detecting malicious traffic accounts provided by an embodiment of the present application. This embodiment can be configured to detect malicious traffic accounts.
- the method can be executed by a computing device such as a server, and specifically includes the following steps:
- Step S101 generating a corresponding account behavior node according to the respective associated information of each account.
- the associated information of the account refers to information associated with the specific operations and behaviors of the account.
- the associated information of the account includes the action execution content of the account, and the account’s action execution content can be obtained by acquiring the action execution content of the account. associated information.
- the account behavior node refers to the node that records the execution content of each action of the account. Exemplarily, if the actions of an account within a certain period of time include following the account, watching live broadcasts, recharging, and rewarding, then within the period of time, for the account, there are four account behavior nodes, and each account behavior The node respectively records the action execution content of the account following the account, watching the live broadcast, recharging, and rewarding.
- the way to record the account action execution content in the account behavior node may be to directly record the account action execution content in words, or it may be Record the action execution content of the account with a specific code.
- the specific method of recording can be set according to actual needs, which is not limited in this solution.
- the action execution content of account A is to watch the live broadcast of account C
- the action execution content of account B is to reward account D
- the action execution content of account A watching the live broadcast of account C and Account B rewards the action execution content of account D
- the action execution content of watching the live broadcast of account C is recorded in the account action node of account A
- account B The action execution content of the tipping account D is recorded in the account behavior node of .
- step S101 specifically includes:
- the corresponding account behavior node is generated according to the action occurrence time, action occurrence node and action execution content recorded corresponding to each account.
- the account behavior node of each account is generated by obtaining the action occurrence time, action occurrence node and action execution content of each account. Exemplarily, if an account follows account A at 3 o'clock, a behavior node generated by the account records the content of "following account A at 3 o'clock", and if the account watches the live broadcast of account B at 5 o'clock, then Another behavior node generated by the account records the content of "Watch account B's live broadcast at 5 o'clock".
- the action execution content is recorded by looking up a table to determine the action index. As shown in Table 1, each action index corresponds to a type of action execution content.
- a nm action occurrence time_action occurrence node_action index.
- the generated action node is: 03_account A_code 8; if an account watches the live broadcast of account B at 5 o'clock, the generated action node is : 05_Account B_Code 9.
- Step S102 Determine, according to the account behavior node, an account behavior sequence corresponding to each account, and the number of associated accounts of each account behavior node.
- the account behavior sequence refers to a sequence including multiple account behavior nodes of the same account. After obtaining the account behavior node corresponding to each account, the account behavior sequence of each account can be determined according to the account behavior node of each account. Exemplarily, if there are three account behavior nodes of account A, the account behavior sequence of account A can be determined according to the three account behavior nodes of account A.
- the method of generating the account behavior sequence according to the account behavior nodes may be to sort the account behavior nodes according to the occurrence time of the action execution content to generate the account behavior sequence, or to generate the account behavior sequence according to the random ordering of the account behavior nodes.
- the number of associated accounts of each account behavior node is determined accordingly.
- the number of associated accounts of each account behavior node is related to each account behavior node or For accounts with the same characteristics, by obtaining the number of associated accounts of each account behavior node, it is convenient for subsequent calculation of the sequence similarity between account behavior sequences.
- the method of determining the number of associated accounts of each account behavior node is specifically: determining the number of other accounts that contain the same action execution content as each account behavior node.
- the method of determining the associated account of each account behavior node may be to determine other accounts that are consistent with the action execution content in each account behavior node. For example, if the action execution content in an account behavior node is the reward account D , other accounts whose action execution content is also the reward account D are used as the associated accounts of the account behavior node, and the number of associated accounts is obtained.
- Step 103 Calculate the similarity of the account behavior sequence between each account according to the account behavior sequence, the number of associated accounts, and the total number of accounts.
- the account behavior sequence similarity is the similarity between each account behavior sequence. If the similarity between the two account behavior sequences is higher, it means that the action nodes in the two account behavior sequences are more similar.
- the action execution content is always the same, which means that the accounts corresponding to the two account behavior sequences are likely to be controlled by the same person, and the possibility that the two accounts are malicious traffic accounts is higher.
- the total number of accounts is obtained, and then calculated according to the account behavior sequence, the number of associated accounts, and the total number of accounts, thereby obtaining Account behavior sequence similarity between each account.
- Step S104 Determine the malicious traffic account according to the similarity of the account behavior sequence.
- the malicious account can be determined according to the account behavior sequence similarity between each account.
- a similarity threshold is preset, and after calculating the similarity of the account behavior sequence between each account, the accounts are filtered according to the similarity threshold, and the account behavior sequence similarity is kept relatively high. account, and then determine the malicious traffic account from the reserved accounts.
- FIG. 2 is a flowchart of another malicious traffic account detection method provided by an embodiment of the present application, and a method for calculating the similarity of account behavior sequences between each account according to the frequency value of each account behavior node is given.
- the technical solution is as follows:
- Step S201 generating a corresponding account behavior node according to the respective associated information of each account.
- Step S202 Determine, according to the account behavior node, an account behavior sequence corresponding to each account, and the number of associated accounts of each account behavior node.
- Step S203 Calculate the frequency value of each account behavior node according to the account behavior sequence, the number of associated accounts, and the total number of accounts.
- the frequency value may be a TF-IDF value, which is a numerical statistic configured to reflect the importance of a word to a set or a document in a corpus.
- TF-IDF is a numerical statistic configured to reflect the importance of a word to a set or a document in a corpus.
- the main idea of TF-IDF is: if a word or phrase appears frequently TF in one article and rarely appears in other articles, it is considered that the word or phrase has good ability to distinguish between categories and is suitable for use to classify.
- TF-IDF is actually: TF ⁇ IDF, TF represents the frequency of term t in document d, the main idea of IDF is: if there are fewer documents d containing term t, the larger the IDF, then the term t It has good class discrimination ability.
- the TF-IDF value of each account behavior node is calculated according to the account behavior sequence, the number of associated accounts, and the total number of accounts, and the frequency of occurrence of each account
- step S203 can be specifically implemented by step S2031-step S2033, and the details are as follows:
- Step S2031 Determine the number of account behavior nodes in the account behavior sequence corresponding to each account, and determine the behavior frequency value of each account according to the number of account behavior nodes.
- the number of account behavior nodes in the account behavior sequence corresponding to each account is obtained, each account behavior node is used as a word, and the number of account behavior nodes in the account behavior sequence is used as the total word of the article
- the behavior frequency value of each account is calculated according to the calculation formula of TF value. Exemplarily, if the number of account behavior nodes in the account behavior sequence of account A is 5, then the calculation formula for determining the behavior frequency value of account A according to the number of the account behavior nodes is:
- Step S2032 Calculate and obtain the reverse behavior frequency index of each account behavior node according to the number of associated accounts of each account behavior node under each account and the total number of accounts.
- the number of associated accounts of each account behavior node under each account is taken as the total number of documents containing a certain word, and the total number of accounts is taken as the total number of documents in the corpus, and according to the calculation formula of the IDF value, each The IDF value of the account behavior node.
- the formula for calculating the reverse behavior frequency index of each account behavior node is:
- Step S2033 Calculate the frequency value of each account behavior node according to the behavior frequency value and the reverse behavior frequency index.
- the TF-IDF value can be calculated, that is, the frequency value of each account behavior node.
- the total number of associated accounts is [9, 13, 360, 761, 115, 1445, 1582], then its IDF values are [11.6, 11.3, 7.9, 7.2, 6.8, 6.5, 6.4]; multiply the TF value and the IDF value to get account A
- Step S204 Calculate, according to the frequency value of each account behavior node, the account behavior sequence similarity between each account.
- the account behavior sequence similarity between each account can be calculated according to the frequency value of each account behavior node.
- LSI is used to calculate the account behavior sequence similarity between each account
- LSI is the latent semantic index
- the LSI algorithm is based on the method based on singular value decomposition (SVD) to obtain the subject of the text, After reducing the dimension of SVD to k dimensions, the decomposition of SVD can be approximately written in the following form:
- Aij corresponds to the feature value of the jth word of the ith text, which is generally based on the preprocessed standardized TF-IDF value
- k is the number of assumed topics
- the number of topics is generally less than the number of texts.
- U il corresponds to the correlation between the i-th word and the l-th word sense
- V jm corresponds to the correlation between the j-th text and the m-th topic
- ⁇ lm corresponds to the l-th word sense and the m-th topic.
- the account behavior node is used as a word, and the account behavior sequence is used as text, and the TF-IDF value of each account behavior node is decomposed using LSI, and the account behavior sequence and the topic can be calculated.
- the correlation degree of , and on this basis, the account behavior sequence similarity between each account is calculated.
- step S204 can be specifically implemented by step S2041 and step S2043, as follows:
- Step S2041 reducing the dimension of the frequency value matrix through the matrix decomposition formula, and obtaining the correlation matrix of each account behavior sequence and behavior theme;
- Behavior themes refer to the similar types of account behavior sequences.
- the number of behavior themes can be set according to actual needs. For example, set the number of behavioral themes to 4.
- a frequency value matrix is generated according to the frequency value of each account behavior node, that is, the TF-IDF matrix.
- the matrix decomposition formula that is, the SVD decomposition formula
- the TF-IDF matrix is Reduce to k-dimension, and get the correlation matrix of each account's behavior sequence and behavior topic.
- V.T represents the transposition of V, that is, the correlation matrix of the behavior sequence and behavior topic of each account.
- Step S2043 Calculate the account behavior sequence similarity between each account based on the correlation matrix.
- the similarity of the account behavior sequence between each account can be calculated based on the correlation matrix V.T.
- the similarity between row vectors in the correlation matrix V.T may be calculated as the account behavior sequence similarity.
- step S2043 is specifically as follows: using a similarity calculation formula to calculate the two-row vectors of the correlation matrix to obtain the account behavior sequence similarity between each account.
- V i- and V j- represent row vectors in the correlation matrix VT, respectively.
- the two-row vectors of the correlation matrix are calculated by the similarity calculation formula, so that the similarity of the account behavior sequence between each account can be obtained.
- the calculation of the similarity of the account behavior sequence can be obtained. The results are shown in Table 4.
- Step S205 Determine the malicious traffic account according to the similarity of the account behavior sequence.
- the similarity of the account behavior sequence between each account is calculated according to the frequency value of the retained account behavior nodes.
- the account behavior sequences whose length of the account behavior sequence is less than 1 and the account behavior nodes whose frequency value is less than 0.5 are eliminated, there are P account behavior nodes and Q accounts remaining.
- the behavior nodes form a P ⁇ Q TF-IDF numerical matrix.
- the correlation matrix of the user’s behavior sequence and the behavior topic is obtained, and then the row vectors of the correlation matrix are calculated in pairs.
- the sequence similarity of account behavior between each account is obtained.
- the frequency value of each account behavior node is calculated, and the frequency value of each account behavior node is calculated according to the behavior of each account.
- the frequency value of the node is calculated to obtain the account behavior sequence similarity between each account. Due to the dimensionality reduction process in LSI, LSI is configured to perform large-scale calculations, so that even when the total number of accounts is relatively large, this solution can accurately calculate the sequence similarity of account behavior between each account and improve This improves the efficiency and accuracy of identifying malicious traffic accounts.
- step S204 can be implemented by step S2044 and step S2045, as follows:
- Step S2044 filtering out the account and the account behavior node according to the length of the account behavior sequence and the frequency value of each account behavior node.
- some accounts and account behavior nodes that do not meet the calculation requirements can be eliminated to reduce the amount of calculation for subsequent calculation of the similarity of account behavior sequences.
- the account whose length of the account behavior sequence does not reach the preset length value is eliminated, and the account behavior node whose frequency value of the account behavior node is less than the preset frequency value is eliminated.
- the length of the account behavior sequence is equal to 1, it is not enough to calculate the similarity of the behavior sequence between the two accounts, so the accounts whose length of the account behavior sequence is less than 1 are eliminated; because the frequency values of some popular anchors and public IPs are relatively high It is said to be small, so the account behavior nodes whose frequency value is less than 0.5 are eliminated.
- Step S2045 calculating the sequence similarity of account behavior between each account according to the frequency value of the filtered account behavior nodes.
- FIG. 3 is a flowchart of another malicious traffic account detection method provided by an embodiment of the present application, which shows a method for calculating strongly connected components by using a connected subgraph algorithm, and filtering out malicious traffic accounts according to the strongly connected components.
- the technical solution is as follows:
- Step S301 generating a corresponding account behavior node according to the respective associated information of each account.
- Step S302 Determine, according to the account behavior node, an account behavior sequence corresponding to each account, and the number of associated accounts of each account behavior node.
- Step S303 Calculate the similarity of the account behavior sequence between each account according to the account behavior sequence, the number of associated accounts, and the total number of accounts.
- Step S304 according to the similarity of the account behavior sequence between each account, screen out the account relationship pair with strong correlation.
- the account behavior sequence similarity threshold may be preset, and the strongly correlated account relationship pairs are screened according to the account behavior sequence similarity threshold.
- the filter threshold of the account behavior sequence similarity such as 30°.
- the strongly associated account relationship pair can be expressed as (account A code, account B code, the similarity of the behavior sequence between account A and account B), representing the account number. Node A and account node B are connected and the connection weight is the behavior similarity between the accounts.
- Step S305 Input the strongly associated account relationship pair into the connected subgraph, and calculate the strongly connected component of the connected subgraph based on a preset similarity threshold.
- the strongly correlated account relation pair is substituted into the connected subgraph algorithm to obtain the strongly connected component.
- the connected subgraph algorithm as shown in FIG. 4 , which is an exemplary directed graph provided by this embodiment of the application, in the directed graph G, if there is at least one path between two vertices, two vertices are called Strong connectivity.
- a directed graph G is said to be strongly connected if every two vertices of it are strongly connected.
- the maximally strongly connected subgraph of a directed graph of a non-strongly connected graph is called a strongly connected component.
- the subgraph ⁇ 1, 2, 3, 4 ⁇ is a strongly connected component, because vertices 1, 2, 3, and 4 are reachable in pairs, and ⁇ 5 ⁇ and ⁇ 6 ⁇ are also two strongly connected components, respectively.
- the general solution algorithm is Tarjan's algorithm, and the time complexity is O(N+M).
- the strongly connected components of the connected subgraph are filtered out based on the preset similarity threshold pair.
- the account behavior sequence similarity threshold is set to 0.8, and the strongly connected components of the connected subgraph are filtered out.
- filtering is performed according to the account behavior sequence similarity threshold of 0.8, and the filtered account relationship pairs with strong correlation are (account code 10, account code 11, 0.82), (account code 10, account code 14, 0.82), (account code 11, account code 14, 1), the size of the strong connected component is 3, and the effect diagram of the corresponding strong connected component is shown in Figure 5, which is an exemplary embodiment of the present application.
- Step S306 Determine the account corresponding to the strongly connected component whose number of strongly connected components is greater than the threshold of the strongly connected component as a malicious traffic account.
- malicious traffic accounts are screened out according to the number of strongly connected components.
- malicious traffic accounts can be screened out according to a preset strong connected component threshold.
- the strong connected component threshold can be set according to actual needs, and in this embodiment, the size of the strong connected component threshold is not specified. limited. Exemplarily, the strong connected component threshold is set to 8, and the strong connected components corresponding to the actual 4 accounts are extracted as shown in Table 5.
- the risk of accounts 1-4 gradually increases with the number of strong connected components, because account 1
- the number of strongly connected components is less than the threshold of strongly connected components, that is, the batch aggregation behavior is not obvious, so it is not recognized as a malicious traffic account.
- test data are as follows:
- Figure 6 is a distribution diagram of a strong connected component based on an account number provided by the embodiment of the application.
- the abscissa is the size of the strong connected component.
- the ordinate is the number of strongly connected components
- the secondary ordinate is the total number of accounts corresponding to the size of the current strongly connected components
- S1 is the curve of the number of strongly connected components
- S2 is the account quantity curve.
- 8220 accounts are distributed among 193 gangs, and 3059 accounts with more than 100 strongly connected components are distributed among 19 strongly connected components. There is no situation where one strongly connected component is associated with most of the accounts, indicating the edge relationship Choose reasonable.
- the strongly connected components of the connected subgraph are calculated by the connected subgraph algorithm, and the malicious traffic accounts are screened out according to the number of strongly connected components, so that the malicious traffic accounts can be accurately screened out.
- the device information of the device does not rely on the account to log in, which improves the identification efficiency and accuracy of malicious traffic accounts.
- FIG. 7 is a structural block diagram of a malicious traffic account detection apparatus provided by an embodiment of the present application, the apparatus is configured to execute the malicious traffic account detection method provided by the above embodiment, and has corresponding functional modules and beneficial effects of the execution method.
- the device specifically includes: a behavior node generation module 401, a sequence determination module 402, an associated account determination module 403, a similarity determination module 404, and a malicious account determination module 405, wherein,
- the behavior node generation module 401 is configured to generate corresponding account behavior nodes according to the respective associated information of each account;
- the sequence determination module 402 is configured to determine the account behavior sequence corresponding to each account according to the account behavior node;
- the associated account determination module 403 is configured to determine the number of associated accounts of each account behavior node according to the account behavior node;
- the similarity determination module 404 is configured to calculate the similarity of the account behavior sequence between each account according to the account behavior sequence, the number of associated accounts and the total number of accounts;
- the malicious account determination module 405 is configured to determine the malicious traffic account according to the similarity of the account behavior sequence.
- the similarity determination module 404 includes a frequency value calculation sub-module and a similarity calculation sub-module, wherein,
- the frequency value calculation sub-module is configured to calculate the frequency value of each account behavior node according to the account behavior sequence, the number of associated accounts and the total number of accounts;
- the similarity calculation sub-module is configured to calculate the account behavior sequence similarity between each account according to the frequency value of each account behavior node.
- the frequency value calculation sub-module includes a behavior frequency value calculation unit, an inverse behavior frequency index calculation unit, and a frequency value calculation unit, wherein,
- the frequency value calculation unit is configured to determine the number of account behavior nodes in the account behavior sequence corresponding to each account, and determine the behavior frequency value of each account according to the number of account behavior nodes;
- the reverse behavior frequency index calculation unit is configured to calculate the reverse behavior frequency index of each account behavior node according to the number of associated accounts of each account behavior node under each account and the total number of accounts;
- the frequency value calculation unit is configured to calculate the frequency value of each account behavior node according to the behavior frequency value and the reverse behavior frequency index.
- the similarity calculation submodule further includes a frequency value matrix construction unit, a dimension reduction unit, and an account behavior sequence similarity calculation unit, wherein,
- the frequency value matrix construction unit is configured to construct a frequency value matrix according to the frequency value of each account behavior node and the account behavior sequence;
- the dimension reduction unit is configured to reduce the dimension of the frequency value matrix through the matrix decomposition formula, and obtain the correlation matrix of each account behavior sequence and behavior theme;
- the account behavior sequence similarity calculation unit is configured to calculate the account behavior sequence similarity between each account based on the correlation matrix.
- the account behavior sequence similarity calculation unit is specifically configured to calculate the two-row vectors of the correlation matrix by using a similarity calculation formula to obtain the account behavior sequence similarity between each account.
- the similarity calculation submodule further includes: a filtering unit and a calculation unit, wherein,
- the filtering unit is configured to filter out accounts and account behavior nodes according to the length of the account behavior sequence and the frequency value of each account behavior node;
- the computing unit is configured to calculate the similarity of the account behavior sequence between each account according to the frequency value of the filtered account behavior nodes.
- the malicious account determination module 405 includes a screening submodule, a strongly connected component calculation submodule, and a malicious traffic account determination submodule, wherein,
- the screening sub-module is configured to screen out strongly correlated account relationship pairs according to the sequence similarity of account behavior between each account;
- the strongly connected component calculation submodule is configured to input the strongly associated account relationship pairs into the connected subgraph, and calculate the strongly connected component of the connected subgraph based on the preset similarity threshold;
- the malicious traffic account determination submodule is configured to determine an account corresponding to a strongly connected component whose number of strongly connected components is greater than the threshold of the strongly connected component as a malicious traffic account.
- the behavior node generating module 401 is configured to generate a corresponding account behavior node according to the action occurrence time, the action occurrence node and the action execution content recorded corresponding to each account.
- the behavior node generation module 401 is configured to determine the number of associated accounts of each account behavior node by: determining the number of other accounts that contain the same action execution content in each account behavior node.
- FIG. 8 is a schematic structural diagram of a malicious traffic account detection device provided by an embodiment of the present application.
- the device includes a processor 501, a memory 502, an input device 503, and an output device 504; The number can be one or more.
- a processor 501 is used as an example; the processor 501, memory 502, input device 503, and output device 504 in the device can be connected by a bus or in other ways. Connect as an example.
- the memory 502 can be configured to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the malicious traffic account detection method in the embodiment of the present application.
- the processor 501 executes various functional applications and data processing of the device by running the software programs, instructions and modules stored in the memory 502 , that is, to implement the above malicious traffic account detection method.
- the input device 503 may be configured to receive input numerical or character information, and to generate key signal input related to user settings and function control of the device.
- the output device 504 may include a display device such as a display screen.
- Embodiments of the present application further provide a storage medium containing computer-executable instructions, where the computer-executable instructions are configured to execute a malicious traffic account detection method when executed by a computer processor, and the method includes:
- the units and modules included are only divided according to functional logic, but are not limited to the above-mentioned division, as long as the corresponding functions can be realized; , the specific names of the functional units are only for the convenience of distinguishing from each other, and are not used to limit the protection scope of the embodiments of the present application.
- various aspects of the methods provided by the present application can also be implemented in the form of a program product, which includes program code for, when the program product runs on a computer device, the program code for The computer device is caused to execute the steps in the methods described above in this specification according to various exemplary embodiments of the present application.
- the computer device may execute the malicious traffic account detection method described in the embodiments of the present application.
- the program product may be implemented using any combination of one or more readable media. .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Security & Cryptography (AREA)
- Probability & Statistics with Applications (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
本申请实施例提供了一种恶意流量账号检测方法、装置、设备和存储介质。该方法包括:根据每个账号各自的关联信息生成对应的账号行为节点后,根据账号行为节点确定每个账号对应的账号行为序列,以及每个账号行为节点的关联账号数量,并根据账号行为序列、关联账号数量以及账号总量计算得到每个账号之间的账号行为序列相似度,再根据账号行为序列相似度确定恶意流量账号。该方案基于账号行为序列相似度对恶意流量账号进行识别,在识别的过程中不依赖账号登录设备的设备信息,从而能够识别出群控、箱控等恶意流量账号,提高了对恶意流量账号的识别效率以及准确性。
Description
本申请要求在2021年04月28日提交中国专利局,申请号为202110470331.1的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
本申请实施例涉及数据处理技术领域,尤其涉及一种恶意流量账号检测方法、装置、设备和存储介质。
随着网络技术以及通信技术的发展,网络直播逐渐走进了大众的生活之中,然而,网络直播行业中恶意流量账号的存在,阻碍了网络直播行业的健康发展。恶意流量账号是指,黑灰产业在互联网从事薅羊毛、引流、刷单等行为的账号,恶意流量账号在网络直播的过程中进行刷关注、刷房间人气、恶意引流等风险行为,使得网络直播行业存在着生态虚假繁荣、主播套取佣金、竞品挖走付费用户等情况。
目前对恶意流量账号进行聚集性检测的方式,主要依赖账号在设备标识、国际移动设备识别码、MAC地址、广告标识等节点注册登录环节的聚集性进行检测。然而,上述检测内容的全面性较差,检测效率和准确率均较低,不能合理挖掘利用更多的有用信息。同时,恶意流量账号除了登录设备聚集的特征外,还存在模拟器登录修改设备标识、群控、箱控等登录特征,其无法利用一台登录设备的聚集性进行检测,需要改进。
发明内容
本申请实施例提供了一种恶意流量账号检测方法、装置、设备和存储介质,通过账号行为序列、关联账号数量以及账号总量计算每个账号之间的账号行为序列相似度,之后根据账号行为序列相似度确定出恶意流量账号,在对恶意流量账号的识别过程中不依赖登录设备的设备信息,能够识别出群控、箱控等恶意流量账号,提高识别效率以及准确性。
第一方面,本申请实施例提供了一种恶意流量账号检测方法,包括以下步骤:
根据每个账号各自的关联信息生成对应的账号行为节点;
根据所述账号行为节点确定每个账号对应的账号行为序列,以及每个账号行为节点的关联账号数量;
根据所述账号行为序列、所述关联账号数量以及账号总量计算得到每个账号之间的账号行为序列相似度;
根据所述账号行为序列相似度确定恶意流量账号。
第二方面,本申请实施例提供了一种恶意流量账号检测装置,所述装置包括:
行为节点生成模块,配置为根据每个账号各自的关联信息生成对应的账号行为节点;
序列确定模块,配置为根据所述账号行为节点确定每个账号对应的账号行为序列;
关联账号确定模块,配置为根据所述账号行为节点确定每个账号行为节点的关联账号数量;
相似度确定模块,配置为根据所述账号行为序列、所述关联账号数量以及账号总量计算得到每个账号之间的账号行为序列相似度;
恶意账号确定模块,配置为根据所述账号行为序列相似度确定恶意流量账号。
第三方面,本申请实施例提供了一种恶意流量账号检测设备,所述设备包括:一个或多个处理器;存储装置,配置为存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如第一方面所述的恶意流量账号检测方法。
第四方面,本申请实施例提供了一种存储计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时配置为执行如第一方面所述的恶意流量账号检测方法。
第五方面,本申请实施例还提供一种恶意流量账号检测的程序,该程序被执行时,可以实现如第一方面所述的恶意流量账号检测方法有关的操作。
本申请实施例中,首先根据每个账号各自的关联信息生成对应的账号行为节点,之后根据账号行为节点确定每个账号对应的账号行为序列以及每个账号行为节点的关联账号数量,并根据账号行为序列、关联账号数量以及账号总量计算得到每个账号之间的账号行为序列相似度,再根据账号行为序列相似度确定恶意流量账号。该方案基于账号行为序列相似度对恶意流量账号进行识别,在识别的过程中不依赖账号登录设备的设备信息,从而能够识别出群控、箱控等恶意流量账号,提高了对恶意流量账号的识别效率以及准确性。
图1为本申请实施例提供的一种恶意流量账号检测方法的流程图;
图2为本申请实施例提供的另一种恶意流量账号检测方法的流程图;
图3为本申请实施例提供的另一种恶意流量账号检测方法的流程图;
图4为本申请实施例提供的一个示例性的有向图;
图5为本申请实施例提供的一个示例性的强连通分量效果图;
图6为本申请实施例提供的一种基于账号的强连通分量分布图;
图7为本申请实施例提供的一种恶意流量账号检测装置的结构框图;
图8为本申请实施例提供的一种恶意流量账号检测设备的结构示意图。
下面结合附图和实施例对本申请实施例作进一步的详细说明。可以理解的是,此处所描述的具体实施例可以配置为解释本申请实施例,而非对本申请实施例的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本申请实施例相关的部分而非全部结构。
图1为本申请实施例提供的一种恶意流量账号检测方法的流程图,本实施例可配置为对恶意流量账号进行检测,该方法可以由计算设备如服务器来执行,具体包括如下步骤:
步骤S101、根据每个账号各自的关联信息生成对应的账号行为节点。
在一个实施例中,账号的关联信息是指与账号具体操作、行为相关联的信息,示例性的,账号的关联信息包括账号的动作执行内容,通过获取账号的动 作执行内容即可得到账号的关联信息。账号行为节点,是指记录有账号每一个动作执行内容的节点。示例性的,若一个账号在某一个时间段内的动作包括关注账号、观看直播、充值以及打赏,则在该时间段内,对于该账号,则存在四个账号行为节点,每个账号行为节点分别记录有该账号关注账号、观看直播、充值以及打赏的动作执行内容,其中,在账号行为节点中记录账号动作执行内容的方式可以是直接以文字记录账号的动作执行内容,也可以是以特定代码记录账号的动作执行内容。记录的具体方式可以根据实际需要设置,本方案不做限定。
举例而言,账号A的动作执行内容是观看账号C的直播,账号B的动作执行内容是打赏账号D,则在生成账号行为节点时,根据账号A观看账号C的直播的动作执行内容以及账号B打赏账号D的动作执行内容,分别生成账号A的账号行为节点以及账号B的动作执行内容节点,其中账号A的账号行为节点中记录有观看账号C的直播的动作执行内容,账号B的账号行为节点中记录有打赏账号D的动作执行内容。
在一个实施例中,步骤S101具体包括:
根据每个账号对应记录的动作发生时间、动作发生节点以及动作执行内容生成对应的账号行为节点。
通过获取每个账号的动作发生时间、动作发生节点以及动作执行内容,来生成每个账号的账号行为节点。示例性的,若某一账号在3点关注账号A,则该账号生成的一个行为节点中记录有“在3点关注账号A”的内容,若该账号在5点观看账号B的直播,则该账号生成的另一个行为节点中记录有“在5点观看账号B的直播”的内容。
在一个实施例中,通过查表确定动作索引的方式来记录动作执行内容,如表1所示,每一个动作索引对应一种动作执行内容。
表1
动作索引 | 动作执行内容 | 动作索引 | 动作执行内容 |
1 | 注册IP | 7 | 充值IP |
2 | 登录IP | 8 | 关注UID |
3 | 改密设备ID | 9 | 观看UID |
4 | 该手机设备ID | 10 | 打赏UID |
5 | 绑定账号openid | 11 | 陌生人私信UID |
对于每一个账号n的任意一个行为节点m,将每个账号的每一个动作节点记录为:a
nm=动作发生时间_动作发生节点_动作索引。示例性的,若某一账号在3点关注账号A,则生成的动作节点为:03_账号A_编码8;若某一账号在在5点观看账号B的直播,则生成的动作节点为:05_账号B_编码9。
步骤S102、根据账号行为节点确定每个账号对应的账号行为序列,以及每个账号行为节点的关联账号数量。
账号行为序列指包含有同一个账号多个账号行为节点的序列,在得到每个账号对应的账号行为节点后,可根据每个账号的账号行为节点确定每个账号的账号行为序列。示例性的,账号A的账号行为节点有3个,则根据账号A的3个账号行为节点,即可确定账号A的账号行为序列。其中,根据账号行为节点生成账号行为序列的方式可以是根据动作执行内容的发生时间对账号行为节点进行排序生成账号行为序列,也可以是根据随机对账号行为节点进行排序生成账号行为序列。
在根据账号行为节点确定每个账号对应的账号行为序列的同时,相应的确定每个账号行为节点的关联账号数量,每个账号行为节点的关联账号数量即与每个账号行为节点存在一定联系或相同特征的账号,通过获取每个账号行为节点的关联账号的数量,从而便于后续对账号行为序列之间的序列相似度进行计算。
在一个实施例中,确定每个账号行为节点的关联账号数量的方式具体为:确定包含和每个账号行为节点中动作执行内容一致的其他账号的数量。具体的,确定每个账号行为节点的关联账号的方式,可以是确定和每个账号行为节点中动作执行内容一致的其他账号,例如,若某一账号行为节点中动作执行内容是打赏账号D,则将动作执行内容同样为打赏账号D的其他账号作为该账号行为节点的关联账号,并获取关联账号的数量。
步骤103、根据账号行为序列、关联账号数量以及账号总量计算得到每个账号之间的账号行为序列相似度。
账号行为序列相似度即每个账号行为序列之间的相似度,若两个账号行为 序列相似度越高,则说明这两个账号行为序列中的动作节点越相似,若两个账号行为序列中的动作执行内容总是相同的,则说明这两个账号行为序列所对应的账号很可能是由同一个人员在操控,这两个账号是恶意流量账号的可能性越高。
在根据账号行为节点确定每个账号对应的账号行为序列以及每个账号行为节点的关联账号数量后,获取账号的总量,之后根据账号行为序列、关联账号数量以及账号总量进行计算,从而得到每个账号之间的账号行为序列相似度。
步骤S104、根据账号行为序列相似度确定恶意流量账号。
在得到每个账号之间的账号行为序列相似度后,即可根据每个账号之间的账号行为序列相似度确定出恶意账号。示例性的,在一个实施例中,预先设置好相似度阈值,在计算出每个账号之间的账号行为序列相似度后,根据相似度阈值对账号进行过滤,保留账号行为序列相似度比较高的账号,之后在从保留的账号中确定出恶意流量账号。
由上述方案可知,为了对恶意流量账号进行检测,首先根据每个账号各自的关联信息生成对应的账号行为节点,之后根据账号行为节点确定每个账号对应的账号行为序列,以及每个账号行为节点的关联账号数量,并根据账号行为序列、关联账号数量以及账号总量计算得到每个账号之间的账号行为序列相似度,再根据账号行为序列相似度确定恶意流量账号。该方案通过确定每个账号的账号行为序列,并通过计算每个账号之间的账号行为序列相似度对恶意流量账号进行识别,在识别的过程中不依赖账号登录设备的设备信息,从而能够识别出群控、箱控等恶意流量账号,提高了对恶意流量账号的识别效率以及准确性。
图2为本申请实施例提供的另一种恶意流量账号检测方法的流程图,给出了根据每个账号行为节点的频率值计算得到每个账号之间的账号行为序列相似度的方法。如图2所示,技术方案具体如下:
步骤S201、根据每个账号各自的关联信息生成对应的账号行为节点。
步骤S202、根据账号行为节点确定每个账号对应的账号行为序列,以及每个账号行为节点的关联账号数量。
步骤S203、根据账号行为序列、关联账号数量以及账号总量计算得到每个账号行为节点的频率值。
在一个实施例中,频率值可以是TF-IDF值,TF-IDF值是一种数字统计,配置为反映单词对集合或者是语料库中的文档的重要程度。TF-IDF的主要思想是:如果某个词或短语在一篇文章中出现的频率TF高,并且在其他文章中很少出现,则认为此词或者短语具有很好的类别区分能力,适合用来分类。TF-IDF实际上是:TF×IDF,TF表示词条t在文档d中出现的频率,IDF的主要思想是:如果包含词条t的文档d越少,IDF越大,则说明词条t具有很好的类别区分能力。在一个实施例中,根据账号行为序列、关联账号数量以及账号总量计算得到每个账号行为节点的TF-IDF值,根据TD-IDF值来衡量每个账号行为节点出现的频率。
在一个实施例中,步骤S203具体可由步骤S2031-步骤S2033实现,具体如下:
步骤S2031、确定每个账号对应的账号行为序列中账号行为节点的个数,根据所述账号行为节点的个数确定每个账号的行为频率值。
确定每个账号的行为频率值,即确定每个账号TF值,在TF-IDF方法中,TF值的计算公式为:
在一个实施例中,获取每个账号对应的账号行为序列中账号行为节点的个数,并将每个账号行为节点作为一个词,将账号行为序列中账号行为节点的个数作为文章的总词数,根据TF值的计算公式计算出每个账号的行为频率值。示例性的,账号A的账号行为序列中账号行为节点的个数为5个,则据所述账号行为节点的个数确定账号A的行为频率值的计算公式为:
步骤S2032、根据每个账号下每个账号行为节点的关联账号数量以及账号总量计算得到每个账号行为节点的逆行为频率指数。
计算每个账号行为节点的逆行为频率指数,即计算每个账号行为节点的IDF值,在TF-IDF方法中,IDF值的计算公式为:
在一个实施例中,将每个账号下每个账号行为节点的关联账号数量作为包含某个词的文档总数,将账号总量作为语料库的文档总数,根据IDF值的计算公式,从而计算出每个账号行为节点的IDF值。示例性的,若某个账号行为节点的关联账号数量为80,账号总量为500,则此时计算每个账号行为节点的逆行为频率指数的公式为:
步骤S2033、根据行为频率值以及逆行为频率指数计算得到每个账号行为节点的频率值。
在计算出TF值以及IDF值后,在TF-IDF方法中,根据公式
TF-IDF=TF×IDF
即可计算出TF-IDF值,即每个账号行为节点的频率值。示例性的,在一个实施例中,若一个账号行为节点的行为频率值为0.2,即TF为0.2,逆行为频率指数为5.2,即IDF为5.2,则TF-IDF=0.2×5.2=1.04,从而计算得到该账号行为节点的频率值。
在一个实施例中,账号A的账号行为序列的长度为7,则其每个行为节点的TF值为1/7=0.143;账号总量为100万,账号A的账号行为序列的7个行为节点[19_账号编码1_9、20_账号编码1_9、05_账号编码2_9、06_账号编码2_9、06_账号编码3_9、12_账号编码4_9、07_账号编码5_9]关联账号总量分别为[9,13,360,761,115,1445,1582],则其IDF值分别为[11.6,11.3,7.9,7.2,6.8,6.5,6.4];将TF值与IDF值相乘即得到账号A每个账号行为节点的TF-IDF值。
步骤S204、根据每个账号行为节点的频率值计算得到每个账号之间的账号行为序列相似度。
在计算出每个账号行为节点的频率值后,即可根据每个账号行为节点的频率值计算得到每个账号之间的账号行为序列相似度。在一个实施例中,采用LSI来计算每个账号之间的账号行为序列相似度,LSI即为潜在语义索引,LSI算法是基于是基于奇异值分解(SVD)的方法来得到文本的主题的,将SVD降维到 k维后,SVD的分解可以近似写成以下形式:
对于输入的m个词,对应n个文本,A
ij对应第i个文本的第j个词的特征值,一般常用的是基于预处理后的标准化TF-IDF值,k为假设的主题数,主题数一般小于文本数。在进行SVD分解后,U
il对应第i词和第l个词义的相关度,V
jm对应第j个文本和第m个主题的相关度,∑
lm对应第l个词义和第m个主题的相关度。
在一个实施例中,在设置了主题数目后,将账号行为节点作为词,账号行为序列作为文本,对每个账号行为节点的TF-IDF值采用LSI进行分解,可计算出账号行为序列与主题的相关度,并在此基础上,对每个账号之间的账号行为序列相似度进行计算。
在一个实施例中,步骤S204具体可由步骤S2041以及步骤S2043实现,具体如下:
步骤S2041、通过矩阵分解公式对频率值矩阵进行降维,得到每个账户行为序列和行为主题的相关度矩阵;
行为主题,是指账户行为序列之间相似的类型,行为主题的数目可根据实际需要进行设置。例如,将行为主题数目设置为4个。在计算出每个账号行为节点的频率值后,根据每个账号行为节点的频率值生成频率值矩阵,即TF-IDF矩阵,之后,根据矩阵分解公式,即SVD分解公式,将TF-IDF矩阵降低到k维,得到每个账号行为序列和行为主题的相关度矩阵。
在一个实施例中,TF-IDF数值矩阵如表2所示,对表2的TF-IDF数值矩阵采用LSI降低至4维,分解公式为A=U·Sigma·V,其中,Sigma表示主题,分解结果如表3所示,表3中V.T表示V的转置,即每个账号行为序列和行为主题的相关度矩阵。
表2
表3
U | 0.0000 | -0.4722 | 0.7071 | 0.3553 |
0.0000 | 0.0000 | 0.0000 | 0.0000 | |
-0.9997 | 0.0021 | 0.0000 | -0.0176 | |
0.0000 | 0.0000 | 0.0000 | 0.0000 | |
-0.0257 | -0.0804 | 0.0000 | 0.6846 | |
0.0000 | -0.7400 | 0.0000 | -0.5278 | |
0.0000 | -0.4722 | -0.7071 | 0.3553 | |
Sigma | 1.5005 | |||
0.2906 | ||||
0.0673 | ||||
0.0671 | ||||
V.T | -1.0000 | 0.0004 | 0.0000 | -0.0008 |
0.0000 | 0.0000 | 0.0000 | 0.0000 | |
0.0000 | -0.8394 | 0.0000 | -0.2797 | |
-0.0004 | -0.3658 | 0.0000 | 0.1048 | |
-0.0005 | -0.1179 | -0.7071 | 0.6707 | |
-0.0005 | -0.1179 | 0.7071 | 0.6707 | |
-0.0004 | -0.3658 | 0.0000 | 0.1048 |
步骤S2043、基于相关度矩阵计算每个账号之间的账号行为序列相似度。
在得到TF-IDF数值矩阵的分解结果中的相关度矩阵V.T之后,即可基于相关度矩阵V.T计算每个账号之间的账号行为序列相似度。在一个实施例中,可计算相关度矩阵V.T中行向量之间的相似度作为账号行为序列相似度。
在一个实施例中,步骤S2043具体为:通过相似度计算公式对相关度矩阵的两两行向量进行计算得到每个账号之间的账号行为序列相似度。
其中,需要说明的是,相似度计算公式为:
其中,V
i-和V
j-分别表示相关度矩阵V.T中的行向量。
示例性的,以表3为例,采用相似度计算公式计算账号编码10和账号编码11之间的行为序列相似度,在表3的V.T中选取出第三行数据的以及第四行数据,第三行数据为[0.0000、-0.8394、0.0000、-0.2797],第四行数据为[-0.0004、-0.3658、0.0000、0.1048],将第三行数据和第四行数据进行四舍五入,保留小数点后的两位小数,之后根据余弦相似度公式对进行计算,可以得到:
因此,通过相似度计算公式对相关度矩阵的两两行向量进行计算,从而能够得到每个账号之间的账号行为序列相似度,根据表3中的数据,可以得到账号行为序列相似度的计算结果如表4所示。
表4
步骤S205、根据账号行为序列相似度确定恶意流量账号。
在滤除掉不符合计算要求的账号以及账号行为节点后,根据保留下来账号行为节点的频率值,计算每个账号之间的账号行为序列相似度。在一个实施例中,将账号行为序列的长度小于1的账号行为序列以及频率值小于0.5的账号行为节点进行剔除后,剩余P个账号行为节点、Q个账号,根据每个账号的每个账号行为节点组成一个P×Q的TF-IDF数值矩阵,将TF-IDF数值矩阵进行降维后,获得用户行为序列和行为主题的相关度矩阵,之后对相关度矩阵的行向量两两进行计算,从而得到每个账号之间的账号行为序列相似度。
由上述方案可知,在根据LSI计算出的每个账号的行为频率值以及每个账号行为节点的逆行为频率指数的基础上,计算出每个账号行为节点的频率值,并根据每个账号行为节点的频率值计算得到每个账号之间的账号行为序列相似度。由于LSI中具有降维的过程,因此LSI配置为进行大规模计算,从而即使在账号总量比较大的情况下,本方案也能够准确计算出每个账号之间的账号行为序列相似度,提高了对恶意流量账号的识别效率以及准确性。
在一实施例中,包括了数据滤除的处理,以优化恶意账号的整体处理流程。具体的,步骤S204可由步骤S2044以及步骤S2045实现,如下:
步骤S2044、根据账号行为序列的长度以及每个账号行为节点的频率值进行账号以及账号行为节点的滤除。
由于计算资源的限制,对于一些不符合计算要求的账号以及账号行为节点可以将其剔除,以减少后续计算账号行为序列相似度的计算量。例如,将账号行为序列的长度没有达到长度预设值的账号进行剔除,将账号行为节点的频率值小于频率预设值的账号行为节点进行剔除。在一个实施例中。当账号行为序列的长度等于1时,不足以计算账号两两之间的行为序列相似度,因此将账号行为序列的长度小于1的账号进行剔除;由于一些热门主播、公用IP的频率值相对来说较小,因此将频率值小于0.5的账号行为节点进行剔除。
步骤S2045、根据滤除后的账号行为节点的频率值计算得到每个账号之间的账号行为序列相似度。
图3为本申请实施例提供的另一种恶意流量账号检测方法的流程图,给出了利用连通子图算法计算强连通分量,根据强连通分量筛选出恶意流量账号的方法。如图3所示,技术方案具体如下:
步骤S301、根据每个账号各自的关联信息生成对应的账号行为节点。
步骤S302、根据账号行为节点确定每个账号对应的账号行为序列,以及每个账号行为节点的关联账号数量。
步骤S303、根据账号行为序列、关联账号数量以及账号总量计算得到每个账号之间的账号行为序列相似度。
步骤S304、根据每个账号之间的账号行为序列相似度,筛选出强关联的账号关系对。
由于账号行为序列相似度表示账号动作节点相似的程度,两个账号之间的账号行为序列相似度越高,则这两个账号之间的关联性越强。在一个实施例中,可预先设置账号行为序列相似度阈值,根据账号行为序列相似度阈值筛选出强关联的账号关系对。示例性的,因为账号行为序列节点选取的较多为弱关联节点,因此采用余弦相似度公式计算账号行为序列相似度时,可选取夹角比较小的值作为账号行为序列相似度的过滤阈值,如30°。在对账号行为序列相似度进行过滤后,可获得强关联的账号关系对。示例性的,对于强关联的账号关系对账号A和账号B,可以将强关联的账号关系对表示为(账号A编码,账号B编码,账号A与账号B的行为序列相似度),代表账号节点A和账号节点B相连且连接权重为账号之间的行为相似度。
步骤S305、将强关联的账号关系对输入到连通子图中,基于预设相似度阈值计算连通子图的强连通分量。
在得到强关联的账号关系对后,将强关联的账号关系对代入连通子图算法就可以获得强连通分量。对于连通子图算法,如图4所示,图4为本申请实施例提供的一个示例性的有向图,在有向图G中,如果两个顶点间至少存在一条路径,称两个顶点强连通。如果有向图G的每两个顶点都强连通,称G是一个强连通图。非强连通图有向图的极大强连通子图,称为强连通分量。例如,图4中,子图{1,2,3,4}为一个强连通分量,因为顶点1,2,3,4两两可达, {5},{6}也分别是两个强连通分量,对于强连通分量,一般的求解算法为Tarjan算法,时间复杂度为O(N+M)。
在一个实施例中,将强关联的账号关系对输入到连通子图后,根据基于预设相似度阈值对筛选出连通子图的强连通分量。示例性的,将账号行为序列相似度阈值设置为0.8,筛选出连通子图的强连通分量。例如,对于表4中的数据,按照账号行为序列相似度阈值为0.8进行过滤,筛选出的强关联的账号关系对有(账号编码10,账号编码11,0.82)、(账号编码10,账号编码14,0.82)、(账号编码11,账号编码14,1),强连通分量大小为3,对应的强连通分量的效果图如图5所示,图5为本申请实施例提供的一个示例性的强连通分量效果图。
步骤S306、将强连通分量个数大于强连通分量阈值的强连通分量对应的账号确定为恶意流量账号。
在得到连通子图的强连通分量后,根据强连通分量的个数,筛选出恶意流量账号,强连通分量的个数越多,则账号聚集性越强,账号风险也就越高。在一个实施例中,可根据预先设置的强连通分量阈值对筛选出恶意流量账号,可理解,强连通分量阈值可根据实际需要进行设置,在本实施例中不对强连通分量阈值的大小进行具体限定。示例性的,将强连通分量阈值设置为8,抽取实际4个账号所对应的强连通分量见如表5所示,账号1-4风险随着强连通分量的个数逐渐增加,因为账号1的强连通分量的个数小于强连通分量阈值,即批量聚集行为不明显,所以不识别为恶意流量账号。
表5
强连通分量编码 | 账号编码 | 强连通分量的个数 |
A | 1 | 5 |
B | 2 | 10 |
C | 3 | 31 |
D | 4 | 423 |
具体实例及测试数据示例性如下:
首先,随机抽取5个账号关系对,计算每个账号关系对的账号行为序列相似度,计算结果如表6所示。
表6
随机挑选最小的一个强连通分量,如表7所示,账号之间的账号行为序列相似度很高。
表7
抽取某一个小时监测出的团伙账号,分布情况如图6所示,图6为本申请实施例提供的一种基于账号的强连通分量分布图,图6中横坐标为强连通分量大小,主纵坐标为强连通分量个数,次纵坐标为当前强连通分量大小对应的账号总量,S1为强连通分量个数的曲线,S2为账号量曲线。图6中8220个账号分布在193个团伙中,强连通分量个数大于100的3059个账号分布在19个强连通分量中,没有出现1个强连通分量关联大部分账号的情况,说明边关系选取合理。
由上述方案可知,在确定恶意流量账号的过程中,通过连通子图算法计算出连通子图的强连通分量,并根据强连通分量的数目筛选出恶意流量账号,从而能够准确筛选出恶意流量账号,在此过程中不依赖账号登录设备的设备信息,提高了对恶意流量账号的识别效率以及准确性。
如图7为本申请实施例提供的一种恶意流量账号检测装置的结构框图,该装置配置为执行上述实施例提供的恶意流量账号检测方法,具备执行方法相应的功能模块和有益效果。如图7所示,该装置具体包括:行为节点生成模块401、序列确定模块402、关联账号确定模块403、相似度确定模块404以及恶意账号确定模块405,其中,
行为节点生成模块401,配置为根据每个账号各自的关联信息生成对应的账号行为节点;
序列确定模块402,配置为根据账号行为节点确定每个账号对应的账号行为序列;
关联账号确定模块403,配置为根据账号行为节点确定每个账号行为节点的关联账号数量;
相似度确定模块404,配置为根据账号行为序列、关联账号数量以及账号总量计算得到每个账号之间的账号行为序列相似度;
恶意账号确定模块405,配置为根据账号行为序列相似度确定恶意流量账号。
在一个实施例中,相似度确定模块404包括频率值计算子模块以及相似度计算子模块,其中,
频率值计算子模块配置为根据账号行为序列、关联账号数量以及账号总量计算得到每个账号行为节点的频率值;
相似度计算子模块配置为根据每个账号行为节点的频率值计算得到每个账号之间的账号行为序列相似度。
在一个实施例中,频率值计算子模块包括行为频率值计算单元,逆行为频率指数计算单元以及频率值计算单元,其中,
频率值计算单元配置为确定每个账号对应的账号行为序列中账号行为节点的个数,根据账号行为节点的个数确定每个账号的行为频率值;
逆行为频率指数计算单元配置为根据每个账号下每个账号行为节点的关联账号数量以及账号总量计算得到每个账号行为节点的逆行为频率指数;
频率值计算单元配置为根据行为频率值以及逆行为频率指数计算得到每个账号行为节点的频率值。
在一个实施例中,相似度计算子模块还包括频率值矩阵构建单元、降维单元以及账号行为序列相似度计算单元,其中,
频率值矩阵构建单元配置为根据每个账号行为节点的频率值以及账号行为序列,构建频率值矩阵;
降维单元配置为通过矩阵分解公式对频率值矩阵进行降维,得到每个账户行为序列和行为主题的相关度矩阵;
账号行为序列相似度计算单元配置为基于相关度矩阵计算每个账号之间的账号行为序列相似度。
在一个实施例中,账号行为序列相似度计算单元具体配置为通过相似度计算公式对相关度矩阵的两两行向量进行计算得到每个账号之间的账号行为序列相似度。
在一个实施例中,相似度计算子模块还包括;过滤单元以及计算单元,其中,
过滤单元配置为根据账号行为序列的长度以及每个账号行为节点的频率值进行账号以及账号行为节点的滤除;
计算单元配置为根据滤除后的账号行为节点的频率值计算得到每个账号之间的账号行为序列相似度。
在一个实施例中,恶意账号确定模块405包括筛选子模块、强连通分量计算子模块以及恶意流量账号确定子模块,其中,
筛选子模块配置为根据每个账号之间的账号行为序列相似度,筛选出强关联的账号关系对;
强连通分量计算子模块配置为将强关联的账号关系对输入到连通子图中,基于预设相似度阈值计算连通子图的强连通分量;
恶意流量账号确定子模块配置为将强连通分量个数大于强连通分量阈值的强连通分量对应的账号确定为恶意流量账号。
在一个实施例中,行为节点生成模块401配置为根据每个账号对应记录的动作发生时间、动作发生节点以及动作执行内容生成对应的账号行为节点。
在一个实施例中,行为节点生成模块401配置为确定每个账号行为节点的关联账号数量的方式为:确定包含和每个账号行为节点中动作执行内容一致的其他账号的数量。
图8为本申请实施例提供的一种恶意流量账号检测设备的结构示意图,如图8所示,该设备包括处理器501、存储器502、输入装置503和输出装置504;设备中处理器501的数量可以是一个或多个,图5中以一个处理器501为例;设备中的处理器501、存储器502、输入装置503和输出装置504可以通过总线或其他方式连接,图5中以通过总线连接为例。存储器502作为一种计算机可读存储介质,可配置为存储软件程序、计算机可执行程序以及模块,如本申请实施例中的恶意流量账号检测方法对应的程序指令/模块。处理器501通过运行存储在存储器502中的软件程序、指令以及模块,从而执行设备的各种功能应用以及数据处理,即实现上述的恶意流量账号检测方法。输入装置503可配置为接收输入的数字或字符信息,以及产生与设备的用户设置以及功能控制有关的键信号输入。输出装置504可包括显示屏等显示设备。
本申请实施例还提供一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时配置为执行恶意流量账号检测方法,该方法包括:
根据每个账号各自的关联信息生成对应的账号行为节点;
根据账号行为节点确定每个账号对应的账号行为序列,以及每个账号行为 节点的关联账号数量;
根据账号行为序列、关联账号数量以及账号总量计算得到每个账号之间的账号行为序列相似度;
根据账号行为序列相似度确定恶意流量账号。
值得注意的是,上述恶意流量账号检测装置的实施例中,所包括的各个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,各功能单元的具体名称也只是为了便于相互区分,并不用于限制本申请实施例的保护范围。
在一些可能的实施方式中,本申请提供的方法的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当所述程序产品在计算机设备上运行时,所述程序代码用于使所述计算机设备执行本说明书上述描述的根据本申请各种示例性实施方式的方法中的步骤,例如,所述计算机设备可以执行本申请实施例所记载的恶意流量账号检测方法。所述程序产品可以采用一个或多个可读介质的任意组合实现。。
Claims (12)
- 恶意流量账号检测方法,配置于服务器中,其中,包括:根据每个账号各自的关联信息生成对应的账号行为节点;根据所述账号行为节点确定每个账号对应的账号行为序列,以及每个账号行为节点的关联账号数量;根据所述账号行为序列、所述关联账号数量以及账号总量计算得到每个账号之间的账号行为序列相似度;根据所述账号行为序列相似度确定恶意流量账号。
- 根据权利要求1所述的恶意流量账号检测方法,其中,所述根据所述账号行为序列、所述关联账号数量以及账号总量计算得到每个账号之间的账号行为序列相似度,包括:根据所述账号行为序列、所述关联账号数量以及账号总量计算得到每个账号行为节点的频率值;根据所述每个账号行为节点的频率值计算得到每个账号之间的账号行为序列相似度。
- 根据权利要求2所述的恶意流量账号检测方法,其中,所述根据所述账号行为序列、所述关联账号数量以及账号总量计算得到每个账号行为节点的频率值,包括:确定每个账号对应的账号行为序列中账号行为节点的个数,根据所述账号行为节点的个数确定每个账号的行为频率值;根据每个账号下每个账号行为节点的关联账号数量以及账号总量计算得到每个账号行为节点的逆行为频率指数;根据所述行为频率值以及所述逆行为频率指数计算得到每个账号行为节点的频率值。
- 根据权利要求2所述的恶意流量账号检测方法,其中,所述根据所述每个账号行为节点的频率值计算得到每个账号之间的账号行为序列相似度,包括:根据所述每个账号行为节点的频率值以及账号行为序列,构建频率值矩阵;通过矩阵分解公式对所述频率值矩阵进行降维,得到每个账户行为序列和行为主题的相关度矩阵;基于所述相关度矩阵计算每个账号之间的账号行为序列相似度。
- 根据权利要求4所述的恶意流量账号检测方法,其中,所述基于所述相关度矩阵计算每个账号之间的账号行为序列相似度,包括:通过相似度计算公式对所述相关度矩阵的两两行向量进行计算得到每个账号之间的账号行为序列相似度。
- 根据权利要求2所述的恶意流量账号检测方法,其中,所述根据所述每个账号行为节点的频率值计算得到每个账号之间的账号行为序列相似度,包括:根据所述账号行为序列的长度以及每个账号行为节点的频率值进行账号以及账号行为节点的滤除;根据滤除后的账号行为节点的频率值计算得到每个账号之间的账号行为序列相似度。
- 根据权利要求1-6中任一项所述的恶意流量账号检测方法,其中,所述根据所述账号行为序列相似度确定恶意流量账号,包括:根据所述每个账号之间的账号行为序列相似度,筛选出强关联的账号关系对;将所述强关联的账号关系对输入到连通子图中,基于预设相似度阈值计算所述连通子图的强连通分量;将强连通分量个数大于强连通分量阈值的强连通分量对应的账号确定为恶意流量账号。
- 根据权利要求1-6中任一项所述的恶意流量账号检测方法,其中,所述根据每个账号各自的关联信息生成对应的账号行为节点,包括:根据每个账号对应记录的动作发生时间、动作发生节点以及动作执行内容生成对应的账号行为节点。
- 根据权利要求1-8中任一项所述的恶意流量账号检测方法,其中,确定每个账号行为节点的关联账号数量的方式包括:确定包含和每个账号行为节点中动作执行内容一致的其他账号的数量。
- 恶意流量账号检测装置,其中,所述装置包括:行为节点生成模块,配置为根据每个账号各自的关联信息生成对应的账号行为节点;序列确定模块,配置为根据所述账号行为节点确定每个账号对应的账号行 为序列;关联账号确定模块,配置为根据所述账号行为节点确定每个账号行为节点的关联账号数量;相似度确定模块,配置为根据所述账号行为序列、所述关联账号数量以及账号总量计算得到每个账号之间的账号行为序列相似度;恶意账号确定模块,配置为根据所述账号行为序列相似度确定恶意流量账号。
- 一种恶意流量账号检测设备,所述设备包括:一个或多个处理器;存储装置,配置为存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-9中任一项所述的恶意流量账号检测方法。
- 一种存储计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时配置为执行如权利要求1-9中任一项所述的恶意流量账号检测方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110470331.1A CN113297840B (zh) | 2021-04-28 | 2021-04-28 | 恶意流量账号检测方法、装置、设备和存储介质 |
CN202110470331.1 | 2021-04-28 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022228371A1 true WO2022228371A1 (zh) | 2022-11-03 |
Family
ID=77320443
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/088944 WO2022228371A1 (zh) | 2021-04-28 | 2022-04-25 | 恶意流量账号检测方法、装置、设备和存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113297840B (zh) |
WO (1) | WO2022228371A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117061254A (zh) * | 2023-10-12 | 2023-11-14 | 之江实验室 | 异常流量检测方法、装置和计算机设备 |
CN117235654A (zh) * | 2023-11-15 | 2023-12-15 | 中译文娱科技(青岛)有限公司 | 一种人工智能的数据智能处理方法及系统 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113297840B (zh) * | 2021-04-28 | 2024-05-24 | 百果园技术(新加坡)有限公司 | 恶意流量账号检测方法、装置、设备和存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105373614A (zh) * | 2015-11-24 | 2016-03-02 | 中国科学院深圳先进技术研究院 | 一种基于用户账号的子用户识别方法及系统 |
US10009358B1 (en) * | 2014-02-11 | 2018-06-26 | DataVisor Inc. | Graph based framework for detecting malicious or compromised accounts |
CN110427999A (zh) * | 2019-07-26 | 2019-11-08 | 武汉斗鱼网络科技有限公司 | 一种账号相关性评估方法、装置、设备及介质 |
CN112116007A (zh) * | 2020-09-18 | 2020-12-22 | 四川长虹电器股份有限公司 | 基于图算法和聚类算法的批量注册账号检测方法 |
CN113297840A (zh) * | 2021-04-28 | 2021-08-24 | 百果园技术(新加坡)有限公司 | 恶意流量账号检测方法、装置、设备和存储介质 |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104917643B (zh) * | 2014-03-11 | 2019-02-01 | 腾讯科技(深圳)有限公司 | 异常账号检测方法及装置 |
US10699009B2 (en) * | 2018-02-28 | 2020-06-30 | Microsoft Technology Licensing, Llc | Automatic malicious session detection |
CN108984721A (zh) * | 2018-07-10 | 2018-12-11 | 阿里巴巴集团控股有限公司 | 垃圾账号的识别方法和装置 |
CN109376354A (zh) * | 2018-09-26 | 2019-02-22 | 出门问问信息科技有限公司 | 欺诈行为识别方法、装置、电子设备及可读存储介质 |
CN112182520B (zh) * | 2019-07-03 | 2024-01-26 | 腾讯科技(深圳)有限公司 | 非法账号的识别方法、装置、可读介质及电子设备 |
CN111031017B (zh) * | 2019-11-29 | 2021-12-14 | 腾讯科技(深圳)有限公司 | 一种异常业务账号识别方法、装置、服务器及存储介质 |
CN111371767B (zh) * | 2020-02-20 | 2022-05-13 | 深圳市腾讯计算机系统有限公司 | 恶意账号识别方法、恶意账号识别装置、介质及电子设备 |
CN111695019B (zh) * | 2020-06-11 | 2023-08-08 | 腾讯科技(深圳)有限公司 | 一种识别关联账号的方法及装置 |
CN111865925A (zh) * | 2020-06-24 | 2020-10-30 | 国家计算机网络与信息安全管理中心 | 基于网络流量的诈骗团伙识别方法、控制器和介质 |
CN112468523B (zh) * | 2021-02-02 | 2021-07-06 | 北京明略昭辉科技有限公司 | 异常流量检测方法、装置、设备及存储介质 |
-
2021
- 2021-04-28 CN CN202110470331.1A patent/CN113297840B/zh active Active
-
2022
- 2022-04-25 WO PCT/CN2022/088944 patent/WO2022228371A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10009358B1 (en) * | 2014-02-11 | 2018-06-26 | DataVisor Inc. | Graph based framework for detecting malicious or compromised accounts |
CN105373614A (zh) * | 2015-11-24 | 2016-03-02 | 中国科学院深圳先进技术研究院 | 一种基于用户账号的子用户识别方法及系统 |
CN110427999A (zh) * | 2019-07-26 | 2019-11-08 | 武汉斗鱼网络科技有限公司 | 一种账号相关性评估方法、装置、设备及介质 |
CN112116007A (zh) * | 2020-09-18 | 2020-12-22 | 四川长虹电器股份有限公司 | 基于图算法和聚类算法的批量注册账号检测方法 |
CN113297840A (zh) * | 2021-04-28 | 2021-08-24 | 百果园技术(新加坡)有限公司 | 恶意流量账号检测方法、装置、设备和存储介质 |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117061254A (zh) * | 2023-10-12 | 2023-11-14 | 之江实验室 | 异常流量检测方法、装置和计算机设备 |
CN117061254B (zh) * | 2023-10-12 | 2024-01-23 | 之江实验室 | 异常流量检测方法、装置和计算机设备 |
CN117235654A (zh) * | 2023-11-15 | 2023-12-15 | 中译文娱科技(青岛)有限公司 | 一种人工智能的数据智能处理方法及系统 |
CN117235654B (zh) * | 2023-11-15 | 2024-03-22 | 中译文娱科技(青岛)有限公司 | 一种人工智能的数据智能处理方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
CN113297840B (zh) | 2024-05-24 |
CN113297840A (zh) | 2021-08-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022228371A1 (zh) | 恶意流量账号检测方法、装置、设备和存储介质 | |
US10268952B2 (en) | Image searching | |
US9460117B2 (en) | Image searching | |
US8370278B2 (en) | Ontological categorization of question concepts from document summaries | |
US9158772B2 (en) | Partial and parallel pipeline processing in a deep question answering system | |
US20160196490A1 (en) | Method for Recommending Content to Ingest as Corpora Based on Interaction History in Natural Language Question and Answering Systems | |
WO2017148267A1 (zh) | 一种文本信息聚类方法和文本信息聚类系统 | |
US10657186B2 (en) | System and method for automatic document classification and grouping based on document topic | |
Baeza-Yates | Big data or right data? | |
Hsu et al. | Integrating machine learning and open data into social Chatbot for filtering information rumor | |
CN108021651A (zh) | 一种网络舆情风险评估方法及装置 | |
Kershaw et al. | Towards modelling language innovation acceptance in online social networks | |
Deng et al. | Exploring and inferring user–user pseudo‐friendship for sentiment analysis with heterogeneous networks | |
Kim et al. | Two applications of clustering techniques to twitter: Community detection and issue extraction | |
US20120226695A1 (en) | Classifying documents according to readership | |
Singh et al. | Event detection from real-time twitter streaming data using community detection algorithm | |
TW201820173A (zh) | 去識別化資料產生裝置、方法及其電腦程式產品 | |
Shah et al. | Artificial intelligence as a service for immoral content detection and eradication | |
Karimi et al. | Evaluation methods for statistically dependent text | |
Ruan et al. | Prediction of topic volume on twitter | |
CN111966920B (zh) | 舆情传播的稳定条件的预测方法、装置及设备 | |
CN113392200A (zh) | 基于用户学习行为的推荐方法及装置 | |
CN103491074A (zh) | 僵尸网络检测方法及装置 | |
Mannan et al. | An Empirical study on theories of sentiment analysis in relation to fake news detection | |
CN111723349A (zh) | 一种用户识别方法、装置、设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22794841 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22794841 Country of ref document: EP Kind code of ref document: A1 |