CN113449112A - Abnormal consignment behavior identification method and device, computer equipment and storage medium - Google Patents

Abnormal consignment behavior identification method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN113449112A
CN113449112A CN202010212199.XA CN202010212199A CN113449112A CN 113449112 A CN113449112 A CN 113449112A CN 202010212199 A CN202010212199 A CN 202010212199A CN 113449112 A CN113449112 A CN 113449112A
Authority
CN
China
Prior art keywords
node
consignment
abnormal
data
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010212199.XA
Other languages
Chinese (zh)
Inventor
马敏
李杏
胡泽柱
张硕硕
陈春璐
苗圣法
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SF Technology Co Ltd
SF Tech Co Ltd
Original Assignee
SF Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SF Technology Co Ltd filed Critical SF Technology Co Ltd
Priority to CN202010212199.XA priority Critical patent/CN113449112A/en
Publication of CN113449112A publication Critical patent/CN113449112A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Educational Administration (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Primary Health Care (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the invention discloses a method and a device for identifying abnormal consignment behaviors, computer equipment and a storage medium. The method comprises the following steps: obtaining consignment relation data, wherein the consignment relation data comprises consignment data from consignment nodes to receiving nodes; pruning the consignment relation data to obtain target consignment relation data; constructing a consignment map according to the target consignment relation data; carrying out map analysis on the consignment map to obtain an analysis result; and determining the target nodes with abnormal consignment behaviors in the consignment nodes and the consignment nodes according to the analysis result. The embodiment of the invention can improve the efficiency and the accuracy of identifying the abnormal consignment behavior.

Description

Abnormal consignment behavior identification method and device, computer equipment and storage medium
Technical Field
The invention relates to the technical field of data processing, in particular to an abnormal consignment behavior identification method and device, computer equipment and a storage medium.
Background
The field of invasion of abnormal behaviors such as counterfeiting and the like is very wide, and relates to the aspect of daily life of people. For example, the relevant personnel can manufacture the imitation by using the corresponding conditions, and the imitation can be sold as a genuine product, and the like, such as manufacturing fake wine and selling the fake wine, or selling fake certificates, and the like. Such products have high mobility and large area span, and often have the conditions of transregional, transprovincial and even transnational. With the rapid development of the logistics industry and the characteristics of the logistics industry, related personnel post the products to destinations by adopting online trading and offline express ways. The existing identification of the abnormal delivery behavior is mainly a manual identification mode, for example, the mode is determined by using blacklist and mailing waybill data and combining experience, most of the modes depending on the manual identification can only identify directly associated suspicious nodes (such as suspicious senders), or the quantity of the suspicious nodes is too much, so that the operability is not strong and the accuracy is not high.
Disclosure of Invention
The embodiment of the invention provides a method and a device for identifying an abnormal consignment behavior, a computer device and a storage medium, which can improve the efficiency and accuracy of identifying the abnormal consignment behavior.
The embodiment of the invention provides an abnormal consignment behavior identification method, which comprises the following steps:
obtaining consignment relation data, wherein the consignment relation data comprise consignment data from consignment nodes to receiving nodes;
pruning the consignment relation data to obtain target consignment relation data;
constructing a consignment map according to the target consignment relation data;
carrying out map analysis on the consignment map to obtain an analysis result;
and determining a target node with abnormal consignment behaviors in the consignee node and the consignee node according to the analysis result.
The embodiment of the present invention further provides an abnormal consignment behavior identification apparatus, including:
the system comprises an acquisition unit, a forwarding unit and a forwarding unit, wherein the acquisition unit is used for acquiring forwarding relation data, and the forwarding relation data comprises forwarding data from a forwarding node to an receiving node;
a pruning unit, configured to prune the forwarding relation data to obtain target forwarding relation data;
the construction unit is used for constructing a consignment map according to the target consignment relation data;
the analysis unit is used for carrying out map analysis on the consignment map to obtain an analysis result;
and the determining unit is used for determining the target node with abnormal consignment behaviors in the consignment node and the consignment node according to the analysis result.
Wherein, the node includes send the node and receive the node, the pruning unit includes:
the item acquisition unit is used for acquiring a risk item list;
a item pruning unit, configured to prune the mail data corresponding to the node in the posting relationship data that is not related to the risky item in the risky item list, to obtain candidate posting relationship data;
a white list determining unit, configured to determine a white list node in the sender node and the recipient node;
and the white list pruning unit is used for pruning the mail data corresponding to the white list node in the candidate mail relationship data to obtain target mail relationship data.
The white list determining unit is specifically configured to: determining a total number and a number threshold of mail contacts of each node, and a frequency threshold of occurrence of the total mail volume of each node; determining a first candidate white list node according to the total number of the mail contacts and the number threshold; determining a second candidate white list node according to the frequency of the mailpiece quantity and the frequency threshold; and determining nodes existing in both the first candidate white list node and the second candidate white list node as white list nodes.
Wherein the analysis unit comprises:
the community division unit is used for carrying out community iterative division on the consignment map to obtain a community division result;
the candidate node determining unit is used for determining candidate abnormal nodes with abnormal consignment behaviors according to the community division result;
and the risk grade determining unit is used for determining the risk grade corresponding to the candidate abnormal node according to a preset scoring card strategy and taking the risk grade corresponding to the candidate abnormal node as an analysis result.
The community division unit is specifically used for carrying out community iterative division on the consignment map to obtain a community division intermediate result; detecting whether the community in the community division intermediate result is in a star structure or is a community with the node number smaller than a preset node number; if so, taking the community division intermediate result as a community division result; if not, carrying out community iterative division on the communities with non-star structures and the node number not less than the preset node number until the communities obtained after community division are in star structures or the node number is less than the preset node number.
The candidate node determining unit is specifically configured to determine central nodes of all communities in the community division result, and determine the number of neighbors of each node in all communities; and taking the node with the maximum number of the central nodes and the neighbors as a candidate abnormal node with abnormal consignment behaviors.
The risk level determination unit is specifically configured to calculate, according to a preset scoring card strategy, a scoring card score corresponding to the candidate abnormal node; acquiring a score corresponding to the risk level; and determining the risk level corresponding to the candidate abnormal node according to the score of the scoring card and the score corresponding to the risk level.
An embodiment of the present invention further provides a computer device, where the computer device includes: one or more processors; a memory; and one or more applications, wherein the processor is coupled to the memory, and the one or more applications are stored in the memory and configured to be executed by the processor in the method for identifying abnormal posting behavior.
The embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is loaded by a processor to execute the above-mentioned method for identifying an abnormal consignment behavior.
The embodiment of the invention prunes the consignment relation data, constructs the consignment map for the target consignment relation data, and performs map analysis on the consignment map so as to identify the target node with abnormal consignment behavior in the consignment relation data. Redundant nodes are found out through a pruning strategy, so that the map structure of the consignment map can be greatly reduced, and the efficiency of identifying abnormal consignment behaviors is improved; by analyzing the graph structure and determining the target node with the abnormal consignment behavior according to the analysis result, the nodes, communities and the like suspected to be abnormally consigned are accurately positioned, and the accuracy of identifying the abnormal consignment behavior is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flow chart of an abnormal consignment behavior identification method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a community iteration policy provided by an embodiment of the present invention;
fig. 3 is a flowchart illustrating an abnormal consignment behavior identification method according to an embodiment of the present invention;
fig. 4 is a flowchart illustrating an abnormal consignment behavior identification method according to an embodiment of the present invention;
fig. 5 is a schematic block diagram of an abnormal consignment behavior recognition apparatus provided by an embodiment of the present invention;
FIG. 6 is another schematic block diagram of an abnormal consignment behavior identification apparatus provided by an embodiment of the present invention;
FIG. 7 is a schematic block diagram of a computer device provided by an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it is to be understood that the terms "first", "second" and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, features defined as "first", "second", may explicitly or implicitly include one or more of the described features. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise. In addition, the terms "first" and "second" are used to distinguish a plurality of elements from each other. For example, a first constraint may be referred to as a second constraint, and similarly, a second constraint may be referred to as a first constraint, without departing from the scope of the invention. The first constraint and the second constraint are both constraints, but they are not the same constraint.
In the present disclosure, the word "exemplary" is used to mean "serving as an example, instance, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments. The following description is presented to enable any person skilled in the art to make and use the invention. In the following description, details are set forth for the purpose of explanation. It will be apparent to one of ordinary skill in the art that the present invention may be practiced without these specific details. In other instances, well-known structures and processes are not shown in detail to avoid obscuring the description of the invention with unnecessary detail. Thus, the present invention is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.
The embodiment of the invention provides a method and a device for identifying abnormal consignment behaviors, computer equipment and a storage medium. The abnormal consignment behavior identification method runs in equipment, wherein the equipment can be a server or a terminal, such as equipment of a mobile phone, a Pad, a desktop computer and the like. The following are detailed below.
Fig. 1 is a flowchart illustrating an abnormal consignment behavior identification method according to an embodiment of the present invention. As shown in fig. 1, the method includes the following specific processes:
101, obtaining consignment relation data, wherein the consignment relation data comprises consignment data between nodes.
Specifically, step 101 includes: obtaining consignment source data; and determining consignment relation data according to the consignment source data.
The consignment source data is consignment data from a consignment node to an addressee node in a certain period of time. E.g., half a year, a quarter, etc. The delivery source data includes sender information (information of a sender node), recipient information (information of a recipient node), waybill data (including a sender address, a recipient address, and the like), and routing information such as a transit station through which the waybill data passes. The sender information comprises at least one of a telephone number, an identification number or a two-dimensional code representing an independent individual of the sender, and the recipient information comprises at least one of a telephone number, an identification number or a two-dimensional code representing an independent individual of the recipient. And taking the sender or the receiver as a node, taking the sender as a sending node and taking the receiver as an receiving node. Hereinafter, the consignment node and the recipient node are also collectively referred to as a node. Specifically, the node is represented by any one of a telephone number, an identification number, or a two-dimensional code representing an independent individual.
The information in the consignment source data is very comprehensive, the data volume is very large, if the statistical analysis, the abnormal recognition and the like of the data are directly carried out on the basis, the workload is very large, and the efficiency is reduced. Therefore, in the consignment source data, data related to the identification of the abnormal consignment behavior, such as consignment relationship data, are counted and extracted, and a consignment relationship data table can be constructed according to the consignment relationship data; further, node data can be extracted, and a node data table can be constructed according to the node data. And on the basis of the data related to the abnormal consignment behavior identification, which is counted and extracted, the abnormal consignment behavior identification is further carried out on the basis, so that the efficiency of the abnormal consignment behavior identification is improved.
The node data can count basic information of all nodes, and fields such as total number of mail sending contacts and total mail sending quantity are counted so as to directly use the statistical data in subsequent abnormal mail sending behavior identification. It should be noted that the values of the statistical fields may also be counted when the values of the statistical fields need to be used later.
The consignment relationship data comprises consignment data from the consignment node to the receiving node. For example, if an article is sent to B by a, a new piece of sending data is added to the sending relation data, and the action of sending a to B is recorded. A statistical field such as the total mailpiece volume of the node, etc. The posting and delivering relation data includes a posting item class, a posting address, an receiving address, a total posting amount of the nodes and the like besides the posting node and the receiving node. The consignment category refers to a category of consignment items, such as a category entered on the manifest, for example, a document, clothes, vegetables, and the like.
In some cases, when the sending data needs to be added later, the corresponding data is added in the sending source data, and the corresponding sending data is also added in the sending relation data, and further, the sending data can be updated in the node data, so that the node data and the sending relation data are updated in real time all the time, the abnormal sending behavior can be conveniently identified by using the node data and the sending relation data at any time later, and the identification efficiency is improved.
And 102, pruning the consignment relation data to obtain target consignment relation data.
A large amount of mail sending data exists in the mail sending relationship data, and many of the mail sending data are normal mail sending data (also called as positive samples), so that the normal mail sending data need to be filtered, that is, the mail sending relationship data needs to be pruned to obtain target mail sending relationship data, the influence of the normal mail sending data on the identification result of the abnormal mail sending behavior is reduced, the data amount of subsequent processing is further reduced, and the efficiency of identifying the abnormal mail sending behavior is improved.
Pruning the forwarding relation data is mainly divided into two steps: the first step is pruning of risk categories; the second step is white list pruning.
Specifically, step 102 includes: acquiring a risk category list; pruning the mail data corresponding to nodes irrelevant to the risk categories of the risk category list in the mail delivery relationship data to obtain candidate mail delivery relationship data; determining a white list node in nodes (including a sending node and a receiving node); and pruning the mail data corresponding to the white name single node in the candidate mail relationship data to obtain target mail relationship data.
According to the business features in the actual application scene, a risk category list is extracted, wherein the risk categories represent categories frequently sent by a blacklist or related risk categories extracted according to business experience. The risk category list can be extracted according to the requirement when needed, and can also be extracted in advance and stored, and the risk category list can be directly obtained when needed.
The forwarding data corresponding to the nodes irrelevant to the risk categories of the risk category list in the forwarding relation data are pruned, or filtering the forwarding data corresponding to the nodes irrelevant to the risk categories of the risk category list in the forwarding relation data, or deleting the forwarding data corresponding to the nodes irrelevant to the risk categories of the risk category list in the forwarding relation data, so as to obtain candidate forwarding relation data. Specifically, all the mail data corresponding to each node in the mail relationship data are obtained; detecting whether the sending item classes in all the sending data of the node are risk item classes in a risk item class list or not; if the fact that the risk categories in the risk category list exist in the sending categories of all the sending data of the node is detected, the node is determined to be a node related to the risk categories; if the fact that the risk categories in the risk category list do not exist in the sending categories of all the sending data of the node is detected, the node is determined to be a node irrelevant to the risk categories; deleting all mail data corresponding to nodes irrelevant to the risk categories; thus, candidate forwarding relation data is obtained.
The white list nodes in the determination nodes (including the sending nodes and the receiving nodes) can be determined according to data of a plurality of different dimensions. Specifically, counting the frequency corresponding to each dimension data, and determining a threshold corresponding to each dimension data; determining candidate white list nodes corresponding to each dimension data in the nodes according to the frequency of each dimension data and the corresponding threshold; and determining nodes existing in the candidate white list nodes corresponding to each dimension data as white list nodes, or determining nodes of the candidate white list nodes which hit N dimensions at the same time as the white list nodes, wherein N is smaller than the total dimension N.
Assuming that a plurality of different dimensions are N dimensions, that is, a total dimension is N, determining nodes existing in candidate white list nodes corresponding to each dimension data as white list nodes, and also determining nodes of the candidate white list nodes which hit N dimension data at the same time as the white list nodes. For example, if a certain node is a white list candidate node of the N1, N2, or N3 dimensional data, it is determined that the node is not a white list node; assuming that a certain node is a candidate white list node of the dimensional data of N1, N2, N3 and N4, the node is determined to be a white list node.
Assuming that the plurality of different dimensions are N dimensions, namely the total dimension is N, determining a node of a candidate white list node which simultaneously hits N dimensions of data as a white list node, wherein N > N. For example, if N is 4 and N is 3, and a certain node is a candidate white list node of the N1, N2, and N3 dimensional data, the node is determined to be a white list node; assuming that other nodes are candidate white list nodes of N1, N2 and N4 dimensional data, or candidate white list nodes of N2, N3 and N4 dimensional data, or candidate white list nodes of N1, N3 and N4 dimensional data, determining that the nodes are all white list nodes; assuming that a certain node is a candidate white list node of the N1 and N3 dimensional data, the node is determined not to be a white list node.
Wherein, a plurality of different dimension data can be determined according to specific scenes. The method comprises the following steps of analyzing by using four dimensional data, such as the total number of mail contacts of a node (including a mail node and a receiving node), the total mail volume of the node (including the mail node and the receiving node), the class length corresponding to the mail data of the node (including the mail node and the receiving node), and the mail address and the receiving address of the node (including the mail node and the receiving node), and obtaining a candidate white list node corresponding to the dimensional data. It should be noted that there may be more dimensional data or less dimensional data.
(1) Total number of mail contacts of node
Acquiring (or counting) the total number of mail contacts of each node, and determining a number threshold; and if the total number of the mail contacts exceeds the number threshold, determining that the node is a first candidate white list node.
For example, if node a mails a dad, mom, colleagues, the total number of mail contacts for node a is 3. The quantity threshold may be determined based on the total number of particular mail contacts, for example, the quantity threshold may be determined to be 100. It will be appreciated that a merchant is generally considered a white list and that the total number of mailer contacts for a merchant is relatively large. The purpose of this dimension data is to filter merchants.
Wherein the determination number threshold may be determined by: firstly, calculating the occurrence times of the total number of the mail contacts of each node; if the ratio of the number of times to the sum of the number of times of the total number of the mail contacts of all the nodes exceeds P1, wherein P1 is percentage or corresponding decimal, 0% < P1< 100%, can be 80%, 50%, and the like, the node corresponding to the number of times is removed; in the times of the total number of the mail contacts of the remaining nodes, if the ratio of the times of the remaining nodes to the times of the total number of the mail contacts of the remaining nodes exceeds P1, removing the nodes corresponding to the times until the ratio of the times of the remaining nodes to the times of the total number of the mail contacts of the remaining nodes does not exceed P1; then a number threshold is determined from the remaining number of times by the quantile, e.g., the quantile corresponding to 3/4 is determined as the number threshold.
For example, assuming that P1 is 50%, there are 100 nodes in total, where the total number of mail contacts of 60 nodes is 5, the number of times that the total number of mail contacts of a node is 5 appears is 60, the ratio of the number of times that the total number of mail contacts of a node is 60 to the sum of the numbers of times that the node appears is 60/100-0.6, and is greater than 50%, the 60 nodes of the total number of 5 are removed, the remaining nodes become 40, and then in 40 nodes, the above steps are repeated until the ratio of the remaining number of times to the sum of the number of times that the total number of mail contacts of the remaining node appears exceeds P1 in all the times. The number threshold is determined according to the quantile from the remaining number of times, for example, the quantile corresponding to 3/4 is determined as the number threshold.
As can be understood, when determining the number threshold, first calculating the number of times of occurrence of the total number of the mail contacts of each node; removing outliers in the number of times; the number threshold is determined in quantiles from the number of times after the outliers are removed. Wherein, the outlier is removed to reduce the influence of the outlier on the whole; in addition, the quantile is used for determining the quantity threshold value, the quantile is used for measuring from the whole, the influence of outliers on the whole is removed, and the quantile is used for determining the quantity threshold value, so that the final result is more accurate.
For example, in the above example, the total number of mail contacts with 60 nodes is 5, which may be understood as an outlier, if the outlier is not removed, the threshold determined by the score method is higher, because the influence of the number of times being 5 affects the whole, which may possibly result in that the total number of mail contacts without nodes is greater than the determined number threshold.
(2) Total mailpiece volume of node
Acquiring (or counting) the total sending quantity of each node; counting the occurrence frequency of the total mailpiece quantity of each node, and determining a frequency threshold; and if the frequency of the occurrence of the total mail volume of the node exceeds the frequency threshold value, determining the node as a second candidate white list node. Wherein the determination frequency threshold value can be determined according to the frequency of occurrence of the total parasitic load of each node.
For example, the total parasitic quantity of a certain node is 5, and the total parasitic quantities of other nodes further include 2, 3, 4, 5, 6, 100, 500, and 1000, where the total parasitic quantities are 2, 3, 4, 5, 6, 100, 500, and 1000, respectively, and the times of occurrence of the total parasitic quantities are 700, 1000, 800, 600, 900, 500, 200, and 10, respectively. The frequency of occurrence of the total number of the mails of 5 is 600/(700+1000+800+600+900+500+200+ 10). And if the frequency threshold value is 0.3, determining the node with the frequency of the total parasitic quantity of the node exceeding 0.3 as a second candidate white list node. It can be understood that, the higher the frequency of occurrence of the total mail volume of the node is, the more times of mail are indicated, most of the mail data cannot be abnormal data, and therefore, the node whose frequency of occurrence of the total mail volume exceeds the frequency threshold is determined as the candidate white list node.
The method for determining the frequency threshold value is determined by the following steps: firstly, acquiring (or counting) the frequency of the total sending quantity of each node; if the ratio of a certain frequency to the sum of the frequencies (the sum of the frequencies is 1) of the total parasitic loads of all the nodes exceeds P2, wherein P1 is percentage or corresponding decimal, 0% < P1< 100%, and can be 80%, 50%, and the like, the node corresponding to the frequency is removed; in the frequencies of the total parasitic quantity of the remaining nodes, if the ratio of the frequency sum of a certain frequency and the total parasitic quantity of the remaining nodes exceeds P, removing the nodes corresponding to the frequency until the ratio of the frequency sum of the total frequency sum of the parasitic quantity of the frequency and the remaining nodes does not exceed P in all the frequencies; then the frequency threshold is determined from the remaining frequencies by the quantile, for example, the quantile corresponding to 3/4 in the remaining frequencies is determined as the frequency threshold.
As can be understood, when determining the frequency threshold, first calculating the frequency of occurrence of the total parasitic quantity of each node; removing outliers in the frequency; the frequency threshold is determined in quantiles from the number of times after the outliers are removed. Wherein, the outlier is removed to reduce the influence of the outlier on the whole; in addition, the quantile is used for determining the frequency threshold value, the quantile is used for measuring from the whole, the influence of outliers on the whole is removed, and the quantile is used for determining the frequency threshold value, so that the final result is more accurate.
(3) Class length corresponding to mail data of node
The article length refers to the length of a character string corresponding to the article to be sent. Generally, the article length is small when the article is sent by the merchant, for example, the article length can be directly written into a file, clothes, shoes and the like, and the article length is relatively long when the merchant sends the article.
Acquiring (or counting) the class length of a mail sending class of a node; counting the total number of consignment times of the items corresponding to the length of each item; calculating the ratio of the total consignment times of each corresponding article to the total consignment times of all the articles; and if the ratio exceeds the preset ratio, determining the node as a third candidate white list node. The determination of the preset ratio may refer to the determination of the frequency threshold.
For example, if the ratio of the total number of consignment times of the item with the item length of 2 to the total number of consignment times of all the items exceeds 50%, the node corresponding to the consignment data with the item length of 2 is considered as the third candidate white list node. It is understood that it is unlikely that more than 50% of the data is anomalous.
The purpose of this dimension is to filter merchants.
(4) Sending address and receiving address of node
It is understood that, for each node, there are a plurality of mailing addresses and receiving addresses corresponding in the mailing relation data.
For each node, respectively acquiring (or counting) the number of times of use of the mail of each mail address and the number of times of use of the mail of each mail address; judging whether the use times of the sending articles exceed a sending article use threshold or whether the use times of the receiving articles exceed a receiving article use threshold; if yes, the node is determined as a fourth candidate white list node. For example, if the volume of the same recipient address exceeds the recipient usage threshold, it may be a merchant, corresponding to a scenario where the merchant returns.
The fourth candidate white list node may also be determined in combination with other ways. For example, for each node, the number of times of use of each forwarding address, the number of times of use of each receiving address, the forwarding address repetition rate and the receiving address repetition rate can be respectively obtained (or calculated); judging whether the using times of the sending articles exceed a sending article using threshold value or not, or whether the using times of the receiving articles exceed a receiving article using threshold value or not, or whether the sending article address repetition rate exceeds a sending article repetition rate threshold value or not, or whether the receiving article address repetition rate exceeds a receiving article repetition rate threshold value or not; and if so, determining the node as a fourth candidate white list node.
Here, the mail address repetition rate is taken as an example for explanation. For each node, acquiring (or counting) the sum N of all the mail quantities of the node and all the mail addresses Nunique (the mail addresses after deduplication) of the node, and calculating an address mail repetition rate according to the sum N of all the mail quantities of the node and all the mail addresses Nunique of the node, for example, calculating according to a formula (N-Nunique)/N, so as to obtain the address mail repetition rate. For example, if the number of times of mail transmission of a node is [2605,763,732,1,1], N is 2605+763+732+1+ 4102, and Nunique is 5, the address repetition rate is (4102-5)/4102 is 0.999. Determining a mail sending repetition rate threshold value, a mail receiving repetition rate threshold value and the like according to the mode of determining the frequency threshold value; the mail use threshold, etc. may be determined in a manner determined by the quantity threshold.
The above is a process of determining corresponding candidate white list nodes according to the four-dimensional data. The candidate white list nodes corresponding to the four dimensional data can be determined in other manners, and the candidate white list nodes can be determined by using other dimensional data. It should be noted that the process of determining the corresponding candidate white list node by the above four dimensional data may be executed synchronously or asynchronously; the execution may be in parallel or in series.
After the candidate white list nodes corresponding to each dimension data are determined, determining the white list nodes from the corresponding candidate white list nodes, for example, determining all the nodes existing in the candidate white list nodes corresponding to each dimension data as the white list nodes, or determining the nodes hitting the candidate white list nodes of N dimensions at the same time as the white list nodes, where N is smaller than the total dimension N. For example, nodes existing in a first candidate white list node, a second candidate white list node, a third candidate white list node and a fourth candidate white list node are determined as white list nodes; or determining a node in which any n candidate white list nodes exist in the four candidate white list nodes as the white list nodes, wherein n < ═ 3. Reference is made in particular to what is described above.
And pruning the mail data corresponding to the white name single node in the candidate forwarding relation data according to the determined white list node, wherein the filtering is performed on the mail data corresponding to the white name single node in the candidate forwarding relation data, or deleting the mail data corresponding to the white name single node in the candidate forwarding relation data, so as to obtain the target forwarding relation data. Specifically, all mail data corresponding to each node in the candidate consignment relationship data are obtained; detecting whether the node is a white list node or not; if yes, deleting all the mail data corresponding to the node. Other ways may also be used to delete the mail data corresponding to the white list node in the candidate mail relationship data. Thus, target consignment relationship data is obtained.
The above is a process of utilizing the risk item list and the white list nodes to prune the consignment relationship data to obtain the target consignment relationship data, and reduces the influence of normal consignment data on the identification result of the abnormal consignment behavior, namely reduces the influence of redundant nodes, and simultaneously greatly reduces the data volume of subsequent processing, reduces the map structure, and improves the efficiency of identifying the abnormal consignment behavior.
103, constructing a consignment map according to the target consignment relation data.
According to the target consignment relationship data, any one of a telephone number, an identity card number or a two-dimensional code representing an independent individual is taken as a node, and a consignment relationship (namely a consignment relationship) is taken as an edge to be drawn so as to construct a directed graph corresponding to the target consignment relationship data, and the directed graph is taken as a consignment graph (also called a consignment graph network). The attribute of the node includes whether the node is a blacklist, and the attribute of the edge includes a total mail volume. The method for constructing the directed graph can refer to any existing method for constructing the directed graph.
And 104, carrying out spectrum analysis on the consignment spectrum to obtain an analysis result.
Carrying out community iterative division on the consignment map by using a community division algorithm to obtain a community division result; and further analyzing the community division result to obtain a risk grade corresponding to the node, and taking the risk grade corresponding to the node as an analysis result.
Specifically, step 104 includes: carrying out community iterative division on the consignment map to obtain a community division result; determining candidate abnormal nodes with abnormal consignment behaviors according to the community division result; and determining a risk grade corresponding to the candidate abnormal node according to a preset scoring card strategy, and taking the risk grade corresponding to the candidate abnormal node as an analysis result.
The community iterative partitioning of the forwarding graph is performed by using a community partitioning Algorithm, for example, the community partitioning of the forwarding graph can be performed by using a Label Propagation Algorithm (LPA), a random Walk Algorithm (Walk Trap), and the like. The label propagation algorithm is a local community division method based on label propagation, label propagation is guided by using a network structure of the label propagation algorithm, no optimization function is needed, the logic is simple, and the division speed of the network with multiple nodes and a complex structure is very high. Preferably, the community iteration division is carried out on the consignment atlas by using the label propagation algorithm.
The flow of the label propagation algorithm is as follows:
(1) initializing each node in the forwarding graph network with its own community label, e.g., for node x, initializing its community label to Cx(0)=x;
(2) Setting an iteration number t;
(3) setting the traversal sequence and the set X of nodes for the nodes in the consignment graph network;
(4) for each node X e X,
let Cx(t)=f(Cx1(t-1),...,Cxm(t-1),Cx(m+1)(t-1),...,Cxk(t-1)), wherein CxAnd (t) represents the community label of the node at the t-th iteration, and the f function represents that the community label in the parameter node is the most.
(5) And (4) judging whether the iteration can be ended, if not, setting t to be t +1, and re-traversing, namely re-executing the step (3).
Based on a label propagation algorithm, a set of iteration strategies for community division is formulated, and as shown in fig. 2, the method comprises the following steps:
and 201, carrying out community iterative division on the consignment atlas to obtain an intermediate result of the community division.
Specifically, the community iterative partitioning may be performed on the forwarding map by using a community partitioning algorithm, and if the community iterative partitioning is performed on the forwarding map by using a label propagation algorithm, reference may be made to the flows (1), (3) and (4) in the label propagation algorithm.
202, detecting whether all communities in the community division intermediate result are communities with a star structure.
If the community which is not in the star structure exists, executing step 203; if all communities in the community division intermediate result are communities with a star structure, step 205 is executed.
And 203, detecting whether the community with the non-star structure is a community with the node number smaller than the preset node number.
The specific data of the preset number of nodes can be formulated according to the actual application scenario, for example, the preset number of nodes is set to 15. If the communities with the non-star structure are communities with the node number smaller than the preset node number, executing the step 205; if the community with the non-star structure is a community with the node number not less than the preset node number, step 204 is executed.
204, carrying out community iterative division on the communities in which the number of nodes in the communities in the non-star structure is not less than the preset number of nodes to obtain an intermediate result of community division, specifically, please refer to the flows of (3) and (4) in the label propagation algorithm, and then executing the step 202 until the communities obtained after community division are in the star structure or the number of nodes in the communities in which the number of nodes is less than the preset number of nodes.
And 205, taking the community division intermediate result as a community division result.
The method comprises the steps of determining whether communities in the community division intermediate result are communities with a star structure or communities with nodes not less than a preset node number or not according to the community division intermediate result divided by using a community division algorithm each time, and determining whether next iteration is required or not and whether iteration can be finished or not. Specifically, a community division intermediate result is obtained by using a community division algorithm, and if the community in the community division intermediate result is either a star community or a community with the node number smaller than a preset node number, the iteration is determined to be finished; otherwise, if the community with the non-star structure and the node number not less than the preset node number exists in the community division intermediate result, the community division algorithm is further used for performing next iteration on the community with the non-star structure and the node number not less than the preset node number. And finally, the community in the obtained community division result is either a star community or a community with the number of nodes smaller than the preset number of nodes.
It can be understood that, when the signature propagation algorithm is used in the embodiment of the present application, whether iteration can be ended is no longer determined by the number of iterations, but whether iteration can be ended is determined by the condition of the communities obtained through iteration, so that the communities obtained through iteration are all communities satisfying the condition, which is beneficial to subsequent analysis of the communities.
And after the community division result is obtained, determining candidate abnormal nodes with abnormal consignment behaviors according to the community division result. Specifically, determining a node of a candidate abnormal consignment behavior according to a community division result includes: determining central nodes of all communities in the community division result and determining the number of neighbors of each node in all communities; and taking the node with the maximum number of the central nodes and the neighbors (or the node with the number of the neighbors exceeding the preset number of the neighbors) as a candidate abnormal node with abnormal consignment behaviors. The determination of the central nodes of all communities can be realized by adopting any existing scheme for determining the central nodes of the communities. The step further determines candidate abnormal nodes with abnormal consignment behaviors, and reduces the range of the nodes identified by the abnormal consignment behaviors.
After the candidate abnormal node with the abnormal consignment behavior is determined, the risk level corresponding to the candidate abnormal node can be determined according to a preset scoring card strategy, and the risk level corresponding to the candidate abnormal node is used as an analysis result. Specifically, calculating a score of a scoring card corresponding to the candidate abnormal node according to a preset scoring card strategy; acquiring a score corresponding to the risk level; and determining the risk grade corresponding to the candidate abnormal node according to the score of the scoring card and the score corresponding to the risk grade, and taking the risk grade corresponding to the candidate abnormal node as an analysis result.
The scoring dimensionality and the scoring method in the preset scoring card strategy are as follows:
a scoring dimension comprising: the number of neighbors of the node, the contact times of the contact blacklist node, whether the node is a central node or not and the like. It should be noted that the scoring dimensions in the embodiments of the present application are only a few dimensions, and the scoring dimensions may also include more dimensions, and in particular, the scoring may also be implemented according to fewer or more dimensions.
When the scoring dimensionality is the number of neighbors of the node: acquiring (or counting) the number of neighbors corresponding to all the nodes of the candidate abnormal consignment behaviors; and normalizing the acquired neighbor number to obtain a neighbor number normalized value of the node.
For example, the number of neighbors of a certain node is 2, the number of neighbors of another node is 3.
When the scoring dimension is the contact times of the contact blacklist nodes: acquiring (or counting) the contact times of contact blacklist nodes corresponding to all candidate nodes with abnormal consignment behaviors; and normalizing the acquired contact times to obtain a contact time normalization value of the node.
For example, the number of contacts between one node and the blacklist node is 2, the number of contacts between another node and the blacklist node is 1000, and if 1000 is the maximum value of the number of contacts, the number of contacts 1000 contacting the blacklist node is normalized to 1, and the number of contacts 2 is normalized to 2/1000.
When the scoring dimension is whether the node is the center node. If the node is the central node, directly setting the node normalization value of the central node as 1; if the node is not the central node, the node normalization value is directly 0.
Finally, the scoring card score of the node is determined according to the neighbor number normalized value, the contact time normalized value, the node normalized value, the.
And determining the risk level corresponding to the candidate abnormal node according to the score of the scoring card of the candidate abnormal node. The score corresponding to the risk level may be predefined, and the higher the risk level is, the higher the corresponding score is. Acquiring a score corresponding to the risk level; and determining the risk level corresponding to the candidate abnormal node according to the score of the scoring card of the node and the score corresponding to the risk level, and determining the risk level of the candidate abnormal node as an analysis result, wherein the analysis result is an analysis result obtained by analyzing the consignment map. A higher score for the scorecard means a higher risk for the node.
And 105, determining the target node with abnormal consignment behaviors in the consignment node and the consignment node according to the analysis result.
Specifically, the node with the highest risk level may be determined as the forwarding node in the forwarding relationship data and the target node with abnormal forwarding behavior in the receiving node. Namely, the node with the highest risk level is output as the target node with abnormal consignment behavior.
For the target node with the abnormal consignment behavior, after verification, the label data can be accumulated to be used as the blacklist node, so that the identification of the subsequent abnormal consignment behavior is more accurate.
The method embodiment can prune the consignment relationship data, construct the consignment map for the target consignment relationship data, and perform community division and scoring on the consignment map so as to identify the node corresponding to the abnormal consignment behavior in the consignment relationship data. Redundant nodes are found out through a pruning strategy, so that the map structure of the consignment map can be greatly reduced, and the efficiency of identifying abnormal consignment behaviors is improved; the community is divided by the community division method, the graph structure is analyzed, the evaluation card is made, the nodes and the communities suspected of abnormal consignment can be accurately positioned, and the accuracy of identification of abnormal consignment behaviors is improved.
Fig. 3 is a flowchart illustrating an abnormal consignment behavior identification method according to an embodiment of the present application. With reference to fig. 1 and fig. 3, a method for identifying an abnormal consignment behavior in an embodiment of the present application includes: counting and extracting the consignment source data to obtain consignment relation data; pruning the consignment relationship data mainly comprises two aspects: pruning the mail data corresponding to the nodes irrelevant to the risk categories of the risk category list in the mail delivery relationship data to obtain candidate mail delivery relationship data, and pruning the mail data corresponding to the white list nodes in the candidate mail delivery relationship data to obtain target mail delivery relationship data; constructing a consignment map according to the target consignment relation data; performing graph analysis on the consignment graph to obtain an analysis result, wherein the community iterative division is performed on the consignment graph to obtain a community division result, a candidate abnormal node with abnormal consignment behavior is determined according to the community division result, a score card score of the candidate abnormal node is calculated according to a preset score card strategy, a risk grade corresponding to the candidate abnormal node is determined according to the score card score, and the risk grade corresponding to the candidate abnormal node is used as the analysis result; and determining the mail sending node in the mail sending relationship data and the target node with abnormal mail sending behavior in the mail receiving node according to the analysis result.
Fig. 4 is a schematic flowchart of an abnormal consignment behavior identification method according to an embodiment of the present application, and as shown in fig. 4, the flowchart of the abnormal consignment behavior identification method includes the following steps:
301, obtaining the consignment relationship data, where the consignment relationship data includes consignment data from a consignment node to an addressee node.
302, a list of risk categories is obtained.
The list of risk categories may be obtained in advance or may be obtained as needed.
303, pruning the forwarding data corresponding to the nodes irrelevant to the risk categories in the risk category list in the forwarding relation data to obtain candidate forwarding relation data, wherein the nodes include a forwarding node and a receiving node.
And 304, determining a white list node in the consignment node and the consignment node.
Analyzing by using four dimensional data such as the total number of mail contacts of the nodes (mail nodes and receiving nodes), the total mail volume of the nodes (mail nodes and receiving nodes), the category lengths corresponding to the mail data of the nodes (mail nodes and receiving nodes), and the mail addresses and the receiving addresses of the nodes (mail nodes and receiving nodes), so as to obtain candidate white list data corresponding to the four dimensional data, determining nodes existing in the candidate white list nodes corresponding to each dimensional data as white list nodes, or determining nodes hitting N dimensional candidate white list nodes at the same time as the white list nodes, wherein N is smaller than the total dimension N.
And 305, pruning the mail data corresponding to the white name single node in the candidate mail relationship data to obtain target mail relationship data.
And 306, constructing a consignment map according to the target consignment relation data.
307, carrying out community iterative division on the consignment atlas to obtain a community division result.
And carrying out community iterative division on the consignment atlas by using a community division algorithm to obtain a community division result. For example, the forwarding graph may be partitioned by using a label propagation algorithm through community iteration, as shown in fig. 2.
And 308, determining candidate abnormal nodes with abnormal consignment behaviors according to the community division result.
309, determining a risk level corresponding to the candidate abnormal node according to a preset scoring card strategy, and taking the risk level corresponding to the candidate abnormal node as an analysis result.
Specifically, calculating a score of a scoring card corresponding to the candidate abnormal node according to a preset scoring card strategy; acquiring a score corresponding to the risk level; and determining the risk grade corresponding to the candidate abnormal node according to the score of the scoring card and the score corresponding to the risk grade, and taking the risk grade corresponding to the candidate abnormal node as an analysis result.
And 310, determining a target node with abnormal consignment behaviors in the consignment node and the consignment node according to the analysis result.
The node with the highest risk level can be determined as the target node with abnormal consignment behavior in the consignment node and the consignment node. Namely, the node with the highest risk level is output as the target node.
For more details of the above steps, please refer to the detailed description in the embodiment of fig. 1, which is not repeated herein.
In order to better implement the method for identifying the abnormal consignment behavior in the embodiment of the present invention, an apparatus for identifying the abnormal consignment behavior is further provided in the embodiment of the present invention based on the method for identifying the abnormal consignment behavior. The identification device for the abnormal consignment behavior of the person is applied to equipment, and the equipment can be a server or a terminal, such as mobile phones, pads, desktop computers and other equipment.
Fig. 5 is a schematic block diagram of an abnormal consignment behavior recognition apparatus according to an embodiment of the present invention. The abnormal consignment behavior identification device includes an acquisition unit 401, a pruning unit 402, a construction unit 403, an analysis unit 404, and a determination unit 405.
The obtaining unit 401 is configured to obtain forwarding relation data, where the forwarding relation data includes forwarding data from a forwarding node to a receiving node.
Obtaining consignment source data; and determining consignment relation data according to the consignment source data. In the consignment source data, counting and extracting data related to the identification of the abnormal consignment behavior, for example, extracting consignment relation data, and constructing a consignment relation data table according to the consignment relation data; further, node data can be extracted, and a node data table can be constructed according to the node data.
A pruning unit 402, configured to prune the forwarding relation data to obtain target forwarding relation data.
As shown in fig. 6, the pruning unit 402 includes: a category obtaining unit 4021, a category pruning unit 4022, a white list determining unit 4023, and a white list pruning unit 4024. Wherein the content of the first and second substances,
the item type obtaining unit 4021 is configured to obtain a list of risk items.
And the item pruning unit 4022 is configured to prune the forwarding data corresponding to the nodes in the forwarding relation data that are not related to the risk items in the risk item list to obtain candidate forwarding relation data, where the nodes include a forwarding node and a receiving node.
The forwarding data corresponding to the nodes irrelevant to the risk categories of the risk category list in the forwarding relation data are pruned, or filtering the forwarding data corresponding to the nodes irrelevant to the risk categories of the risk category list in the forwarding relation data, or deleting the forwarding data corresponding to the nodes irrelevant to the risk categories of the risk category list in the forwarding relation data, so as to obtain candidate forwarding relation data.
The white list determining unit 4023 is configured to determine a white list node of the forwarding node and the receiving node.
The white list nodes are determined according to data of a plurality of different dimensions. Specifically, counting the frequency corresponding to each dimension data, and determining a threshold corresponding to each dimension data; determining candidate white list nodes corresponding to each dimension data according to the frequency of each dimension data and the corresponding threshold; and determining nodes existing in the candidate white list nodes corresponding to each dimension data as white list nodes, or determining nodes of the candidate white list nodes which hit N dimensions at the same time as the white list nodes, wherein N is smaller than the total dimension N.
For example, determining a total number and a number threshold of mail contacts per node, a frequency and a frequency threshold of occurrence of the total mail volume per node; determining a first candidate white list node according to the total number and the number threshold of the mail contacts, and determining a second candidate white list node according to the frequency and the frequency threshold of the mail volume; and determining nodes existing in both the first candidate white list node and the second candidate white list node as white list nodes.
And a white list pruning unit 4024, configured to prune the mail data corresponding to the white list node in the candidate posting relationship data to obtain target posting relationship data.
And filtering the mail data corresponding to the white name single node in the candidate mail delivery relation data, or deleting the mail data corresponding to the white name single node in the candidate mail delivery relation data to obtain the target mail delivery relation data.
A constructing unit 403, configured to construct a consignment map according to the target consignment relationship data.
According to the target consignment relationship data, any one of a telephone number, an identity card number or a two-dimensional code representing an independent individual is taken as a node, and a consignment relationship (namely a consignment relationship) is taken as an edge to be drawn so as to construct a directed graph corresponding to the target consignment relationship data, and the directed graph is taken as a consignment graph (also called a consignment graph network). The attribute of the node includes whether the node is a blacklist, and the attribute of the edge includes a total mail volume.
An analyzing unit 404, configured to perform a profile analysis on the consignment profile to obtain an analysis result.
As shown in fig. 6, the analyzing unit 404 includes: community dividing unit 4041, candidate node determination unit 4042, and risk level determination unit 4043. Wherein the content of the first and second substances,
the community dividing unit 4041 is configured to perform community iterative division on the consignment graph to obtain a community division result. The community dividing unit 4041 is specifically used for carrying out community iterative division on the consignment map to obtain a community division intermediate result; detecting whether the community in the community division intermediate result is a community with a star structure or a community with the node number smaller than a preset node number; if so, taking the intermediate result of community division as the community division result; if not, carrying out community iterative division on the communities with non-star structures and the node number not less than the preset node number to obtain an intermediate community division result, and executing the step of detecting whether the communities in the intermediate community division result are the communities with the star structures or whether the communities with the node number less than the preset node number until the communities with the star structures or the node number less than the preset node number are obtained after the community division.
The candidate node determining unit 4042 is configured to determine a candidate abnormal node with an abnormal posting behavior according to the community partitioning result. The candidate node determination unit 4042 is specifically configured to determine central nodes of all communities in the community division result, and determine the number of neighbors of each node in all communities; and taking the node with the maximum number of the central nodes and the neighbors as a candidate abnormal node with abnormal consignment behaviors.
The risk level determining unit 4043 is configured to determine a risk level corresponding to the candidate abnormal node according to a preset score card policy, and use the risk level corresponding to the candidate abnormal node as an analysis result.
The risk level determination unit 4043 is specifically configured to calculate, according to a preset score card policy, a score card score corresponding to the candidate abnormal node; acquiring a score corresponding to the risk level; and determining the risk grade corresponding to the candidate abnormal node according to the score of the scoring card and the score corresponding to the risk grade, and taking the risk grade corresponding to the candidate abnormal node as an analysis result.
Calculating score of a scoring card corresponding to the candidate abnormal node according to a preset scoring card strategy, wherein the calculating comprises the following steps: acquiring the number of neighbors of each node, the contact times of each node to a blacklist node and whether each node is a central node; normalizing the neighbor number to obtain a neighbor number normalized value, normalizing the contact times to obtain a contact time normalized value, and normalizing whether the node is a central node or not to obtain a node normalized value; and determining the score of the scoring card of each node according to the neighbor number normalization value, the contact time normalization value and the node normalization value.
The determining unit 405 is configured to determine, according to the analysis result, a target node of the forwarding node and the receiving node, where an abnormal forwarding behavior exists.
For example, the node with the highest risk level may be determined as the forwarding node in the forwarding relation data and the target node with abnormal forwarding behavior in the receiving node. Namely, the node with the highest risk level is output as the target node.
It should be noted that, as will be clear to those skilled in the art, specific implementation procedures and achieved beneficial effects of the above-mentioned apparatus and units may refer to corresponding descriptions in the foregoing method embodiments, and for convenience and brevity of description, no further description is provided herein.
The embodiment of the present invention further provides a computer device, which integrates any one of the methods for identifying an abnormal consignment behavior provided by the embodiment of the present invention, where the computer device includes:
one or more processors;
a memory; and
one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the processor for performing the steps of the method for identifying abnormal posting behavior as described in any of the above embodiments of the method for identifying abnormal posting behavior.
The embodiment of the invention also provides computer equipment, which integrates any one of the abnormal consignment behavior identification devices provided by the embodiment of the invention. Fig. 7 is a schematic diagram showing a structure of a computer device according to an embodiment of the present invention, specifically:
the computer device may include components such as a processor 501 of one or more processing cores, memory 502 of one or more computer-readable storage media, a power supply 503, and an input unit 504. Those skilled in the art will appreciate that the computer device configuration illustrated in FIG. 7 does not constitute a limitation of computer devices, and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. Wherein:
the processor 501 is a control center of the computer device, connects various parts of the entire computer device by using various interfaces and lines, and performs various functions of the computer device and processes data by running or executing software programs and/or modules stored in the memory 502 and calling data stored in the memory 502, thereby monitoring the computer device as a whole. Optionally, processor 501 may include one or more processing cores; preferably, the processor 501 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 501.
The memory 502 may be used to store software programs and modules, and the processor 501 executes various functional applications and data processing by operating the software programs and modules stored in the memory 502. The memory 502 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the computer device, and the like. Further, the memory 502 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 502 may also include a memory controller to provide the processor 501 with access to the memory 502.
The computer device further comprises a power supply 503 for supplying power to the various components, and preferably, the power supply 503 may be logically connected to the processor 501 through a power management system, so that functions of managing charging, discharging, power consumption, and the like are realized through the power management system. The power supply 503 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.
The computer device may also include an input unit 504, and the input unit 504 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
Although not shown, the computer device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 501 in the computer device loads the executable file corresponding to the process of one or more application programs into the memory 502 according to the following instructions, and the processor 501 runs the application programs stored in the memory 502, so as to implement various functions as follows:
obtaining consignment relation data, wherein the consignment relation data comprise consignment data from consignment nodes to receiving nodes;
pruning the consignment relation data to obtain target consignment relation data;
constructing a consignment map according to the target consignment relation data;
carrying out map analysis on the consignment map to obtain an analysis result;
and determining a target node with abnormal consignment behaviors in the consignee node and the consignee node according to the analysis result.
It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.
To this end, an embodiment of the present invention provides a computer-readable storage medium, which may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like. The computer program is loaded by the processor to execute the steps of any one of the methods for identifying abnormal consignment behavior provided by the embodiments of the present invention. For example, the computer program may be loaded by a processor to perform the steps of:
obtaining consignment relation data, wherein the consignment relation data comprise consignment data from consignment nodes to receiving nodes;
pruning the consignment relation data to obtain target consignment relation data;
constructing a consignment map according to the target consignment relation data;
carrying out map analysis on the consignment map to obtain an analysis result;
and determining a target node with abnormal consignment behaviors in the consignee node and the consignee node according to the analysis result.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and parts that are not described in detail in a certain embodiment may refer to the above detailed descriptions of other embodiments, and are not described herein again.
In a specific implementation, each unit or structure may be implemented as an independent entity, or may be combined arbitrarily to be implemented as one or several entities, and the specific implementation of each unit or structure may refer to the foregoing method embodiment, which is not described herein again.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
The method, the apparatus, the computer device and the storage medium for identifying an abnormal posting behavior provided by the embodiment of the present invention are described in detail above, a specific example is applied in the present document to explain the principle and the implementation of the present invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the present invention; meanwhile, for those skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. An abnormal consignment behavior identification method, comprising:
obtaining consignment relation data, wherein the consignment relation data comprise consignment data from consignment nodes to receiving nodes;
pruning the consignment relation data to obtain target consignment relation data;
constructing a consignment map according to the target consignment relation data;
carrying out map analysis on the consignment map to obtain an analysis result;
and determining a target node with abnormal consignment behaviors in the consignee node and the consignee node according to the analysis result.
2. The abnormal consignment behavior identification method according to claim 1, wherein the consignment node and the recipient node are collectively referred to as a node, and the pruning of the consignment relationship data to obtain the target consignment relationship data comprises:
acquiring a risk category list;
pruning the consignment data corresponding to the nodes irrelevant to the risk categories in the risk category list in the consignment relationship data to obtain candidate consignment relationship data;
determining a white list node of the nodes;
and pruning the mail data corresponding to the white name single node in the candidate mail relationship data to obtain target mail relationship data.
3. The abnormal consignment behavior identification method as in claim 2, wherein said determining a white list node comprises:
determining a total number and a number threshold of mail contacts of each node, and a frequency threshold of occurrence of the total mail volume of each node;
determining a first candidate white list node according to the total number of the mail contacts and the number threshold;
determining a second candidate white list node according to the frequency of the mailpiece quantity and the frequency threshold;
and determining nodes existing in both the first candidate white list node and the second candidate white list node as white list nodes.
4. The abnormal consignment behavior identification method according to claim 1, wherein said profiling said consignment profile to obtain an analysis result comprises:
carrying out community iterative division on the consignment map to obtain a community division result;
determining candidate abnormal nodes with abnormal consignment behaviors according to the community division result;
and determining the risk level corresponding to the candidate abnormal node according to a preset scoring card strategy, and taking the risk level corresponding to the candidate abnormal node as an analysis result.
5. The abnormal consignment behavior identification method according to claim 4, wherein the performing community iterative partitioning on the consignment graph to obtain a community partitioning result comprises:
carrying out community iterative division on the consignment map to obtain a community division intermediate result;
detecting whether the community in the community division intermediate result is in a star structure or is a community with the node number smaller than a preset node number;
if so, taking the community division intermediate result as a community division result;
if not, carrying out community iterative division on the communities with non-star structures and the node number not less than the preset node number until the communities obtained after community division are in star structures or the node number is less than the preset node number.
6. The abnormal consignment behavior identification method according to claim 4, wherein said determining that there is a candidate abnormal node of abnormal consignment behavior according to the community partition result comprises:
determining central nodes of all communities in the community division result and determining the number of neighbors of each node in all communities;
and taking the node with the maximum number of the central nodes and the neighbors as a candidate abnormal node with abnormal consignment behaviors.
7. The abnormal consignment behavior identification method according to claim 4, wherein said determining the risk level corresponding to the candidate abnormal node according to a preset scorecard policy comprises:
calculating score of a score card corresponding to the candidate abnormal node according to a preset score card strategy;
acquiring a score corresponding to the risk level;
and determining the risk level corresponding to the candidate abnormal node according to the score of the scoring card and the score corresponding to the risk level.
8. An abnormal consignment behavior identification device, comprising:
the system comprises an acquisition unit, a forwarding unit and a forwarding unit, wherein the acquisition unit is used for acquiring forwarding relation data, and the forwarding relation data comprises forwarding data from a forwarding node to an receiving node;
a pruning unit, configured to prune the forwarding relation data to obtain target forwarding relation data;
the construction unit is used for constructing a consignment map according to the target consignment relation data;
the analysis unit is used for carrying out map analysis on the consignment map to obtain an analysis result;
and the determining unit is used for determining the target node with abnormal consignment behaviors in the consignment node and the consignment node according to the analysis result.
9. A computer device, characterized in that the computer device comprises:
one or more processors; a memory; and one or more applications, wherein the processor is coupled to the memory, the one or more applications being stored in the memory and configured to be executed by the processor to implement the method of exception forwarding behavior identification of any of claims 1-7.
10. A computer storage medium having a computer program stored thereon, the computer program being loaded by a processor to perform the steps of the method for anomalous posting behavior identification of any one of claims 1 to 7.
CN202010212199.XA 2020-03-24 2020-03-24 Abnormal consignment behavior identification method and device, computer equipment and storage medium Pending CN113449112A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010212199.XA CN113449112A (en) 2020-03-24 2020-03-24 Abnormal consignment behavior identification method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010212199.XA CN113449112A (en) 2020-03-24 2020-03-24 Abnormal consignment behavior identification method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113449112A true CN113449112A (en) 2021-09-28

Family

ID=77807402

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010212199.XA Pending CN113449112A (en) 2020-03-24 2020-03-24 Abnormal consignment behavior identification method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113449112A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114037395A (en) * 2022-01-07 2022-02-11 国家邮政局邮政业安全中心 Abnormal consignment data identification method and system, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105159922A (en) * 2015-08-03 2015-12-16 同济大学 Label propagation algorithm-based posting data-oriented parallelized community discovery method
WO2018077039A1 (en) * 2016-10-27 2018-05-03 腾讯科技(深圳)有限公司 Community discovery method, apparatus, server, and computer storage medium
CN108764917A (en) * 2018-05-04 2018-11-06 阿里巴巴集团控股有限公司 It is a kind of fraud clique recognition methods and device
CN110795562A (en) * 2019-10-29 2020-02-14 腾讯科技(深圳)有限公司 Map optimization method, device, terminal and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105159922A (en) * 2015-08-03 2015-12-16 同济大学 Label propagation algorithm-based posting data-oriented parallelized community discovery method
WO2018077039A1 (en) * 2016-10-27 2018-05-03 腾讯科技(深圳)有限公司 Community discovery method, apparatus, server, and computer storage medium
CN108764917A (en) * 2018-05-04 2018-11-06 阿里巴巴集团控股有限公司 It is a kind of fraud clique recognition methods and device
CN110795562A (en) * 2019-10-29 2020-02-14 腾讯科技(深圳)有限公司 Map optimization method, device, terminal and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114037395A (en) * 2022-01-07 2022-02-11 国家邮政局邮政业安全中心 Abnormal consignment data identification method and system, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110009174B (en) Risk recognition model training method and device and server
CN111291816B (en) Method and device for carrying out feature processing aiming at user classification model
CN107169768B (en) Method and device for acquiring abnormal transaction data
CN107563757B (en) Data risk identification method and device
US8799193B2 (en) Method for training and using a classification model with association rule models
CN111368147B (en) Graph feature processing method and device
CN113568368B (en) Self-adaptive determination method for industrial control data characteristic reordering algorithm
CN111460315B (en) Community portrait construction method, device, equipment and storage medium
Nalepa et al. Adaptive guided ejection search for pickup and delivery with time windows
CN113449112A (en) Abnormal consignment behavior identification method and device, computer equipment and storage medium
CN115689407A (en) Account abnormity detection method and device and terminal equipment
CN111046947B (en) Training system and method of classifier and recognition method of abnormal sample
CN112686312A (en) Data classification method, device and system
CN112446660A (en) Network point clustering method, device, server and storage medium
CN113065892B (en) Information pushing method, device, equipment and storage medium
CN114021716A (en) Model training method and system and electronic equipment
CN108334488A (en) A kind of work order classification processing method and server
CN115409226A (en) Data processing method and data processing system
CN112529319A (en) Grading method and device based on multi-dimensional features, computer equipment and storage medium
CN112308419A (en) Data processing method, device, equipment and computer storage medium
CN112560433A (en) Information processing method and device
CN113837325B (en) Unsupervised algorithm-based user anomaly detection method and unsupervised algorithm-based user anomaly detection device
CN108537654A (en) Rendering intent, device, terminal device and the medium of customer relationship network
CN111552790B (en) Method and device for identifying article form
CN111159398B (en) Method and device for identifying merchant types

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination