CN112699217B - Behavior abnormal user identification method based on user text data and communication data - Google Patents

Behavior abnormal user identification method based on user text data and communication data Download PDF

Info

Publication number
CN112699217B
CN112699217B CN202011588924.XA CN202011588924A CN112699217B CN 112699217 B CN112699217 B CN 112699217B CN 202011588924 A CN202011588924 A CN 202011588924A CN 112699217 B CN112699217 B CN 112699217B
Authority
CN
China
Prior art keywords
user
node
abnormal
nodes
behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011588924.XA
Other languages
Chinese (zh)
Other versions
CN112699217A (en
Inventor
程鹏飞
敬好青
何芳
刘敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xi'an Jiusuo Data Technology Co ltd
Original Assignee
Xi'an Jiusuo Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xi'an Jiusuo Data Technology Co ltd filed Critical Xi'an Jiusuo Data Technology Co ltd
Priority to CN202011588924.XA priority Critical patent/CN112699217B/en
Publication of CN112699217A publication Critical patent/CN112699217A/en
Application granted granted Critical
Publication of CN112699217B publication Critical patent/CN112699217B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Evolutionary Computation (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a behavior abnormity user identification method based on user text data and communication data, which comprises the steps of establishing user text data according to a mobile phone number of a user, and filtering the user text data by keywords and expansion words to obtain suspected abnormal text; when the number of the suspected abnormal texts is larger than zero, constructing a user abnormal behavior recognition model based on the user text content; when the number of the suspected abnormal texts is not larger than zero, constructing an abnormal behavior user identification model of social network analysis; judging the user behavior abnormity by adopting a user abnormal behavior identification model based on the user text content, and if so, listing the user behavior abnormity in a behavior abnormity personnel information base according to the owner information; if not, entering an abnormal behavior user identification model of social network analysis; and judging the user behavior abnormity by adopting an abnormal behavior user identification model analyzed by the social network, and if so, listing the abnormal behavior user identification model into a behavior abnormity personnel information base according to the owner information.

Description

Behavior abnormal user identification method based on user text data and communication data
Technical Field
The invention discloses application of computer technology in the field of public safety, and particularly relates to a method for identifying a user with abnormal behavior based on user text data and communication data.
Background
With the development of deep learning and computer vision, behavior recognition has made significant progress and is widely applied in the field of public security. At present, in the field of human behavior recognition, relevant features are mostly directly extracted from original video frames, and a deep learning network model is utilized for recognition. The method has a large amount of information redundancy, so that large noise is brought to the neural network model, the accuracy and the speed of behavior recognition are influenced, and a large amount of missed detection exists.
Social crime behaviors such as terrorism, explosion and virus are involved, which seriously endangers social security, destroys national economy, greatly hinders the development and progress of human society and brings great challenges to social harmony and stability. The behavior characteristics of the abnormal users are analyzed, the abnormal users are actively found and are subjected to key monitoring, social harm activities can be effectively prevented, and the method has great significance for maintaining social stability.
Disclosure of Invention
The invention aims to provide a method for identifying users with abnormal behaviors based on user text data and communication data, which comprehensively considers the text data and the communication data of users, constructs an abnormal user identification model in multiple aspects, actively discovers the users with abnormal behaviors, accurately attacks the criminal behaviors damaging the social stability and reduces the cost for maintaining the social stability on one hand, and eliminates the young ends harming the social behaviors from the source before activities occur by monitoring the users with the abnormal behaviors on the other hand.
In order to realize the task, the invention adopts the following technical solution:
a behavior abnormal user identification method based on user text data and communication data is characterized by comprising the following steps:
the first step is as follows: establishing user text data according to the mobile phone number of a user, and filtering the user text data through keywords and expansion words to obtain suspected abnormal text;
the second step: when the number of the suspected abnormal texts is larger than zero, constructing a user abnormal behavior recognition model based on the user text content;
when the number of the suspected abnormal texts is not larger than zero, constructing an abnormal behavior user identification model of social network analysis;
the third step: judging the user behavior abnormity by adopting a user abnormal behavior identification model based on the user text content, and if so, listing the user behavior abnormity in a behavior abnormity personnel information base according to the owner information; if not, entering an abnormal behavior user identification model of social network analysis;
the fourth step: and judging the user behavior abnormity by adopting an abnormal behavior user identification model analyzed by the social network, and if so, listing the abnormal behavior user identification model into an abnormal behavior personnel information base according to the owner information.
According to the invention, the construction steps of the user abnormal behavior recognition model based on the user text content are as follows:
step 1: establishing an abnormal behavior keyword word bank omega by combining various common abnormal behavior words 1 Using omega 1 Preliminarily screening suspected abnormal text omega 1
And 2, step: using abnormal behavior keywords as root words to expand words, establishing an expanded word bank omega containing keywords but having no abnormal behavior 2 Using omega 2 Further filter out omega 1 In (1) contains omega 2 To obtain omega 2
And step 3: constructing an abnormal text recognition model based on a Bert pre-training text classification method, wherein the model structure consists of five layers, namely a text input layer, a word embedding layer, a multi-level Transformer layer, a full connection layer and a Softmax layer; wherein, each input unit content E of the text input layer i Corresponds to omega 2 The ith token of each sentence sequence, the first mark of each sequence is special classification embedding, and the word embedding layer embeds the words E i Translating the corresponding word vector V i
The word vector V i The composition of (A) is as follows:
V i =tokenembeddings(E i )+segmentation embeddings(E i )+position embeddings(E i );
wherein, tokeneaddressing (E) i ) Word vector conversion using WordPiece embedding and 30,000 vocabularies, positional embeddings (E) i ) According to each E i At the position where it is transformed, supporting the length of the sequence at the most512 for limiting the length of the sentence, segmentation templates (E) i ) The sentence pairs are packed into a sequence, a plurality of sentences can be input simultaneously, and are separated by special marks; secondly, adding a spare sensor A to be embedded into each token of a first sentence, and embedding a sensor B into each token of a second sentence; the multistage Transformer performs feature extraction on the input word vectors and outputs context features T of each character i ,C i Representing a hidden state of the classification category of the input sentence, C i And a full connection layer is connected in the rear for extracting characteristic category characteristics, and the probability of each category is calculated through a Softmax layer, namely: p (C) i )=Softmax(C i W); w is the weight of the full connection layer, the probability values of all categories are compared, and the text is classified into the category with high probability;
the model training parameters include: learning rate l 1 Maximum length of text max _ length, training times epochs, amount of training data batch _ size each time, and extracting part ω 2 Data omega 3 As training data, class labeling is performed, model parameters are adjusted, a model is trained by using a training set, and a test set omega is used 4 I.e. omega 3 Testing the stability and accuracy of the model on the subset to determine the optimal parameters of the model;
and 4, step 4: will omega 2 Inputting the marked omega into the trained model 2 Each text type is marked as an abnormal user number by combining the number of the user aiming at the abnormal text;
and 5: and associating the user owner information table to obtain detailed information such as certificate numbers, addresses and the like of abnormal users.
Further, the construction steps of the abnormal behavior user identification model of the social network analysis are as follows:
step 1: according to the short message data and the communication data of the abnormal user, calculating the first-degree relation user communication service number and the second-degree relation service number of the abnormal user, counting the communication times between the first-degree relation user communication service number and the second-degree relation service number, and establishing an input node relation set S1 (P) I ->P O ->num),P I Is an input node, P O Is an opposite-end output node, num is a two-node userThe first degree relation user refers to a user who has direct communication with an abnormal user, the second degree relation user refers to a user who does not have direct communication with the abnormal user, but has communication with the abnormal user;
step 2: computing a set of input node relationships S1 (P) I ->P O ->num), including owner name, social attribute;
and step 3: with input node relation set S1 (P) I ->P O ->num) as a basis, establishing an input node relation and node label table S2, which comprises an input node service number, an input node owner name, an input node social attribute, an output node service number, an output node owner name, an output node social attribute and the number of communication;
and 4, step 4: according to the node label table S2, a force guide graph layout is adopted to draw a relation network of input nodes, the number of times of communication between users is used as the connection line weight between the nodes, when the attractive force between the nodes is different based on the weight of the edges, the larger the weight of the edges is, the larger the attractive force between the nodes is, the more the nodes are gathered, and all the input nodes are traversed to establish the whole abnormal user social network;
and 5: and constructing an output index to control the display content of the social network.
Specifically, the specific steps of constructing the output index to control the display content of the social network are as follows:
step 1: summarizing the opposite ends of the S1, removing input nodes in the opposite ends to form an opposite end node set S3;
step 2: calculating the number of friends of each input node and the number of friends of an opposite end node, wherein the number of friends of a user is defined as the number of users communicating with the user;
and step 3: removing nodes with the input node friend number of 1 in the opposite-end node set S3 to form a candidate node set S4;
and 4, step 4: counting the input node friend numbers of all candidate nodes to form a candidate node counting table S5, wherein the candidate node counting table comprises the input node friend numbers of the candidate nodes, the candidate node numbers and the accumulated ratio, and the input node friend numbers are sorted from large to small;
and 5: and using the accumulated ratio as a specific output index of the display of the social network, when the output index is 0, only the input nodes are displayed, and when the output index is 1, the input nodes and all the candidate nodes are displayed.
Preferably, the step of identifying an abnormal user flow comprises:
step 1: inputting the mobile phone number of the user, extracting and analyzing all text data of the user in a time range, and utilizing the omega of the behavior abnormal word bank 1 And extended thesaurus omega 2 Screening out suspected abnormal text omega 2
And 2, step: judging suspected abnormal text omega 2 Quantity, selecting different recognition models; if ω is 2 And if the quantity is larger than zero, selecting an abnormal user identification model based on the text content for identification, otherwise, selecting an abnormal user identification method based on the social network analysis, and judging whether the user has abnormal behaviors.
The method for screening out key people of other users with abnormal behaviors from the network by adopting a network analysis algorithm based on the social network of the users with abnormal behaviors comprises the following steps:
step 1: k-shell value calculation
Calculating K-shell values for networks formed by all the nodes, counting the K-shell values of all the nodes, forming an all-node statistical table, wherein the all-node statistical table comprises the K-shell values, the node number and the accumulative ratio, the accumulative ratio is used as a specific display index, when the value is 0, no node is displayed, and when the core index is 1, all the nodes are displayed;
the K-shell value calculation method comprises the following steps: firstly, finding out nodes with all degrees of 1, deleting the nodes, then continuously searching the nodes with the degrees of 1 in the rest nodes, deleting the nodes until no node with the degrees of 1 exists in the network, assigning the K-shell value to be 1 for all the nodes with the degrees of 1 deleted before, namely KS =1, and then searching the nodes with the degrees of other values by the same method;
step 2: median centrality calculation
1) Selecting a source node s to perform breadth-first search and calculating sigma st ,σ st Representing the number of s → t shortest paths;
2) Each node is reserved as a set of other node predecessors in the traversal process, and the predecessor node is defined as P s (v) { u, V }. Epsilon.E, d (s, V) = d (s, u) + d (u, V) }, if the shortest path from s → V contains a node (u, V), then u belongs to this set of s → V predecessor nodes, where V is all nodes in the network, E is the set of two nodes of an edge, and d (s, V) represents the minimum number of nodes passed by the node s to V;
3) Calculating the median central degree delta s (v)=∑w:v∈P s (w)σ sv σ sw (1+δ s (w)), v is the s to w precursor node; the betweenness centrality reflects the information transmission capability of the node, and the higher the betweenness centrality of the node is, the more important the position of the node in the network is;
and step 3: and screening users with abnormal behaviors, adjusting the K-shell value, the betweenness central value, the output index and the node degree, screening the social network, and when the user node appears in the rest networks, indicating that the user is the user with abnormal behaviors.
The method for identifying the user with the abnormal behavior based on the user text data and the communication data adopts a network analysis algorithm to mine key people of other users with the abnormal behavior from a network; and associating the user owner information table to obtain the detailed information of the user with the abnormal behavior. Text data and communication data of users are comprehensively considered, an abnormal user identification model is constructed in multiple aspects, users with abnormal behaviors are actively discovered, on one hand, criminal behaviors damaging social stability are accurately struck, the cost for maintaining social stability is reduced, on the other hand, the young people damaging social behaviors are eliminated from the source before activities occur by monitoring users with abnormal behaviors.
Drawings
FIG. 1 is a flow chart of a method for identifying a user with abnormal behavior based on user text data and communication data according to the present invention;
FIG. 2 is a flow chart of building a user abnormal behavior recognition model based on user text content;
FIG. 3 is a diagram of an abnormal text recognition model structure;
FIG. 4 is a schematic diagram of an anomalous user social network.
The present invention will be described in further detail with reference to the following drawings and examples.
Detailed Description
Referring to fig. 1, the embodiment provides a method for identifying a user with abnormal behavior based on user text data and communication data, which specifically includes the following steps:
the first step is as follows: establishing user text data according to the mobile phone number of the user, and filtering the user text data by keywords and expansion words to obtain suspected abnormal text;
the second step is that: when the number of the suspected abnormal texts is larger than zero, constructing a user abnormal behavior recognition model based on the user text content;
when the number of the suspected abnormal texts is not larger than zero, constructing an abnormal behavior user identification model for social network analysis;
the third step: judging the user behavior abnormity by adopting a user abnormal behavior identification model based on the user text content, and if so, listing the user behavior abnormity in a behavior abnormity personnel information base according to the owner information; if not, entering an abnormal behavior user identification model of social network analysis;
the fourth step: and judging the user behavior abnormity by adopting an abnormal behavior user identification model analyzed by the social network, and if so, listing the abnormal behavior user identification model into an abnormal behavior personnel information base according to the owner information.
1. The user abnormal behavior recognition model based on the user text content is constructed and shown in the figure 2, and the method comprises the following steps:
step 1: establishing an abnormal behavior keyword word bank omega by combining various common abnormal behavior words 1 Using omega 1 Preliminarily screening suspected abnormal text omega 1
Step 2: using abnormal behavior keywords as root word expansion, establishing an expanded word bank omega containing keywords but having no abnormal behavior 2 Using omega 2 Further filter out omega 1 In which omega is contained 2 To obtain omega 2
And 3, step 3: method for constructing abnormal text recognition model based on Bert pre-training text classification method(structural diagram see fig. 3), consisting of five layers, i.e., a text input layer, a word embedding layer, a multi-level Transformer layer, a full-link layer, and a Softmax layer; wherein, each input unit content E of the text input layer i Corresponding to omega 2 The ith token of each sentence sequence, the first mark of each sequence is special classification embedding, and the word embedding layer embeds the words E i Translating the corresponding word vector V i (ii) a The word vector V i The composition of (A) is as follows:
V i =tokenembeddings(E i )+segmentation embeddings(E i )+position embeddings(E i );
wherein, tokeneaddressing (E) i ) Word vector transformations, positional embeddings (E), using WordPiece embedding and 30,000 vocabularies i ) According to each E i The position where the sentence is located is converted, the supported sequence length is at most 512, the length of the sentence is limited, and the segmentation elements (E) i ) The sentence pairs are packed into a sequence, a plurality of sentences can be input simultaneously, and are separated by special marks; secondly, adding a spare sensor A to be embedded into each token of the first sentence, and embedding a sensor B into each token of the second sentence; the multistage Transformer performs feature extraction on the input word vectors and outputs context features T of each character i ,C i Representing a hidden state of the classification category of the input sentence, C i And then connecting a full connection layer to extract the characteristic class characteristics, and calculating the probability of each class through a Softmax layer, namely: p (C) i )=Softmax(C i W); w is the weight of the full connection layer, the probability values of all categories are compared, and the text is classified into the category with high probability;
the model training parameters include: learning rate l 1 Maximum length of text max _ length, number of training epochs, amount of training data per time batch _ size, and extraction of portion ω 2 Data omega 3 As training data, class labeling is performed, model parameters are adjusted, a model is trained by using a training set, and a test set omega is used 43 Subset of) to test the stability and accuracy of the model to determine the model's optimal parametersAnd (4) counting.
And 4, step 4: will omega 2 Inputting the marked omega into the trained model 2 And each text type is marked as an abnormal user number by combining the number of the user aiming at the abnormal text.
And 5: and associating the user owner information table to obtain detailed information such as certificate numbers, addresses and the like of abnormal users.
2. Constructing an abnormal behavior user identification model of social network analysis, comprising the following steps:
step 1: according to the short message data and the communication data of the abnormal user, calculating the first-degree relation user communication service number and the second-degree relation service number of the abnormal user, counting the communication times between the first-degree relation user communication service number and the second-degree relation service number, and establishing an input node relation set S1 (P) I ->P O ->num),P I Is an input node, P O The number num is the number of times of communication between two node users, a first-degree relation user refers to a user who has direct communication with an abnormal user, a second-degree relation user refers to a user who has no direct communication with the abnormal user but has communication with the abnormal user;
step 2: calculating data labels of all users in the S1, including owner names, social attributes and the like;
and step 3: based on the S1, establishing an input node relation and node label table S2, which comprises an input node service number, an input node owner name, an input node social attribute, an output node service number, an output node owner name, an output node social attribute and the number of communication;
and 4, step 4: according to the S2, a force guide graph layout is adopted to draw a relation network of input nodes, the number of times of communication between users is used as the connection line weight between the nodes, when the attraction between the nodes is different based on the weight of the edges, the larger the weight of the edges is, the larger the attraction between the nodes is, the more the nodes are gathered, and all the input nodes are traversed to establish the whole abnormal user social network, as shown in FIG. 4.
And 5: constructing an output index to control the display content of the social network, wherein the specific steps comprise;
step 5.1: summarizing the opposite ends of the S1, removing input nodes therein, and forming an opposite end node set S3;
step 5.2: calculating the number of friends of each input node and the number of friends of an opposite end node, wherein the number of friends of a user is defined as the number of users communicating with the user;
step 5.3: removing nodes with the input node friend number of 1 in the opposite-end node set S3 to form a candidate node set S4;
step 5.4: counting the input node friend numbers of all candidate nodes to form a candidate node counting table S5, wherein the candidate node counting table comprises the input node friend numbers of the candidate nodes, the candidate node numbers and the accumulated ratio, and the input node friend numbers are sorted from large to small;
step 5.5: and using the accumulated ratio as a specific output index of the display of the social network, when the output index is 0, only the input nodes are displayed, and when the output index is 1, the input nodes and all the candidate nodes are displayed.
3. Based on the social network of the abnormal users, a key person analysis method is established, and other abnormal users are screened out from the network, and the method is characterized by comprising the following steps:
step 1: k-shell value calculation
Calculating K-shell values for networks formed by all the nodes, counting the K-shell values of all the nodes, forming an all-node statistical table, wherein the all-node statistical table comprises the K-shell values, the node number and the accumulative ratio, the accumulative ratio is used as a specific display index, when the value is 0, no node is displayed, and when the core index is 1, all the nodes are displayed. The K-shell value calculation method comprises the following steps: firstly, finding out all nodes with the degree of 1, deleting the nodes, then continuously searching the nodes with the degree of 1 in the rest nodes, and deleting the nodes. Until no node with the degree of 1 exists in the network, all the nodes with the degree of 1 deleted before are assigned with the value of 1, namely KS =1, and then nodes with the degrees of other values are searched in the same way.
Step 2: median centrality calculation
Step 2.1, selecting a source node s to perform breadth-first search, and calculating sigma st ,σ st Representing the number of s → t shortest paths;
step 2.2. In the traversal procedure, keep everyEach node is taken as a precursor set of other nodes, and the precursor node is defined as P s (v) { u ∈ V: { u, V } ∈ E, d (s, V) = d (s, u) + d (u, V) }, if the shortest path from s → V contains a node (u, V), then u belongs to the s → V predecessor node set, where V is all nodes of the network, E is two node sets of an edge, and d (s, V) represents the minimum number of nodes passed by the node s to V;
step 2.3. Calculating the betweenness centrality delta s (v)=∑w:v∈P s (w)σ sv σ sw (1+δ s (w)), v is the s to w precursor node. The betweenness centrality reflects the information transmission capability of the node, and the higher the betweenness centrality of the node is, the more important the node is at the position in the network.
And step 3: and screening users with abnormal behaviors, adjusting the K-shell value, the betweenness central value, the output index and the node degree, screening the social network, and when the user node appears in the rest networks, indicating that the user is the user with abnormal behaviors.
4. The step of identifying the abnormal user flow comprises the following steps:
step 1: inputting the mobile phone number of the user, extracting and analyzing all text data of the user in a time range, and utilizing the omega of the behavior abnormal word bank 1 And extended thesaurus omega 2 Screening out suspected abnormal text omega 2
Step 2: judging suspected abnormal text omega 2 Quantity, different recognition models are selected. If ω is 2 And if the quantity is larger than zero, selecting an abnormal user identification model based on the text content for identification, otherwise, selecting an abnormal user identification method based on the social network analysis, and judging whether the user has abnormal behaviors.

Claims (4)

1. A behavior abnormal user identification method based on user text data and communication data is characterized by comprising the following steps:
the first step is as follows: establishing user text data according to the mobile phone number of the user, and filtering the user text data by keywords and expansion words to obtain suspected abnormal text;
the second step is that: when the number of the suspected abnormal texts is larger than zero, constructing a user abnormal behavior recognition model based on the user text content;
the user text content-based user abnormal behavior recognition model construction method comprises the following steps:
step 1: establishing an abnormal behavior keyword word bank omega by combining various common abnormal behavior words 1 Using omega 1 Preliminarily screening suspected abnormal text omega 1
Step 2: using abnormal behavior keywords as root words to expand words, establishing an expanded word bank omega containing keywords but having no abnormal behavior 2 By using omega 2 Further filter out omega 1 In (1) contains omega 2 To obtain omega 2
And step 3: constructing an abnormal text recognition model based on a Bert pre-training text classification method, wherein the model structure consists of five layers, namely a text input layer, a word embedding layer, a multi-level Transformer layer, a full connection layer and a Softmax layer; wherein, each input unit content E of the text input layer i Corresponds to omega 2 The ith token of each sentence sequence, the first mark of each sequence is special classification embedding, and the word embedding layer embeds the words E i Translating the corresponding word vector V i
The word vector V i The composition of (A) is as follows:
V i =tokenembeddings(E i )+segmentation embeddings(E i )+position
embeddings(E i );
wherein, tokeneaddressing (E) i ) Word vector transformations, positional embeddings (E), using WordPiece embedding and 30,000 vocabularies i ) According to each E i The position where the sentence is located is converted, the supported sequence length is at most 512, the length of the sentence is limited, and the segmentation elements (E) i ) Packing the sentence pairs into a sequence, inputting a plurality of sentences simultaneously, and separating the sentences by using special marks; secondly, adding a spare sensor A to be embedded into each token of the first sentence, and embedding a sensor B into each token of the second sentence; input of multistage Transformer pairsExtracting the character of the word vector and outputting the context character T of each character i ,C i Representing a hidden state of the classification category of the input sentence, C i And then connecting a full connection layer to extract the characteristic class characteristics, and calculating the probability of each class through a Softmax layer, namely: p (Ci) = Softmax (CiW); w is the weight of the full connection layer, the probability values of all categories are compared, and the text is classified into the category with high probability;
wherein: the learned presence A refers to one sentence A in the learned sentences;
sensor B refers to a sentence B that has not been learned;
the model training parameters include: learning rate l1, maximum length of text max _ length, training times epochs, amount of training data batch _ size each time, and extraction of part ω 2 Data omega 3 As training data, class labeling is performed, model parameters are adjusted, a model is trained by using a training set, and a test set omega is used 4 I.e. omega 3 Testing the stability and accuracy of the model on the subset to determine the optimal parameters of the model;
and 4, step 4: will omega 2 Inputting the marked omega into the trained model 2 Each text type is marked as an abnormal user number by combining the number of the user aiming at the abnormal text;
and 5: associating a user owner information table to obtain a certificate number and an address of an abnormal user;
when the number of the suspected abnormal texts is not larger than zero, constructing an abnormal behavior user identification model of social network analysis;
the method for constructing the abnormal behavior user identification model of the social network analysis comprises the following steps:
step 1: according to the short message data and the communication data of the abnormal user, calculating the first-degree relation user communication service number and the second-degree relation service number of the abnormal user, counting the communication times between the first-degree relation user communication service number and the second-degree relation service number, and establishing an input node relation set S1 (P) I ->P O ->num),P I Is an input node, P O For the output node of the opposite terminal, num is the number of communication times between two node users, a first-degree relation user refers to a user who has direct communication with an abnormal user, and a second-degree relationThe user refers to a user with whom the abnormal user does not directly communicate with but has a communication with the abnormal user at a first degree;
step 2: computing a set of input node relationships S1 (P) I ->P O ->num), including owner name, social attribute;
and 3, step 3: with input node relation set S1 (P) I ->P O ->num) as a basis, establishing an input node relation and node label table S2, which comprises an input node service number, an input node owner name, an input node social attribute, an output node service number, an output node owner name, an output node social attribute and the number of communication;
and 4, step 4: according to the node label table S2, a force guide graph layout is adopted to draw a relation network of input nodes, the number of times of communication between users is used as the weight of connection lines between the nodes, when the attraction between the nodes is different based on the weight of the edges, the larger the weight of the edges is, the larger the attraction between the nodes is, the more the nodes are gathered, and all the input nodes are traversed to establish the whole abnormal user social network;
and 5: constructing an output index to control the display content of the social network;
the third step: judging the user behavior abnormity by adopting a user abnormal behavior identification model based on the user text content, and if so, listing the user behavior abnormity in a behavior abnormity personnel information base according to the owner information; if not, entering an abnormal behavior user identification model of social network analysis;
the fourth step: and judging the user behavior abnormity by adopting an abnormal behavior user identification model analyzed by the social network, and if so, listing the abnormal behavior user identification model into an abnormal behavior personnel information base according to the owner information.
2. The method of claim 1, wherein the step of constructing the output metrics to control the content of the social network comprises:
step 1: summarizing the opposite ends of the S1, removing input nodes in the opposite ends to form an opposite end node set S3;
step 2: calculating the number of friends of each input node and the number of friends of an opposite end node, wherein the number of friends of a user is defined as the number of users communicating with the user;
and step 3: removing nodes with the input node friend number of 1 in the opposite-end node set S3 to form a candidate node set S4;
and 4, step 4: counting the input node friend numbers of all candidate nodes to form a candidate node counting table S5, wherein the candidate node counting table comprises the input node friend numbers of the candidate nodes, the candidate node numbers and the accumulated ratio, and the input node friend numbers are sorted from large to small;
and 5: and using the accumulated proportion as a specific output index of the display of the social network, when the output index is 0, only displaying the input nodes, and when the output index is 1, displaying the input nodes and all candidate nodes.
3. The method of claim 1, wherein the first step and the second step specifically comprise the steps of:
step 1: inputting the mobile phone number of the user, extracting and analyzing all text data of the user in a time range, and utilizing the omega of the behavior abnormal word bank 1 And extended thesaurus omega 2 Screening out suspected abnormal text omega 2
Step 2: judging suspected abnormal text omega 2 Quantity, selecting different recognition models; if ω is 2 And if the quantity is larger than zero, selecting an abnormal user identification model based on the text content for identification, otherwise, selecting an abnormal user identification method based on the social network analysis, and judging whether the user has abnormal behaviors.
4. The method of claim 1, wherein the fourth step specifically comprises the steps of:
step 1: k-shell value calculation
Calculating K-shell values for networks formed by all the nodes, counting the K-shell values of all the nodes, forming an all-node statistical table, wherein the all-node statistical table comprises the K-shell values, the node number and the accumulative ratio, the accumulative ratio is used as a specific display index, when the value is 0, no node is displayed, and when the core index is 1, all the nodes are displayed;
the K-shell value calculation method comprises the following steps: firstly, finding out nodes with all degrees of 1, deleting the nodes, then continuously searching the nodes with the degrees of 1 in the rest nodes, deleting the nodes until no node with the degrees of 1 exists in the network, assigning the value of K-shell to be 1 for the nodes with all degrees of 1 deleted before, namely KS =1, and then searching the nodes with the degrees of other values by the same method;
step 2: mesomeric centrality calculation
1) Selecting a source node s to perform breadth-first search and calculating sigma st ,σ st Representing the number of s → t shortest paths;
2) Each node is reserved as a set of other node predecessors in the traversal process, and the predecessor node is defined as P s (v) { u ∈ V: { u, V } ∈ E, d (s, V) = d (s, u) + d (u, V) }, if the shortest path from s → V contains a node (u, V), then u belongs to the s → V predecessor node set, where V is all nodes of the network, E is two node sets of an edge, and d (s, V) represents the minimum number of nodes passed by the node s to V;
d (s, u) represents the distance from s to u, d (u, v) represents the distance from u to v;
3) Calculating the median central degree delta s (v)=∑w:v∈P s (w)σ sv σ sw (1+δ s (w)), v is the s to w precursor node; the betweenness centrality reflects the information transmission capability of the node, and the higher the betweenness centrality of the node is, the more important the position of the node in the network is;
sigma denotes the sum, P s (w) a set of predecessor nodes, σ, representing w in the shortest path s to w sv Number of shortest paths, σ, representing s to v sw Number of shortest paths, δ, representing s to w s (v) Represents the ending centrality of w;
and step 3: and screening users with abnormal behaviors, adjusting the K-shell value, the betweenness central value, the output index and the node degree, screening the social network, and when the user node appears in the rest networks, indicating that the user is the user with abnormal behaviors.
CN202011588924.XA 2020-12-29 2020-12-29 Behavior abnormal user identification method based on user text data and communication data Active CN112699217B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011588924.XA CN112699217B (en) 2020-12-29 2020-12-29 Behavior abnormal user identification method based on user text data and communication data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011588924.XA CN112699217B (en) 2020-12-29 2020-12-29 Behavior abnormal user identification method based on user text data and communication data

Publications (2)

Publication Number Publication Date
CN112699217A CN112699217A (en) 2021-04-23
CN112699217B true CN112699217B (en) 2023-04-18

Family

ID=75513097

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011588924.XA Active CN112699217B (en) 2020-12-29 2020-12-29 Behavior abnormal user identification method based on user text data and communication data

Country Status (1)

Country Link
CN (1) CN112699217B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9185095B1 (en) * 2012-03-20 2015-11-10 United Services Automobile Association (Usaa) Behavioral profiling method and system to authenticate a user
WO2019174393A1 (en) * 2018-03-14 2019-09-19 阿里巴巴集团控股有限公司 Graph structure model training and junk account identification
CN110995643A (en) * 2019-10-10 2020-04-10 中国人民解放军国防科技大学 Abnormal user identification method based on mail data analysis
CN111814064A (en) * 2020-06-24 2020-10-23 平安科技(深圳)有限公司 Abnormal user processing method and device based on Neo4j, computer equipment and medium
CN111915086A (en) * 2020-08-06 2020-11-10 上海连尚网络科技有限公司 Abnormal user prediction method and equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9185095B1 (en) * 2012-03-20 2015-11-10 United Services Automobile Association (Usaa) Behavioral profiling method and system to authenticate a user
WO2019174393A1 (en) * 2018-03-14 2019-09-19 阿里巴巴集团控股有限公司 Graph structure model training and junk account identification
CN110995643A (en) * 2019-10-10 2020-04-10 中国人民解放军国防科技大学 Abnormal user identification method based on mail data analysis
CN111814064A (en) * 2020-06-24 2020-10-23 平安科技(深圳)有限公司 Abnormal user processing method and device based on Neo4j, computer equipment and medium
CN111915086A (en) * 2020-08-06 2020-11-10 上海连尚网络科技有限公司 Abnormal user prediction method and equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Data Mining Approach for Anomaly Detection in Social networks Analysis;M.Swarna Sudha等;《IEEE Xplore》;20180927;第1862-1866页 *
社交网络异常用户检测技术研究;袁丽欣;《中国优秀博硕士学位论文全文数据库(硕士)社会科学Ⅰ辑》;20190915(第09期);第G113-291 页 *

Also Published As

Publication number Publication date
CN112699217A (en) 2021-04-23

Similar Documents

Publication Publication Date Title
US8554540B2 (en) Topic map based indexing and searching apparatus
CN111831790A (en) False news identification method based on low threshold integration and text content matching
CN107844533A (en) A kind of intelligent Answer System and analysis method
CN113094578A (en) Deep learning-based content recommendation method, device, equipment and storage medium
CN112989208B (en) Information recommendation method and device, electronic equipment and storage medium
Zhang et al. STCS lexicon: Spectral-clustering-based topic-specific Chinese sentiment lexicon construction for social networks
CN107943514A (en) The method for digging and system of core code element in a kind of software document
CN111967267B (en) XLNET-based news text region extraction method and system
CN110110218B (en) Identity association method and terminal
CN114386100A (en) Public cloud user sensitive data management method
Liu et al. Correlation identification in multimodal weibo via back propagation neural network with genetic algorithm
Sadiq et al. High dimensional latent space variational autoencoders for fake news detection
CN114915468A (en) Intelligent analysis and detection method for network crime based on knowledge graph
KR102206781B1 (en) Method of fake news evaluation based on knowledge-based inference, recording medium and apparatus for performing the method
CN116992052B (en) Long text abstracting method and device for threat information field and electronic equipment
Krishnan et al. Machine learning based sentiment analysis of coronavirus disease related twitter data
WO2024087754A1 (en) Multi-dimensional comprehensive text identification method
Sheeba et al. A fuzzy logic based on sentiment classification
CN112699217B (en) Behavior abnormal user identification method based on user text data and communication data
CN115309899B (en) Method and system for identifying and storing specific content in text
CN113886529B (en) Information extraction method and system for network security field
CN108427769B (en) Character interest tag extraction method based on social network
Regina et al. Clickbait headline detection using supervised learning method
CN111159360B (en) Method and device for obtaining query topic classification model and query topic classification
CN114925198A (en) Knowledge-driven text classification method fusing character information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant