CN114491003A - User behavior analysis device, method and equipment based on domain knowledge graph - Google Patents

User behavior analysis device, method and equipment based on domain knowledge graph Download PDF

Info

Publication number
CN114491003A
CN114491003A CN202111624264.0A CN202111624264A CN114491003A CN 114491003 A CN114491003 A CN 114491003A CN 202111624264 A CN202111624264 A CN 202111624264A CN 114491003 A CN114491003 A CN 114491003A
Authority
CN
China
Prior art keywords
behavior
user
data
knowledge graph
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111624264.0A
Other languages
Chinese (zh)
Inventor
王晓林
周鹏飞
马亮
张新壮
于静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Huichen Capital Information Co ltd
Original Assignee
Beijing Huichen Capital Information Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Huichen Capital Information Co ltd filed Critical Beijing Huichen Capital Information Co ltd
Priority to CN202111624264.0A priority Critical patent/CN114491003A/en
Publication of CN114491003A publication Critical patent/CN114491003A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of artificial intelligence, and particularly discloses a user behavior analysis device, a user behavior analysis method and user behavior analysis equipment based on a domain knowledge graph, wherein the user behavior analysis device comprises a knowledge graph construction unit, a domain knowledge graph analysis unit and a domain knowledge graph analysis unit, wherein the knowledge graph construction unit is used for performing fusion analysis on multi-account heterogeneous structured data and non-structured data of a user to construct the knowledge graph; the behavior pattern library construction unit is used for mining common behavior patterns in case-related personnel data to construct a behavior pattern library by analyzing mass historical case data; and the user abnormity analysis unit is used for extracting multi-mode characteristics through behavior rule and behavior content analysis by applying the user data and the behavior pattern library data, training a deep neural network model and analyzing the behavior abnormity probability of the user in the appointed time period. The domain knowledge graph constructed by the method improves the data cooperation efficiency; by combining user data and historical case data, multi-mode feature comprehensive analysis is automatically extracted, the result is more accurate and convincing, and the working efficiency of workers can be improved.

Description

User behavior analysis device, method and equipment based on domain knowledge graph
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a user behavior analysis device, a user behavior analysis method and user behavior analysis equipment based on a domain knowledge graph.
Background
The 21 st century is an era of the great development of the internet and communication technology, the boundaries and application range of the internet are greatly expanded in the fields of telecommunication, network communication, search and browsing, shopping consumption, capital exchange, entertainment and audio-visual, travel and lodging and the like, and various data are rapidly expanding and becoming large. Under the background, various intelligent terminal devices including smart phones, smart watches and tablet computers have penetrated into every gap and corner of people's life, and support various conscious or unconscious repeated and trivial activities in people's life.
In recent years, electronic devices represented by smartphones have become essential tools for modern life and work, and various illegal criminal activities leave a large amount of process data in the electronic devices. In the field of public security, various user behavior analyses based on electronic material evidence data, especially user behavior abnormity analyses can provide important clues for promoting cases, however, related data are complicated, a large amount of manual labor is consumed by a traditional manual analysis method, and the analysis efficiency is very low.
The knowledge graph is used as a tool for describing large-scale data association, and has great application value in human relation and behavior analysis. Meanwhile, in the field of electronic material evidence, the mining of massive historical case data can provide an important idea for judging user behavior abnormity. A method and a device for rapidly and automatically analyzing and finding user behavior abnormity by integrating electronic data and mass historical case data and applying an artificial intelligence algorithm and a knowledge graph technology are developed. The cooperative utilization efficiency of electronic data can be greatly improved, the analysis accuracy is improved, and the work efficiency of related workers is further improved.
Disclosure of Invention
The invention aims to provide a user behavior analysis device, a user behavior analysis method and user behavior analysis equipment based on a domain knowledge graph, so as to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme:
a domain knowledge graph-based user behavior analysis apparatus, the apparatus comprising:
the knowledge graph construction unit is used for carrying out fusion analysis on the multi-equipment and multi-account heterogeneous structured data and the unstructured data of the user, identifying the relationship between people and accounts and the relationship information between people, and constructing a knowledge graph; the step of identifying the person-to-person relationship information comprises the steps of identifying the person relationship based on session context information and identifying the person relationship based on an information circulation mode; the knowledge graph comprises a social relationship knowledge graph and an application use behavior knowledge graph;
the behavior pattern library construction unit is used for analyzing historical case data through a machine learning algorithm and a data mining algorithm, mining common behavior patterns in case-related personnel data, sending the behavior patterns to an auditing terminal, and receiving a behavior pattern library fed back by the auditing terminal;
the behavior anomaly analysis unit is used for extracting multi-modal characteristics in the knowledge graph from two layers of behavior rules and behavior contents based on the behavior pattern library, training an artificial intelligence algorithm and analyzing the probability of user behavior anomaly;
the visualization unit is used for displaying a knowledge graph which is respectively constructed based on all data and part of data of a user and relevant characteristics and results of the user behavior abnormity analysis unit;
and the storage unit is used for storing the using behavior knowledge graph of the user application program, the social relation knowledge graph of the user, the behavior pattern library and the abnormal analysis characteristics and results of the user behavior.
1. A knowledge graph construction unit:
in this unit, the plurality of terminal devices specifically include all electronic devices such as a mobile phone, a smart watch, a tablet computer, and a computer. Based on multi-terminal behavior data, according to behavior content and field meaning, field mapping is carried out according to a unified rule, and data from different sources are unified into structured data. The unified field mainly comprises equipment ID, behavior broad class, behavior detail class, application name, behavior start time, behavior duration, user ID, contact account ID, contact ID, behavior content, message type, behavior geographic position and whether information is deleted or not.
And marking four fields of the user ID, the behavior class, the application name and the contact ID in the structured data as nodes, and extracting entities in the nodes. The behavior category specifically includes: telecommunication, network communication, money exchange, shopping consumption, search browsing, entertainment audio-visual, travel accommodation and other behaviors 8 types.
Relationship extraction for constructing application use behavior knowledge graph, which means extracting (user, behavior type, application) triple relationship from data; and extracting entity attributes, wherein each entity attribute of the application program is as follows: action start time, action duration, contact account ID, action content, message type, action geographic location, whether information was deleted.
Performing personnel relationship identification and information circulation mode personnel relationship identification based on the session context information, wherein the step of performing personnel relationship identification based on the session context information means that identity information among multiple persons is analyzed according to the session context by adopting a natural language processing technology; the step of identifying the personnel relationship based on the information circulation mode refers to identifying the superior-inferior relationship among multiple persons through an information transmission mode among the multiple persons. The analysis result is more accurate by fusing the two personnel relationship identification methods.
Relationship extraction for constructing a social relationship knowledge graph refers to extracting (user, contact) binary relationships from data; and extracting entity attributes, wherein each entity attribute of the contact is as follows: action start time, action duration, contact account ID, action content, message type, action geographic location, whether information was deleted.
Analyzing account attributes and interactive behavior contents of contacts by applying a semantic analysis method in natural language processing, finding and identifying corresponding accounts of the same friend in different applications, and realizing entity disambiguation; and removing symbols, namely interpretation contents, such as () in the application field, converting the WeChat (official version) into the WeChat, and realizing entity name normalization. And further constructing a user application program use behavior knowledge graph and a user social relation knowledge graph.
2. Constructing a behavior pattern library:
massive historical case data are analyzed through methods such as machine learning, a data mining algorithm, expert summary and the like, a behavior pattern library is constructed, and the behavior pattern library comprises multi-stage behavior patterns such as pre-case planning preparation, in-case implementation methods, post-case hiding modes and the like of case related personnel. Specifically, the method comprises the following steps:
by analyzing a large amount of real historical case data, extracting case characteristics such as case type, case severity and other information, case-related personnel attributes such as sex, cultural degree, first criminal/second criminal and other information, and case-related personnel data behavior modes such as common behavior modes of each stage before, in and after a case to construct a behavior mode library.
3. User behavior anomaly analysis unit:
the unit comprises a user behavior abnormity analysis method based on artificial intelligence, and a user behavior abnormity probability is analyzed by analyzing a behavior knowledge graph, a user social relation knowledge graph and a historical behavior pattern library used by a user application program, extracting multi-mode characteristics from two levels of behavior rules and behavior contents, training a deep neural network model and analyzing the user behavior abnormity probability. The specific behavior rules and the characteristic contents of the behavior contents are as follows:
1) behavior rules are as follows:
applying a machine learning algorithm to judge whether the application program use behavior pattern on the date specified by the user is abnormal compared with the historical daily application program use behavior pattern; determining 20 friends most closely contacted with the user, and applying a machine learning algorithm to judge whether the relation between the specified analysis time period of the user and the friends of the user is abnormal compared with the history; and determining the coincidence proportion of the behavior pattern of the user-specified time period and the similar case behavior pattern in the behavior pattern library through comparative analysis. And integrating the analysis results of the three parts to obtain the feature vector of the learned user behavior rule.
2) The behavior content is as follows:
and applying a deep learning algorithm to obtain text data and voice data in the active behavior data of the user in the appointed time period, and analyzing the emotional state distribution condition of the data to obtain an emotional characteristic vector.
4. A visualization unit comprising:
1) the method has the advantages that a complete knowledge graph spectrum constructed by applying all data of a user and a knowledge graph constructed by applying behavior data in a specified time period can visually show whether the overall behavior rule of the user is different from the behavior rule in the specified time period or not;
2) the multi-modal characteristics extracted in the user behavior abnormity analysis comprise behavior rule characteristics and emotion characteristic vectors obtained by analyzing behavior contents.
3) And the user behavior abnormity analysis result comprises a time period of appointed analysis and a corresponding abnormity probability.
Furthermore, the user behavior abnormity analysis unit provides a user behavior analysis method based on the domain knowledge graph, multi-modal characteristics can be extracted from two angles of behavior rules and behavior contents, an optimal model is trained to perform user behavior abnormity analysis, and the analysis efficiency and the analysis result accuracy can be greatly improved.
The behavior rule feature extraction strategy is used for obtaining the behavior rule features in the appointed time period by analyzing historical behavior data of the user and behavior pattern database data; the behavior content feature extraction strategy specifically refers to analyzing emotional states contained in behavior content by learning text and voice data contained in behavior data such as publishing, sending and searching in social and entertainment behaviors when a user uses equipment.
The optimal model trains and updates the model by regularly learning the characteristic data of different users in the field, and the generalization capability and the analysis accuracy of the model are continuously improved.
5. A storage unit for storing:
1) the knowledge graph construction unit constructs a user social relationship knowledge graph and user application program use behavior knowledge graph related data;
2) the behavior pattern library construction unit is used for constructing behavior pattern data based on the historical cases;
3) and the user behavior abnormity analysis unit is used for extracting and analyzing the multi-mode characteristics according to the strategy and result. The method specifically comprises the following steps: the multi-modal feature extraction model, the user behavior abnormity analysis model, the multi-modal features corresponding to each analysis task and the abnormity analysis result.
The technical scheme of the invention also provides a user behavior analysis method based on the domain knowledge graph, which comprises the following steps:
step 1: acquiring data;
1) acquiring information data of each application program node in a user application program use behavior knowledge graph, specifically acquiring data related to active behaviors of a user;
2) relevant information data of nodes of relevant contacts (20 closest contacts by default) in the user social behavior knowledge graph;
3) behavior data of one month before and after taking the specified time as the center in the user equipment data is obtained.
The behavior generation method can be divided into active behavior data and passive behavior data. The active behavior specifically refers to a behavior that a user actively contacts with others through the device, and a behavior used by other devices except the communication behavior, such as behavior data of sending, publishing, browsing, searching and the like. The analysis result by using the active behavior data is more accurate, and the accuracy of the analysis is influenced because the behaviors of other people except the user or the pushed information of the third-party application are uncertain.
Step2: a multi-modal feature extraction strategy;
and performing strategy learning from two levels of behavior rules and behavior contents by using the acquired behavior data, and respectively constructing models. The behavior rules are analyzed by learning a plurality of behavior characteristics in the data and applying an algorithm; the behavior content is subjected to emotion classification by learning the emotion characteristics in the text and voice data in the data and applying an algorithm. The method comprises the following specific steps:
1) behavior rules are as follows:
a) aiming at the analysis of the difference of the use behavior patterns of the equipment in the time period appointed by the user and the time period corresponding to the history, the data acquired in the step1 are applied, the total behavior times, the average interval duration of behaviors, the active behavior times of various behavior types, the date and other characteristics of the appointed time period and other corresponding time periods of the user are firstly extracted, two unsupervised anomaly detection algorithms of Isolation Forest and One Class SVM are applied, whether the behavior pattern in the appointed time period is abnormal or not is judged, and the result is a part of the characteristic of the learned behavior rule. Behavioral data for a specified time period, such as 2020/10/20; other behavioral data corresponding to time periods correspond to behavioral data for, e.g., the remaining days.
b) Aiming at the analysis of the social behavior difference between the time period designated by the user and the time period corresponding to the history, applying the data acquired in the step 1(2), specifically to the analysis of the relationship between the user and a certain contact, and extracting the times of the contact of the user in a certain period (default one week) and the history every week, the times of the contact contacting the user, the average interval duration of each contact, the duration of each contact and other characteristics. And (3) applying an Isolation Forest algorithm to judge whether the social behavior pattern of the user and a certain contact in a specified time period is abnormal, performing the same analysis on other contacts, and obtaining an analysis result which is a partial characteristic of the learned behavior rule.
c) Analyzing the behavior data of the user in the appointed time period (the data obtained in the steps 1 and 3), analyzing the coincidence proportion of the behavior pattern of the user and the behavior pattern of the similar case in the behavior pattern library, and obtaining the result as the partial characteristics of the learned behavior rule.
And (c) integrating the characteristics obtained by the analysis of the three parts a, b and c, namely the characteristic vector of the behavior rule in the user anomaly analysis.
2) The behavior content is as follows:
and analyzing the behavior content, specifically searching and viewing the content in the data, sending the content of the message and other content issued by the equipment holder, and the like. Because the content may be related to the living or growing environment, working habits and hobbies of the bearer, the content has personalized characteristics, and keyword collision is not accurate enough. But the emotion expressed by the content is more general, and negative emotions such as anger, sadness and the like can reflect the abnormality better. And (3) constructing a model for emotion analysis through an artificial intelligence algorithm on the voice and text data in the data acquired in the step (1), and taking the emotion classification result as a feature vector representing the behavior content.
The method comprises the following specific steps:
a) and labeling the linguistic data in the field and training a fine tuning model by a general language model Bert model trained in large-scale linguistic data aiming at the data of converting text data and voice into text. And applying the trained model to analyze the emotion of the text and the voice converted text data, wherein the emotion vector obtained by analysis is the characteristic of the learned text data.
b) And aiming at the voice data, a labeled emotion data pre-training model is disclosed, and labeled voice data in the field is applied to fine tuning of the model. And applying the trained model to carry out emotion analysis on the voice data, wherein the emotion vector obtained by analysis is the learned voice data characteristic.
The feature extraction method integrating the behavior patterns and the behavior contents is a strategy for extracting multi-modal features.
And step3: training a deep neural network model;
and respectively establishing a user application program use behavior knowledge graph and a user social relation knowledge graph for the acquired original data of the plurality of users and the corresponding devices, and storing the user application program use behavior knowledge graph and the user social relation knowledge graph in a storage unit. Setting a time period in which the user is known to be clear to have abnormity and a time period in which the user does not have abnormity, respectively executing the steps 1 and 2 in the method, applying the strategy for extracting the multi-modal characteristics learned in the step2, extracting the characteristics of the related data in the step1, and respectively constructing a characteristic vector set.
And training a deep neural network algorithm DNN by using the obtained feature set and the corresponding label to obtain a general model for analyzing the user behavior abnormity.
And 4, step4: analyzing the abnormal behavior of the new user;
and aiming at a plurality of pieces of equipment data of a new user to be analyzed, constructing a user application program use behavior knowledge graph and a user social relation knowledge graph by constructing structured data. And (3) respectively executing the steps 1 and 2 in the method, applying the strategy for extracting the multi-modal characteristics learned in the step2, extracting the characteristics of the related data in the step1, and constructing a characteristic vector set. And (4) loading the DNN model trained in the step (3) and analyzing the probability of abnormal behaviors in the time period set by the user.
In addition, the application provides a user behavior analysis device based on the domain knowledge graph:
one or more processors; a memory;
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: and executing the user behavior analysis method based on the domain knowledge graph.
Compared with the prior art, the invention has the beneficial effects that:
1) in the knowledge graph construction unit, the multi-device heterogeneous behavior data is processed into unified structural data, the contact account number is communicated by knowledge extraction and semantic analysis to realize knowledge fusion, and then a user application program using behavior knowledge graph and a user social relation knowledge graph are constructed. The problem of data isolation between the same contact and multiple devices among application programs is solved, the user behavior characteristics are displayed more intuitively from the perspective of social relations and entertainment behaviors, and the data coordination efficiency and the data use efficiency are improved.
2) In a user behavior abnormity analysis unit, a method based on knowledge graph sum is providedBehavioral pattern libraryAnd from two aspects of behavior rules and behavior contents, learning and extracting a multi-mode characteristic strategy through an artificial intelligence algorithm, and further realizing the behavior abnormal probability analysis in a time period appointed by a user. A plurality of artificial intelligence algorithms aiming at different scenes are applied, the analysis angle is increased, the result is more accurate and comprehensive, the value of the data is exerted to the maximum extent, the analysis efficiency is improved, and the reference in the substantial sense is provided for the working personnel.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention.
FIG. 1 is a flow chart of the overall apparatus.
FIG. 2 is a flow diagram of domain knowledge graph construction.
Fig. 3 is a flow chart of user behavior anomaly analysis.
FIG. 4 is a user application usage behavior knowledge graph.
Detailed Description
The technical scheme of the disclosure is clearly and completely described in the following with reference to the accompanying drawings. It is to be understood that the described embodiments are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.
The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not intended to be limiting of the present application.
The following is a resolution of some of the terms and keywords referred to in this application.
Telecommunication communication: the communication service provided by telecommunication operators, particularly telephone, short message and multimedia message, can communicate without network.
Network communication: the communication mode is a communication mode for generating communication behaviors through networking, and is commonly used in applications such as WeChat, QQ, e-mail, microblog and the like.
And (4) fund exchange: the method is used for recording fund transactions through various mobile phone banks, payment treasures, WeChat and other applications capable of conducting fund transactions.
Shopping and consumption: the behavior of consuming and shopping through each shopping platform is commonly seen in the shopping platforms such as Taobao, Jingdong, Mei Tuo, Shuduo, etc.
Searching and browsing: now, the behavior of searching for browsing information through each browser and searching for browsing information by shopping is referred to.
Entertainment and audio viewing: refers to the behavior associated with watching video software or music-like software.
And (3) traveling and lodging: indicating row and accommodation booking and holding related behaviors.
And others: other records of behaviour generated using the apparatus, e.g. taking pictures, memorandum, GPS positioning, etc
Example 1:
the focus in this example is on the details of constructing the knowledge-graph and behavior pattern library. In particular, the relevant details of the knowledge-graph constructing unit are explained in detail with reference to fig. 2. FIG. 4 is a diagram illustrating the local visualization of the behavior knowledge graph used by a user application, wherein the size of each entity is proportional to the number of times each entity behaves as a whole.
Firstly, constructing a knowledge graph
The detailed implementation steps are divided into 3 steps, the multi-device user data are constructed into structured data, knowledge acquisition and fusion are carried out, and a knowledge graph is constructed. The method comprises the following specific steps:
(1) structuring structured data
The application behavior data of a plurality of terminal devices specifically includes the usage behavior data of mobile phones, smart watches, tablet computers and the like using relatively private mobile terminals. The premise of the related analysis of the application is that the related equipment is used by the equipment.
Based on the multi-device behavior data, according to behavior content and field meaning, field mapping is carried out according to a unified rule, and data from different sources are unified into structured data. The unified field mainly comprises equipment ID, behavior broad class, behavior detail class, application name, behavior start time, behavior duration, user ID, contact account ID, contact ID, behavior content, message type, behavior geographic position and whether information is deleted or not.
(2) Knowledge acquisition and fusion
And marking four fields of the user ID, the behavior class, the application name and the contact ID in the structured data as nodes, and extracting entities in the nodes. The behavior category specifically includes: telecommunication, network communication, money exchange, shopping consumption, search browsing, entertainment audio-visual, travel accommodation and other behaviors 8 types.
Extracting a relation of a behavior knowledge graph used for constructing the application program, namely extracting a (user, behavior type and application program) triple relation from data; extracting entity attributes, wherein each entity attribute of the application program is as follows: action start time, action duration, contact account ID, action content, message type, action geographic location, whether information was deleted.
Relationship extraction for constructing a social relationship knowledge graph refers to extracting (user, contact) binary relationships from data; and extracting entity attributes, wherein each entity attribute of the contact is as follows: action start time, action duration, contact account ID, action content, message type, action geographic location, whether information was deleted.
Training a natural language processing model, analyzing the data content communicated among the personnel, and identifying the possible identity relationship and the corresponding probability value. The information circulation mode analysis is carried out on the data among the multiple persons, the information transmission mode among the multiple persons is identified, and then the upper and lower relations among the multiple persons are judged. The selectable information circulation mode analysis mode comprises a top-down tree information diffusion mode and a bottom-up tree information collection mode.
Analyzing account attributes and interactive behavior contents of contacts by applying a semantic analysis method in natural language processing, finding and identifying corresponding accounts of the same friend in different applications, and realizing entity disambiguation; symbols such as "()" in application program fields, namely interpretation contents are removed, and entity name normalization is realized by converting 'WeChat (official edition)' into 'WeChat'. And further constructing a user application program use behavior knowledge graph and a user social relation knowledge graph.
(3) Building knowledge graph
And (3) calculating the relation strength among the entities based on the results of the knowledge extraction and fusion in the step (2), and further constructing a user application program use behavior knowledge graph and a user social relation knowledge graph. The specific relationship strength is calculated as follows:
a) aiming at an entity for constructing a user application program use behavior knowledge graph, wherein the relationship strength between a behavior class and each application refers to the average of the daily behavior times of each APP in the behavior class; the relationship strength between the user and each behavior type refers to the average of the behavior times of each behavior type every day. Take telecommunication as an example:
user->Telecommunication communication:
Figure RE-GDA0003580658460000131
telecommunication communication>Telephone:
Figure RE-GDA0003580658460000132
wherein i is the number of days spanned by the user behavior, and j is the number of application programs contained in the current behavior type;
Xithe total number of times of the telephone calls on the ith day comprises all actions of dialing the telephone, receiving the telephone, refusing the telephone, not answering the telephone and the like;
Xijthe number of actions taken for the jth application on day i.
b) For the entities used to construct the user social relationship knowledge graph, the strength of the relationship between the user and the contacts is classified as in-out, since the relationship between people is bi-directional. The out degree of the user and a certain contact person refers to the sum of behaviors of the user actively contacting the current contact person, and the in degree of the user and the certain contact person refers to the sum of behaviors of all contact persons contacting the user. Correspondingly, the entrance degree and the exit degree of each contact person are respectively the entrance degree and the exit degree of the user and the current contact person.
Secondly, constructing a behavior pattern library
Massive historical case data are analyzed through methods such as machine learning, a data mining algorithm, expert summary and the like, a behavior pattern library is constructed, and the behavior pattern library comprises multi-stage behavior patterns such as case personnel plan preparation before a case, plan implementation methods in the case, and case rear hiding modes.
Specifically, the association rule and similarity calculation method is applied:
1) excavating various behavior modes of case-related personnel before, in and after a case by applying an association rule excavation algorithm oriented to various case situations;
2) analyzing the behavior pattern set of various personnel by applying similarity calculation and clustering algorithm;
3) and (3) applying semantic analysis, extracting case characteristics and personnel attribute information by analyzing a large amount of real case information, optionally associating the case characteristics such as case types and case severity and personnel attributes such as sex, cultural degree, beginners/rethers and the like, and combining the behavior pattern sets obtained in the step (2) to construct a behavior pattern library.
Example 2:
in this embodiment, the method for extracting the multi-modal feature policy and analyzing the user behavior anomaly according to the present invention is described with reference to fig. 3. According to the method and the system, the multi-modal characteristics are extracted by training a plurality of sub-models, and then the deep neural network is trained to be used for analyzing the abnormal probability of the user behavior. Specifically, the behavior rules and the characteristics of the behavior content are extracted for analysis.
The module is mainly divided into three parts, as shown in figure 3, and mainly comprises related data acquisition, multi-modal feature extraction, deep neural network model training and prediction.
1) Data acquisition:
a) acquiring attribute data of each application program node in a user application program use behavior knowledge graph, and specifically acquiring active behavior data of a user;
b) node attribute data of related contacts (20 closest by default) in the user social behavior knowledge graph;
c) and acquiring behavior data of one month before and after the specified time is taken as the center in the user behavior data.
According to the generation mode of the behaviors, the data can be divided into active behavior data and passive behavior data. The active behavior specifically refers to a behavior that a user actively contacts with others through the device, and a behavior used by other devices except the communication behavior, such as behavior data of sending, publishing, browsing, searching and the like. The analysis result by using the active behavior data is more accurate, and the accuracy of the analysis is influenced because the behaviors of other people except the user or the pushed information of the third-party application are uncertain.
2) Feature extraction: and (3) extracting multi-modal features from two aspects of behavior rules and behavior contents, respectively modeling, and integrating results of a plurality of models to be used as feature vectors for analyzing the abnormal probability of the user behavior.
And analyzing the behavior rules, namely analyzing the behavior patterns of the user in the appointed time period and the historical behavior patterns of the user by modeling, analyzing the interaction behavior patterns of the user and the closest 20 contacts in the appointed time period and the historical interaction behavior patterns, and obtaining the feature vector serving as the representative behavior rule by the coincidence proportion of the behavior patterns of the appointed time period and the similar case behavior patterns in the behavior pattern library.
And analyzing the behavior content, specifically searching and viewing the content in the data, sending the content of the message and other content issued by the equipment holder, and the like. Because the content may be related to the living or growing environment, working habits and hobbies of the bearer, the content has personalized characteristics, and keyword collision is not accurate enough. But the emotion expressed by the content is more general, and negative emotions such as anger, sadness and the like can reflect the abnormality better. And extracting characteristic construction models from the text data and the voice data to carry out emotion analysis, and taking emotion classification results as characteristic vectors representing behavior contents. Because most of emotions attached to behavior content have a persistent characteristic, a data range is set as data of a week before and after a specified time period as a center by default, and the data range can be customized according to different situations.
The method comprises the following specific steps:
step 1: and extracting the behavior rule characteristics of the equipment holder.
a) For the analysis of the difference between the device usage behavior patterns in the time period specified by the user and the time period corresponding to the history, the relevant characteristics of the device holder in the specified time period and the history behavior are extracted through the data in the step (1) (a) of the embodiment, and an unsupervised anomaly detection algorithm is applied to analyze whether the behavior patterns in the specified time period are abnormal or not.
In particular, different people use devices with different preferences and characteristics. In the same time period, due to the characteristics of occupation, habit, identity and the like, the equipment using behaviors of different people are greatly different. Therefore, whether the equipment use behavior is abnormal within a certain period of time is judged more accurately according to the historical behavior data of the individual.
Based on the data in the step (1) (a) in this embodiment, the total behavior times, the average interval duration of behaviors, the active behavior times of each behavior type, and the date characteristics of the specified time period and other corresponding time periods of the user are extracted, two unsupervised anomaly detection algorithms, namely Isolation Forest and One Class SVM, are applied to determine whether the behavior pattern in the specified time period is abnormal, and the analysis result is a part of the characteristics of the learned behavior pattern. Behavioral data for a specified time period, such as 2020/10/20; other behavioral data corresponding to time periods correspond to behavioral data for, e.g., the remaining days.
Further, the calculation of the average interval duration of the active behavior follows the following formula.
Figure RE-GDA0003580658460000171
Wherein the content of the first and second substances,
Sj+1the j +1 th active action start time;
Ejthe jth active behavior end time;
n number of active actions within a specified time period
According to the extracted features, two anomaly detection methods of Isolation forest and One Class SVM are used for analysis, and the analysis result of the model is used as part of features representing behavior rules. The Isolation forest and One Class SVM are unsupervised anomaly detection, and due to the fact that behaviors of each person have personalized characteristics different from those of a group, the fact that historical behavior data of each person are labeled is unrealistic and inaccurate. The unsupervised anomaly detection does not need to label data in advance, and meanwhile, the two algorithms are based on two assumptions, wherein the former assumption is that a small number of anomaly points exist in historical behavior data, and the latter assumption is that no anomaly points exist in the historical behavior data, so that the comprehensive analysis is more consistent with the application scene in the embodiment.
And (3) applying the anomaly detection algorithm to judge whether the behavior pattern of the specified time period is abnormal, if so, returning to-1, otherwise, setting the behavior pattern to 1. The result is a part of characteristics representing the behavior rule.
b) Based on the data in the step (1) (b), for the analysis of the social behavior difference between the time period specified by the user and the time period corresponding to the history, specifically for the analysis of the relationship between the user and a certain contact, the times of the user contacting the certain contact within the certain time period (default one week) and the times of the user contacting the certain contact every week in the history are sequentially extracted, the times of the contact contacting the user, the average interval duration of each contact, and the duration of each contact. Similar to the method in step (a), the Isolation Forest algorithm is applied to determine whether there is an abnormality in the social behavior of the user and the current contact in a specified time period (default one week). According to the theory proposed by dunba, each person has 5 most close persons and 15 close friends on average, 20 friends most closely contacted by the user are analyzed in sequence, and all analysis results are partial characteristics of the learned behavior rule.
c) Based on the data in the step (1) (c) and the historical behavior pattern library, analyzing the user behavior pattern from the user behavior data, and calculating the coincidence proportion of the user behavior pattern and all behavior patterns in similar cases in the behavior pattern library, wherein the coincidence proportion is part of the characteristics of the learned behavior law.
And (d) integrating the analysis results in (a), (b) and (c), namely the learned user behavior rule feature vector.
Step2 feature extraction for text data.
The analysis of the text data is specifically divided into the analysis of the contents of two parts, namely the text type data and the data converted from voice into text. In the application, corresponding emotional characteristics are obtained by specifically performing emotional analysis on the text.
The text data includes text data in chat interaction data, text data generated by search and other text data issued by the device holder, such as friend circles, blogs and the like. The sentiment hidden in the data is more general and explanatory than the subject matter of the content. If the emotional polarity of each sentence of all the text data in 6 hours is analyzed, if the proportion of the negative emotion sentences is higher (such as 80%), it indicates that the state of the person in the time period may be problematic.
Aiming at the emotion analysis of the text, a general language model Bert model trained in large-scale linguistic data is adopted, and the emotion polarities of the related text data are labeled in a targeted mode and are specifically classified into positive type, negative type and neutral type 3. And (3) applying the labeled corpus to finely adjust the pre-trained Bert model to obtain the emotion classifier aiming at the text data in the field.
And processing and analyzing the current text data by using an emotion classifier, and calculating the respective occupied proportion of positive data, negative data and neutral data in the current all text data to serve as the emotional characteristics of the text data in behavior content for final analysis of user behavior abnormity.
Specifically, in this embodiment, the ratio of the sentences corresponding to the 3 types of emotions to all the analyzed sentences is calculated as the feature vector. The calculation formula is as follows:
Figure RE-GDA0003580658460000191
Figure RE-GDA0003580658460000192
Figure RE-GDA0003580658460000193
wherein n is the number of sentences of the text data;
Posiwhether the emotion polarity of the text of the ith sentence is positive or not is represented, NegiRepresenting whether the emotional polarity of the text of the ith sentence is negative or not, NeuiAnd whether the emotion polarity of the text of the ith sentence is neutral or not is represented.
Step3, feature extraction for speech data.
The emotion analysis for voice data mainly refers to emotion classification analysis for voice data generated by sending voice messages in chatting and other voice data stored in equipment.
Aiming at the emotion analysis of voice, the currently disclosed voice emotion data and part of labeled data in the field are applied, the characteristics are extracted for training, and the emotion of each section of voice is analyzed. Specifically, CNN is used for extracting voice data characteristics, and LSTM is trained for analysis to obtain a final emotion analyzer.
Specifically, the emotion of the voice data is classified into 6 categories including sadness, happiness, fear, anger, disgust, and surprise. Similarly, in this embodiment, the ratio of each kind of emotion in all current voice data is calculated to serve as an emotion feature vector of the voice data, which is used for analyzing the user behavior abnormality. The specific formula is similar to the calculation of the emotion ratio of the text data.
And Step4, fusing the feature vectors extracted from Step1-Step3 to obtain a feature set for analyzing the abnormal behavior of the user in the specified time period.
3) Model training and prediction:
and respectively establishing a user application program use behavior knowledge graph and a user social relation knowledge graph for the acquired original data of the plurality of users and the corresponding devices, and storing the user application program use behavior knowledge graph and the user social relation knowledge graph in a storage unit.
Setting a time period in which the user is known to have abnormality and a time period in which the user is known to have abnormality, executing the steps (1) and (2) in the embodiment respectively, applying the strategy for extracting the multi-modal features learned in the step (2), extracting the features of the relevant data in the step (1), and constructing a feature vector set. And training a deep neural network algorithm DNN by using the obtained feature set and the corresponding label to obtain a general model for analyzing the user behavior abnormity.
And aiming at a plurality of pieces of equipment data of a new user to be analyzed, constructing a user application program use behavior knowledge graph and a user social relation knowledge graph by constructing structured data. And (3) respectively executing the steps (1) and (2) in the method, applying the strategy for extracting the multi-modal characteristics learned in the step (2), extracting the characteristics of the relevant data in the step (1), constructing a characteristic vector set, loading the trained DNN model, and analyzing the probability of the abnormal behavior of the user set time period.
In a knowledge graph construction unit, multi-device heterogeneous behavior data are processed into unified structural data, through knowledge extraction, semantic analysis is applied to communicate account numbers of contacts to realize knowledge fusion, and then a user application program using behavior knowledge graph and a user social relation knowledge graph are constructed. The problem of data isolation between the same contact and multiple devices among application programs is solved, the user behavior characteristics are displayed more intuitively from the perspective of social relations and entertainment behaviors, and the data coordination efficiency and the data use efficiency are improved.
The patent provides a strategy for learning and extracting multi-mode characteristics through an artificial intelligence algorithm from two aspects of behavior rules and behavior contents based on a knowledge map and a behavior pattern library in a user behavior abnormity analysis unit, and further realizes the probability analysis of behavior abnormity in a user specified time period. A plurality of algorithms aiming at different scenes are applied, so that the analysis angle is increased, the result is more accurate and comprehensive, the value of the data is exerted to the maximum extent, the analysis efficiency is improved, and a substantial reference is provided for the working personnel.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims (10)

1. A domain knowledge graph-based user behavior analysis apparatus, the apparatus comprising:
the knowledge graph construction unit is used for carrying out fusion analysis on the multi-equipment and multi-account heterogeneous structured data and the unstructured data of the user, identifying the relationship between people and accounts and the relationship information between people, and constructing a knowledge graph; the step of identifying the person-to-person relationship information comprises the steps of identifying the person relationship based on session context information and identifying the person relationship based on an information circulation mode; the knowledge graph comprises a social relationship knowledge graph and an application use behavior knowledge graph;
the behavior pattern library construction unit is used for analyzing historical case data through a machine learning algorithm and a data mining algorithm, mining common behavior patterns in case-related personnel data, sending the behavior patterns to an auditing terminal, and receiving a behavior pattern library fed back by the auditing terminal;
the behavior anomaly analysis unit is used for extracting multi-modal characteristics in the knowledge graph from two layers of behavior rules and behavior contents based on the behavior pattern library, training an artificial intelligence algorithm and analyzing the probability of user behavior anomaly;
the visualization unit is used for displaying a knowledge graph which is respectively constructed based on all data and part of data of a user and relevant characteristics and results of the user behavior abnormity analysis unit;
and the storage unit is used for storing the using behavior knowledge graph of the user application program, the social relation knowledge graph of the user, the behavior pattern library and the abnormal analysis characteristics and results of the user behavior.
2. The domain knowledge graph-based user behavior analysis device of claim 1, wherein the step of performing person relationship recognition based on the session context information comprises analyzing identity information between multiple persons according to the session context by using a natural language processing technique; the step of identifying the personnel relationship based on the information circulation mode comprises the step of identifying the superior-inferior relationship among multiple persons through an information transmission mode among the multiple persons.
3. The domain knowledge graph-based user behavior analysis device according to claim 1, wherein the knowledge graph construction unit performs fusion analysis based on the user multi-device multi-account heterogeneous structured data and unstructured data to identify the relationship between people and accounts and the relationship information between people, and the knowledge graph construction step comprises:
analyzing account attributes and interactive behavior contents of contacts by applying a semantic analysis method in natural language processing, and identifying corresponding accounts of the same friend in different applications;
identifying an interpretation symbol in an application field, and rejecting explanatory content based on the interpretation symbol;
and constructing a user application program using the behavior knowledge graph and the user social relation knowledge graph.
4. The user behavior analysis device based on the domain knowledge graph as claimed in claim 1, wherein the behavior pattern library construction unit analyzes historical case data through a machine learning algorithm and a data mining algorithm, mines a common behavior pattern in case-related personnel data, sends the behavior pattern to an audit terminal, and receives the behavior pattern library fed back by the audit terminal, and the steps include:
analyzing historical case characteristics based on an association rule algorithm, a similarity calculation algorithm and a clustering algorithm; wherein, the case type, the case level and the attribute of the personnel involved in the case; the attributes of the personnel involved in the case comprise gender, cultural degree and first/second criminals;
excavating common behavior patterns of the stages before, in and after the case in the multi-equipment behavior data of the case-involved personnel under different case characteristics and the attributes of the case-involved personnel;
and sending the behavior pattern to an auditing end, and receiving a behavior pattern library fed back by the auditing end.
5. The domain knowledge graph-based user behavior analysis device according to claim 1, wherein the step of extracting multi-modal features in the knowledge graph according to behavior rules based on the behavior pattern library in the behavior anomaly analysis unit comprises:
analyzing the active behavior and historical active behavior of a user in a specified time period by using a behavior knowledge graph through an application program, and judging whether the behavior pattern of the user in the specified time period is abnormal or not;
analyzing the relation between the user specified time period and friends thereof and the relation between the user history and friends thereof through a social relation knowledge graph, and judging whether the social relation between the user specified time period and the friends is abnormal or not;
and calculating the coincidence proportion of the behavior patterns corresponding to the similar cases and the personnel attributes in the behavior pattern library by analyzing the user attribute characteristics and the behavior patterns of the user in the appointed time period.
6. The domain knowledge graph-based user behavior analysis apparatus according to claim 5, wherein the active behavior comprises a behavior in which the user actively contacts others through the device and a behavior in which the device is used except for a communication behavior.
7. The domain knowledge graph-based user behavior analysis device according to claim 6, wherein the step of extracting multi-modal features in the knowledge graph according to behavior contents based on the behavior pattern library in the behavior anomaly analysis unit comprises:
and respectively analyzing the emotional state distribution conditions of the text and the voice data in the user active behavior data by respectively training an artificial intelligence algorithm to obtain an emotional vector representing the behavior content characteristics.
8. The domain knowledge graph-based user behavior analysis device according to claim 1, wherein the displaying of the knowledge graph respectively constructed based on all data and part data of the user and the relevant features and results of the user behavior abnormality analysis unit in the visualization unit comprises:
the method comprises the steps that a complete knowledge graph constructed based on all user data and a knowledge graph constructed based on behavior data of a user in a specified time period visually display the difference and the difference of the behavior rule of the user in the whole behavior rule and the behavior rule of the user in the specified time period;
the multi-modal characteristics extracted in the user behavior anomaly analysis comprise the results of the three-part analysis of the behavior rule of claim 5 and the emotional state distribution condition of text and voice data in the behavior content of claim 6;
the user behavior anomaly analysis result specifically includes a time period for specified analysis and a behavior anomaly probability.
9. A user behavior analysis method based on a domain knowledge graph is characterized by comprising the following steps:
s1: acquiring data, specifically comprising: a knowledge graph constructed based on current user data, a historical behavior pattern library and current user specified time range behavior data;
s2: a multi-mode feature extraction strategy, which analyzes the emotion distribution state of text voice data in each behavior mode and behavior content of a user through an artificial intelligence algorithm from two aspects of behavior rules and behavior content and respectively extracts feature vectors;
s3: model training, namely applying the equipment data and case information of a plurality of involved persons in a historical case library, acquiring required data through S1-S2, extracting multi-mode features respectively, and training a deep neural network model for analyzing the behavior abnormal probability of a user in a designated time period;
s4: and (3) obtaining the current user data and extracting features by applying S1-S2, loading the trained model in S3, and analyzing the abnormal probability of the behaviors of the current user in the appointed time period.
10. A domain knowledge graph-based user behavior analysis apparatus, the apparatus comprising:
one or more processors;
a memory;
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: the user behavior analysis method according to claim 9 is performed.
CN202111624264.0A 2021-12-28 2021-12-28 User behavior analysis device, method and equipment based on domain knowledge graph Pending CN114491003A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111624264.0A CN114491003A (en) 2021-12-28 2021-12-28 User behavior analysis device, method and equipment based on domain knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111624264.0A CN114491003A (en) 2021-12-28 2021-12-28 User behavior analysis device, method and equipment based on domain knowledge graph

Publications (1)

Publication Number Publication Date
CN114491003A true CN114491003A (en) 2022-05-13

Family

ID=81495838

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111624264.0A Pending CN114491003A (en) 2021-12-28 2021-12-28 User behavior analysis device, method and equipment based on domain knowledge graph

Country Status (1)

Country Link
CN (1) CN114491003A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114867052A (en) * 2022-06-10 2022-08-05 中国电信股份有限公司 Wireless network fault diagnosis method and device, electronic equipment and medium
CN115391670A (en) * 2022-11-01 2022-11-25 南京嘉安网络技术有限公司 Knowledge graph-based internet behavior analysis method and system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114867052A (en) * 2022-06-10 2022-08-05 中国电信股份有限公司 Wireless network fault diagnosis method and device, electronic equipment and medium
CN114867052B (en) * 2022-06-10 2023-11-07 中国电信股份有限公司 Wireless network fault diagnosis method, device, electronic equipment and medium
CN115391670A (en) * 2022-11-01 2022-11-25 南京嘉安网络技术有限公司 Knowledge graph-based internet behavior analysis method and system

Similar Documents

Publication Publication Date Title
Jeong et al. Social media mining for product planning: A product opportunity mining approach based on topic modeling and sentiment analysis
Gurari et al. Captioning images taken by people who are blind
Sudhir et al. Comparative study of various approaches, applications and classifiers for sentiment analysis
CN107515873B (en) Junk information identification method and equipment
CN110909165A (en) Data processing method, device, medium and electronic equipment
CN108139918A (en) Using every user as basic custom program feature
França et al. An overview of deep learning in big data, image, and signal processing in the modern digital age
CN114491003A (en) User behavior analysis device, method and equipment based on domain knowledge graph
CN113051916A (en) Interactive microblog text emotion mining method based on emotion offset perception in social network
Choi et al. Residual-based graph convolutional network for emotion recognition in conversation for smart Internet of Things
CN110737811B (en) Application classification method and device and related equipment
CN113111264B (en) Interface content display method and device, electronic equipment and storage medium
CN115114439B (en) Method and device for multi-task model reasoning and multi-task information processing
CN113051324A (en) User portrait construction method and device based on big data and storage medium
CN114090755A (en) Reply sentence determination method and device based on knowledge graph and electronic equipment
Ogudo et al. Sentiment analysis application and natural language processing for mobile network operators’ support on social media
CN113392920B (en) Method, apparatus, device, medium, and program product for generating cheating prediction model
CN112464106B (en) Object recommendation method and device
Abu-Salih et al. DAO-LGBM: dual annealing optimization with light gradient boosting machine for advocates prediction in online customer engagement
CN111506718A (en) Session message determining method, device, computer equipment and storage medium
CN110197196B (en) Question processing method and device, electronic equipment and storage medium
CN114048294B (en) Similar population extension model training method, similar population extension method and device
Rauniyar A survey on deep learning based various methods analysis of text summarization
CN116484085A (en) Information delivery method, device, equipment, storage medium and program product
Alghalibi et al. Deep Tweets Analyzer Model for Twitter Mood Visualization and Prediction Based Deep Learning Approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination