CN116561668A - Chat session risk classification method, device, equipment and storage medium - Google Patents

Chat session risk classification method, device, equipment and storage medium Download PDF

Info

Publication number
CN116561668A
CN116561668A CN202310846524.1A CN202310846524A CN116561668A CN 116561668 A CN116561668 A CN 116561668A CN 202310846524 A CN202310846524 A CN 202310846524A CN 116561668 A CN116561668 A CN 116561668A
Authority
CN
China
Prior art keywords
dialogue
session
suspicious
user
risk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310846524.1A
Other languages
Chinese (zh)
Inventor
郭健
刘星星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Chuanqu Network Technology Co ltd
Original Assignee
Shenzhen Chuanqu Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Chuanqu Network Technology Co ltd filed Critical Shenzhen Chuanqu Network Technology Co ltd
Priority to CN202310846524.1A priority Critical patent/CN116561668A/en
Publication of CN116561668A publication Critical patent/CN116561668A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/30Scenes; Scene-specific elements in albums, collections or shared content, e.g. social network photos or video
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioethics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the field of artificial intelligence and discloses a risk classification method, device and equipment for chat sessions and a storage medium. The method comprises the following steps: acquiring a plurality of historical dialogue records generated by a chat platform in a preset time period; constructing two network topological graphs according to each round of history dialogue records, and detecting suspicious session objects to obtain suspicious groups; monitoring the conversation behavior of each user in the suspicious group, and acquiring the conversation content information of the user; after word segmentation processing is carried out on dialogue texts in the information, grammar analysis is carried out, entity recognition is carried out, and dialogue content information containing entity recognition results is obtained; extracting target dialogue characteristics from dialogue content information; and inputting the target dialogue characteristics into a dialogue risk recognition network model for processing to obtain the dialogue risk grade. The method and the device determine suspicious user groups from the network topological graph, detect various dialogue characteristics of the groups to determine risk levels, thereby realizing risk monitoring of sessions and improving safety.

Description

Chat session risk classification method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence, and in particular, to a risk classification method, apparatus, device, and storage medium for chat sessions.
Background
In the digital age today, the chat session platform has become one of the important ways people communicate daily. The platforms provide convenient and quick communication channels, so that users can share information and exchange ideas and establish contact. However, as people communicate more and more over chat platforms, the risks associated with privacy sensitive data are also becoming more prominent.
Currently, some potential safety hazards and risks exist in the chat session platform, and users on the platform often share privacy sensitive contents including personal identity information, financial data, sensitive pictures or videos and the like. Once the data falls into the hands of lawbreakers, personal privacy disclosure, identity theft, property loss and other problems may be caused, so that corresponding risk identification and monitoring on chat sessions are needed.
Disclosure of Invention
The method comprises the steps of establishing a network topology graph among users according to user history chat records, determining suspicious conversation user groups from the network topology graph by combining an abnormal conversation object detection algorithm, monitoring and detecting various conversation characteristics in conversation content information of the groups, determining corresponding risk grades according to the conversation characteristics, and realizing risk monitoring in the chat conversation process.
In a first aspect, an embodiment of the present application provides a risk classification method for a chat session, including:
acquiring a plurality of historical dialogue records generated by a chat platform in a preset time period; wherein, each round of history dialogue record at least comprises two user dialogue objects;
constructing two network topological graphs according to each round of history dialogue record; the two network topological diagrams comprise a plurality of network nodes and undirected edges connected with each network node, each network node corresponds to each user dialogue object included in each round of history dialogue records, and each undirected edge is used for representing the dialogue relation established between each user dialogue object;
suspicious session object detection is carried out on the two network topological graphs based on a preset session object anomaly detection algorithm, and a suspicious session user group is obtained;
monitoring the conversation behavior of each suspicious user in a suspicious conversation user group, and responding to the detection of any suspicious user to initiate a conversation to acquire conversation content information of the suspicious user;
word segmentation processing is carried out on dialogue texts in dialogue content information, grammar analysis is carried out according to word segmentation processing results, entity recognition is carried out according to grammar analysis results, and dialogue content information containing entity recognition results is obtained;
Feature extraction is carried out on dialogue content information containing entity identification results, and target dialogue features are obtained; the target dialogue features comprise intention features, emotion features and psychological behavior features, and the psychological behavior features at least comprise character selection features and word habit features of the input text in the dialogue process and visual prompt features in the input image;
inputting the target dialogue characteristics into a preset dialogue risk identification network model for processing to obtain a dialogue risk grade; the session risk recognition network model is a hierarchical neural network model with hierarchical classification output based on convolutional neural network learning.
In a first possible implementation manner of the first aspect, performing suspicious session object detection on the two network topology graphs based on a preset session object anomaly detection algorithm, and obtaining the suspicious session user group includes:
sequentially carrying out multi-round suspicious session object iterative detection on the two network topological graphs according to preset iteration times to sequentially obtain a plurality of suspicious session user sub-groups; wherein each round of iterative detection comprises: detecting suspicious session objects of the current two-part network topology map, removing the suspicious session user group obtained by detection from the current two-part network topology map, and storing the removed two-part network topology map;
And calculating a union among the multiple suspicious session user sub-groups to obtain the suspicious session user group.
In a second possible implementation manner of the first aspect, sequentially obtaining a plurality of suspicious session user sub-groups, wherein the suspicious session user sub-groups are arranged in descending order according to a user risk level;
monitoring the session behavior of each suspicious user in the suspicious session user group, and responding to detecting any suspicious user to initiate a session, wherein the obtaining the dialogue content information of the suspicious user comprises the following steps:
monitoring the session behavior of each suspicious user in the suspicious session user group, responding to the detection of any suspicious user to initiate a session, and determining the user risk level corresponding to the suspicious user according to the suspicious session user sub-group where the suspicious user is located;
determining a target data acquisition strategy according to the user risk level corresponding to the suspicious user; the target data acquisition strategy at least comprises a data acquisition time period and a data acquisition type;
and acquiring dialogue content information of the suspicious user according to the target data acquisition strategy.
In a third possible implementation manner of the first aspect, performing suspicious session object detection on the current two-part network topology includes:
Calculating the node suspicion degree of each network node in the current two-part network topological graph;
determining a target network node with the maximum node suspicion according to the node suspicion of each network node;
iteratively removing the target network node and the target undirected edge connected with the target network node from the current two-part network topology graph until the two-part network topology graph is empty, and obtaining a plurality of candidate two-part network topology graphs;
and calculating the global average suspicion degree of each candidate bipartite network topological graph, and determining the suspicious conversation user group detected at this time according to the candidate bipartite network topological graph with the largest global average suspicion degree.
In a fourth possible implementation manner of the first aspect, the dialogue content information including the entity recognition result includes dialogue text and dialogue images;
feature extraction is carried out on dialogue content information containing entity recognition results, and target dialogue features are obtained, wherein the method comprises the following steps:
determining text word vector matrix data corresponding to the dialogue text and image picture pixel point matrix data corresponding to the dialogue image;
respectively carrying out convolution calculation on the text word vector matrix data and the image picture pixel point matrix data to obtain a calculation set;
Calculating based on the feature extraction network output by the calculation set to obtain target dialogue features; the characteristic extraction network comprises a plurality of line convolution layers or convolution layers and pooling layers.
In a fifth possible implementation manner of the first aspect, the feature extraction network includes a first network and a second network, the first network includes a first line one-layer pooling layer, a second line two-layer convolution layer, and a third line four-layer convolution layer from left to right, and the second network includes a first line convolution layer, a second line two-layer convolution layer, and a third line three-layer convolution layer from left to right;
calculating based on the feature extraction network output by the calculation set, the obtaining the target dialogue feature comprises the following steps:
outputting a feature set comprising intent features and emotion features based on the calculation of the first network;
and outputting the psychological behavioral characteristics based on the calculation of the characteristic set by the second network.
In a sixth possible implementation manner of the first aspect, after inputting the target dialogue feature into a preset session risk identification network model for processing to obtain a session risk level, the method further includes:
determining the risk form of the suspicious user according to the session risk level corresponding to the session content information; the risk forms comprise the current system state, the behavior of the user and the running environment of the chat system;
Determining a corresponding risk decision strategy according to the risk morphology, and executing risk control according to the risk decision strategy; wherein the risk decision strategy comprises responding, warning and rejecting the request to the user.
In a second aspect, the present application provides a risk classification apparatus for chat session, including:
the dialogue record acquisition module is used for acquiring a plurality of historical dialogue records generated by the chat platform in a preset time period; wherein, each round of history dialogue record at least comprises two user dialogue objects;
the topology diagram construction module is used for constructing two network topology diagrams according to each round of history dialogue records; the two network topological diagrams comprise a plurality of network nodes and undirected edges connected with each network node, each network node corresponds to each user dialogue object included in each round of history dialogue records, and each undirected edge is used for representing the dialogue relation established between each user dialogue object;
the session detection module is used for detecting suspicious session objects of the two network topological graphs based on a preset session object anomaly detection algorithm to obtain suspicious session user groups;
the dialogue content acquisition module is used for monitoring the dialogue behaviors of each suspicious user in the suspicious dialogue user group, and acquiring dialogue content information of the suspicious user in response to detecting that any suspicious user initiates a dialogue;
The data processing module is used for word segmentation processing of dialogue texts in dialogue content information, carrying out grammar analysis according to word segmentation processing results, and carrying out entity recognition according to grammar analysis results to obtain dialogue content information containing entity recognition results;
the feature extraction module is used for extracting features of dialogue content information containing entity identification results to obtain target dialogue features; the target dialogue features comprise intention features, emotion features and psychological behavior features, and the psychological behavior features at least comprise character selection features and word habit features of the input text in the dialogue process and visual prompt features in the input image;
the risk identification module is used for inputting the target dialogue characteristics into a preset session risk identification network model for processing to obtain a session risk level; the session risk recognition network model is a hierarchical neural network model with hierarchical classification output based on convolutional neural network learning.
In a first possible implementation manner of the second aspect, the session detection module specifically includes:
the iteration detection unit is used for sequentially carrying out multi-round suspicious session object iteration detection on the two network topological graphs according to preset iteration times to sequentially obtain a plurality of suspicious session user sub-groups; wherein each round of iterative detection comprises: detecting suspicious session objects of the current two-part network topology map, removing the suspicious session user group obtained by detection from the current two-part network topology map, and storing the removed two-part network topology map;
And the union calculating unit is used for calculating the union among the multiple suspicious session user sub-groups to obtain the suspicious session user group.
In a second possible implementation manner of the second aspect, sequentially obtaining a plurality of suspicious session user sub-groups, wherein the suspicious session user sub-groups are arranged in descending order according to a user risk level; the dialogue content acquisition module specifically comprises:
the user risk identification unit is used for monitoring the session behavior of each suspicious user in the suspicious session user group, responding to the detection of any suspicious user to initiate a session, and determining the user risk level corresponding to the suspicious user according to the suspicious session user sub-group where the suspicious user is located;
the acquisition strategy determining unit is used for determining a target data acquisition strategy according to the user risk level corresponding to the suspicious user; the target data acquisition strategy at least comprises a data acquisition time period and a data acquisition type;
and the information acquisition unit is used for acquiring the dialogue content information of the suspicious user according to the target data acquisition strategy.
In a third possible implementation manner of the second aspect, the iteration detecting unit is specifically configured to:
sequentially carrying out multi-round suspicious session object iterative detection on the two network topological graphs according to preset iteration times to sequentially obtain a plurality of suspicious session user sub-groups; wherein each round of iterative detection comprises: detecting suspicious session objects of the current two-part network topology map, removing the suspicious session user group obtained by detection from the current two-part network topology map, and storing the removed two-part network topology map;
The suspicious session object detection to the current two-part network topological graph comprises the following steps: calculating the node suspicion degree of each network node in the current two-part network topological graph; determining a target network node with the maximum node suspicion according to the node suspicion of each network node; iteratively removing the target network node and the target undirected edge connected with the target network node from the current two-part network topology graph until the two-part network topology graph is empty, and obtaining a plurality of candidate two-part network topology graphs; and calculating the global average suspicion degree of each candidate bipartite network topological graph, and determining the suspicious conversation user group detected at this time according to the candidate bipartite network topological graph with the largest global average suspicion degree.
In a fourth possible implementation manner of the second aspect, the dialogue content information including the entity recognition result includes dialogue text and dialogue images, and the feature extraction module specifically includes:
the matrix determining unit is used for determining text word vector matrix data corresponding to the dialogue text and image picture pixel point matrix data corresponding to the dialogue image;
the convolution calculation unit is used for carrying out convolution calculation on the text word vector matrix data and the image picture pixel point matrix data respectively to obtain a calculation set;
The feature calculation unit is used for calculating based on the feature extraction network output by the calculation set to obtain target dialogue features; the characteristic extraction network comprises a plurality of line convolution layers or convolution layers and pooling layers.
In a fifth possible implementation manner of the second aspect, the feature extraction network includes a first network and a second network, the first network includes a first line one-layer pooling layer, a second line two-layer convolution layer, and a third line four-layer convolution layer from left to right, and the second network includes a first line convolution layer, a second line two-layer convolution layer, and a third line three-layer convolution layer from left to right;
the feature calculation unit is specifically configured to:
outputting a feature set comprising intent features and emotion features based on the calculation of the first network;
and outputting the psychological behavioral characteristics based on the calculation of the characteristic set by the second network.
In a sixth possible implementation manner of the second aspect, the risk classification device for a chat session further includes:
the risk form determining module is used for determining the risk form of the suspicious user according to the session risk level corresponding to the session content information; the risk forms comprise the current system state, the behavior of the user and the running environment of the chat system;
The risk control module is used for determining a corresponding risk decision strategy according to the risk form and executing risk control according to the risk decision strategy; wherein the risk decision strategy comprises responding, warning and rejecting the request to the user.
In a third aspect, the present application provides a risk classification device for a chat session, including: a memory and at least one processor, the memory having instructions stored therein; the at least one processor invokes the instructions in the memory to cause the risk ranking device of the chat session to perform the steps of the risk ranking method of the chat session described above.
In a fourth aspect, the present application provides a computer readable storage medium having instructions stored therein that, when executed on a computer, cause the computer to perform the steps of the risk ranking method of chat sessions described above.
In the method, the network topology graph among users is established according to the historical chat records of the users, and the suspicious conversation user group is determined from the network topology graph by combining an abnormal conversation object detection algorithm, so that conversation characteristics including intention characteristics, emotion characteristics and psychological behavior characteristics in conversation content information of the group are monitored and detected, corresponding risk grades are determined according to the conversation characteristics, risk monitoring in the chat conversation process is realized, and the safety of the chat conversation is improved.
Drawings
Fig. 1 is a flowchart of an embodiment of a risk classification method for a first chat session provided in an embodiment of the present application;
fig. 2 is a flowchart of an embodiment of a risk classification method for a second chat session provided in an embodiment of the application;
fig. 3 is a flowchart of an embodiment of a risk classification method for a third chat session provided in an embodiment of the application;
fig. 4 is a flowchart of an embodiment of a risk classification method for a fourth chat session provided in an embodiment of the application;
fig. 5 is a flowchart of an embodiment of a risk classification method for a fifth chat session provided in an embodiment of the application;
fig. 6 is a schematic structural diagram of a risk classification device for chat session according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of another risk classification apparatus for chat session according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a risk classification device for chat session according to an embodiment of the present application;
fig. 9 is a schematic diagram of a two-part network topology provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. Wherein the terms "first," "second," "third," "fourth," and the like in the description and in the claims of this application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that any part related to data acquisition or collection in the present application is authorized by the user; the executing body of the application may be a risk classification device of a chat session, and may also be a mobile terminal or a server, which is not limited herein.
For easy understanding, the embodiment of the present application describes a risk classification method for the chat session with a server as an execution body, and the following describes a specific flow of the embodiment of the present application, please refer to fig. 1, fig. 1 is a flowchart of an embodiment of a risk classification method for a first chat session provided in the embodiment of the present application, which includes:
101. acquiring a plurality of historical dialogue records generated by a chat platform in a preset time period; wherein, each round of history dialogue record at least comprises two user dialogue objects;
it will be appreciated that chat platforms typically persist conversation records between users in a server that can obtain historical conversation records for all users of the chat platform during the last week or month. The historical dialogue record of one round can be chat information between two users within a preset time period, or can be chat information between a plurality of users within a preset time period, for example, the group chat, and the users receiving the chat information sent by any other person in the group chat are all user dialogue objects.
102. Constructing two network topological graphs according to each round of history dialogue record; the two network topological diagrams comprise a plurality of network nodes and undirected edges connected with each network node, each network node corresponds to each user dialogue object included in each round of history dialogue records, and each undirected edge is used for representing the dialogue relation established between each user dialogue object;
referring to fig. 9, fig. 9 shows a schematic diagram of a two-part network topology diagram, which is composed of a plurality of network nodes and undirected edges connecting the network nodes, wherein the network nodes include A, B, C, D, E, F nodes, each network node represents a user session object, and any two network nodes with connection relations represent that session relations are established between the two user session objects.
As can be seen from the first example in fig. 9, the two network nodes A, B have a connection relationship, and the two network nodes are in a one-to-one chat relationship, specifically, the chat information is sent to B by a, the chat information is sent to a by B, or the chat information is sent to a by A, B;
as can be seen from the second example in fig. 9, the B, D, E, F five network nodes establish a connection relationship through the network node C, and the five network nodes together establish a group chat, and establish a dialogue relationship, but the B, D, E, F four network nodes do not have a connection relationship with each other, which means that only C in the group chat sends chat information, but does not get replies from any other members in the group chat;
As can be seen from the third example in fig. 9, the three network nodes G, H, I have a connection relationship therebetween, and together construct a group chat, a session relationship is established, and a session interaction is performed between the three network nodes.
103. Suspicious session object detection is carried out on the two network topological graphs based on a preset session object anomaly detection algorithm, and a suspicious session user group is obtained;
it should be understood that, in this application, the Fraudar algorithm is used as the abnormal detection algorithm of the session object, and the server defines a global measure capable of expressing the average suspicion degree of the node, and forms a dense subgraph with the highest suspicion degree, i.e. the suspicion session user group, according to the retention node that maximizes the average suspicion degree of the node in the iterative process of removing the network node with the minimum suspicion degree in a step-by-step greedy manner.
104. Monitoring the conversation behavior of each suspicious user in a suspicious conversation user group, and responding to the detection of any suspicious user to initiate a conversation to acquire conversation content information of the suspicious user;
it can be understood that each suspicious user has a unique ID identifier in the chat platform, and the server adds the ID identifier of each suspicious user to the monitoring list and continuously monitors the session behavior of each suspicious user in the list; the dialogue content information includes, but is not limited to, text and image forms, which are not specifically limited in this embodiment.
105. Word segmentation processing is carried out on dialogue texts in dialogue content information, grammar analysis is carried out according to word segmentation processing results, entity recognition is carried out according to grammar analysis results, and dialogue content information containing entity recognition results is obtained;
the server performs word segmentation processing on the original text in the dialogue content information, and divides the original text into single words or marks for further processing, wherein the word segmentation processing mode comprises, but is not limited to, a word segmentation rule-based mode, a statistical word segmentation mode or a word segmentation mode based on machine learning;
further, the server analyzes the word segmentation result according to the grammar rule pair of the language to obtain a sentence structure, so that the constituent parts of the sentence and the relation between the constituent parts are understood; among these, the manner of parsing includes, but is not limited to, context-Free Grammar (Context-Free Grammar), dependency syntax parsing (Dependency Parsing), and the like. The server can identify phrases, main-predicate relations, modifier relations and the like in the dialogue text based on the methods;
still further, the server performs named entity recognition in combination with the result of the parsing, so as to identify and classify entities having a specific meaning in the dialogue text, thereby extracting key information in the dialogue text and important entities in the context, such as a person name, a place name, an organization, a date, a time, and the like. It should be noted that, the server uses the grammar analysis result as the basis, which not only can make the named entity identify more accurately, but also can easily identify the risk sensitive entity information such as personal identity information, account and finance information, medical and health information, business confidentiality, etc.
106. Feature extraction is carried out on dialogue content information containing entity identification results, and target dialogue features are obtained; the target dialogue features comprise intention features, emotion features and psychological behavior features, and the psychological behavior features at least comprise character selection features and word habit features of the input text in the dialogue process and visual prompt features in the input image;
it can be understood that the server can perform feature extraction on dialogue content information containing entity recognition results based on one or more of an n-gram model, a Bag-of-Words model, an attention model and other network models to obtain target dialogue features; the target dialog features include intent features, emotion features, and psychobehavioral features including at least text selection features, word habit features of the text entered during the dialog, and visual cue features in the image entered.
In the embodiment of the application, the intention feature is used for representing the intention or the purpose of the user, and the emotion feature is used for representing the emotion color of the dialogue content and the emotion state of the user, and it is required to be noted that the emotion color not only comprises the emotion color of the dialogue text, but also comprises the emotion color of the image and the emotion color of the audio when the dialogue content comprises the image and the audio.
In the embodiment of the present application, the text selection feature is used to represent a specific word or term appearing in the text, and may imply or infer presence of sensitive information, for example, words related to bank accounts, credit cards, payment passwords, etc. may imply the presence of financial and account information, words related to medical records, medical conditions, medicines, diagnostics, etc. may imply the presence of medical and health information, words related to personal crime records, judicial decisions, legal disputes, etc. may imply the presence of legal and regulatory related information.
In the embodiment of the application, the word habit features are used for expressing the word style, word selection and usage habit of an individual in expression, and may suggest some sensitive information or personal features, for example, expressing personal preference, hobbies or word selection of political views in text may suggest sensitive information of the individual, and word selection related to a specific geographic location may suggest sensitive information related to the region.
In the embodiment of the application, the visual cue features are features, modes or elements in the image, such as characters or marks, which can be used for judging whether sensitive information exists or not: the appearance of explicit text or logos in the image can become a visual cue for sensitive information; for example, text or logos containing personal identity information, account information, or other sensitive identification; face recognition: facial features in the image, particularly facial recognition associated with the identity of the person, may suggest the presence of sensitive information, including personal identification cards, driver's license photographs, or other situations involving personal identification; two-dimensional code or bar code: the two-dimensional code or bar code appearing in the image may contain sensitive information such as personal account information, product serial number or other privacy related data; sensitive scenes or objects: a particular scene or object in an image may be associated with sensitive information. For example, medical instruments, legal documents, financial transactions, etc. may suggest the presence of sensitive information; image processing trace: some images may be processed or modified but some traces or marks remain identifiable. These traces may be evidence of the origin or modification of the sensitive information.
107. Inputting the target dialogue characteristics into a preset dialogue risk identification network model for processing to obtain a dialogue risk grade; the session risk recognition network model is a hierarchical neural network model with hierarchical classification output based on convolutional neural network learning.
It will be appreciated that the session risk identification network model is a convolutional neural network (Convolutional Neural Network, CNN) based neural network model for processing and identifying the risk level of a target session. The model evaluates and classifies the target dialog by learning features of the input dialog and mapping them to different risk level classification outputs.
Based on the method provided by the embodiment of the application, through establishing a network topology graph among users according to the user history chat record and determining suspicious conversation user groups from the network topology graph by combining an abnormal conversation object detection algorithm, conversation characteristics including intention characteristics, emotion characteristics and psychological behavior characteristics in conversation content information of the groups are monitored and detected, and corresponding risk levels are determined according to the conversation characteristics, so that risk monitoring in a chat conversation process is realized, and the safety of the chat conversation is improved.
Referring to fig. 2, fig. 2 is a flowchart of an embodiment of a risk classification method for a second chat session according to an embodiment of the present application, including:
201. acquiring a plurality of historical dialogue records generated by a chat platform in a preset time period; wherein, each round of history dialogue record at least comprises two user dialogue objects;
202. constructing two network topological graphs according to each round of history dialogue record; the two network topological diagrams comprise a plurality of network nodes and undirected edges connected with each network node, each network node corresponds to each user dialogue object included in each round of history dialogue records, and each undirected edge is used for representing the dialogue relation established between each user dialogue object;
steps 201 to 202 are similar to the steps 101 to 102, and are not repeated here.
203. Sequentially carrying out multi-round suspicious session object iterative detection on the two network topological graphs according to preset iteration times to sequentially obtain a plurality of suspicious session user sub-groups; wherein each round of iterative detection comprises: detecting suspicious session objects of the current two-part network topology map, removing the suspicious session user group obtained by detection from the current two-part network topology map, and storing the removed two-part network topology map;
The two network topological graphs are formed by network nodes corresponding to 100 rounds of dialogue records, if the server detects that 20 rounds of dialogue records are suspicious in the first round of iterative detection, the network nodes corresponding to the 20 rounds of dialogue records and the connection undirected edges of the network nodes are removed from the two network topological graphs, the remaining 80 rounds of dialogue records form a new two network topological graph and then are stored, and the next round of iterative detection is executed until the preset iterative times are reached. In practical applications, the number of iterations may be adjusted according to the needs, which is not specifically limited in the embodiments of the present application.
It should be appreciated that in one of the rounds of iterative detection, the server will perform suspicious session object detection on the current two-part network topology map including: the server calculates the node suspicion degree of each network node in the current two-part network topological graph; determining a target network node with the maximum node suspicion according to the node suspicion of each network node; iteratively removing the target network node and the target undirected edge connected with the target network node from the current two-part network topology graph until the two-part network topology graph is empty, and obtaining a plurality of candidate two-part network topology graphs; and calculating the global average suspicion degree of each candidate bipartite network topological graph, and determining the suspicious conversation user group detected at this time according to the candidate bipartite network topological graph with the largest global average suspicion degree.
The server calculates the node suspicion degree of each network node in the current two-part network topological graph, which comprises the following steps: the server determines the undirected edges connected with each network node in the current two network topological graphs, calculates the edge suspicious degree of each undirected edge according to a preset calculation rule, and finally counts the sum of the edge suspicious degrees of all undirected edges connected with each network node, thereby obtaining the node suspicious degree of each network node.
The specific calculation rule is as follows:
where h is a network node, F (h) is the node suspicion of the network node h, and x is the total number of undirected edges connecting the network node h.
The server calculates the global level of each candidate bipartite network topological graph according to a preset calculation ruleAll suspicious degrees, the calculation rule is:
wherein s is a network node set of each candidate bipartite network topological graph, n is the number of nodes in the network node set, and F(s) is the sum of the suspicious degree of the network nodes in the network node set.
204. Calculating a union among a plurality of suspicious session user sub-groups to obtain a suspicious session user group;
205. monitoring the conversation behavior of each suspicious user in a suspicious conversation user group, and responding to the detection of any suspicious user to initiate a conversation to acquire conversation content information of the suspicious user;
206. Word segmentation processing is carried out on dialogue texts in dialogue content information, grammar analysis is carried out according to word segmentation processing results, entity recognition is carried out according to grammar analysis results, and dialogue content information containing entity recognition results is obtained;
207. feature extraction is carried out on dialogue content information containing entity identification results, and target dialogue features are obtained; the target dialogue features comprise intention features, emotion features and psychological behavior features, and the psychological behavior features at least comprise character selection features and word habit features of the input text in the dialogue process and visual prompt features in the input image;
208. inputting the target dialogue characteristics into a preset dialogue risk identification network model for processing to obtain a dialogue risk grade; the session risk recognition network model is a hierarchical neural network model with hierarchical classification output based on convolutional neural network learning.
Steps 205-208 are similar to the steps 104-107 described above, and detailed descriptions thereof are omitted herein.
Based on the method provided by the embodiment of the application, suspicious groups are detected and removed in the network topological graph iteratively by using a recursion idea, so that each suspicious session user object is obtained, and the detection result is more comprehensive and accurate.
Referring to fig. 3, fig. 3 is a flowchart of an embodiment of a risk classification method for a third chat session according to an embodiment of the present application, including:
301. acquiring a plurality of historical dialogue records generated by a chat platform in a preset time period; wherein, each round of history dialogue record at least comprises two user dialogue objects;
302. constructing two network topological graphs according to each round of history dialogue record; the two network topological diagrams comprise a plurality of network nodes and undirected edges connected with each network node, each network node corresponds to each user dialogue object included in each round of history dialogue records, and each undirected edge is used for representing the dialogue relation established between each user dialogue object;
steps 301 to 302 are similar to the steps 101 to 102, and are not repeated here.
303. Sequentially carrying out multi-round suspicious session object iterative detection on the two network topological graphs according to preset iteration times to sequentially obtain a plurality of suspicious session user sub-groups; wherein each round of iterative detection comprises: detecting suspicious session objects of the current two-part network topology map, removing the suspicious session user group obtained by detection from the current two-part network topology map, and storing the removed two-part network topology map;
It should be understood that the suspicious session user group obtained by each round of iterative detection decreases with increasing iteration number, that is, the suspicious session user group and the suspicious session user group are in a negative correlation relationship, and the suspicious session user sub-groups obtained in sequence are arranged in descending order according to the user risk level.
304. Calculating a union among a plurality of suspicious session user sub-groups to obtain a suspicious session user group;
305. monitoring the session behavior of each suspicious user in the suspicious session user group, responding to the detection of any suspicious user to initiate a session, and determining the user risk level corresponding to the suspicious user according to the suspicious session user sub-group where the suspicious user is located;
it can be understood that the server sets a corresponding user risk level for the suspicious session user group detected by each iteration in advance according to the suspicious degree, for example, for the suspicious session user group detected by the first iteration, the suspicious degree of the group is the largest, and the server sets the user risk level of the suspicious session user group to be one level in advance; for the suspicious session user group detected by the second round of iteration, the suspicion of the group is inferior to that of the first round of iteration, the server presets the user risk level of the suspicious session user group to be two-level, and so on.
306. Determining a target data acquisition strategy according to the user risk level corresponding to the suspicious user; the target data acquisition strategy at least comprises a data acquisition time period and a data acquisition type;
it should be understood that the server differentially selects a data acquisition strategy, such as a time period for data acquisition, according to the user risk level of the suspicious user, and for users with high risk levels, the server adopts a data acquisition mode of all-day period; for users with low risk levels, the server adopts a periodic data acquisition mode. For example, the data acquisition type, for users with high risk level, the server acquires various chat data such as chat text, voice, image, file and the like; for users with low risk levels, the server only obtains text-type chat data.
307. Acquiring dialogue content information of a suspicious user according to a target data acquisition strategy;
308. word segmentation processing is carried out on dialogue texts in dialogue content information, grammar analysis is carried out according to word segmentation processing results, entity recognition is carried out according to grammar analysis results, and dialogue content information containing entity recognition results is obtained;
309. feature extraction is carried out on dialogue content information containing entity identification results, and target dialogue features are obtained; the target dialogue features comprise intention features, emotion features and psychological behavior features, and the psychological behavior features at least comprise character selection features and word habit features of the input text in the dialogue process and visual prompt features in the input image;
310. Inputting the target dialogue characteristics into a preset dialogue risk identification network model for processing to obtain a dialogue risk grade; the session risk recognition network model is a hierarchical neural network model with hierarchical classification output based on convolutional neural network learning.
Steps 308 to 310 are similar to the steps 105 to 107, and are not repeated here.
Based on the method provided by the embodiment of the application, the corresponding user risk level is determined according to the suspicious group in which the user is located, so that different data acquisition modes are differentially selected for users with different risk levels to further identify the risk levels, and the identification result is more accurate by more differentiated processing modes.
Referring to fig. 4, fig. 4 is a flowchart of an embodiment of a risk classification method for a fourth chat session according to an embodiment of the present application, including:
401. acquiring a plurality of historical dialogue records generated by a chat platform in a preset time period; wherein, each round of history dialogue record at least comprises two user dialogue objects;
402. constructing two network topological graphs according to each round of history dialogue record; the two network topological diagrams comprise a plurality of network nodes and undirected edges connected with each network node, each network node corresponds to each user dialogue object included in each round of history dialogue records, and each undirected edge is used for representing the dialogue relation established between each user dialogue object;
403. Suspicious session object detection is carried out on the two network topological graphs based on a preset session object anomaly detection algorithm, and a suspicious session user group is obtained;
404. monitoring the conversation behavior of each suspicious user in a suspicious conversation user group, and responding to the detection of any suspicious user to initiate a conversation to acquire conversation content information of the suspicious user;
405. word segmentation processing is carried out on dialogue texts in dialogue content information, grammar analysis is carried out according to word segmentation processing results, entity recognition is carried out according to grammar analysis results, and dialogue content information containing entity recognition results is obtained; the dialogue content information of the entity identification result comprises dialogue text and dialogue images;
steps 401 to 405 are similar to the steps performed in steps 101 to 105, and are not repeated here.
406. Determining text word vector matrix data corresponding to the dialogue text and image picture pixel point matrix data corresponding to the dialogue image;
in a specific implementation, the server maps each Word in the text into a fixed length vector representation, such as Word2Vec, gloVe or FastText, etc., through a pre-trained Word embedding model; the server builds a vocabulary, determines a vocabulary to be used, wherein the vocabulary comprises all words in the dialogue text, and the vocabulary can be built according to words appearing in training data and can be screened according to specific requirements of tasks; the server converts each word in the dialog text into a corresponding word vector based on the word embedding model and the vocabulary, thereby sequentially assembling the word vectors into text word vector matrix data.
In a specific implementation, the server reads image data from an image file, wherein the image file comprises but is not limited to JPEG, PNG and other formats; the server performs necessary preprocessing operations on the image, such as resizing, cropping, graying, etc., so as to obtain a consistent input size and channel number; the server converts each pixel in the image into a corresponding numerical representation, thereby composing the image frame pixel matrix data. For gray scale images, each pixel point may be represented as a gray scale value; for color images, each pixel may be represented as a vector containing the red, green, and blue channel values.
407. Respectively carrying out convolution calculation on the text word vector matrix data and the image picture pixel point matrix data to obtain a calculation set;
the server carries out convolution calculation on the text word vector matrix data and the image picture pixel point matrix data respectively through a preset convolution neural network, and the convolution layer number of the convolution neural network can be set according to requirements. Processing of dialog text typically relies on recurrent neural networks (Recurrent Neural Network, RNN) to capture deep information; in the embodiment of the application, the server can capture local features in the text through convolution operation. For natural language processing tasks, text usually has local structure and semantic information, such as a combination mode of words and phrases, and a server can effectively capture the local features through convolution operation and share weights in the whole text so as to reduce the parameter number of a model; meanwhile, as the convolution layers can learn features with different sizes, the server processes the dialogue text from the local to global range through the stacked convolution neural network of a plurality of convolution layers and pooling layers, and extracts high-level feature representation of the dialogue text. These features can capture semantic, grammatical and syntactic information of text, providing a richer characterization capability.
408. Calculating based on the feature extraction network output by the calculation set to obtain target dialogue features; the characteristic extraction network comprises a plurality of line convolution layers or convolution layer and pooling layers; the target dialogue features comprise intention features, emotion features and psychological behavior features, and the psychological behavior features at least comprise character selection features and word habit features of the input text in the dialogue process and visual prompt features in the input image;
the characteristic extraction network comprises a first network and a second network, wherein the first network comprises a first line one-layer pooling layer, a second line two-layer convolution layer and a third line four-layer convolution layer from left to right, the second network comprises a first line convolution layer, a second line two-layer convolution layer and a third line three-layer convolution layer from left to right, and the convolution layers of each line are different; the number of lines calculated for the convolution layer of the psycho-behavioral feature extraction is greater than the number of lines calculated for the convolution layer of the intention feature and the emotion feature extraction.
Under the network structure, the server outputs a feature set containing an intention feature and an emotion feature based on the calculation of the first network, and outputs a psycho-behavioral feature based on the calculation of the feature set by the second network.
409. Inputting the target dialogue characteristics into a preset dialogue risk identification network model for processing to obtain a dialogue risk grade; the session risk recognition network model is a hierarchical neural network model with hierarchical classification output based on convolutional neural network learning.
Step 409 is similar to the above-mentioned step 107, and is not described here in detail.
Based on the method provided by the embodiment of the application, the character selection features and the word habit features of the input text and the visual prompt features in the input image in the dialogue process are respectively extracted through the feature extraction network comprising a plurality of line convolution layers or convolution layers and pooling layers, so that the classification of the features is accurately carried out according to the plurality of features, and an accurate conversation risk level is obtained.
Referring to fig. 5, fig. 5 is a flowchart of an embodiment of a risk classification method for a fifth chat session according to an embodiment of the present application, including:
501. acquiring a plurality of historical dialogue records generated by a chat platform in a preset time period; wherein, each round of history dialogue record at least comprises two user dialogue objects;
502. constructing two network topological graphs according to each round of history dialogue record; the two network topological diagrams comprise a plurality of network nodes and undirected edges connected with each network node, each network node corresponds to each user dialogue object included in each round of history dialogue records, and each undirected edge is used for representing the dialogue relation established between each user dialogue object;
503. Suspicious session object detection is carried out on the two network topological graphs based on a preset session object anomaly detection algorithm, and a suspicious session user group is obtained;
504. monitoring the conversation behavior of each suspicious user in a suspicious conversation user group, and responding to the detection of any suspicious user to initiate a conversation to acquire conversation content information of the suspicious user;
505. word segmentation processing is carried out on dialogue texts in dialogue content information, grammar analysis is carried out according to word segmentation processing results, entity recognition is carried out according to grammar analysis results, and dialogue content information containing entity recognition results is obtained;
506. feature extraction is carried out on dialogue content information containing entity identification results, and target dialogue features are obtained; the target dialogue features comprise intention features, emotion features and psychological behavior features, and the psychological behavior features at least comprise character selection features and word habit features of the input text in the dialogue process and visual prompt features in the input image;
507. inputting the target dialogue characteristics into a preset dialogue risk identification network model for processing to obtain a dialogue risk grade; the session risk recognition network model is a hierarchical neural network model which is formed by learning based on a convolutional neural network and has hierarchical classification output;
Steps 501 to 507 are similar to the steps 101 to 107, and are not repeated here.
508. Determining the risk form of the suspicious user according to the session risk level corresponding to the session content information; the risk forms comprise the current system state, the behavior of the user and the running environment of the chat system;
it should be understood that the server classifies the users into different risk forms according to the evaluation result of the session risk level. For example, a low risk profile indicates that the user is behaving normally and without suspicious risk, and a high risk profile indicates that the user may be at risk or suspicious. The risk form can comprehensively consider the current system state, the behavior of the user, the running environment of the chat system and other factors.
509. Determining a corresponding risk decision strategy according to the risk morphology, and executing risk control according to the risk decision strategy; wherein the risk decision strategy comprises responding, warning and rejecting the request to the user.
It should be appreciated that the server determines the corresponding risk decision strategy based on the risk profile of the user. These policies may include ways of responding to users, warnings and alerts, limiting rights or function usage, etc. For example, for users in a high risk modality, more stringent measures may be taken, such as monitoring their conversations, limiting the transmission of sensitive content, etc.
It should be appreciated that the server may perform corresponding risk control measures in accordance with the risk decision strategy. This may involve automated system operation, such as automatically issuing a warning, temporarily disabling or blocking a user account, etc. In addition, the system may also notify an administrator or auditor to make manual interventions and decisions.
Based on the method provided by the embodiment of the application, the risk form of the current dialogue content information of the suspicious user is determined, so that the risk decision strategy of corresponding processing is further determined according to the risk form, risk control is timely carried out, the safety of chat session is improved, and session risk is reduced.
The foregoing describes a method for risk classification of a chat session in the embodiment of the present application, and the following describes a device for risk classification of a chat session in the embodiment of the present application, please refer to fig. 6, fig. 6 is a schematic structural diagram of a device for risk classification of a chat session provided in the embodiment of the present application, including:
the dialogue record obtaining module 601 is configured to obtain a plurality of historical dialogue records generated by the chat platform in a preset time period; wherein, each round of history dialogue record at least comprises two user dialogue objects;
The topology map construction module 602 is configured to construct two network topology maps according to each round of history dialogue records; the two network topological diagrams comprise a plurality of network nodes and undirected edges connected with each network node, each network node corresponds to each user dialogue object included in each round of history dialogue records, and each undirected edge is used for representing the dialogue relation established between each user dialogue object;
the session detection module 603 is configured to perform suspicious session object detection on the two network topology graphs based on a preset session object anomaly detection algorithm, so as to obtain a suspicious session user group;
a session content obtaining module 604, configured to monitor session behavior of each suspicious user in the suspicious session user group, and obtain session content information of the suspicious user in response to detecting that any suspicious user initiates a session;
the data processing module 605 is configured to process the dialogue text in the dialogue content information by word segmentation, parse the dialogue text according to the word segmentation result, and identify the entity according to the parsed result, so as to obtain the dialogue content information including the entity identification result;
the feature extraction module 606 is configured to perform feature extraction on dialogue content information that includes an entity recognition result, so as to obtain a target dialogue feature; the target dialogue features comprise intention features, emotion features and psychological behavior features, and the psychological behavior features at least comprise character selection features and word habit features of the input text in the dialogue process and visual prompt features in the input image;
The risk identification module 607 is configured to input the target dialogue feature into a preset session risk identification network model for processing, so as to obtain a session risk level; the session risk recognition network model is a hierarchical neural network model with hierarchical classification output based on convolutional neural network learning.
Based on the device provided by the embodiment of the application, through establishing a network topology graph among users according to the user history chat record and determining suspicious conversation user groups from the network topology graph by combining an abnormal conversation object detection algorithm, conversation characteristics including intention characteristics, emotion characteristics and psychological behavior characteristics in conversation content information of the groups are monitored and detected, and corresponding risk levels are determined according to the conversation characteristics, so that risk monitoring in a chat conversation process is realized, and the safety of chat conversation is improved.
Referring to fig. 7, fig. 7 is a schematic structural diagram of another risk classification apparatus for chat session according to the embodiment of the present application, including:
the dialogue record obtaining module 601 is configured to obtain a plurality of historical dialogue records generated by the chat platform in a preset time period; wherein, each round of history dialogue record at least comprises two user dialogue objects;
The topology map construction module 602 is configured to construct two network topology maps according to each round of history dialogue records; the two network topological diagrams comprise a plurality of network nodes and undirected edges connected with each network node, each network node corresponds to each user dialogue object included in each round of history dialogue records, and each undirected edge is used for representing the dialogue relation established between each user dialogue object;
the session detection module 603 is configured to perform suspicious session object detection on the two network topology graphs based on a preset session object anomaly detection algorithm, so as to obtain a suspicious session user group;
a session content obtaining module 604, configured to monitor session behavior of each suspicious user in the suspicious session user group, and obtain session content information of the suspicious user in response to detecting that any suspicious user initiates a session;
the data processing module 605 is configured to process the dialogue text in the dialogue content information by word segmentation, parse the dialogue text according to the word segmentation result, and identify the entity according to the parsed result, so as to obtain the dialogue content information including the entity identification result;
the feature extraction module 606 is configured to perform feature extraction on dialogue content information that includes an entity recognition result, so as to obtain a target dialogue feature; the target dialogue features comprise intention features, emotion features and psychological behavior features, and the psychological behavior features at least comprise character selection features and word habit features of the input text in the dialogue process and visual prompt features in the input image;
The risk identification module 607 is configured to input the target dialogue feature into a preset session risk identification network model for processing, so as to obtain a session risk level; the session risk recognition network model is a hierarchical neural network model with hierarchical classification output based on convolutional neural network learning.
In one possible embodiment, the risk classification device for chat session further includes:
the risk form determining module 608 is configured to determine a risk form of the suspicious user according to the session risk level corresponding to the session content information; the risk forms comprise the current system state, the behavior of the user and the running environment of the chat system;
the risk control module 609 is configured to determine a corresponding risk decision policy according to the risk morphology, and execute risk control according to the risk decision policy; wherein the risk decision strategy comprises responding, warning and rejecting the request to the user.
In one possible implementation, the session detection module 603 specifically includes:
the iteration detection unit 6031 is configured to sequentially perform multiple rounds of iterative detection on suspicious session objects of the two network topology graphs according to a preset iteration number, so as to sequentially obtain multiple suspicious session user sub-groups; wherein each round of iterative detection comprises: detecting suspicious session objects of the current two-part network topology map, removing the suspicious session user group obtained by detection from the current two-part network topology map, and storing the removed two-part network topology map;
And a union calculating unit 6032, configured to calculate a union among the multiple suspicious session user sub-groups, so as to obtain a suspicious session user group.
In one possible implementation, a plurality of suspicious session user sub-groups are sequentially obtained and are arranged in descending order according to the user risk level; the dialogue content obtaining module 604 specifically includes:
the user risk identification unit 6041 is configured to monitor a session behavior of each suspicious user in the suspicious session user group, respond to detection of initiation of a session by any suspicious user, and determine a user risk level corresponding to the suspicious user according to a suspicious session user sub-group in which the suspicious user is located;
an acquisition policy determining unit 6042, configured to determine a target data acquisition policy according to a user risk level corresponding to the suspicious user; the target data acquisition strategy at least comprises a data acquisition time period and a data acquisition type;
an information acquisition unit 6043 for acquiring dialogue content information of the suspicious user according to the target data acquisition policy.
In one possible implementation, the iteration detecting unit 6031 is specifically configured to:
sequentially carrying out multi-round suspicious session object iterative detection on the two network topological graphs according to preset iteration times to sequentially obtain a plurality of suspicious session user sub-groups; wherein each round of iterative detection comprises: detecting suspicious session objects of the current two-part network topology map, removing the suspicious session user group obtained by detection from the current two-part network topology map, and storing the removed two-part network topology map;
The suspicious session object detection to the current two-part network topological graph comprises the following steps: calculating the node suspicion degree of each network node in the current two-part network topological graph; determining a target network node with the maximum node suspicion according to the node suspicion of each network node; iteratively removing the target network node and the target undirected edge connected with the target network node from the current two-part network topology graph until the two-part network topology graph is empty, and obtaining a plurality of candidate two-part network topology graphs; and calculating the global average suspicion degree of each candidate bipartite network topological graph, and determining the suspicious conversation user group detected at this time according to the candidate bipartite network topological graph with the largest global average suspicion degree.
In one possible implementation, the dialogue content information including the entity recognition result includes dialogue text and dialogue images, and the feature extraction module 606 specifically includes:
a matrix determining unit 6061 for determining text word vector matrix data corresponding to the dialogue text and image picture pixel point matrix data corresponding to the dialogue image;
the convolution calculation unit 6062 is configured to perform convolution calculation on the text word vector matrix data and the image picture pixel point matrix data, to obtain a calculation set;
A feature calculation unit 6063, configured to perform calculation based on the feature extraction network output by the calculation set, to obtain a target dialogue feature; the characteristic extraction network comprises a plurality of line convolution layers or convolution layers and pooling layers.
In one possible implementation, the feature extraction network includes a first network including a first line one-layer pooling layer, a second line two-layer convolution layer, and a third line four-layer convolution layer from left to right, and a second network including a first line convolution layer, a second line two-layer convolution layer, and a third line three-layer convolution layer from left to right;
the feature calculation unit 6063 specifically functions to:
outputting a feature set comprising intent features and emotion features based on the calculation of the first network;
and outputting the psychological behavioral characteristics based on the calculation of the characteristic set by the second network.
Based on the device provided by the embodiment of the application, through establishing a network topology graph among users according to the historical chat records of the users and determining suspicious conversation user groups from the network topology graph by combining an abnormal conversation object detection algorithm, conversation characteristics including intention characteristics, emotion characteristics and psychological behavior characteristics in conversation content information of the groups are monitored and detected, and corresponding risk levels are determined according to the conversation characteristics, so that risk monitoring in a chat conversation process is realized, and the safety of chat conversation is improved; meanwhile, the modularized design enables the hardware of each part of the chat session risk classification device to be focused on the realization of a certain function, the performance of the hardware is maximally realized, meanwhile, the coupling between the modules of the device is reduced due to the modularized design, and the device is more convenient to maintain.
The risk classification apparatus for chat session in the embodiment of the present application is described in detail above in fig. 6 to 7 from the point of view of the modularized functional entity, and the risk classification device for chat session in the embodiment of the present application is described in detail below from the point of view of hardware processing.
Fig. 8 is a schematic structural diagram of a risk classification device for a chat session according to an embodiment of the disclosure, where the risk classification device 800 for the chat session may have a relatively large difference due to different configurations or performances, and may include one or more processors 810 (e.g., one or more processors) and a memory 820, and one or more storage media 830 storing application programs 833 or data 832. Wherein memory 820 and storage medium 830 can be transitory or persistent. The program stored on the storage medium 830 may include one or more modules (not shown), each of which may include a series of instruction operations in the risk stratification device 800 for a chat session. Still further, the processor 810 can be configured to communicate with the storage medium 830 and execute a series of instruction operations in the storage medium 830 on the risk stratification device 800 of the chat session.
The risk stratification device 800 of chat sessions may also include one or more power supplies 840, one or more wired or wireless network interfaces 850, one or more input/output interfaces 860, and/or one or more operating systems 831, such as Windows Server, mac OS X, unix, linux, freeBSD, etc. It will be appreciated by those skilled in the art that the risk stratification device structure of the chat session shown in fig. 8 does not constitute a limitation of the risk stratification device of the chat session, and may comprise more or less components than illustrated, or some components combined, or a different arrangement of components.
The present application also provides a risk ranking device for chat sessions, where the computer device includes a memory and a processor, where the memory stores computer readable instructions that, when executed by the processor, cause the processor to perform the steps of the risk ranking method for chat sessions in the embodiments described above. The present application also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, and may also be a volatile computer readable storage medium, where instructions are stored in the computer readable storage medium, when the instructions are executed on a computer, cause the computer to perform the steps of the risk classification method for chat sessions.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The subject application is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The above embodiments are merely for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (10)

1. A risk ranking method for chat sessions, comprising:
acquiring a plurality of historical dialogue records generated by a chat platform in a preset time period; wherein, each round of history dialogue record at least comprises two user dialogue objects;
constructing two network topological graphs according to each round of history dialogue record; the two network topological diagrams comprise a plurality of network nodes and undirected edges connected with each network node, each network node corresponds to each user dialogue object included in each history dialogue record, and each undirected edge is used for representing the dialogue relation established between each user dialogue object;
suspicious session object detection is carried out on the two network topological graphs based on a preset session object anomaly detection algorithm, so as to obtain suspicious session user groups;
monitoring the conversation behavior of each suspicious user in the suspicious conversation user group, and responding to the detection of any suspicious user to initiate conversation to acquire the conversation content information of the suspicious user;
word segmentation processing is carried out on dialogue texts in the dialogue content information, grammar analysis is carried out according to the word segmentation processing result, and entity recognition is carried out according to the grammar analysis result, so that dialogue content information containing entity recognition results is obtained;
Extracting features of the dialogue content information containing the entity identification result to obtain target dialogue features; the target dialogue features comprise intention features, emotion features and psychological behavior features, wherein the psychological behavior features at least comprise character selection features and word habit features of input texts in the dialogue process and visual prompt features in input images;
inputting the target dialogue characteristics into a preset dialogue risk identification network model for processing to obtain a dialogue risk grade; the session risk identification network model is a hierarchical neural network model which is formed by learning based on a convolutional neural network and has hierarchical classification output.
2. The method for risk classification of chat session according to claim 1, wherein the performing suspicious session object detection on the two network topologies based on a preset session object anomaly detection algorithm to obtain a suspicious session user group includes:
sequentially carrying out multi-round suspicious session object iterative detection on the two network topological graphs according to preset iteration times to sequentially obtain a plurality of suspicious session user sub-groups; wherein each round of iterative detection comprises: detecting suspicious session objects of the current two-part network topology map, removing the suspicious session user group obtained by detection from the current two-part network topology map, and storing the removed two-part network topology map;
And calculating the union among the multiple suspicious session user sub-groups to obtain the suspicious session user group.
3. The risk classification method for chat session according to claim 2, wherein the sub-groups of users who sequentially obtain a plurality of suspicious sessions are arranged in descending order according to the risk level of the users;
the monitoring of the session behavior of each suspicious user in the suspicious session user group, and the responding of detecting any suspicious user to initiate a session, the obtaining of the dialogue content information of the suspicious user comprises the following steps:
monitoring the session behavior of each suspicious user in the suspicious session user group, responding to the detection of any suspicious user to initiate a session, and determining the user risk level corresponding to the suspicious user according to the suspicious session user sub-group where the suspicious user is located;
determining a target data acquisition strategy according to the user risk level corresponding to the suspicious user; the target data acquisition strategy at least comprises a data acquisition time period and a data acquisition type;
and acquiring dialogue content information of the suspicious user according to the target data acquisition strategy.
4. The method for risk ranking chat sessions of claim 2 wherein the suspicious session object detection of the current bipartite network topology comprises:
Calculating the node suspicion degree of each network node in the current two-part network topological graph;
determining a target network node with the maximum node suspicion according to the node suspicion of each network node;
iteratively removing the target network node and the target undirected edge connected with the target network node from the current two-part network topology graph until the two-part network topology graph is empty, and obtaining a plurality of candidate two-part network topology graphs;
and calculating the global average suspicion degree of each candidate bipartite network topological graph, and determining the suspicious conversation user group detected at this time according to the candidate bipartite network topological graph with the largest global average suspicion degree.
5. The risk ranking method of chat session according to claim 1, wherein the dialogue content information containing the entity recognition result includes dialogue text and dialogue image;
the step of extracting the characteristics of the dialogue content information containing the entity identification result, and the step of obtaining the target dialogue characteristics comprises the following steps:
determining text word vector matrix data corresponding to the dialogue text and image picture pixel point matrix data corresponding to the dialogue image;
respectively carrying out convolution calculation on the text word vector matrix data and the image picture pixel point matrix data to obtain a calculation set;
Calculating based on the feature extraction network output by the calculation set to obtain target dialogue features; the characteristic extraction network comprises a plurality of line convolution layers or convolution layer and pooling layers.
6. The method of risk stratification for chat sessions of claim 5 wherein the feature extraction network comprises a first network comprising a first line one pooling layer, a second line two layer convolution layer and a third line four layer convolution layer from left to right and a second network comprising a first line convolution layer, a second line two layer convolution layer and a third line three layer convolution layer from left to right;
the feature extraction network based on the calculation set output calculates, and the obtaining of the target dialogue feature comprises the following steps:
outputting a feature set comprising intent features and emotion features based on the calculation of the first network;
and outputting psychological behavioral characteristics based on the calculation of the characteristic set by the second network.
7. The risk classification method of a chat session according to any one of claims 1-6, wherein the inputting the target dialogue feature into a preset session risk identification network model for processing, after obtaining a session risk class, further includes:
Determining the risk form of the suspicious user according to the session risk level corresponding to the session content information; the risk forms comprise the current system state, the behavior of the user and the running environment of the chat system;
determining a corresponding risk decision strategy according to the risk morphology, and executing risk control according to the risk decision strategy; wherein the risk decision strategy comprises responding, warning and rejecting the request to the user.
8. A risk ranking apparatus for chat sessions, comprising:
the dialogue record acquisition module is used for acquiring a plurality of historical dialogue records generated by the chat platform in a preset time period; wherein, each round of history dialogue record at least comprises two user dialogue objects;
the topology diagram construction module is used for constructing two network topology diagrams according to each round of history dialogue records; the two network topological diagrams comprise a plurality of network nodes and undirected edges connected with each network node, each network node corresponds to each user dialogue object included in each history dialogue record, and each undirected edge is used for representing the dialogue relation established between each user dialogue object;
The session detection module is used for detecting suspicious session objects of the two network topological graphs based on a preset session object anomaly detection algorithm to obtain suspicious session user groups;
the dialogue content acquisition module is used for monitoring the dialogue behaviors of each suspicious user in the suspicious dialogue user group, and acquiring dialogue content information of the suspicious user in response to detecting that any suspicious user initiates a dialogue;
the data processing module is used for word segmentation processing of dialogue text processing in the dialogue content information, carrying out grammar analysis according to the word segmentation processing result and carrying out entity recognition according to the grammar analysis result to obtain the dialogue content information containing the entity recognition result;
the feature extraction module is used for extracting features of the dialogue content information containing the entity identification result to obtain target dialogue features; the target dialogue features comprise intention features, emotion features and psychological behavior features, wherein the psychological behavior features at least comprise character selection features and word habit features of input texts in the dialogue process and visual prompt features in input images;
the risk identification module is used for inputting the target dialogue characteristics into a preset session risk identification network model for processing to obtain a session risk level; the session risk identification network model is a hierarchical neural network model which is formed by learning based on a convolutional neural network and has hierarchical classification output.
9. A risk ranking device for a chat session, the risk ranking device for a chat session comprising: a memory and at least one processor, the memory having instructions stored therein;
the at least one processor invoking the instructions in the memory to cause the risk stratification device of the chat session to perform the steps of the risk stratification method of the chat session of any of claims 1-7.
10. A computer readable storage medium having instructions stored thereon, which when executed by a processor, implement the steps of the risk ranking method of a chat session of any of claims 1-7.
CN202310846524.1A 2023-07-11 2023-07-11 Chat session risk classification method, device, equipment and storage medium Pending CN116561668A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310846524.1A CN116561668A (en) 2023-07-11 2023-07-11 Chat session risk classification method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310846524.1A CN116561668A (en) 2023-07-11 2023-07-11 Chat session risk classification method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116561668A true CN116561668A (en) 2023-08-08

Family

ID=87488389

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310846524.1A Pending CN116561668A (en) 2023-07-11 2023-07-11 Chat session risk classification method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116561668A (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108418809A (en) * 2018-02-07 2018-08-17 平安科技(深圳)有限公司 Chat data processing method, device, computer equipment and storage medium
CN110660074A (en) * 2019-10-10 2020-01-07 北京同创信通科技有限公司 Method for establishing steel scrap grade division neural network model
CN110717455A (en) * 2019-10-10 2020-01-21 北京同创信通科技有限公司 Method for classifying and detecting grades of scrap steel in storage
CN111241822A (en) * 2020-01-03 2020-06-05 北京搜狗科技发展有限公司 Emotion discovery and dispersion method and device under input scene
CN112988991A (en) * 2021-02-04 2021-06-18 支付宝(杭州)信息技术有限公司 Method and system for anti-fraud intervention through man-machine conversation
CN113656652A (en) * 2021-08-31 2021-11-16 平安医疗健康管理股份有限公司 Method, device and equipment for detecting medical insurance violation and storage medium
CN114399379A (en) * 2022-01-11 2022-04-26 平安普惠企业管理有限公司 Artificial intelligence-based collection behavior recognition method, device, equipment and medium
CN114626731A (en) * 2022-03-22 2022-06-14 平安普惠企业管理有限公司 Risk identification method and device, electronic equipment and computer readable storage medium
CN114722199A (en) * 2022-04-06 2022-07-08 平安科技(深圳)有限公司 Risk identification method and device based on call recording, computer equipment and medium
CN114978474A (en) * 2022-05-13 2022-08-30 上海辉禹科技有限公司 Method and system for automatically handling user chat risk level
CN116010551A (en) * 2022-12-12 2023-04-25 百果园技术(新加坡)有限公司 Chat text detection method and device, equipment and medium thereof
CN116156056A (en) * 2022-08-12 2023-05-23 马上消费金融股份有限公司 Call risk processing method and device, electronic equipment and storage medium
CN116320139A (en) * 2023-02-08 2023-06-23 号百信息服务有限公司 Method and device for analyzing wind control management of conversation, electronic equipment and storage medium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108418809A (en) * 2018-02-07 2018-08-17 平安科技(深圳)有限公司 Chat data processing method, device, computer equipment and storage medium
CN110660074A (en) * 2019-10-10 2020-01-07 北京同创信通科技有限公司 Method for establishing steel scrap grade division neural network model
CN110717455A (en) * 2019-10-10 2020-01-21 北京同创信通科技有限公司 Method for classifying and detecting grades of scrap steel in storage
CN111241822A (en) * 2020-01-03 2020-06-05 北京搜狗科技发展有限公司 Emotion discovery and dispersion method and device under input scene
CN112988991A (en) * 2021-02-04 2021-06-18 支付宝(杭州)信息技术有限公司 Method and system for anti-fraud intervention through man-machine conversation
CN113656652A (en) * 2021-08-31 2021-11-16 平安医疗健康管理股份有限公司 Method, device and equipment for detecting medical insurance violation and storage medium
CN114399379A (en) * 2022-01-11 2022-04-26 平安普惠企业管理有限公司 Artificial intelligence-based collection behavior recognition method, device, equipment and medium
CN114626731A (en) * 2022-03-22 2022-06-14 平安普惠企业管理有限公司 Risk identification method and device, electronic equipment and computer readable storage medium
CN114722199A (en) * 2022-04-06 2022-07-08 平安科技(深圳)有限公司 Risk identification method and device based on call recording, computer equipment and medium
CN114978474A (en) * 2022-05-13 2022-08-30 上海辉禹科技有限公司 Method and system for automatically handling user chat risk level
CN116156056A (en) * 2022-08-12 2023-05-23 马上消费金融股份有限公司 Call risk processing method and device, electronic equipment and storage medium
CN116010551A (en) * 2022-12-12 2023-04-25 百果园技术(新加坡)有限公司 Chat text detection method and device, equipment and medium thereof
CN116320139A (en) * 2023-02-08 2023-06-23 号百信息服务有限公司 Method and device for analyzing wind control management of conversation, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110222140B (en) Cross-modal retrieval method based on counterstudy and asymmetric hash
CN105426356B (en) A kind of target information recognition methods and device
CN109284371B (en) Anti-fraud method, electronic device, and computer-readable storage medium
CN106874253A (en) Recognize the method and device of sensitive information
CN111325319B (en) Neural network model detection method, device, equipment and storage medium
CN110401545B (en) Chat group creation method, chat group creation device, computer equipment and storage medium
CN108319888B (en) Video type identification method and device and computer terminal
CN112800225B (en) Microblog comment emotion classification method and system
CN109325422A (en) Expression recognition method, device, terminal and computer readable storage medium
CN114330966A (en) Risk prediction method, device, equipment and readable storage medium
CN111783903A (en) Text processing method, text model processing method and device and computer equipment
CN113094478A (en) Expression reply method, device, equipment and storage medium
CN116561570A (en) Training method, device and equipment for multi-mode model and readable storage medium
CN115588193A (en) Visual question-answering method and device based on graph attention neural network and visual relation
CN113128526B (en) Image recognition method and device, electronic equipment and computer-readable storage medium
CN110674370A (en) Domain name identification method and device, storage medium and electronic equipment
CN114491003A (en) User behavior analysis device, method and equipment based on domain knowledge graph
Thandaga Jwalanaiah et al. Effective deep learning based multimodal sentiment analysis from unstructured big data
CN111859925B (en) Emotion analysis system and method based on probability emotion dictionary
CN114118398A (en) Method and system for detecting target type website, electronic equipment and storage medium
CN110222187B (en) Common activity detection and data sharing method for protecting user privacy
Shome et al. A generalized mechanism beyond NLP for real-time detection of cyber abuse through facial expression analytics
CN116561668A (en) Chat session risk classification method, device, equipment and storage medium
CN116094971A (en) Industrial control protocol identification method and device, electronic equipment and storage medium
CN111611774B (en) Operation and maintenance operation instruction safety analysis method, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination