CN114612246A - Object set identification method and device, computer equipment and storage medium - Google Patents

Object set identification method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN114612246A
CN114612246A CN202111441117.XA CN202111441117A CN114612246A CN 114612246 A CN114612246 A CN 114612246A CN 202111441117 A CN202111441117 A CN 202111441117A CN 114612246 A CN114612246 A CN 114612246A
Authority
CN
China
Prior art keywords
objects
behavior
clustering
time sequence
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111441117.XA
Other languages
Chinese (zh)
Inventor
叶志豪
李晓雯
赵瑞辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Cyber Tianjin Co Ltd
Original Assignee
Tencent Cyber Tianjin Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Cyber Tianjin Co Ltd filed Critical Tencent Cyber Tianjin Co Ltd
Priority to CN202111441117.XA priority Critical patent/CN114612246A/en
Publication of CN114612246A publication Critical patent/CN114612246A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to an object set identification method, an object set identification device, computer equipment and a storage medium, and relates to the technical field of data processing. The method comprises the following steps: obtaining a heterogeneous graph corresponding to each object; extracting respective relation features of each object based on the abnormal image; extracting the respective behavior time sequence characteristics of the objects based on the respective behavior sequences of the objects; clustering each object based on the respective relationship characteristic of each object and the respective behavior time sequence characteristic of each object to obtain at least one object set; a set of target objects is identified from the at least one set of objects. The scheme can accurately cluster the objects, so that the accuracy of the object set which is identified from the clustering result and has the specified interactive behavior is improved.

Description

Object set identification method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to an object set identification method, an object set identification device, a computer device, and a storage medium.
Background
With the continuous improvement of medical insurance systems, medical insurance becomes an important part of the lives of people, but along with the improvement, abnormal medical insurance behaviors occur occasionally, and how to effectively identify the abnormal medical insurance behaviors becomes a urgent need in the industry.
In the related art, usually, based on a user who is found to have abnormal medical insurance behaviors, the attribute feature of the user is extracted, similarity calculation is performed on the attribute feature of the user and the attribute features of other users, and the other users similar to the attribute feature of the user are used as users who may have abnormal medical insurance behaviors, so that the purpose of screening the users who may have abnormal medical insurance behaviors is achieved.
However, according to the scheme, abnormal medical insurance user identification is performed only through the user attribute features, and the situation of false identification is easy to occur, so that the identification accuracy is low.
Disclosure of Invention
The embodiment of the application provides an object set identification method, an object set identification device, computer equipment and a storage medium, which can improve the accuracy of object screening of abnormal behaviors, and the technical scheme is as follows:
in one aspect, a method for identifying a set of objects is provided, where the method includes:
obtaining a heterogeneous graph corresponding to each object, wherein the heterogeneous graph is used for indicating the relationship between each object and each service mechanism;
extracting respective relation features of the objects based on the abnormal image;
acquiring respective behavior sequences of the objects, wherein the behavior sequences comprise behavior records of interaction behaviors between the objects and service organizations; and the behavior records in the behavior sequence are arranged according to a time sequence;
extracting the behavior time sequence characteristics of each object based on the behavior sequence of each object;
clustering the objects based on the respective relationship characteristics of the objects and the respective behavior time sequence characteristics of the objects to obtain at least one object set;
identifying a target set of objects from the at least one set of objects; the target set of objects is a set of objects for which specified interaction behavior exists.
In another aspect, an object set recognition apparatus is provided, the apparatus including:
the heterogeneous graph acquisition module is used for acquiring a heterogeneous graph corresponding to each object, and the heterogeneous graph is used for indicating the relationship between each object and each service mechanism;
the first characteristic acquisition module is used for extracting the respective relation characteristics of each object based on the heteromorphic graph;
the sequence acquisition module is used for acquiring respective behavior sequences of the objects, wherein the behavior sequences comprise behavior records of interaction behaviors between the objects and the service mechanism; and the behavior records in the behavior sequence are arranged according to time sequence;
the second characteristic acquisition module is used for extracting the respective behavior time sequence characteristics of each object based on the respective behavior sequence of each object;
the clustering model is used for clustering each object based on the respective relationship characteristic of each object and the respective behavior time sequence characteristic of each object to obtain at least one object set;
an identification module for identifying a target set of objects from the at least one set of objects; the target object set is a set of objects for which specified interaction behavior exists.
In one possible implementation, the service organization comprises a first type of organization, and a second type of organization; the first type mechanism is used for providing target services corresponding to the interaction behaviors for the object, and the second type mechanism is used for providing resource compensation for resources required by the object to receive the target services;
the abnormal pattern acquisition module is used for acquiring a pattern of the image,
acquiring respective object information of each object, mechanism information of each first type mechanism and mechanism information of each second type mechanism;
generating the abnormal composition based on the respective object information of the respective objects, the mechanism information of the respective first type mechanisms, and the mechanism information of the respective second type mechanisms;
wherein the heterogeneous graph comprises object nodes corresponding to objects, first type mechanism nodes corresponding to the first type mechanisms and second type mechanism nodes corresponding to the second type mechanisms; an edge between the object node and the first type mechanism node to indicate a number of times the first type mechanism provides the target service to an object; an edge between the object node and the second-type authority node is used to indicate that an affiliation exists between the object and the second-type authority.
In one possible implementation, the behavior record is used for indicating that the object accepts the behavior information of the target service provided by the first type mechanism;
wherein the behavior information includes: an organization identification of the first type of organization, an occurrence time of the target service, and a quantity of resources corresponding to the target service.
In one possible implementation, the clustering model is configured to,
splicing the respective relationship characteristics of each object with the respective behavior time sequence characteristics of each object to obtain the respective clustering characteristics of each object;
and clustering based on the respective clustering characteristics of the objects to obtain the at least one object set.
In one possible implementation, the clustering model is configured to,
acquiring a similar object set corresponding to each object through a similar node acceleration library, wherein the similar object set comprises n other objects with the closest similarity distance with the corresponding object; the similarity distance is used for identifying the similarity between the clustering characteristics of the two objects;
and clustering based on the respective similar object sets of the objects and the similarity distances between the objects and the objects in the respective similar object sets to obtain the at least one object set.
In one possible implementation, the clustering model is used to,
acquiring respective object attribute characteristics of each object;
and splicing the respective relationship characteristics of the objects, the respective behavior time sequence characteristics of the objects and the respective object attribute characteristics of the objects to obtain the respective clustering characteristics of the objects.
In a possible implementation manner, the second feature obtaining module is configured to,
inputting the behavior sequence of a target object into a time sequence mining model to obtain the behavior time sequence characteristics of the target object output by the time sequence mining model; the target object is any one of the objects;
wherein the timing mining model comprises at least one of a word-vector model and a converter-based bi-directional coding characterization BERT model.
In one possible implementation manner, the first feature obtaining module is configured to,
inputting the heterogeneous graph into a graph neural network model to obtain the respective relationship characteristics of each object output by the graph neural network;
the graph neural network includes a HinSAGE model.
In one possible implementation, the graph neural network is an attention-based graph neural network.
In one possible implementation manner, the identification module is configured to,
performing anomaly detection on the at least one object set to obtain an abnormal community set in the at least one object set;
matching the at least one object set based on a target rule to obtain a rule matching group set in the at least one object set; the target rules include rules that are satisfied by the presence of a set of objects specifying interaction behavior;
and acquiring intersection of the abnormal community set and the rule matching community set to obtain the target object set.
In one possible implementation manner, the identification module is configured to,
extracting respective community features of the at least one object set;
and inputting the community characteristics of the at least one object set into a community detection model to obtain a community detection result of the community detection model, wherein the community detection result is used for indicating the target object set.
In another aspect, a computer device is provided, the computer device comprising a processor and a memory, the memory having stored therein at least one computer instruction, the at least one computer instruction being loaded and executed by the processor to implement the above object set identification method.
In another aspect, a computer-readable storage medium is provided, in which at least one computer instruction is stored, and the computer instruction is loaded and executed by a processor to implement the above object set identification method.
In another aspect, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the object set identification method provided in the various alternative implementations described above.
The technical scheme provided by the application can comprise the following beneficial effects:
extracting the relation characteristics related to the relation between the object and the service mechanism through a heterogeneous graph, extracting the behavior time sequence characteristics of the object through a time sequence formed by behavior records between the object and the mechanism, clustering the objects by combining the two characteristics, and identifying an object set storing the specified interactive behavior from the object set obtained by clustering. The heterogeneous composition graph can effectively integrate the relation between the objects and the service mechanism, so that different object representations can be more effectively learned, and meanwhile, the time sequence formed by the behavior records of the objects can better represent the behavior similarity between the objects.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
FIG. 1 illustrates a flow chart of an object set identification method illustrated in an exemplary embodiment of the present application;
FIG. 2 illustrates a flow chart of an object set identification method provided by an exemplary embodiment of the present application;
FIG. 3 is a schematic illustration of two types of dots to which the embodiment of FIG. 2 relates;
FIG. 4 is a schematic diagram of an acceleration process involved in the embodiment of FIG. 2;
FIG. 5 is a schematic diagram of an object collection detection framework to which the embodiment shown in FIG. 2 relates;
FIG. 6 is a graph of the visual clustering effect according to the embodiment shown in FIG. 2;
FIG. 7 is a block diagram illustrating an object set recognition apparatus according to an exemplary embodiment of the present application;
FIG. 8 is a block diagram illustrating the structure of a computer device according to an example embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
The embodiment of the application provides a target object set identification method, which can improve the efficiency and accuracy of obtaining abnormal interaction services. The scheme shown in each embodiment of the application can be applied to various scenes such as cloud technology, artificial intelligence and intelligent traffic. For ease of understanding, several terms referred to in this application are explained below.
1) Picture Embedding (Graph Embedding)
Graph embedding is a process of mapping graph data (typically a high-dimensional dense matrix) into low-dense vectors, aiming at representing the nodes of the graph as a low-dimensional vector space, while preserving the topology and node information of the network, so that the existing machine learning algorithm can be directly used in the subsequent graph analysis task.
2) Special-shaped picture
Heterogeneous graphs refer to scenarios in which different types of nodes and edges (at least one of which has multiple types) exist in a graph, which are common in a knowledge graph. The simplest way to handle heterogeneous information is to use one-hot coded type information and concatenate it on the original representation of the node. Heterogeneous graphs are more common than homogeneous graphs in real life, or it can be considered that multiple types of edges (relationships) exist among nodes in the homogeneous graphs, and the different attributes of each edge also influence the distance and the closeness among the nodes.
The purpose of learning the heterogeneous graph representation is to search a meaningful vector representation for each node so as to facilitate subsequent applications such as link prediction, personalized recommendation, node classification and the like. However, this task is difficult to achieve. Because not only the information of the various types of nodes and edges composed of heterogeneous structures needs to be integrated, but also heterogeneous attributes and heterogeneous contents associated with each node need to be considered. Although much work has been done on isomorphic graph embedding, attribute graph embedding, and graph neural networks, few can effectively consider heterogeneous structure (graph) information in conjunction with heterogeneous content information for each node.
3) Artificial Intelligence (AI)
Artificial intelligence is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
4) Machine Learning (Machine Learning, ML)
Machine learning is a multi-field cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.
Fig. 1 shows a flowchart of an object set identification method according to an exemplary embodiment of the present application, where the object set identification method may be executed by a computer device, and the computer device may be implemented as a server or a terminal, and as shown in fig. 1, the object set identification method includes:
and step 110, acquiring a heterogeneous graph corresponding to each object, wherein the heterogeneous graph is used for indicating the relationship between each object and each service mechanism.
The object may refer to a user, such as a medical insurance user.
In this embodiment, the abnormal graph includes nodes corresponding to each service mechanism in addition to nodes corresponding to the object, and an edge exists between a node corresponding to the object and a node corresponding to a service mechanism, which may indicate that a certain relationship exists between the object and the service mechanism.
And step 120, extracting the respective relationship characteristics of each object based on the heteromorphic graph.
In the embodiment of the present application, based on the above-described heteromorphic graph, features related to the relationship between each object and each service organization, that is, the above-described relationship features, may be extracted.
Step 130, acquiring respective behavior sequences of each object, wherein the behavior sequences comprise behavior records of interaction behaviors between the objects and the service mechanism; and the behavior records in the behavior sequence are arranged according to time sequence.
In the embodiment of the application, one interaction behavior occurring between one object and one service mechanism can be identified by one behavior record, and the behavior records corresponding to the past interaction behaviors occurring between the same object and each service mechanism can be arranged according to the time sequence to obtain the behavior sequence of the object.
Step 140, extracting the behavior time sequence characteristics of each object based on the behavior sequence of each object.
In the embodiment of the present application, the behavior time sequence feature may simultaneously represent the interactive behavior occurring between the object and each service mechanism, and the time when the interactive behavior occurs.
And 150, clustering the objects based on the respective relationship characteristics of the objects and the respective behavior time sequence characteristics of the objects to obtain at least one object set.
Each object set may be a user group obtained by clustering.
Step 160, identifying a target object set from at least one object set; the set of target objects is a set of objects for which there is a specified interaction behavior.
The specified interactive behavior may be an abnormal interactive behavior, for example, taking an abnormal behavior group mining scene in the field of medical insurance as an example, and the specified interactive behavior may be an abnormal medical insurance reimbursement behavior.
In summary, in the solution shown in the embodiment of the present application, the relationship features related to the relationship between the object and the service organization are extracted through the heterogeneous graph, the behavior time sequence features of the object are extracted through the time sequence formed by the behavior records between the object and the organization, the two features are combined to cluster each object, and then the object set with the specified interactive behavior is identified from the object set obtained by clustering. The heterogeneous composition graph can effectively integrate the relation between the objects and the service mechanism, so that different object representations can be more effectively learned, and meanwhile, the time sequence formed by the behavior records of the objects can better represent the behavior similarity between the objects.
The scheme shown in the embodiment of the application can be applied to various recognition scenes of object sets with abnormal behaviors.
For example, in a possible implementation manner, the object set identification method provided by the present application may be applied to the field of medical insurance, in which each object is a participant of medical insurance, each service organization is an organization providing medical insurance consumption service, the interaction behavior may be a doctor seeing behavior/medical insurance consumption behavior, and a medical insurance object set having abnormal interaction behavior, for example, a deception and insurance group, may be screened out by the object set identification method provided by the present application, thereby implementing analysis and judgment on the abnormal behavior group.
For another example, the scheme shown in the embodiments of the present application can also have relatively good applications in the real-time supervision of the electronic certificate, for example, regarding the real-time data of the electronic certificate, the framework provided by the scheme shown in the present application for the behavior information and the personal information newly added to the object can effectively perform the mining of abnormal groups.
Or, the object set identification method provided by the present application may also be applied to other fields related to abnormal behavior, such as the field of online waybill monitoring, in which each object is a user with online shopping behavior, each service organization may be an online store, the interaction behavior may be online shopping behavior, and a user group with abnormal behavior (such as waybill) may be screened out by the object set identification method provided by the present application, so as to implement analysis and judgment on a suspected waybill group.
Fig. 2 shows a flowchart of an object set identification method provided in an exemplary embodiment of the present application, where the object set identification method may be executed by a computer device, and the computer device may be implemented as a server or a terminal, and as shown in fig. 2, the target object set identification method includes:
step 210, obtaining a heterogeneous graph corresponding to each object, where the heterogeneous graph is used to indicate a relationship between each object and each service mechanism.
In one possible implementation, the service organization comprises a first type of organization, and a second type of organization; the first type mechanism is used for providing target services corresponding to the interactive behaviors for the object, and the second type mechanism is used for providing resource compensation for resources required by the object to receive the target services;
obtaining a heterogeneous graph corresponding to each object, including:
acquiring respective object information of each object, mechanism information of each first type mechanism and mechanism information of each second type mechanism;
generating an abnormal composition based on the respective object information of each object, the mechanism information of each first type mechanism and the mechanism information of each second type mechanism;
the heterogeneous graph comprises object nodes corresponding to objects, first type mechanism nodes corresponding to first type mechanisms and second type mechanism nodes corresponding to second type mechanisms; the edge between the object node and the first type mechanism node is used for indicating the number of times that the first type mechanism provides the target service to the object; an edge between the object node and the second-type authority node is used to indicate that there is an affiliation between the object and the second-type authority.
Taking an abnormal behavior group mining scene in the field of medical insurance as an example, in the scheme shown in the embodiment of the application, the heterogeneous graph representation is used for integrating the interaction information among individuals, hospitals/doctors and insurance application units, wherein the individuals correspond to the objects, the hospitals/doctors correspond to the first type mechanisms, the interaction behavior corresponds to the treatment behavior, the target service corresponds to the examination and treatment service, the insurance application units correspond to the second type mechanisms, and the resource compensation corresponds to the medical insurance reimbursement service.
And step 220, inputting the heterogeneous graph into the graph neural network model to obtain the respective relationship characteristics of each object output by the graph neural network model.
Taking an abnormal group identification scheme in the medical insurance field as an example, since the heterogeneous graph representation can relatively effectively integrate the connections between different types of nodes, different individual representations (corresponding to the relationship characteristics) can be more effectively learned on a graph with personal information, doctor information/hospital information and institution information through the heterogeneous graph neural network and are used for subsequent group clustering.
In one possible implementation, the graph neural network includes a HinSAGE model.
In the embodiment of the present application, taking an abnormal group identification scheme in the field of medical insurance as an example, the process of creating an abnormal group map is as follows: the method is divided into individual nodes, hospital nodes and unit nodes (corresponding to insurance units) firstly. The node attributes are personal information, hospital information and unit information respectively; the edges between the nodes are the number of times of the individual visiting the hospital and the attribution relationship of the individual and the unit.
The scheme shown in the embodiment of the application can use a HinSAGE model as a graph neural network for heterogeneous characterization extraction, and like the GraphSAGE isomorphic neural network model, HinSAGE also comprises two processes of sampling and feature aggregation, which utilize vertex features (such as text information, vertex information and degrees of vertices), and utilize the topology of each vertex neighbor and the distribution of the vertex features in the neighbor, and finally learn a function, which can be used to generate a feature representation of a vertex that has not been seen (i.e. has not been used as training data). In addition, the HinSAGE model trains a set of aggregation functions (aggregators functions) that can learn how to aggregate feature information from local neighbors of a vertex. During the inference process, the model may utilize these aggregation functions to generate embedded representations (Embedding) for unseen vertices.
Among them, the HinSAGE model employs Heterogeneous mean aggregation (Heterogeneous mean aggregators) in feature aggregation, which is an extension of mean aggregation on a Heterogeneous graph. The HinSAGE model can mainly comprise the following steps when feature aggregation is carried out:
1) firstly, respectively carrying out mean value aggregation on neighbor nodes of different types, multiplying the neighbor nodes by different weight matrixes, and converting the neighbor nodes into the same dimensionality;
2) then summing the characteristics of different types of nodes, and then averaging to obtain a result R1 after neighbor characteristic aggregation;
3) multiplying the feature matrix of the father node by the corresponding weight matrix to obtain a result R2 of the father node;
4) and then splicing the result R1 after neighbor feature aggregation and the result R2 of the parent node to form a new feature matrix of the parent node.
Through the process, the HinSAGE model not only considers the relevant characteristics of the node of the HinSAGE model, but also considers the characteristic information of the neighbor node and the network structure topology information.
In one possible implementation, the graph neural network is an attention-based graph neural network.
In the embodiment of the application, the graph neural network can be improved through an attention mechanism, so that the graph neural network can better learn the relationship between the nodes.
Step 230, acquiring respective behavior sequences of each object, wherein the behavior sequences comprise behavior records of interaction behaviors between the objects and the service organizations; and the behavior records in the behavior sequence are arranged according to time sequence.
In a possible implementation manner, the behavior record is used for indicating that the object receives behavior information of a target service provided by the first type mechanism;
wherein the behavior information includes: an organization identification of the first type of organization, an occurrence time of the target service, and a quantity of resources corresponding to the target service.
For example, taking an abnormal behavior object set mining scene in the medical insurance field as an example, one behavior record of an object may include a hospital where the object visits, the time of the visit, the amount of medical consumption, and the like.
Optionally, the behavior information may also include other information according to the characteristics of the application scenario, for example, also in the field of medical insurance, a behavior record of the subject may further include a diagnosis result, a department, a doctor, and the like.
In this embodiment of the application, the computer device may obtain behavior records of each object in a certain time period (for example, within a year or within a half year), and arrange the behavior records according to a time sequence to obtain the behavior sequence.
For example, taking the mining scene of the abnormal behavior user group in the medical insurance field as an example, the computer device may obtain annual diagnosis and treatment statistical information of each user from a medical institution, extract diagnosis and treatment records at different times from the annual diagnosis and treatment statistical information, arrange the diagnosis and treatment records into behavior records at different time points, and arrange the behavior records according to the time sequence, so as to obtain the behavior sequence of the user.
And 240, extracting the behavior time sequence characteristics of each object based on the behavior sequence of each object.
In a possible implementation manner, extracting the behavior time sequence feature of each object based on the behavior sequence of each object includes:
inputting the behavior sequence of the target object into a time sequence mining model to obtain the behavior time sequence characteristics of the target object output by the time sequence mining model; the target object is any one of the objects;
wherein the time sequence mining model comprises at least one of a word-vector (word2vec) model and a Bidirectional encoding representation from transforms (BERT) model based on a converter.
Taking the mining scene of the abnormal behavior user group in the medical insurance field as an example, the computer equipment can acquire the diagnosis behaviors or the personal annual statistical information of the user, and the word2vec or BERT is utilized to model the diagnosis behavior information of the individual from the time sequence, so that the time sequence diagnosis information of different individuals can be effectively combined, and the individuals with similar diagnosis behaviors can be mined.
Taking an abnormal behavior user group mining scene in the field of medical insurance as an example, for the time sequence information of the user's visit (i.e. the behavior sequence), the scheme shown in the present application can use two ways to perform feature mining, which is similar to the scheme of text processing, and first, the scheme defines Token as psn _ id: hospital time amount, the sequence is the visit sequence of the user. The word2vec model or the BERT model is used for training, which is equivalent to clustering the time-series diagnosis behaviors, and is beneficial to finding out the abnormal behavior information in the set. In order to obtain the time-series visit sequence representation of the individual, the scheme can adopt the following two ways:
1) word2vec based visit sequence representation:
the method for modeling the user behavior sequence by adopting word2vec can be characterized in that the user diagnosis behavior sequence is regarded as a document, each behavior record in the sequence is regarded as a word in the document, the co-occurrence relation between behaviors is used as a context, a skip-gram model (a model for predicting the word in the context under the condition that a target word (a central word, which corresponds to the behavior record in the embodiment of the application) is given) is used for training, the characterization vector of the behavior record can be obtained, and finally, the embedding corresponding to the sequence of the user behavior record is subjected to homography and averaging to obtain the characterization vector of the user (corresponding to the behavior time sequence characteristic).
2) BERT-based visit sequence representation:
in view of the text-based excellent effect of BERT, the scheme shown in the embodiment of the present application may also use BERT to model the behavior sequence of the user.
In the scheme of the embodiment of the application, a Next sequence loss function is removed from a BERT loss function. Next sequence in BERT is the task of Next Sentence prediction introduced to train a model to understand the relationships between sentences. For the behavior sequence of the user in the community mining scene of abnormal behaviors, the loss function (loss) is expressed by judging whether the two behavior sequences are generated by the same user, and the effect in the community mining scene of abnormal behaviors is smaller, so the loss function can be removed in the training process of the BERT model.
In a possible implementation manner, in the step 240, the word2vec model and the BERT model may be used in combination, for example, features output by the word2vec model and the BERT model by the computer device respectively are fused (for example, by splicing, weighted summation, or the like) with corresponding users as respective behavior timing features of the users.
And step 250, splicing the respective relationship characteristics of each object with the respective behavior time sequence characteristics of each object to obtain the respective clustering characteristics of each object.
In this embodiment of the present application, for any object in each object, the computer device may splice the relationship characteristic of the object and the behavior time series characteristic of the object, so as to obtain a clustering characteristic of the object. And performing the above processing on each object to obtain respective clustering characteristics of each object, so as to be used in subsequent clustering.
In a possible implementation manner, the obtaining of the respective clustering feature of each object by splicing the respective relationship feature of each object with the respective behavior time sequence feature of each object includes:
acquiring respective object attribute characteristics of each object;
and splicing the respective relationship characteristics of the objects, the respective behavior time sequence characteristics of the objects and the respective object attribute characteristics of the objects to obtain the respective clustering characteristics of the objects.
In the embodiment of the present application, in addition to considering the relationship between the object and the organization and the behavior sequence of the object, the personal attributes of the object may be considered, for example, taking an abnormal behavior user group mining scenario in the field of medical insurance as an example, the personal attributes of the user may include: a unit of participation, a category of personnel (e.g., enterprise employee, individual household, farmer, etc.), age, gender, etc. And after the clustering characteristics are obtained, object attribute characteristics corresponding to personal attributes can be spliced with the relationship characteristics and behavior time sequence characteristics of the user.
In another possible implementation manner, the object attribute features may also be fused in the time sequence features, that is, the computer device may input the personal attributes as part of personal information corresponding to each object node in the heteromorphic image into the graph neural network for processing, to obtain respective relationship features of each object, where the relationship features include not only the relationship between the object and the mechanism, but also the related features of the personal attributes of the object.
And step 260, clustering is carried out based on the respective clustering characteristics of the objects to obtain at least one object set.
In this embodiment of the present application, the clustering of each object by the computer device may refer to dividing a plurality of objects with high similarity between clustering features in each object into one object set.
For example, the computer device may perform Clustering by a DBSCAN (Density-Based Spatial Clustering of Applications with Noise, Density-Based Noise application Spatial Clustering) Clustering algorithm, which may be as follows:
first three types of points are defined:
1) core point: containing more than the number of MinPts within the radius Eps;
2) boundary points are as follows: points within the radius Eps that are less than MinPts but fall within the neighborhood of the core point;
3) noise point: points that are neither core points nor boundary points.
Please refer to fig. 3, which shows schematic diagrams of two types of dots related to the embodiments of the present application. As shown in fig. 3, the black point in the figure is the boundary point 31, because it is within the radius Eps, and the number of points in the domain of the boundary point does not exceed MinPts, where the MinPts set here can be set by the object, such as 5; the middle white point is the core point 32, and the number of points in the neighborhood exceeds MinPts (5), and the points in the neighborhood are the black boundary points.
The algorithm flow of dbscan is as follows:
1) marking all points as core points, boundary points or noise points;
2) deleting noise points;
3) assigning an edge between all core points having a distance within the Eps;
4) each group of connected core points form a cluster;
5) each boundary point is assigned to a cluster of core points associated therewith (i.e. within the radius of the corresponding core point).
In a possible implementation manner, clustering based on respective clustering features of each object to obtain at least one object set includes:
acquiring a similar object set corresponding to each object through a similar node acceleration library, wherein the similar object set comprises n other objects with the closest similarity distance with the corresponding object; the similarity distance is used for identifying the similarity between the clustering characteristics of the two objects;
clustering is carried out on the basis of the respective similar object set of each object and the similarity distance between each object and the objects in the respective similar object set, and at least one object set is obtained.
Taking the mining scene of the abnormal behavior user group in the medical insurance field as an example, the user volume data monitored by the medical insurance fund is huge, in order to accelerate the clustering algorithm, the scheme shown in the embodiment of the application can accelerate dbscan by using an acceleration library (such as a faiss acceleration library), wherein in the process of searching core points and boundary points in dbscan, the faiss can be used for accelerating the searching speed, and the number of returned results in the search range (range _ search) is counted.
Faiss is a clustering and similarity search library, which is a nearly neighbor search library that is mature at present. The implementation process can include: obtaining vectors, constructing indexes (based on violence, inversion, product quantization and the like), and retrieving query with similar Top K. Please refer to fig. 4, which illustrates a schematic diagram of an acceleration process according to an embodiment of the present application.
As shown in fig. 4, on one hand, the clustering characteristics of the target object are used as search query terms, and are processed by the distillation BERT model 41 to obtain vectorized query term information; on the other hand, the clustering characteristics of other objects in the database are input into the distillation BERT model 42 to obtain the clustering characteristic information of other objects; then, the vectorized query term information and the clustering feature vectors of other objects are input into a faiss acceleration library 43, a faiss index is constructed, K feature vectors closest to the query term information are output through the constructed faiss index, and then search results (that is, K objects with the smallest similarity distance with the target object) are output according to the K feature vectors closest.
Step 270, identifying a target object set from at least one object set; the set of target objects is a set of objects for which there is a specified interaction behavior.
In a possible implementation manner, the identifying a target object set from at least one object set includes:
performing anomaly detection on at least one object set to obtain an abnormal group set in the at least one object set;
matching at least one object set based on a target rule to obtain a rule matching group set in the at least one object set; the target rules include rules that are satisfied by the presence of a set of objects specifying the interaction behavior;
and (5) taking intersection of the abnormal community set and the rule matching community set to obtain a target object set.
In the embodiment of the application, when a target object set with specified interactive behaviors is identified, the abnormal groups are detected by using rule abnormal detection and abnormal detection algorithms respectively, and then when the target object set is determined, the crossed groups detected by using the rule abnormal detection and the abnormal detection algorithms are used as the target object set, so that the object set with abnormal behaviors is more accurately found and identified.
The target rule may include that the relevant attribute/parameter satisfied by the object set having the specified interactive behavior satisfies a parameter threshold, such as that the amount of resource (e.g., annual reimbursement fee) compensated or applying for compensation is greater than a resource amount threshold, and so on.
For example, in the solution shown in the embodiment of the present application, the computer device first detects an object set with abnormal behavior by using an anomaly detection algorithm (such as an independent forest (iforcest) algorithm) and an algorithm based on rule statistics, and then obtains a final highly suspicious target object set by using an intersection of a community detected by the anomaly detection algorithm and a community detected by using a rule. After the highly suspicious target object set is obtained, the object information of the detected target object set can be further submitted to a corresponding department or a manager for subsequent processing.
In one possible implementation, identifying a target object set from at least one object set includes:
extracting respective community features of at least one object set;
and inputting the respective community characteristics of at least one object set into the community detection model to obtain a community detection result of the community detection model, wherein the community detection result is used for indicating the target object set.
In the solution shown in the embodiment of the present application, in terms of target object set detection, other algorithms of machine learning (such as an eXtreme Gradient Boosting (XGB) algorithm) may also be used to identify the target object set.
Optionally, in another possible implementation manner, the group detection model, the anomaly detection algorithm, and the rule detection algorithm may also be used in combination, that is, an intersection is taken from a set of objects respectively detected by the group detection model, the anomaly detection algorithm, and the rule detection algorithm.
Please refer to fig. 5, which illustrates a schematic diagram of a user community detection framework according to an embodiment of the present application. Taking the mining scenario of abnormal behavior user community in the medical insurance field as an example, as shown in fig. 5, the user community detection framework includes a feature extraction component 51, a clustering component 52, and an identification component 53.
The feature extraction component 51 is configured to extract an embedded representation of the abnormal figure, an embedded representation of the user attribute, and a clinic time sequence feature representation of the user.
For example, the feature extraction component 51 includes a feature extraction network model such as a graph neural network model, an attribute extraction model, and a time series mining model; the graph neural network model is used for processing the heterogeneous graph and outputting the embedded representation of the heterogeneous graph; the attribute extraction model is used for processing the attribute characteristics of the user and outputting the embedded representation of the user attribute; the time sequence mining model is used for processing the clinic time sequence of the user and outputting the clinic time sequence characteristic representation of the user.
The clustering module 52 is configured to cluster the users according to the user feature information, such as the embedded representation of the heteromorphic graph of each user, the embedded representation of the user attribute, and the visit timing feature representation of the user, to obtain a clustering result, i.e., a plurality of user groups (object sets).
The above-described identifying component 53 identifies a user group of abnormal behavior in the medical insurance field in combination with abnormality detection and rule judgment.
In the embodiment of the application, the clustering Index in the clustering process may use a contour coefficient, a CH (Calinski-Harabaz) value, a BDI Index (Davies-building Index, Davies-bordetella Index), and the like to quantify the clustering effect, and taking the mining scene of the abnormal behavior user group in the medical insurance field as an example, it is found through calculation that the scheme shown in the embodiment of the application can obtain the scores of the acceptable contour coefficient, the CH value, and the BDI Index for the clustering result of the personal and visit data.
Performing TSNE dimension reduction visualization processing on the clustering result of the scheme shown in the embodiment of the application, that is, after the clustering is completed, reducing the dimension to three dimensions through TSNE (T-Stochastic neighbor Embedding, T distribution-random neighbor Embedding), and visually displaying the clustering effect. Please refer to fig. 6, which illustrates a visual clustering effect graph according to an embodiment of the present application. Because the gathered analogies are more, in order to visually represent, as shown in fig. 6, in the embodiment of the application, clusters 61 and 62 and other small samples are visually displayed, and it can be found through visual observation that in the scheme shown in the embodiment of the application, a clustering algorithm can effectively cluster individual or treatment data.
For different target object sets, the scheme shown in the embodiment of the present application may also use some word cloud modes to label the target object sets (word cloud analysis).
In the solution shown in the embodiment of the present application, after the target object set is obtained, the target object set may be sorted from business to obtain some typical examples as a group with high suspected abnormal behaviors, and then relevant information of the group with high suspected abnormal behaviors is submitted to a next-level processor/part for further investigation/confirmation.
In summary, in the solution shown in the embodiment of the present application, the relationship features related to the relationship between the object and the service organization are extracted through the heterogeneous graph, the behavior time sequence features of the object are extracted through the time sequence formed by the behavior records between the object and the organization, the two features are combined to cluster each object, and then the object set with the specified interactive behavior is identified from the object set obtained by clustering. The heterogeneous composition graph can effectively integrate the relation between the objects and the service mechanism, so that different object representations can be more effectively learned, and meanwhile, the time sequence formed by the behavior records of the objects can better represent the behavior similarity between the objects.
Fig. 7 is a block diagram illustrating an object set recognition apparatus according to an exemplary embodiment of the present application, where the object set recognition apparatus may be applied to a computer device, where the computer device may be implemented as a server or a terminal, and as shown in fig. 7, the object set recognition apparatus includes:
a heterogeneous graph obtaining module 701, configured to obtain a heterogeneous graph corresponding to each object, where the heterogeneous graph is used to indicate a relationship between each object and each service mechanism;
a first feature obtaining module 702, configured to extract respective relationship features of the objects based on the heteromorphic graph;
a sequence obtaining module 703, configured to obtain respective behavior sequences of the objects, where the behavior sequences include behavior records of interaction behaviors between the objects and the service mechanism; and the behavior records in the behavior sequence are arranged according to time sequence;
a second feature obtaining module 704, configured to extract, based on the respective behavior sequence of each object, a behavior time sequence feature of each object;
a clustering model 705, configured to cluster the objects based on the respective relationship characteristics of the objects and the respective behavior time sequence characteristics of the objects to obtain at least one object set;
an identifying module 706 configured to identify a target set of objects from the at least one set of objects; the target set of objects is a set of objects for which specified interaction behavior exists.
In one possible implementation, the service organization comprises a first type of organization, and a second type of organization; the first type mechanism is used for providing target services corresponding to the interaction behaviors for the object, and the second type mechanism is used for providing resource compensation for resources required by the object to receive the target services;
the heterogeneous map obtaining module 701 is configured to,
acquiring respective object information of each object, mechanism information of each first type mechanism and mechanism information of each second type mechanism;
generating the abnormal composition based on the respective object information of the respective objects, the mechanism information of the respective first type mechanisms, and the mechanism information of the respective second type mechanisms;
wherein the heterogeneous graph comprises object nodes corresponding to objects, first type mechanism nodes corresponding to the first type mechanisms and second type mechanism nodes corresponding to the second type mechanisms; an edge between the object node and the first type mechanism node to indicate a number of times the first type mechanism provides the target service to an object; an edge between the object node and the second-type authority node is used to indicate that an affiliation exists between the object and the second-type authority.
In one possible implementation, the behavior record is used for indicating that the object accepts the behavior information of the target service provided by the first type mechanism;
wherein the behavior information includes: an organization identification of the first type of organization, an occurrence time of the target service, and a quantity of resources corresponding to the target service.
In one possible implementation, the clustering model 705 is configured to,
splicing the respective relation characteristics of each object with the respective behavior time sequence characteristics of each object to obtain respective clustering characteristics of each object;
and clustering based on the respective clustering characteristics of the objects to obtain the at least one object set.
In one possible implementation, the clustering model 705 is configured to,
acquiring a similar object set corresponding to each object through a similar node acceleration library, wherein the similar object set comprises n other objects with the closest similarity distance with the corresponding object; the similarity distance is used for identifying the similarity between the clustering characteristics of the two objects;
and clustering based on the respective similar object sets of the objects and the similarity distances between the objects and the objects in the respective similar object sets to obtain the at least one object set.
In one possible implementation, the clustering model 705 is configured to,
acquiring respective object attribute characteristics of each object;
and splicing the respective relationship characteristics of the objects, the respective behavior time sequence characteristics of the objects and the respective object attribute characteristics of the objects to obtain the respective clustering characteristics of the objects.
In one possible implementation manner, the second feature obtaining module 704 is configured to,
inputting the behavior sequence of a target object into a time sequence mining model to obtain the behavior time sequence characteristics of the target object output by the time sequence mining model; the target object is any one of the objects;
wherein the timing mining model comprises at least one of a word-vector model and a converter-based bi-directional coding characterization BERT model.
In one possible implementation manner, the first feature obtaining module 702 is configured to,
inputting the heterogeneous graph into a graph neural network model to obtain the respective relationship characteristics of each object output by the graph neural network;
the graph neural network includes a HinSAGE model.
In one possible implementation, the graph neural network is an attention-based graph neural network.
In one possible implementation, the identifying module 706 is configured to,
performing anomaly detection on the at least one object set to obtain an abnormal community set in the at least one object set;
matching the at least one object set based on a target rule to obtain a rule matching group set in the at least one object set; the target rules include rules that are satisfied by the presence of a set of objects specifying interaction behavior;
and acquiring intersection of the abnormal community set and the rule matching community set to obtain the target object set.
In one possible implementation, the identifying module 706 is configured to,
extracting respective community features of the at least one object set;
and inputting the respective community characteristics of the at least one object set into a community detection model to obtain a community detection result of the community detection model, wherein the community detection result is used for indicating the target object set.
In summary, in the solution shown in the embodiment of the present application, the relationship characteristic related to the relationship between the object and the service organization is extracted through the heterogeneous graph, the behavior time sequence characteristic of the object is extracted through the time sequence formed by the behavior records between the object and the service organization, the two characteristics are combined to cluster each object, and then the object set with the specified interactive behavior is identified from the object set obtained by clustering. The heterogeneous composition graph can effectively integrate the relation between the objects and the service mechanism, so that different object representations can be more effectively learned, and meanwhile, the time sequence formed by the behavior records of the objects can better represent the behavior similarity between the objects.
Fig. 8 illustrates a block diagram of a computer device 800 according to an exemplary embodiment of the present application. The computer device may be implemented as a server in the above-mentioned aspects of the present application. The computer apparatus 800 includes a Central Processing Unit (CPU) 801, a system Memory 804 including a Random Access Memory (RAM) 802 and a Read-Only Memory (ROM) 803, and a system bus 805 connecting the system Memory 804 and the CPU 801. The computer device 800 further includes a mass storage device 806 for storing an operating system 809, application programs 810 and other program modules 811.
The mass storage device 806 is connected to the central processing unit 801 through a mass storage controller (not shown) connected to the system bus 805. The mass storage device 806 and its associated computer-readable media provide non-volatile storage for the computer device 800. That is, the mass storage device 806 may include a computer-readable medium (not shown) such as a hard disk or Compact Disc-Only Memory (CD-ROM) drive.
Without loss of generality, the computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), flash Memory or other solid state Memory technology, CD-ROM, Digital Versatile Disks (DVD), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage, or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. The system memory 804 and mass storage device 806 as described above may be collectively referred to as memory.
The computer device 800 may also operate as a remote computer connected to a network via a network, such as the internet, in accordance with various embodiments of the present disclosure. That is, the computer device 800 may be connected to the network 808 through the network interface unit 807 attached to the system bus 805, or may be connected to another type of network or remote computer system (not shown) using the network interface unit 807.
The memory further comprises at least one instruction, at least one program, a code set, or a set of instructions, and the at least one computer instruction is stored in the memory, and the central processing unit 801 executes the at least one computer instruction to implement all or part of the steps of the object set identification method according to the embodiments.
In an exemplary embodiment, a computer-readable storage medium is further provided for storing at least one computer instruction, which is loaded and executed by a processor to implement all or part of the steps of the above-mentioned object set identification method. For example, the computer readable storage medium may be a read-only memory, a random-access memory, a read-only optical disc, a magnetic tape, a floppy disk, an optical data storage device, and so forth.
In an exemplary embodiment, a computer program product or a computer program is also provided, which comprises computer instructions, which are stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to cause the computer device to perform all or part of the steps of the method shown in the above embodiments.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (10)

1. A method for identifying a set of objects, the method comprising:
obtaining a heterogeneous graph corresponding to each object, wherein the heterogeneous graph is used for indicating the relationship between each object and each service mechanism;
extracting respective relation features of the objects based on the abnormal image;
acquiring respective behavior sequences of the objects, wherein the behavior sequences comprise behavior records of interaction behaviors between the objects and service organizations; and the behavior records in the behavior sequence are arranged according to time sequence;
extracting the respective behavior time sequence characteristics of the objects based on the respective behavior sequences of the objects;
clustering the objects based on the respective relationship characteristics of the objects and the respective behavior time sequence characteristics of the objects to obtain at least one object set;
identifying a target set of objects from the at least one set of objects; the target object set is a set of objects for which specified interaction behavior exists.
2. The method of claim 1, wherein the service organization comprises a first type of organization, and a second type of organization; the first type mechanism is used for providing target services corresponding to the interaction behaviors for the object, and the second type mechanism is used for providing resource compensation for resources required by the object to receive the target services;
the obtaining of the heterogeneous graph corresponding to each object includes:
acquiring respective object information of each object, mechanism information of each first type mechanism and mechanism information of each second type mechanism;
generating the abnormal composition based on the respective object information of the respective objects, the mechanism information of the respective first type mechanisms, and the mechanism information of the respective second type mechanisms;
wherein the heterogeneous graph comprises object nodes corresponding to objects, first type mechanism nodes corresponding to the first type mechanisms and second type mechanism nodes corresponding to the second type mechanisms; an edge between the object node and the first type mechanism node to indicate a number of times the first type mechanism provides the target service to an object; an edge between the object node and the second-type authority node is used to indicate that an affiliation exists between the object and the second-type authority.
3. The method of claim 2, wherein the behavior record is used to indicate that the object accepts the behavior information of the target service provided by the first type of mechanism;
wherein the behavior information includes: an organization identification of the first type of organization, an occurrence time of the target service, and a quantity of resources corresponding to the target service.
4. The method according to claim 1, wherein the clustering the objects based on the relationship features of the objects and the behavior time-series features of the objects to obtain at least one object set comprises:
splicing the respective relationship characteristics of each object with the respective behavior time sequence characteristics of each object to obtain the respective clustering characteristics of each object;
and clustering based on the respective clustering characteristics of the objects to obtain the at least one object set.
5. The method according to claim 4, wherein the obtaining the clustering feature of each object by splicing the relationship feature of each object with the behavior time sequence feature of each object comprises:
acquiring respective object attribute characteristics of each object;
and splicing the respective relationship characteristics of the objects, the respective behavior time sequence characteristics of the objects and the respective object attribute characteristics of the objects to obtain the respective clustering characteristics of the objects.
6. The method according to claim 1, wherein the extracting the behavior time-series feature of each object based on the behavior sequence of each object comprises:
inputting the behavior sequence of a target object into a time sequence mining model to obtain the behavior time sequence characteristics of the target object output by the time sequence mining model; the target object is any one of the objects;
wherein the time sequence mining model comprises at least one of a word-vector model and a bidirectional code characterization BERT model based on a converter.
7. An apparatus for identifying a set of objects, the apparatus comprising:
the heterogeneous graph acquisition module is used for acquiring a heterogeneous graph corresponding to each object, and the heterogeneous graph is used for indicating the relationship between each object and each service mechanism;
the first characteristic acquisition module is used for extracting the respective relation characteristics of each object based on the heteromorphic graph;
the sequence acquisition module is used for acquiring respective behavior sequences of the objects, wherein the behavior sequences comprise behavior records of interaction behaviors between the objects and the service mechanism; and the behavior records in the behavior sequence are arranged according to time sequence;
the second characteristic acquisition module is used for extracting the respective behavior time sequence characteristics of each object based on the respective behavior sequence of each object;
the clustering model is used for clustering each object based on the respective relationship characteristic of each object and the respective behavior time sequence characteristic of each object to obtain at least one object set;
an identification module for identifying a target set of objects from the at least one set of objects; the target object set is a set of objects for which specified interaction behavior exists.
8. A computer device comprising a processor and a memory, the memory storing at least one computer instruction that is loaded and executed by the processor to implement the object set identification method of any of claims 1 to 6.
9. A computer-readable storage medium, in which at least one computer program is stored, which is loaded and executed by a processor to implement the object set identification method according to any one of claims 1 to 6.
10. A computer program product comprising computer instructions for execution by a processor for implementing an object set identification method as claimed in any one of claims 1 to 6.
CN202111441117.XA 2021-11-30 2021-11-30 Object set identification method and device, computer equipment and storage medium Pending CN114612246A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111441117.XA CN114612246A (en) 2021-11-30 2021-11-30 Object set identification method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111441117.XA CN114612246A (en) 2021-11-30 2021-11-30 Object set identification method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114612246A true CN114612246A (en) 2022-06-10

Family

ID=81858044

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111441117.XA Pending CN114612246A (en) 2021-11-30 2021-11-30 Object set identification method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114612246A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116541731A (en) * 2023-05-26 2023-08-04 北京百度网讯科技有限公司 Processing method, device and equipment of network behavior data

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116541731A (en) * 2023-05-26 2023-08-04 北京百度网讯科技有限公司 Processing method, device and equipment of network behavior data

Similar Documents

Publication Publication Date Title
CN113822494B (en) Risk prediction method, device, equipment and storage medium
CN112949786B (en) Data classification identification method, device, equipment and readable storage medium
US20240203599A1 (en) Method and system of for predicting disease risk based on multimodal fusion
CN113420152A (en) Service processing method, device and system based on fuzzy logic
CN111242948A (en) Image processing method, image processing device, model training method, model training device, image processing equipment and storage medium
CN111696656B (en) Doctor evaluation method and device of Internet medical platform
CN113821668A (en) Data classification identification method, device, equipment and readable storage medium
CN112258250A (en) Target user identification method and device based on network hotspot and computer equipment
CN116340793A (en) Data processing method, device, equipment and readable storage medium
CN116188953A (en) Medical image data processing method, system and electronic equipment for realizing data security
CN115222443A (en) Client group division method, device, equipment and storage medium
CN112580616B (en) Crowd quantity determination method, device, equipment and storage medium
CN114612246A (en) Object set identification method and device, computer equipment and storage medium
CN113762973A (en) Data processing method and device, computer readable medium and electronic equipment
CN113705293A (en) Image scene recognition method, device, equipment and readable storage medium
CN116741396A (en) Article classification method and device, electronic equipment and storage medium
CN114973107B (en) Unsupervised cross-domain video action identification method based on multi-discriminator cooperation and strong and weak sharing mechanism
CN116958622A (en) Data classification method, device, equipment, medium and program product
Townsend et al. Discovering visual concepts and rules in convolutional neural networks
CN116910341A (en) Label prediction method and device and electronic equipment
CN114357242A (en) Training evaluation method and device based on recall model, equipment and storage medium
CN113888265A (en) Product recommendation method, device, equipment and computer-readable storage medium
CN114022698A (en) Multi-tag behavior identification method and device based on binary tree structure
Weber et al. Less is more: The influence of pruning on the explainability of cnns
CN111552827A (en) Labeling method and device, and behavior willingness prediction model training method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination