CN112347316A - GraphSAGE-based bad preference behavior detection method and device and electronic equipment - Google Patents

GraphSAGE-based bad preference behavior detection method and device and electronic equipment Download PDF

Info

Publication number
CN112347316A
CN112347316A CN202011134335.4A CN202011134335A CN112347316A CN 112347316 A CN112347316 A CN 112347316A CN 202011134335 A CN202011134335 A CN 202011134335A CN 112347316 A CN112347316 A CN 112347316A
Authority
CN
China
Prior art keywords
node
user
graphsage
neighbor
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011134335.4A
Other languages
Chinese (zh)
Inventor
陈雪清
孙涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Qiyue Information Technology Co Ltd
Original Assignee
Shanghai Qiyue Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Qiyue Information Technology Co Ltd filed Critical Shanghai Qiyue Information Technology Co Ltd
Priority to CN202011134335.4A priority Critical patent/CN112347316A/en
Publication of CN112347316A publication Critical patent/CN112347316A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method and a device for detecting unfavorable hobby behaviors based on GraphSAGE, and electronic equipment, wherein the method comprises the following steps: generating node information of a graph data structure based on historical user data, wherein the node information represents behavior characteristics of a user, and the behavior characteristics comprise specified labels for representing whether the user has bad preferences; adding a new user without a specified label as a new node into a graph data structure, and training a GraphSAGE model by adopting a GraphSAGE algorithm to predict the value of the specified label of the user; predicting whether the new user has bad taste according to the value of the assigned label. The invention has good identification degree for bad and good guests such as gambling, and can improve the precision and reduce the false rejection rate. Meanwhile, the invention does not need to retrain the huge relational network in a time-consuming manner, can reduce the time consumed by calculation, reduces the bearing pressure of the server and realizes the real-time calculation of the large relational network.

Description

GraphSAGE-based bad preference behavior detection method and device and electronic equipment
Technical Field
The invention relates to the technical field of computer information processing, in particular to a poor hobby behavior detection method and device based on GraphSAGE, electronic equipment and a computer readable medium.
Background
In financial wind control systems, users with poor hobbies such as gambling behaviour and tendencies are at high risk, since most of them are people with very high probability of suffering from hunger and thirst and bad accounts.
The prior art has two main ways to identify users with poor gambling hobbies: firstly, the user with bad taste is identified and marked through manual electric checking and investigation. And secondly, probability deduction is carried out on people with bad preference through graph calculation and a model algorithm, and marking is carried out on high-probability users.
The examination and approval timeliness is influenced by fewer cases which can be manually processed every day in the manual and investigation method. For the second method, in the conventional graph neural network, since the whole graph needs to be recalculated after a new node is added, real-time calculation is impossible, and a problem that a user cannot make a decision in real time is caused. And both of these ways are not highly accurate for the identification of a poor hobby user.
Disclosure of Invention
The invention aims to solve the technical problems of poor timeliness and low accuracy of identifying bad hobby users in the prior art.
In order to solve the technical problem, a first aspect of the present invention provides a method for detecting adverse hobby behaviors based on GraphSAGE, including:
generating node information of a graph data structure based on historical user data, wherein the node information represents behavior characteristics of a user, and the behavior characteristics comprise specified labels for representing whether the user has bad preferences;
adding a new user without a specified label as a new node into a graph data structure, and training a GraphSAGE model by adopting a GraphSAGE algorithm to predict the value of the specified label of the user;
predicting whether the new user has bad taste according to the value of the assigned label.
According to a preferred embodiment of the present invention, the training of the graphcage model using the graphcage algorithm includes:
sampling a plurality of neighbor nodes containing new neighbor nodes of the nodes in each search layer;
obtaining the embedding of the node on the kth layer according to the embedding of the neighbor node sampled on the kth-1 layer by the node;
and k is the number of the search layers.
According to a preferred embodiment of the present invention, the sampling a plurality of neighbor nodes including a new neighbor node of a node in each search layer comprises:
searching a new neighbor node of the node in each search layer;
sampling a plurality of neighbor nodes in each search layer;
wherein the plurality of neighbor nodes includes all new neighbor nodes in the search layer.
According to a preferred embodiment of the present invention, the obtaining embedding of the node at the kth layer according to embedding of the neighbor node sampled at the kth-1 layer by the node includes:
aggregating the neighbor aggregation characteristics of the nodes on the k-1 layer through aggregation functions;
and splicing the neighbor aggregation characteristics of the node on the kth layer with the embedding of the node on the kth-1 layer neighbor node, and obtaining the embedding of the node on the kth layer through full connection layer conversion.
According to a preferred embodiment of the present invention, the aggregation function is LSTM, and when aggregating, the vector sets of adjacent nodes are input to LSTM according to a predetermined order.
According to a preferred embodiment of the present invention, the training of the graphcage model using the graphcage algorithm further comprises:
and performing back propagation through gradient descent to optimize parameters and parameters in the aggregation function.
According to a preferred embodiment of the present invention, the new neighbor node is a newly added neighbor node or a neighbor node whose node information is updated.
In order to solve the above technical problem, a second aspect of the present invention provides a poor taste behavior detection apparatus based on GraphSAGE, the apparatus comprising:
the generation module is used for generating node information of the graph data structure based on historical user data, the node information represents the behavior characteristics of the user, and the behavior characteristics comprise a designated label used for representing whether the user has bad taste;
the training module is used for adding a new user without a specified label into the graph data structure as a new node, and adopting a GraphSAGE algorithm to train a GraphSAGE model to predict the value of the specified label of the user;
and the prediction module is used for predicting whether the new user has bad taste according to the value of the specified label.
According to a preferred embodiment of the invention, the training module comprises:
the sampling module is used for sampling a plurality of neighbor nodes containing the new neighbor nodes of the nodes in each search layer;
the aggregation module is used for obtaining the embedding of the node on the kth layer according to the embedding of the neighbor node sampled on the kth-1 layer by the node;
and k is the number of the search layers.
According to a preferred embodiment of the present invention, the sampling module comprises:
the searching module is used for searching a new neighbor node of the node in each searching layer;
the sub-sampling module is used for sampling a plurality of neighbor nodes in each search layer;
wherein the plurality of neighbor nodes includes all new neighbor nodes in the search layer.
According to a preferred embodiment of the present invention, the aggregation module comprises:
the first aggregation module is used for aggregating the imbedding of the neighbor nodes of the nodes on the k-1 layer through the aggregation function to obtain the neighbor aggregation characteristics of the nodes on the k layer;
and the second aggregation module is used for splicing the neighbor aggregation characteristics of the node on the kth layer with the embedding of the node on the kth-1 layer neighbor node, and obtaining the embedding of the node on the kth layer through full connection layer conversion.
According to a preferred embodiment of the present invention, the aggregation function is LSTM, and when aggregating, the vector sets of adjacent nodes are input to LSTM according to a predetermined order.
According to a preferred embodiment of the present invention, the training module further comprises:
and the optimization module is used for performing back propagation through gradient descent, and optimizing parameters and parameters in the aggregation function.
According to a preferred embodiment of the present invention, the new neighbor node is a newly added neighbor node or a neighbor node whose node information is updated.
To solve the above technical problem, a third aspect of the present invention provides an electronic device, comprising:
a processor; and
a memory storing computer executable instructions that, when executed, cause the processor to perform the method described above.
In order to solve the above technical problem, a fourth aspect of the present invention proposes a computer-readable storage medium, wherein the computer-readable storage medium stores one or more programs that, when executed by a processor, implement the above method.
Generating node information of a graph data structure based on historical user data, wherein the node information represents behavior characteristics of a user, and the behavior characteristics comprise an appointed label for representing whether the user has bad taste; adding a new user without a specified label as a new node into a graph data structure, and training a GraphSAGE model by adopting a GraphSAGE algorithm to predict the value of the specified label of the user; predicting whether the new user has bad taste according to the value of the assigned label. According to the method, the aggregation information of the K search depth of the sampling node is stored after the GraphSAGE model is trained, time-consuming retraining on a huge relational network is not needed, the load of a server can be greatly reduced, and meanwhile, the real-time calculation of the user graph relational network is possible. On the other hand, by adopting the GraphSAGE algorithm, neighbor nodes with data updating are preferentially considered in node sampling, and meanwhile, the node aggregation mode is also adopted, and the LSTM has the advantages that the time sequence of data is considered, and the common place is shared with the predicted bad preference behavior infection of the user, so that the node aggregation is in a certain sequence. By the improvement, fresher data is obtained to obtain higher precision, and the misjudgment rate is reduced.
In conclusion, the invention has good identification degree for bad and good guests such as gambling, and can improve the precision, reduce the false rejection rate and reduce the flow loss. Meanwhile, the invention does not need to retrain the huge relational network in a time-consuming manner, can greatly reduce the time consumed by calculation, and reduce the bearing pressure of the server, so that the real-time calculation of the large relational network becomes possible.
Drawings
In order to make the technical problems solved by the present invention, the technical means adopted and the technical effects obtained more clear, the following will describe in detail the embodiments of the present invention with reference to the accompanying drawings. It should be noted, however, that the drawings described below are only illustrations of exemplary embodiments of the invention, from which other embodiments can be derived by those skilled in the art without inventive step.
FIG. 1 is a schematic flow chart of the poor taste behavior detection method based on GraphSAGE according to the present invention;
FIG. 2 is a schematic flow chart of the step of training the GraphSAGE model using the GraphSAGE algorithm according to the present invention;
FIG. 3 is a schematic structural framework diagram of a poor hobby behavior detection device based on GraphSAGE according to the present invention;
FIG. 4 is a block diagram of an exemplary embodiment of an electronic device in accordance with the present invention;
FIG. 5 is a schematic diagram of one embodiment of a computer-readable medium of the present invention.
Detailed Description
Exemplary embodiments of the present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which exemplary embodiments of the invention may be embodied in many specific forms, and should not be construed as limited to the embodiments set forth herein. Rather, these exemplary embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the invention to those skilled in the art.
The structures, properties, effects or other characteristics described in a certain embodiment may be combined in any suitable manner in one or more other embodiments, while still complying with the technical idea of the invention.
In describing particular embodiments, specific details of structures, properties, effects, or other features are set forth in order to provide a thorough understanding of the embodiments by one skilled in the art. However, it is not excluded that a person skilled in the art may implement the invention in a specific case without the above-described structures, performances, effects or other features.
The flow chart in the drawings is only an exemplary flow demonstration, and does not represent that all the contents, operations and steps in the flow chart are necessarily included in the scheme of the invention, nor does it represent that the execution is necessarily performed in the order shown in the drawings. For example, some operations/steps in the flowcharts may be divided, some operations/steps may be combined or partially combined, and the like, and the execution order shown in the flowcharts may be changed according to actual situations without departing from the gist of the present invention.
The block diagrams in the figures generally represent functional entities and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The same reference numerals denote the same or similar elements, components, or parts throughout the drawings, and thus, a repetitive description thereof may be omitted hereinafter. It will be further understood that, although the terms first, second, third, etc. may be used herein to describe various elements, components, or sections, these elements, components, or sections should not be limited by these terms. That is, these phrases are used only to distinguish one from another. For example, a first device may also be referred to as a second device without departing from the spirit of the present invention. Furthermore, the term "and/or", "and/or" is intended to include all combinations of any one or more of the listed items.
In the present invention, GraphSAGE (Graph Sample and aggregate) is a graph convolution neural network. Is a neural network that operates directly on the graph structure. GraphSAGE learns a function that generates embeddings by sampling and aggregating features in the local neighborhood of nodes. Specifically, graphpage trains each embedding (embedding) of all nodes, and also trains an aggregation function, which generates embedding by sampling and collecting features from the nodes' neighbors.
Referring to fig. 1, fig. 1 is a flowchart of a method for detecting adverse preference behavior based on GraphSAGE according to the present invention, as shown in fig. 1, the method includes:
s1, generating node information of the graph data structure based on the historical user data,
in the invention, the node information represents the behavior characteristics of the user, and the behavior characteristics comprise an appointed label for representing whether the user has bad preference; the unfavorable taste may be a taste which is liable to cause liability such as gambling and enthusiasm, or a taste which is harmful to the body such as smoking and alcoholism.
Specifically, an adjacency list of the graph data structure may be obtained according to the relationship data between users in the historical user data to store neighbor node information of all nodes, where the adjacency list is recorded in the form of a dictionary in python, such as { 'a': [ 'b', 'c','d' ] } denotes that the first-order neighbors of node a are b, c, d.
S2, adding a new user without a specified label into the graph data structure as a new node, and adopting a GraphSAGE algorithm to train a GraphSAGE model to predict the value of the specified label of the user;
illustratively, new users may be added as nodes to the graph data structure based on the relationship between the new user data and the historical user data.
In the method, in the process of training the GraphSAGE model by adopting the GraphSAGE algorithm, neighbor nodes with data updating are preferentially considered in node sampling, and meanwhile, the node aggregation mode is also given to the LSTM, the advantage of the LSTM is considered to be that the time sequence of the data is common to the predicted bad preference behavior infection of the user, so that the node aggregation has a certain sequence. By the improvement, fresher data is obtained to obtain higher precision, and the misjudgment rate is reduced.
Specifically, as shown in fig. 2, the training of the graphcage model by using the graphcage algorithm includes:
and S20, setting the node search depth.
In the present invention, the search depth refers to the number of layers in the search map data structure, and is denoted by K. Considering that the 2-hop depth is compared with 1-hop, the performance of the model is improved, but if the K is set to exceed 2, the performance of the model is improved to a limited extent, but the calculation time is increased in an exponential level, so that the node search depth K is set to be 2.
S21, sampling a plurality of neighbor nodes containing the new neighbor nodes of the nodes in each search layer;
in the invention, all the neighbor nodes do not participate in the calculation, but a random sampling mode is adopted, and random sampling is carried out after the new neighbor nodes are considered preferentially. And the new neighbor node is a newly added neighbor node or a neighbor node with updated node information.
Illustratively, in this step, a new neighbor node of a node in each search layer is searched first; sampling a plurality of neighbor nodes in each search layer; wherein the plurality of neighbor nodes includes all new neighbor nodes in the search layer. Specifically, a new neighbor node is searched for first in a first search layer, i.e., a one-hop neighbor node, and then a first preset number of neighbor nodes including the new neighbor node are sampled, and a new neighbor node is searched for first in a second search layer, i.e., a two-hop neighbor node, and then a second preset number of neighbor nodes including the new neighbor node are sampled. Preferably, the first preset number is 20, and the second preset number is 10.
S22, acquiring embedding of the node at the kth layer according to embedding of the neighbor node sampled at the kth-1 layer by the node;
specifically, neighbor aggregation characteristics of a node on a k-1 layer are obtained through aggregation of imbedding of neighbor nodes of the node on a k-1 layer by an aggregation function; and splicing the neighbor aggregation characteristics of the node on the kth layer with the embedding of the node on the kth-1 layer neighbor node, and obtaining the embedding of the node on the kth layer through full connection layer conversion. And k is the number of the search layers.
In the present invention, the aggregation function is preferably Long Short-Term Memory (LSTM). Among them, LSTM is a time-cycled neural network suitable for processing and predicting significant events of very long intervals and delays in time series.
LSTM aggregation in the original algorithm requires that the vector sets of neighboring nodes be randomly shuffled and then used as input to the LSTM, so that the graph data structure conforms to the properties of symmetric. However, it is an order to find that infection is actually directional when the infection of bad taste such as gambling is studied. So at the time of aggregation, a set of vectors of adjacent nodes is input to the LSTM according to a predetermined order. The predetermined order may be a chronological order of the applications. Specifically, the nodes are sorted according to the generated time sequence in the graph data structure, and the LSTM is fed into the graph data structure according to the time sequence during the aggregation.
Further, to optimize the GraphSAGE model, back propagation may be performed by gradient descent, optimizing parameters and intra-aggregation-function parameters.
The aggregation information of the K search depth of the sampling node is stored after the GraphSAGE model is trained, if a newly added node only needs to input the characteristics of the new node and the adjacent nodes, the vector representation can be obtained, and the time-consuming retraining of the huge relationship network is not needed. And updating graph data according to the trained GraphSAGE model, and then, the appointed label value of the target user can be predicted in a forward propagation mode.
And S3, predicting whether the new user has bad taste according to the value of the assigned label.
In one example, the value of the designated label represents a probability that the new user has an undesirable taste, and the new user has an undesirable taste when the probability value of the designated label is greater than a preset value.
Fig. 3 is a schematic structural diagram of a poor taste behavior detection device based on GraphSAGE according to the present invention, and as shown in fig. 3, the device includes:
a generating module 31, configured to generate node information of the graph data structure based on historical user data, where the node information represents a behavior feature of a user, and the behavior feature includes a specific tag used for characterizing whether the user has a bad preference;
a training module 32, configured to add a new user without a specified label as a new node into the graph data structure, and train a graphcage model using a graphcage algorithm to predict a value of the specified label of the user;
a prediction module 33, configured to predict whether the new user has bad taste according to the value of the assigned tag.
In one embodiment, the training module 32 includes:
a sampling module 321, configured to sample multiple neighbor nodes including a new neighbor node of a node in each search layer; the new neighbor node is a newly added neighbor node or a neighbor node with updated node information.
The aggregation module 322 is configured to obtain the embedding of the node on the kth layer according to the embedding of the neighbor node sampled by the node on the kth-1 layer;
and k is the number of the search layers.
The sampling module 321 includes:
the searching module is used for searching a new neighbor node of the node in each searching layer;
the sub-sampling module is used for sampling a plurality of neighbor nodes in each search layer;
wherein the plurality of neighbor nodes includes all new neighbor nodes in the search layer.
The aggregation module 322 includes:
the first aggregation module is used for aggregating the imbedding of the neighbor nodes of the nodes on the k-1 layer through the aggregation function to obtain the neighbor aggregation characteristics of the nodes on the k layer;
and the second aggregation module is used for splicing the neighbor aggregation characteristics of the node on the kth layer with the embedding of the node on the kth-1 layer neighbor node, and obtaining the embedding of the node on the kth layer through full connection layer conversion.
In a preferred embodiment, the aggregation function is LSTM, and when aggregating, the set of vectors of adjacent nodes is input to LSTM according to a predetermined order.
Further, the training module 32 further includes:
and an optimization module 323 for back propagation by gradient descent, optimizing the parameters and the intra-aggregation-function parameters.
Those skilled in the art will appreciate that the modules in the above-described embodiments of the apparatus may be distributed as described in the apparatus, and may be correspondingly modified and distributed in one or more apparatuses other than the above-described embodiments. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.
In the following, embodiments of the electronic device of the present invention are described, which may be regarded as an implementation in physical form for the above-described embodiments of the method and apparatus of the present invention. Details described in the embodiments of the electronic device of the invention should be considered supplementary to the embodiments of the method or apparatus described above; for details which are not disclosed in embodiments of the electronic device of the invention, reference may be made to the above-described embodiments of the method or the apparatus.
Fig. 4 is a block diagram of an exemplary embodiment of an electronic device according to the present invention. The electronic device shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 4, the electronic device 400 of the exemplary embodiment is represented in the form of a general-purpose data processing device. The components of electronic device 400 may include, but are not limited to: at least one processing unit 410, at least one memory unit 420, a bus 430 connecting different electronic device components (including the memory unit 420 and the processing unit 410), a display unit 440, and the like.
The storage unit 420 stores a computer-readable program, which may be a code of a source program or a read-only program. The program may be executed by the processing unit 410 such that the processing unit 410 performs the steps of various embodiments of the present invention. For example, the processing unit 410 may perform the steps as shown in fig. 1.
The storage unit 420 may include readable media in the form of volatile storage units, such as a random access memory unit (RAM)4201 and/or a cache memory unit 4202, and may further include a read only memory unit (ROM) 4203. The storage unit 420 may also include a program/utility 4204 having a set (at least one) of program modules 4205, such program modules 4205 including, but not limited to: operating the electronic device, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 430 may be any bus representing one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 400 may also communicate with one or more external devices 300 (e.g., keyboard, display, network device, bluetooth device, etc.), enable a user to interact with the electronic device 400 via the external devices 400, and/or enable the electronic device 400 to communicate with one or more other data processing devices (e.g., router, modem, etc.). Such communication may occur via input/output (I/O) interfaces 450, and may also occur via a network adapter 460 with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network such as the Internet). The network adapter 460 may communicate with other modules of the electronic device 400 via the bus 430. It should be appreciated that although not shown in FIG. 4, other hardware and/or software modules may be used in the electronic device 400, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID electronics, tape drives, and data backup storage electronics, among others.
FIG. 5 is a schematic diagram of one computer-readable medium embodiment of the present invention. As shown in fig. 5, the computer program may be stored on one or more computer readable media. The computer readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electronic device, apparatus, or device that is electronic, magnetic, optical, electromagnetic, infrared, or semiconductor, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. The computer program, when executed by one or more data processing devices, enables the computer-readable medium to implement the above-described method of the invention, namely: generating node information of a graph data structure based on historical user data, wherein the node information represents behavior characteristics of a user, and the behavior characteristics comprise specified labels for representing whether the user has bad preferences; adding a new user without a specified label as a new node into a graph data structure, and training a GraphSAGE model by adopting a GraphSAGE algorithm to predict the value of the specified label of the user; predicting whether the new user has bad taste according to the value of the assigned label.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments of the present invention described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a computer-readable storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a data processing device (which can be a personal computer, a server, or a network device, etc.) execute the above-mentioned method according to the present invention.
The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution electronic device, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including object oriented programming languages such as Java, C + + or the like and conventional procedural programming languages, such as "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
In summary, the present invention can be implemented as a method, an apparatus, an electronic device, or a computer-readable medium executing a computer program. Some or all of the functions of the present invention may be implemented in practice using a general purpose data processing device such as a microprocessor or a Digital Signal Processor (DSP).
While the foregoing embodiments have described the objects, aspects and advantages of the present invention in further detail, it should be understood that the present invention is not inherently related to any particular computer, virtual machine or electronic device, and various general-purpose machines may be used to implement the present invention. The invention is not to be considered as limited to the specific embodiments thereof, but is to be understood as being modified in all respects, all changes and equivalents that come within the spirit and scope of the invention.

Claims (10)

1. A method for detecting poor hobby behaviors based on GraphSAGE, which is characterized by comprising the following steps:
generating node information of a graph data structure based on historical user data, wherein the node information represents behavior characteristics of a user, and the behavior characteristics comprise specified labels for representing whether the user has bad preferences;
adding a new user without a specified label as a new node into a graph data structure, and training a GraphSAGE model by adopting a GraphSAGE algorithm to predict the value of the specified label of the user;
predicting whether the new user has bad taste according to the value of the assigned label.
2. The method of claim 1, wherein training the GraphSAGE model using the GraphSAGE algorithm comprises:
sampling a plurality of neighbor nodes containing new neighbor nodes of the nodes in each search layer;
obtaining the embedding of the node on the kth layer according to the embedding of the neighbor node sampled on the kth-1 layer by the node;
and k is the number of the search layers.
3. The method of any of claims 1-2, wherein the sampling a plurality of neighbor nodes in each search layer that contain a new neighbor node of a node comprises:
searching a new neighbor node of the node in each search layer;
sampling a plurality of neighbor nodes in each search layer;
wherein the plurality of neighbor nodes includes all new neighbor nodes in the search layer.
4. The method according to any one of claims 1-3, wherein the obtaining the embedding of the node at the k-th layer according to the embedding of the neighbor node sampled by the node at the k-1 th layer comprises:
aggregating the neighbor aggregation characteristics of the nodes on the k-1 layer through aggregation functions;
and splicing the neighbor aggregation characteristics of the node on the kth layer with the embedding of the node on the kth-1 layer neighbor node, and obtaining the embedding of the node on the kth layer through full connection layer conversion.
5. The method of any of claims 1-4, wherein the aggregation function is LSTM, and wherein the set of vectors for neighboring nodes are input into the LSTM according to a predetermined order when aggregated.
6. The method of any one of claims 1-5, wherein training the GraphSAGE model using the GraphSAGE algorithm further comprises:
and performing back propagation through gradient descent to optimize parameters and parameters in the aggregation function.
7. The method according to any of claims 1-6, characterized in that the new neighbor node is a newly added neighbor node or a neighbor node with updated node information.
8. A poor hobby behavior detection device based on GraphSAGE, characterized in that the device comprises:
the generation module is used for generating node information of the graph data structure based on historical user data, the node information represents the behavior characteristics of the user, and the behavior characteristics comprise a designated label used for representing whether the user has bad taste;
the training module is used for adding a new user without a specified label into the graph data structure as a new node, and adopting a GraphSAGE algorithm to train a GraphSAGE model to predict the value of the specified label of the user;
and the prediction module is used for predicting whether the new user has bad taste according to the value of the specified label.
9. An electronic device, comprising:
a processor; and
a memory storing computer-executable instructions that, when executed, cause the processor to perform the method of any of claims 1-7.
10. A computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the method of any of claims 1-7.
CN202011134335.4A 2020-10-21 2020-10-21 GraphSAGE-based bad preference behavior detection method and device and electronic equipment Pending CN112347316A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011134335.4A CN112347316A (en) 2020-10-21 2020-10-21 GraphSAGE-based bad preference behavior detection method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011134335.4A CN112347316A (en) 2020-10-21 2020-10-21 GraphSAGE-based bad preference behavior detection method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN112347316A true CN112347316A (en) 2021-02-09

Family

ID=74359614

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011134335.4A Pending CN112347316A (en) 2020-10-21 2020-10-21 GraphSAGE-based bad preference behavior detection method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN112347316A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113515568A (en) * 2021-07-13 2021-10-19 北京百度网讯科技有限公司 Graph relation network construction method, graph neural network model training method and device
CN114168799A (en) * 2021-11-26 2022-03-11 四川云从天府人工智能科技有限公司 Method, device and medium for acquiring characteristics of node adjacency relation in graph data structure
CN115048535A (en) * 2022-06-30 2022-09-13 支付宝(杭州)信息技术有限公司 Method and device for recognizing abnormity
CN114168799B (en) * 2021-11-26 2024-06-11 四川云从天府人工智能科技有限公司 Method, device and medium for acquiring characteristics of node adjacency in graph data structure

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781407A (en) * 2019-10-21 2020-02-11 腾讯科技(深圳)有限公司 User label generation method and device and computer readable storage medium
CN110995810A (en) * 2019-11-25 2020-04-10 腾讯科技(深圳)有限公司 Object identification method based on artificial intelligence and related device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781407A (en) * 2019-10-21 2020-02-11 腾讯科技(深圳)有限公司 User label generation method and device and computer readable storage medium
CN110995810A (en) * 2019-11-25 2020-04-10 腾讯科技(深圳)有限公司 Object identification method based on artificial intelligence and related device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113515568A (en) * 2021-07-13 2021-10-19 北京百度网讯科技有限公司 Graph relation network construction method, graph neural network model training method and device
CN114168799A (en) * 2021-11-26 2022-03-11 四川云从天府人工智能科技有限公司 Method, device and medium for acquiring characteristics of node adjacency relation in graph data structure
CN114168799B (en) * 2021-11-26 2024-06-11 四川云从天府人工智能科技有限公司 Method, device and medium for acquiring characteristics of node adjacency in graph data structure
CN115048535A (en) * 2022-06-30 2022-09-13 支付宝(杭州)信息技术有限公司 Method and device for recognizing abnormity

Similar Documents

Publication Publication Date Title
CN107844560B (en) Data access method and device, computer equipment and readable storage medium
CN110807515A (en) Model generation method and device
AU2019280855A1 (en) Detecting suitability of machine learning models for datasets
CN107609185B (en) Method, device, equipment and computer-readable storage medium for similarity calculation of POI
CN111143226B (en) Automatic test method and device, computer readable storage medium and electronic equipment
US11093857B2 (en) Method and apparatus for generating information
CN110708285B (en) Flow monitoring method, device, medium and electronic equipment
CN112347316A (en) GraphSAGE-based bad preference behavior detection method and device and electronic equipment
CN113239173B (en) Question-answer data processing method and device, storage medium and electronic equipment
CN111598678A (en) Incremental learning-based user financial risk identification method and device and electronic equipment
CN111353601A (en) Method and apparatus for predicting delay of model structure
CN112819099A (en) Network model training method, data processing method, device, medium and equipment
US10733537B2 (en) Ensemble based labeling
CN112925785A (en) Data cleaning method and device
CN113392920B (en) Method, apparatus, device, medium, and program product for generating cheating prediction model
CN111179055A (en) Credit limit adjusting method and device and electronic equipment
CN105447032A (en) Method and system for processing message and subscription information
CN113298634A (en) User risk prediction method and device based on time sequence characteristics and graph neural network
CN112347776B (en) Medical data processing method and device, storage medium and electronic equipment
CN112765398A (en) Information recommendation method and device and storage medium
CN114141236B (en) Language model updating method and device, electronic equipment and storage medium
CN112149623B (en) Self-adaptive multi-sensor information fusion system, method and storage medium
US20230230081A1 (en) Account identification method, apparatus, electronic device and computer readable medium
CN114897183A (en) Problem data processing method, and deep learning model training method and device
CN115017385A (en) Article searching method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Country or region after: Zhong Guo

Address after: Room 1109, No. 4, Lane 800, Tongpu Road, Putuo District, Shanghai, 200062

Applicant after: Shanghai Qiyue Information Technology Co.,Ltd.

Address before: Room a2-8914, 58 Fumin Branch Road, Hengsha Township, Chongming District, Shanghai, 201500

Applicant before: Shanghai Qiyue Information Technology Co.,Ltd.

Country or region before: Zhong Guo