CN116150341B - Method for detecting claim event, computer device and storage medium - Google Patents

Method for detecting claim event, computer device and storage medium Download PDF

Info

Publication number
CN116150341B
CN116150341B CN202310440660.0A CN202310440660A CN116150341B CN 116150341 B CN116150341 B CN 116150341B CN 202310440660 A CN202310440660 A CN 202310440660A CN 116150341 B CN116150341 B CN 116150341B
Authority
CN
China
Prior art keywords
node
nodes
word
event
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310440660.0A
Other languages
Chinese (zh)
Other versions
CN116150341A (en
Inventor
潘怡君
那崇宁
张泷
胡汉一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202310440660.0A priority Critical patent/CN116150341B/en
Publication of CN116150341A publication Critical patent/CN116150341A/en
Application granted granted Critical
Publication of CN116150341B publication Critical patent/CN116150341B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Finance (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Mathematical Physics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application relates to a method for detecting a claim settlement event, a computer device and a storage medium, wherein under the condition that a graph network is changed, a first node sequence is updated to obtain a second node sequence with a current timestamp, the graph network is constructed based on data of the claim settlement event, and the first node sequence is obtained through a plurality of nodes in a correlation graph network; in the word vector model, carrying out reverse iterative training on the word vector model according to the second node sequence and the weight parameters of the related change nodes, and obtaining the node vector of the current timestamp according to the trained word vector model, wherein the related change nodes comprise nodes with changed nodes and/or edges between the nodes in the graph network; according to the node vector of the current timestamp, whether the claim settlement event belongs to the target type event is determined, so that the weight parameters of part of nodes are only required to be updated in an iterative mode, the calculated amount is reduced, and the claim settlement event can be accurately and efficiently detected.

Description

Method for detecting claim event, computer device and storage medium
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to a method for detecting an event of a claim, a computer device, and a storage medium.
Background
With the popularity of distributed systems, the emergence of claims event data has grown in magnitude. Identifying the fraud from these data requires a lot of manpower and material resources, and marking the fraud with expert knowledge has the disadvantage of being highly subjective.
For this reason, the related art proposes a method of detecting a claim event: storing data of the claim event by using an unstructured carrier, constructing and training a word vector model to represent words of the unstructured carrier, and obtaining a target vector; and constructing and training a fraud detection model, and inputting the target vector into the fraud detection model for prediction, so as to obtain a prediction result of whether the claim settlement event belongs to the fraud event.
However, the existing method for detecting the claim event is applied to static isomorphic data, and the unstructured data can present dynamic expression along with time change, so that the existing model cannot adapt to the dynamically changed unstructured data, and the detection result of the claim event is not accurate enough. If the existing model is trained by re-inputting the training sample, the calculated amount is large and the time cost is high.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a claim event detection method, a computer device, and a storage medium that can accurately and efficiently detect claim events.
In a first aspect, the present application provides a method for detecting a claim event, the method comprising:
under the condition that a graph network is changed, updating a first node sequence to obtain a second node sequence of a current time stamp, wherein the graph network is constructed based on data of claim settlement events, and the first node sequence is obtained by associating a plurality of nodes in the graph network;
in a word vector model, performing reverse iterative training on the word vector model according to the second node sequence and the weight parameters of the related change nodes, and obtaining a node vector of a current time stamp according to the trained word vector model, wherein the related change nodes comprise nodes with changed nodes and/or edges between nodes in the graph network;
and determining whether the claim event belongs to a target type event according to the node vector of the current timestamp.
In one embodiment, the case where the graph network is changed includes at least one of:
at the current timestamp, a node is added;
at the current timestamp, a node is deleted;
a node is newly added in the current timestamp, and an association relationship is formed between the newly added node and other nodes;
At the current timestamp, one node is deleted, and the association relationship between the deleted node and other nodes is released.
In one embodiment, the weight parameters of the related change node include a central word matrix and a surrounding word matrix, and performing reverse iterative training on the word vector model according to the second node sequence and the weight parameters of the related change node includes:
encoding words contained in the change-related nodes according to a preset encoding rule to obtain target word vectors of the change-related nodes;
multiplying the target word vector with the central word matrix to obtain central word vectors of all the change-related nodes, and multiplying the target word vector with the surrounding word matrix to obtain surrounding word vectors of all the change-related nodes;
and carrying out normalization processing on the surrounding word vectors, and adjusting the central word matrix and the surrounding word matrix of the related change node according to the normalized probability of the surrounding word vectors so as to enable the word vector model to meet convergence conditions.
In one embodiment, obtaining the node vector of the current timestamp according to the trained word vector model includes:
And taking the central word matrix of the word vector model meeting the convergence condition as a node vector of the current timestamp.
In one embodiment, the normalizing the surrounding word vectors, and adjusting the central word matrix and the surrounding word matrix of the related and modified node according to the normalized probability of the surrounding word vectors, so that the word vector model meets a convergence condition, includes:
taking the negative logarithm of the initial loss function of the word vector, dividing the obtained value by the total number of preset words to obtain a new loss function, wherein the initial loss function comprises a maximum likelihood function;
and adjusting the central word matrix and the surrounding word matrix of the change-related node to enable the new loss function to be converged.
In one embodiment, determining whether the claim event belongs to a target type event according to the node vector of the current timestamp includes:
merging node vectors belonging to the same claim settlement event in the node vectors of the current time stamp;
predicting the combined node vectors to obtain a prediction label of the claim settlement event;
and determining whether the claim event belongs to the target type event according to the prediction label of the claim event.
In one embodiment, the obtaining the first node sequence includes:
selecting a plurality of nodes in the graph network which are adaptive according to an identification target, and determining a random walk path among the plurality of nodes, wherein the identification target comprises a claim settlement event of the identification target type;
and associating the selected plurality of nodes in the graph network according to the random walk path.
In one embodiment, the attributes of each node include at least one of: time, location, personnel information, claims object identification.
In a second aspect, the present application further provides a computer device, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the claim event detection method of the first aspect when the processor executes the computer program.
In a third aspect, the present application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the first aspect described above.
In the method for detecting the claim settlement event, the computer equipment and the storage medium, under the condition that the graph network is changed, updating the first node sequence to obtain the second node sequence of the current timestamp, wherein the graph network is constructed based on the data of the claim settlement event, and the first node sequence is obtained through a plurality of nodes in the association graph network; in the word vector model, carrying out reverse iterative training on the word vector model according to the second node sequence and the weight parameters of the related change nodes, and obtaining the node vector of the current timestamp according to the trained word vector model, wherein the related change nodes comprise nodes with changed nodes and/or edges between the nodes in the graph network; according to the node vector of the current timestamp, whether the claim settlement event belongs to the target type event is determined, so that the weight parameters of part of nodes are only required to be updated in an iterative mode, the calculated amount is reduced, and the claim settlement event can be accurately and efficiently detected.
Drawings
FIG. 1 is a block diagram of a hardware architecture of a terminal of a method of detecting a claim event in one embodiment;
FIG. 2 is a flow chart of a method of detecting an event of a claim in one embodiment;
FIG. 3 is a schematic diagram of a graph network constructed based on vehicle insurance claim data in one embodiment;
FIG. 4 is a schematic diagram of a word vector model in one embodiment;
FIG. 5 is a schematic diagram of a design idea of a method for detecting a vehicle insurance claim event in an embodiment;
FIG. 6 is a flow chart of a method of detecting a vehicle insurance claim event in one embodiment;
FIG. 7 is a diagram of simulation results of anti-fraud recognition of a vehicle insurance claim of 1500 events in one embodiment;
FIG. 8 is a diagram of simulation results of anti-fraud recognition of a car insurance claim of 100 events in one embodiment;
fig. 9 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
For a clearer understanding of the objects, technical solutions and advantages of the present application, the present application is described and illustrated below with reference to the accompanying drawings and examples.
Unless defined otherwise, technical or scientific terms used herein shall have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terms "a," "an," "the," "these," and the like in this application are not intended to be limiting in number, but rather are singular or plural. The terms "comprising," "including," "having," and any variations thereof, as used in the present application, are intended to cover a non-exclusive inclusion; for example, a process, method, and system, article, or apparatus that comprises a list of steps or modules (units) is not limited to the list of steps or modules (units), but may include other steps or modules (units) not listed or inherent to such process, method, article, or apparatus. The terms "connected," "coupled," and the like in this application are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference to "a plurality" in this application means two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., "a and/or B" may mean: a exists alone, A and B exist together, and B exists alone. Typically, the character "/" indicates that the associated object is an "or" relationship. The terms "first," "second," "third," and the like, as referred to in this application, merely distinguish similar objects and do not represent a particular ordering of objects.
The method embodiments provided in the present embodiment may be executed in a terminal, a computer, or similar computing device. For example, the method runs on a terminal, and fig. 1 is a block diagram of a hardware structure of the terminal of the method for detecting a claim event according to an embodiment of the present application. As shown in fig. 1, the terminal may include one or more (only one is shown in fig. 1) processors 102 and a memory 104 for storing data, wherein the processors 102 may include, but are not limited to, a microprocessor MCU, a programmable logic device FPGA, or the like. The terminal may also include a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely illustrative and is not intended to limit the structure of the terminal. For example, the terminal may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1.
The memory 104 may be used to store computer programs, such as software programs of application software and modules, such as computer programs corresponding to the method of detecting an event of claim in the present embodiment, and the processor 102 executes the computer programs stored in the memory 104 to perform various functional applications and data processing, that is, to implement the method described above. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located relative to the processor 102, which may be connected to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used to receive or transmit data via a network. The network includes a wireless network provided by a communication provider of the terminal. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, simply referred to as NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is configured to communicate with the internet wirelessly.
When an event occurs, the claim data is recorded in the service system, the claim data is usually unstructured and unordered when the claim data is analyzed, and the characterization learning technology in the natural language processing field is difficult to directly migrate and use, so that the characterization learning technology based on the unstructured data needs to be studied. In this embodiment, the graph is used as an unstructured carrier to store claim data, and a graph network is obtained. Node characterization learning is a technology specially aiming at a graph, and the idea behind the node characterization learning is as follows: the method comprises the steps of utilizing the topological structure of a graph to represent massive, high-dimensional, heterogeneous, complex and dynamic data into uniform, low-dimensional and dense vectors by learning a type of mapping, and optimizing the mapping to save the structure and the property of the graph, so that the learned vector can truly reflect the topological structure in an original space. However, most node characterization learning methods are applied to static isomorphic diagrams at present, dynamic expression can occur to an actual service system along with time change, and the relationship among the number of nodes, the attributes and edges can be changed, namely, the diagram network is changed, and the existing model cannot adapt to the dynamic changed service system, so that the detection result of the claim settlement event is inaccurate. If the existing model is trained by re-inputting the training sample, the calculated amount is large and the time cost is high.
Based on the foregoing, in one embodiment, a method for detecting a claim event is provided, and an example of application of the method to the terminal in fig. 1 is described, and fig. 2 shows a schematic flow chart of the method for detecting a claim event in this embodiment, where the flow chart includes the following steps:
step S201, under the condition that the graph network is changed, updating the first node sequence to obtain a second node sequence of the current time stamp, wherein the graph network is constructed based on the data of the claim settlement event, and the first node sequence is obtained through a plurality of nodes in the association graph network.
A graph network, a data structure and carrier, is represented as a collection of nodes and edges. Attributes of each node include, but are not limited to, time, place, personnel information, claims object identification. The personnel information may be the identity of an impaired person and/or the identity of a person who enjoys the benefit of the insurance claim, and the claim object refers to an article or person who enjoys the benefit of the insurance claim, such as a car, a computer, a consumer of financial products, a passenger, a guest.
The node sequence is data obtained based on nodes in a random walk path association graph network, the random walk path can be determined based on the target type, and different target types correspond to different random walk paths. The random walk path corresponds to a path template that indicates which nodes are associated and how the nodes are associated. Optionally, acquiring the first node sequence includes: selecting a plurality of nodes in the graph network according to the identification target, and determining random walk paths among the nodes, wherein the identification target comprises a claim settlement event of the identification target type; a plurality of nodes selected in the graph network are associated according to the random walk path. Taking the example of detecting a car insurance claim event, when the target type is personal car insurance claim fraud, then the random walk path may be: time-frame number-time. When the target type is group car insurance claim fraud, then the random walk path may be: place-loss fighter-frame number-loss fighter-place.
Table 1 shows a piece of vehicle insurance claim data, and fig. 3 shows a schematic diagram of a graph network constructed based on the vehicle insurance claim data in table 1.
TABLE 1 vehicle insurance claim data sheet
In a practical scenario, a plurality of vehicle risk events are associated with each other, for example, an impairment fighter is responsible for a plurality of events, a vehicle owner may relate to a plurality of events, a vehicle frame number may also relate to a plurality of events, and based on the association relationship, nodes are associated in a graph network based on random walk paths to obtain a node sequence. For example, a random walk path is set as: time-frame number-time, the frame numbers of event A1 and event A2 are the same, and the frame numbers of event A2 and event A3 are the same, then a node sequence can be obtained: T1-C1-T2, T2-C2-T3. For example, a random walk path is set as: site-impairment fighter-frame number-impairment fighter-site, the frame numbers of event A1 and event A2 are the same, the frame numbers of event A2 and event A3 are the same, then the node sequence can be obtained: L1-S1-C1-S2-L2, L2-S2-C2-S3-L3.
Step S202, in the word vector model, performing reverse iterative training on the word vector model according to the second node sequence and the weight parameters of the related change nodes, and obtaining the node vector of the current timestamp according to the trained word vector model, wherein the related change nodes comprise nodes with changed nodes and/or edges between nodes in the graph network.
The word vector model is a natural language processing model capable of converting words in natural language into dense vectors. The word vector model is trained by nodes in the graph network. The weight parameters related to the changed nodes are network layer parameters in the word vector model. Fig. 4 shows a schematic structure of a word vector model, in which nodes in an input node sequence of an input layer are processed by an intermediate hidden layer, then the output layer outputs a node vector, and the node vector is used as input of the word vector model and is processed again, so that the reverse iterative training of the word vector model is realized.
The change node may be a node whose own attribute changes, a node whose own attribute does not change but whose own association relationship with other nodes changes, or a node whose own attribute changes and whose own association relationship with other nodes changes.
Assuming that n=100, that is, 100 events, each event records 5 kinds of data including time, place, owner, damage-assessment agent and frame number, there are 5×100 nodes, at this time, a newly acquired event includes 5 new nodes, and assuming that the 5 new nodes have an association relationship with 100 nodes in the original 500 nodes, the node sequence of the 105 nodes is re-acquired according to the set random walk path, that is, in the 505 total nodes, the association relationship between each node in the 105 nodes and other nodes is acquired according to the set random walk path. It should be noted that, when the selected 105 nodes walk randomly, the selected 105 nodes also walk through all 505 nodes, which may involve the remaining 400 nodes, and in this embodiment, only the weight parameters of the 105 nodes are selected to perform the reverse iterative training on the word vector model, instead of performing the reverse iterative training on the word vector model based on the weight parameters of the 505 nodes.
Step S203, according to the node vector of the current time stamp, whether the claim event belongs to the target type event is determined.
And inputting the node vector of the current timestamp into a trained neural network model for prediction to obtain whether an event conforming to the target type exists in the claim settlement event. The target type, which may be claims fraud; personal claims fraud or partner claims fraud; claim fraud for cars, computers, financial product consumers, passengers, or tourists is also possible.
In the above steps S201 to S203, the word vector model is used to represent the word of the nodes of the graph network, the graph network will change with time, the word vector model needs to update the weight parameters in order to adapt to the change, i.e. the training samples are input again for training, and the weight parameters of all the neurons will be adjusted once every time the neural network is trained by a round of training samples, which is large in calculation amount and long in time. Therefore, under the condition that the graph network is changed, only the weight parameters of part of nodes are iteratively updated, the weight parameters of other nodes are fixed, the calculated amount is reduced, and therefore the claim settlement event can be accurately and efficiently detected.
In one embodiment, the case where the graph network is changed includes at least one of: at the current timestamp, a node is added; at the current timestamp, a node is deleted; a node is newly added in the current timestamp, and an association relationship is formed between the newly added node and other nodes; at the current timestamp, one node is deleted, and the association relationship between the deleted node and other nodes is released.
Four trends may occur for nodes and edges: node addition, node deletion, edge addition between nodes and deletion. The above four changes are described below by means of a mathematical model, so as to facilitate the selection of nodes involving the changes, and then the principle of negative sampling is considered according to the change conditions, and a dynamic update data set of each time stamp is constructed, including an update sequence and the selection of weight parameters to be updated. In this embodiment, based on the negative sampling principle, the nodes related to the newly added vehicle risk node (for example, the 105 nodes) are taken as positive samples, the other irrelevant nodes (for example, the remaining 400 nodes) are taken as negative samples, all positive samples are taken as relevant nodes, and then the node sequence is acquired again. The specific mathematical model can be expressed as the formulas (1) - (4).
The new node is added, that is, at the current timestamp, a new node is added:
node deletion, namely deleting a node at the current timestamp:
the new addition of the edges between the nodes means that at the current time stamp, the original two unconnected nodes are connected due to the new addition of the nodes:
the edge deletion between nodes means that at the current time stamp, two originally connected nodes are not connected due to the deletion of the nodes:
wherein V is add Representing a newly added node set, V del Representing deleted node sets, E add Representing a newly added edge set, E del Represents a deleted edge set, V represents a first node, u represents a second node, w represents a third node, V t Representing the set of all nodes at time t, V t+1 Represents the set of all nodes at time t+1, E t Representing the set of all edges at time t, E t+1 Representing the set of all edges at time t+1, t and t+1 representing the time of event acquisition, respectively, where t+1 corresponds to the current timestamp.
In one embodiment, the weight parameters related to the change node include a central word matrix and a surrounding word matrix, and performing reverse iterative training on the word vector model according to the second node sequence and the weight parameters related to the change node includes: encoding words contained in the change-related nodes according to a preset encoding rule to obtain target word vectors of the change-related nodes; multiplying the target word vector by the central word matrix to obtain central word vectors of all the related change nodes, and multiplying the target word vector by the surrounding word matrix to obtain surrounding word vectors of all the related change nodes; and carrying out normalization processing on surrounding word vectors, and adjusting a central word matrix and surrounding word matrices of the related change nodes according to the probability of the normalized surrounding word vectors so as to enable the word vector model to meet convergence conditions. Further, obtaining a node vector of the current timestamp according to the trained word vector model, including: and taking the central word matrix of the word vector model meeting the convergence condition as the node vector of the current time stamp.
The node vector can be obtained by constructing a characteristic learning neural network of the node based on the selected partial sampling data by adopting a skip-gram method in Word2 vec. The skip-gram method takes a word as a central word in each iteration, and tries to predict a context word within a certain range of the central word. In this embodiment, each node in the sequence of nodes is treated as each word in the text, and then the vector of nodes is obtained based on a maximum likelihood function using the natural accuracy of the random walk path.
The manner in which the node vectors are generated will be described further below.
Step S301, a target word is determined and recorded as one-hot codeWherein->One-hot vector, which refers to the target word c,/->Refers to the number of total words;
step S302, two parameter matrixes are constructed, namely a central word matrix respectivelyAnd surrounding word matrixWherein->The number of all words, d refers to the dimension of the node vector;
step S303, target word one-hot encoding obtained in step S301Multiplying by a central word vector matrix>Obtaining a +.>A vector of dimensions, which can be considered as the center word vector of the word;
Step S304, multiplying the target word vector by the surrounding word vector matrixThis step can be understood as the inner product of each word separately for the target word, resulting in +.>Each element in the vector is the inner product size of the word at that location and the target word.
In one embodiment, the normalization processing is performed on surrounding word vectors, and the central word matrix and the surrounding word matrix of the related change node are adjusted according to the probability of the normalized surrounding word vectors so that the word vector model meets the convergence condition, and the method comprises the following steps: taking the negative logarithm of the initial loss function of the word vector, dividing the obtained value by the total number of the preset words to obtain a new loss function, wherein the initial loss function comprises a maximum likelihood function; the central word matrix and the surrounding word matrices related to the changed node are adjusted to enable the new loss function to be converged.
For the finally obtainedVector, further normalized by softmax, the greater the probability after normalization, the phase of the word with the target word is representedThe larger the relativity is, the larger the probability of the surrounding word is utilized based on the principle of maximized likelihood function, and the central word matrix is adjusted>And surrounding word matrix->According to the loss function, the parameter matrix is adjusted by using a back propagation algorithm, and finally, the minimization of the loss function is realized, wherein the central word matrix is +. >I.e. the final node vector representation. The loss function is also called likelihood function, here representing the probability of occurrence of all surrounding words within a 2m window given a target word,/o->Refers to the number of total words. The objective of this embodiment is to maximize this loss function by adjusting the parameters, because the larger this function, the more fit the representation is to the actual situation, the formula is as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,refers to the number of all words, m is the window size, w i Is the i-th word,/>Refers to the word w in the known sense t Word w appears in the case of (a) t+j Is a probability of (2).
According to usual practice, it is generally preferred to minimize the loss function, rather than maximize it. Thus taking the negative logarithm of the loss function and dividing byObtaining a new productThe formula is shown below:
wherein, the liquid crystal display device comprises a liquid crystal display device,refers to the number of all words, m is the window size, w i Is the i-th word,/>Refers to the word w in the known sense t Word w appears in the case of (a) t+j Is a probability of (2).
In one embodiment, determining whether the claim event belongs to a target type event based on the node vector of the current timestamp includes: merging node vectors belonging to the same claim settlement event in the node vectors of the current time stamp; predicting the combined node vectors to obtain a prediction label of the claim settlement event; and determining whether the claim event belongs to the target type event according to the prediction label of the claim event.
In one embodiment, a method for detecting a vehicle insurance claim event is provided, fig. 5 shows a design idea of the method for detecting a vehicle insurance claim event, takes a vehicle insurance claim as a research object, takes multi-source heterogeneous data as a research basis, carries out tasks of path design, weight selection and anomaly detection, and finally achieves the aim: characterization learning of dynamic heterogeneous network of vehicle insurance claim event, anomaly detection and anti-fraud identification of vehicle insurance claim event. Wherein, the path design includes: and acquiring the running flow of the vehicle insurance claim, acquiring the association relation among the nodes, and designing a random walk path. The weight selection comprises the following steps: describing dynamic network variation trend, constructing each time stamp model, and considering a negative sampling implementation process. The abnormality detection includes: and obtaining a node vector, detecting abnormal vehicle insurance claim settlement, and identifying anti-fraud. The design thought can be split into two main tasks, namely characterization learning and anomaly detection.
Characterization learning: according to the acquired multisource heterogeneous data of the vehicle insurance claim, based on expert knowledge and an association relation analysis method, a proper graph network node is adaptively selected; and updating the random walk route of part of nodes according to the node change characteristics of each time stamp, selecting the weight matrix of part of nodes based on the negative sampling principle to carry out iterative updating, and finally obtaining the node vector representation of each time stamp.
Abnormality detection: the node vector representation obtained by each time stamp is utilized to carry out unified processing on the node vectors belonging to the same vehicle insurance claim event, so as to obtain the node vector representation of different elements of each event; based on the fraud labels of part of the original data, the online real-time abnormal detection of the vehicle insurance claim settlement is realized by using machine learning methods such as principal component analysis and the like.
On the basis of fig. 5, fig. 6 shows a schematic flow chart of a method for practicing the design concept, please refer to fig. 6, the flow chart includes the following steps:
step S601, acquiring historical car insurance claim data, training a word vector model according to the historical car insurance claim data, acquiring a corresponding node vector according to the trained word vector model, and marking fraud labels according to expert knowledge;
step S602, unifying the node vector of each time stamp into a vector, and constructing a neural network model based on the fraud labels by using a data clustering method such as principal component analysis;
step S603, identifying the change rule of the nodes in the new automobile insurance claim settlement event, selecting the changed nodes, updating the node sequence and iterating part of weight parameters to obtain the node vector in the new automobile insurance claim settlement event;
Step S604, predicting whether the car insurance claim event is fraudulent or not in real time according to the node vector calculated in the step S603 and the neural network model constructed in the step S602, so as to realize anomaly detection of the car insurance claim.
In an actual vehicle insurance claim event, nodes are added and deleted with time, and the relationship between the nodes is changed with time, but most of the current algorithms are directed against static networks. The embodiment aims at researching dynamic heterogeneous network characterization learning oriented to the vehicle insurance claim settlement event, only updates part of models when new data is acquired, reduces the calculation complexity and has important application value in practice. The graph network contains a lot of important information, and if the relationship between the nodes and the edges is directly utilized, the graph network is complex to implement due to the multi-source heterogeneous characteristic. The nodes and the edges are represented by vectors, firstly, unification can be realized in dimensions, and secondly, the numerical vector containing semantic and structural information in the graph network can be effectively applied to the traditional machine learning method to realize the anomaly detection of the vehicle insurance claim settlement event, evaluate the risk value of the vehicle insurance claim settlement event, identify some anomaly states, and have important significance in the aspects of vehicle insurance claim settlement anti-fraud and the like.
The embodiment has the following advantages:
1) According to different fraud recognition targets, self-adaptively selecting corresponding nodes and designing a random walk path, and pertinently improving the fraud recognition rate of the vehicle insurance claim settlement;
2) The problem that nodes dynamically change along with time in the vehicle insurance claim settlement event is solved, the operation complexity of the model is reduced, and the fraud detection precision of the model is ensured while the operation speed is improved; based on the idea of negative sampling, the trend of the node changing along with time is classified by using a mathematical model, and the node needing iterative updating is selected with smaller calculation complexity, so that the operation speed of the model can be improved to a greater extent;
3) Under the condition of acquiring the vector representation of the node, the fraud label of the car insurance claim event is acquired on line and in real time by utilizing a machine learning algorithm, so that the cost is reduced, the subjectivity of expert knowledge is reduced, the detection time of car insurance fraud is improved, the car insurance claim event with risk is intercepted in time, and the method has higher economic value.
A specific example is given below in connection with fig. 6 described above. In this embodiment, there are 6080 events in the vehicle insurance claim event set, 4347 events are manually marked as normal events, the rest 1733 events are marked as fraudulent events, and the event fraud rate is 28.50%. Wherein 2000 normal events and 1000 fraud events are used as training matrices and 2347 normal events and 733 fraud events are used as test matrices.
In the embodiment, the personal behavior fraud events of the vehicle owners are mainly identified, the positions of the vehicle frame numbers, the damage fighters and the accidents in the vehicle insurance claim data are counted, and the probability of the vehicle insurance claim fraud is counted from the three aspects of the vehicle frame numbers, the damage fighters and the accidents. Firstly, designing a random walk path, and collecting training texts by adopting the path of accident positions, loss assessment staff, frame numbers, loss assessment staff and accident positions in the embodiment; each node is used as an initial node to generate 100 random walk paths, each path is repeated for 10 times according to the set route requirements, and all the random walk paths generate txt documents which are used as text materials for training a follow-up skip-gram model; in this embodiment, the embedding dimension is set to 128, the window size is 7, the iteration number is 5, and the node vector representation obtained by training is stored in the txt file. In the test data application process, the test node is tried to be updated by taking 50 samples as a unit, and the best delay interval is searched. Fig. 7 is a simulation result diagram of anti-fraud recognition of vehicle insurance claims for 1500 events in the present embodiment, fig. 8 is a simulation result diagram of anti-fraud recognition of vehicle insurance claims for 100 events in the present embodiment, the principal component analysis method is used to reduce the dimension of 384-dimensional data, obtain 128 principal component directions, and plot the first two-dimensional data as shown in fig. 7 and 8. The normal events in the test data set are mainly concentrated in the dotted line frame, the abnormal events in the test data set are mainly concentrated outside the dotted line frame, the vector representation of the vehicle insurance claim event node can be seen, and the identification of the fraud event can be realized. Wherein, fig. 7 is a training matrix of 2000 normal events and 1000 fraud events, and a test matrix of 1000 normal events and 500 fraud events; fig. 8 is a graph of simulation results of fig. 7 and 8, with the sampling interval reduced, using 2000 normal events and 1000 fraud events as training matrices, and 50 normal events and 50 fraud events as test matrices. From simulation results, 1500 events are used as test data, the effect is good, and the running time of the model is 3006s; 100 events are used as test data, the effect is slightly bad, most of the data can be marked as accurate labels, the running time of the model is only 35s, the running efficiency of the model is improved to a great extent, the effect of the method provided by the application is verified, and the operation speed of the model is improved on the premise of meeting the accuracy requirement.
Furthermore, to visually represent the effect of the algorithm, the effect of the two methods described above is compared using ROC coordinates. The ROC curve has a true case rate TPR on the vertical axis and a false case rate FPR on the horizontal axis (0, 1 is optimal; 1,0 is worst). Wherein the calculation formulas of TPR and FPR are as follows:
wherein Case represents an event, TP is true for prediction and actual observation is true; FN is predicted as false and actual observations as true; FP is predicted to be true and the actual observation is false; TN is predicted to be false, and the actual observation is also false. The ROC curve coordinates of the two experiments were calculated as shown in table 2:
TABLE 2 ROC Curve coordinate calculation results
It should be understood that, although the steps in the flowcharts according to the embodiments described above are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
In one embodiment, a computer device is provided, which may be a terminal, and the internal structure thereof may be as shown in fig. 9. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program when executed by a processor implements a method of claim event detection. The display unit of the computer device is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device. The display screen can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structure shown in fig. 9 is merely a block diagram of a portion of the structure associated with the present application and is not limiting of the computer device to which the present application applies, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:
under the condition that a graph network is changed, updating a first node sequence to obtain a second node sequence of a current time stamp, wherein the graph network is constructed based on data of a claim settlement event, and the first node sequence is obtained through a plurality of nodes in a correlation graph network;
in the word vector model, carrying out reverse iterative training on the word vector model according to the second node sequence and the weight parameters of the related change nodes, and obtaining the node vector of the current timestamp according to the trained word vector model, wherein the related change nodes comprise nodes with changed nodes and/or edges between the nodes in the graph network;
And determining whether the claim event belongs to the target type event according to the node vector of the current timestamp.
In one embodiment, the computer program when executed by the processor further performs the steps of:
encoding words contained in the change-related nodes according to a preset encoding rule to obtain target word vectors of the change-related nodes;
multiplying the target word vector by the central word matrix to obtain central word vectors of all the related change nodes, and multiplying the target word vector by the surrounding word matrix to obtain surrounding word vectors of all the related change nodes;
and carrying out normalization processing on surrounding word vectors, and adjusting a central word matrix and surrounding word matrices of the related change nodes according to the probability of the normalized surrounding word vectors so as to enable the word vector model to meet convergence conditions.
In one embodiment, the computer program when executed by the processor further performs the steps of:
and taking the central word matrix of the word vector model meeting the convergence condition as the node vector of the current time stamp.
In one embodiment, the computer program when executed by the processor further performs the steps of:
taking the negative logarithm of the initial loss function of the word vector, dividing the obtained value by the total number of the preset words to obtain a new loss function, wherein the initial loss function comprises a maximum likelihood function;
The central word matrix and the surrounding word matrices related to the changed node are adjusted to enable the new loss function to be converged.
In one embodiment, the computer program when executed by the processor further performs the steps of:
merging node vectors belonging to the same claim settlement event in the node vectors of the current time stamp;
predicting the combined node vectors to obtain a prediction label of the claim settlement event;
and determining whether the claim event belongs to the target type event according to the prediction label of the claim event.
In one embodiment, the computer program when executed by the processor further performs the steps of:
selecting a plurality of nodes in the graph network according to the identification target, and determining random walk paths among the nodes, wherein the identification target comprises a claim settlement event of the identification target type;
a plurality of nodes selected in the graph network are associated according to the random walk path.
It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to comply with the related laws and regulations and standards of the related countries and regions.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as Static Random access memory (Static Random access memory AccessMemory, SRAM) or dynamic Random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (9)

1. A method of claim event detection, the method comprising:
under the condition that a graph network is changed, updating a first node sequence to obtain a second node sequence of a current time stamp, wherein the graph network is constructed based on data of claim settlement events, and the first node sequence is obtained by associating a plurality of nodes in the graph network;
In a word vector model, performing reverse iterative training on the word vector model according to the second node sequence and the weight parameters of the related change nodes, and obtaining a node vector of a current time stamp according to the trained word vector model, wherein the related change nodes comprise nodes with changed nodes and/or edges between nodes in the graph network;
determining whether the claim settlement event belongs to a target type event according to the node vector of the current timestamp;
the weight parameters of the related change nodes comprise a central word matrix and a surrounding word matrix, and the word vector model is subjected to reverse iterative training according to the second node sequence and the weight parameters of the related change nodes, and the method comprises the following steps:
encoding words contained in the change-related nodes according to a preset encoding rule to obtain target word vectors of the change-related nodes;
multiplying the target word vector with the central word matrix to obtain central word vectors of all the change-related nodes, and multiplying the target word vector with the surrounding word matrix to obtain surrounding word vectors of all the change-related nodes;
and carrying out normalization processing on the surrounding word vectors, and adjusting the central word matrix and the surrounding word matrix of the related change node according to the normalized probability of the surrounding word vectors so as to enable the word vector model to meet convergence conditions.
2. The method of claim 1, wherein the changing of the graph network includes at least one of:
at the current timestamp, a node is added;
at the current timestamp, a node is deleted;
a node is newly added in the current timestamp, and an association relationship is formed between the newly added node and other nodes;
at the current timestamp, one node is deleted, and the association relationship between the deleted node and other nodes is released.
3. The method of claim 1, wherein obtaining a node vector of a current timestamp from the trained word vector model comprises:
and taking the central word matrix of the word vector model meeting the convergence condition as a node vector of the current timestamp.
4. The method according to claim 1, wherein normalizing the surrounding word vectors, and adjusting the center word matrix and the surrounding word matrix of the related change node according to the normalized probability of the surrounding word vectors so that the word vector model meets a convergence condition, comprises:
Taking the negative logarithm of the initial loss function of the word vector, dividing the obtained value by the total number of preset words to obtain a new loss function, wherein the initial loss function comprises a maximum likelihood function;
and adjusting the central word matrix and the surrounding word matrix of the change-related node to enable the new loss function to be converged.
5. The method of claim 1, wherein determining whether the claim event belongs to a target type event based on the node vector of the current timestamp comprises:
merging node vectors belonging to the same claim settlement event in the node vectors of the current time stamp;
predicting the combined node vectors to obtain a prediction label of the claim settlement event;
and determining whether the claim event belongs to the target type event according to the prediction label of the claim event.
6. The method of claim 1, wherein obtaining the first node sequence comprises:
selecting a plurality of nodes in the graph network which are adaptive according to an identification target, and determining a random walk path among the plurality of nodes, wherein the identification target comprises a claim settlement event of the identification target type;
And associating the selected plurality of nodes in the graph network according to the random walk path.
7. The method of claim 1, wherein the attribute of each node comprises at least one of: time, location, personnel information, claims object identification.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the claim event detection method of any of claims 1 to 7.
9. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the claim event detection method of any of claims 1 to 7.
CN202310440660.0A 2023-04-23 2023-04-23 Method for detecting claim event, computer device and storage medium Active CN116150341B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310440660.0A CN116150341B (en) 2023-04-23 2023-04-23 Method for detecting claim event, computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310440660.0A CN116150341B (en) 2023-04-23 2023-04-23 Method for detecting claim event, computer device and storage medium

Publications (2)

Publication Number Publication Date
CN116150341A CN116150341A (en) 2023-05-23
CN116150341B true CN116150341B (en) 2023-07-18

Family

ID=86358605

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310440660.0A Active CN116150341B (en) 2023-04-23 2023-04-23 Method for detecting claim event, computer device and storage medium

Country Status (1)

Country Link
CN (1) CN116150341B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112232971A (en) * 2020-10-14 2021-01-15 太平金融科技服务(上海)有限公司 Anti-fraud detection method, anti-fraud detection device, computer equipment and storage medium
CN112417099A (en) * 2020-11-20 2021-02-26 南京邮电大学 Method for constructing fraud user detection model based on graph attention network

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3752929A4 (en) * 2018-02-16 2021-11-17 Munich Reinsurance America, Inc. Computer-implemented methods, computer-readable media, and systems for identifying causes of loss
CN109636061B (en) * 2018-12-25 2023-04-18 深圳市南山区人民医院 Training method, device and equipment for medical insurance fraud prediction network and storage medium
US20220100857A1 (en) * 2020-09-28 2022-03-31 Elasticsearch B.V. Systems and Methods of Anomalous Pattern Discovery and Mitigation
EP3975092A1 (en) * 2020-09-29 2022-03-30 MasterCard International Incorporated Method and system for detecting fraudulent transactions
US20220300903A1 (en) * 2021-03-19 2022-09-22 The Toronto-Dominion Bank System and method for dynamically predicting fraud using machine learning
CN113837886B (en) * 2021-09-16 2024-05-31 之江实验室 Knowledge-graph-based vehicle insurance claim fraud risk identification method and system
CN114580263A (en) * 2021-12-02 2022-06-03 国家电网有限公司信息通信分公司 Knowledge graph-based information system fault prediction method and related equipment
CN114155009A (en) * 2021-12-06 2022-03-08 华东交通大学 Fraud detection method and device, electronic equipment and storage medium
CN114840745A (en) * 2022-03-30 2022-08-02 达而观信息科技(上海)有限公司 Personalized recommendation method and system based on graph feature learning and deep semantic matching model
CN115063035A (en) * 2022-07-21 2022-09-16 平安健康保险股份有限公司 Customer evaluation method, system, equipment and storage medium based on neural network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112232971A (en) * 2020-10-14 2021-01-15 太平金融科技服务(上海)有限公司 Anti-fraud detection method, anti-fraud detection device, computer equipment and storage medium
CN112417099A (en) * 2020-11-20 2021-02-26 南京邮电大学 Method for constructing fraud user detection model based on graph attention network

Also Published As

Publication number Publication date
CN116150341A (en) 2023-05-23

Similar Documents

Publication Publication Date Title
EP3985578A1 (en) Method and system for automatically training machine learning model
WO2020253358A1 (en) Service data risk control analysis processing method, apparatus and computer device
CN110363449B (en) Risk identification method, device and system
CN108170909B (en) Intelligent modeling model output method, equipment and storage medium
CN112291807B (en) Wireless cellular network traffic prediction method based on deep migration learning and cross-domain data fusion
CN110781970B (en) Classifier generation method, device, equipment and storage medium
CN110287316A (en) A kind of Alarm Classification method, apparatus, electronic equipment and storage medium
CN108241867B (en) Classification method and device
Kass et al. Improving area of occupancy estimates for parapatric species using distribution models and support vector machines
CN111415167B (en) Network fraud transaction detection method and device, computer storage medium and terminal
CN114493052A (en) Multi-model fusion self-adaptive new energy power prediction method and system
CN115545103A (en) Abnormal data identification method, label identification method and abnormal data identification device
CN115618008A (en) Account state model construction method and device, computer equipment and storage medium
CN111582313B (en) Sample data generation method and device and electronic equipment
CA3179311A1 (en) Identifying claim complexity by integrating supervised and unsupervised learning
CN116150341B (en) Method for detecting claim event, computer device and storage medium
CN115758271A (en) Data processing method, data processing device, computer equipment and storage medium
CN116029760A (en) Message pushing method, device, computer equipment and storage medium
CN113409096B (en) Target object identification method and device, computer equipment and storage medium
CN112199434B (en) Data processing method, device, electronic equipment and storage medium
CN111737319B (en) User cluster prediction method, device, computer equipment and storage medium
CN111860655B (en) User processing method, device and equipment
CN114154617A (en) Low-voltage resident user abnormal electricity utilization identification method and system based on VFL
CN117273959A (en) Method for detecting claim event, computer device and storage medium
Zeng et al. Anomaly detection for high‐dimensional dynamic data stream using stacked habituation autoencoder and union kernel density estimator

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant