WO2019196552A1

WO2019196552A1 - Data processing method, apparatus and device for insurance fraud identification, and server

Info

Publication number: WO2019196552A1
Application number: PCT/CN2019/074097
Authority: WO
Inventors: 王修坤; 邹晓川
Original assignee: 阿里巴巴集团控股有限公司
Priority date: 2018-04-12
Filing date: 2019-01-31
Publication date: 2019-10-17
Also published as: TWI686760B; TW201944338A; CN108334647A

Abstract

A data processing method, apparatus and device for insurance fraud identification, and a server, wherein multi-scale relationship network graph data of a crowd are built on the basis of multi-scale relationship association data of insurance applicants and insurants, a relationship network among people may be more deeply mined, identification efficiency is improved, and identification range is widened. At the same time, a supervised learning model is jointly built according to characteristic data of a fraudster and is used for learning relationship network characteristics and personal characteristics of the fraudster. Accomplice fraudsters will have obvious multi-scale relationship characteristics in the relationship network, the characteristics of the fraudsters frequently indicate similarities, thus the fraudsters may be more effectively and efficiently identified by using the described method, and identification processing efficiency is improved.

Description

Data processing method, device, device and server for insurance fraud identification

Technical field

The embodiment of the present specification belongs to the technical field of computer data processing for insurance fraud detection, and particularly relates to a data processing method, device, processing device and server for insurance fraud.

Background technique

Insurance is the financial and personal protection that can be enjoyed by paying the prescribed premiums. With the economic development of society and the awareness of people's insurance, the demand for insurance business is also increasing.

However, due to the certain economic leverage effect of insurance, there is a large amount of fraudulent behavior in the market. These fraudsters usually deliberately create insurance and obtain insurance company compensation. The current fraudulent behavior has developed into a trend of specialization and teamwork, which has a very negative impact on the healthy development of the insurance industry, and damages the interests of insurance companies and the public. At present, the traditional way of identifying fraud insurance mainly relies on manual use of some simple rules to identify historical fraudsters, and predicts whether there is a risk of fraud insurance by the behavior of historical fraudsters. Due to the increasing concealment of fraudulent personnel and groups, the existing method is not easy to quickly find group crimes, and the manual review has a large workload and the recognition efficiency is relatively low.

Therefore, there is a need in the industry for a way to identify fraudsters more effectively and efficiently.

Summary of the invention

The embodiment of the present specification aims to provide a data processing method, device, processing device and server for insurance fraud, which can provide network data and self-characteristics between the use personnel, and can more effectively identify the fraudster.

The data processing method, device, processing device and server for insurance fraud provided by the embodiments of the present specification are implemented by the following methods:

Obtaining relationship-related data of the people to be identified;

Constructing the multi-degree relationship network map data of the to-be-identified group based on the relationship-related data and extracting the person characteristic data of the to-be-identified group;

Identifying the multi-degree relationship network map data and the person characteristic data of the to-be-identified group by using the constructed supervised learning algorithm, and determining that the to-be-identified group defrauds the output result; the supervised learning algorithm includes adopting to select The data relationship model of the target group's multi-degree relationship network data and personnel characteristic data, and the marking history of the fraudsters as training data.

A data processing device for insurance fraud identification, comprising:

a data acquisition module, configured to acquire relationship association data of the to-be-identified group;

a feature calculation module, configured to construct the multi-degree relationship network map data of the to-be-identified group based on the relationship-related data, and extract the person characteristic data of the to-be-identified group;

a fraud identification module, configured to identify the multi-degree relationship network map data and the person characteristic data of the to-be-identified group by using the constructed supervised learning algorithm, and confirm that the to-be-identified group defrauds the output result; The learning algorithm includes a data relationship model obtained by using the multi-degree relationship network data and the person characteristic data of the selected target group, and the marked historical fraud insurance personnel as the sample data.

A processing device includes a processor and a memory for storing processor-executable instructions that, when executed by the processor, are implemented:

Obtaining relationship-related data of the people to be identified;

A server comprising at least one processor and a memory for storing processor-executable instructions, the processor implementing the instructions to:

Obtaining relationship-related data of the people to be identified;

The data processing method, device, processing device and server for insurance fraud provided by the embodiments of the present specification are based on the multi-dimensional relationship data of the insured person and the insured person to construct the multi-degree relationship network map data of the crowd, which can be more deeply Exploring the network of relationships between people to improve the efficiency and scope of identification. At the same time, combined with the characteristics data of the fraudsters themselves, a supervised learning model is established to learn the relationship network characteristics and characteristics of the fraudsters. The gang's swindlers not only have obvious and abundance of relationship characteristics on the relationship network, but also their own characteristics often show similarities. Therefore, the methods provided in the embodiments of the present specification can identify fraudsters more effectively and efficiently. Improve the efficiency of recognition processing.

DRAWINGS

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings to be used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only It is a few embodiments described in the present specification, and other drawings can be obtained from those skilled in the art without any inventive labor.

1 is a schematic flow chart of an embodiment of a data processing method for insurance fraud identification provided by the present specification;

2 is a schematic diagram of a processing procedure for constructing a supervised recognition model provided by the present specification;

3 is a block diagram showing the hardware structure of an insurance fraud identification processing server provided by the present specification;

4 is a block diagram showing the structure of a data processing apparatus for insurance fraud identification provided by the present specification.

FIG. 5 is a block diagram showing the structure of a fraud identification module in a data processing apparatus for insurance fraud identification provided by the present specification.

detailed description

In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the specification. The embodiments are only a part of the embodiments in the specification, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on one or more embodiments of the present specification without departing from the scope of the present invention should fall within the scope of the embodiments of the present invention.

Birds of a feather flock together. Defrauding people usually need more people to cooperate to improve the camouflage of fraud. In many cases, the aggregation of fraudsters will also be based on acquaintance relationships or network relationship characteristics with more obvious common characteristics or a certain dimension. For example, the fraudulent behavior of partnerships between relatives, the pyramid schemes of the pyramid schemes with obvious class division, and the experienced historical fraudsters are the social groups or student groups that the leaders are wooing. The embodiment of the present specification provides a plurality of embodiments, which are triggered by multiple relationship-related data of a target group including an insured person and an application for claiming personnel, and the composition of the multi-degree relationship network is performed (the data of the relationship network graph may be referred to as a multi-degree relationship graph). Data), dig deep into the network of relationships between target groups, and solve the problem of low coverage and low recognition rate that is conventionally identified only for historical scammers and one-time relationships that are directly related to historical scammers. At the same time, the solution provided by the embodiment of the present specification also considers the characteristic attributes of the fraudsters themselves, such as the fraudulent inspector usually uses the false information to register the account, the account registration time is short, and the account is registered to use the insured service. The implementation scheme provided by the present specification combines the relationship characteristic data of the fraud insurance group and the self-characteristic data to mark the historical fraud insurance personnel and perform algorithm learning with the supervised model, so that the person to be identified can be calculated or identified whether the fraudulent insurance exists. result.

The following describes an embodiment of the present specification by taking an application scenario of a specific insurance service fraud identification processing as an example. Specifically, FIG. 1 is a schematic flowchart diagram of an embodiment of a data processing method for insurance fraud identification provided by the present specification. Although the present specification provides method operation steps or device structures as shown in the following embodiments or figures, there may be more or partial merged fewer operational steps in the method or device based on conventional or no inventive labor. Or module unit. In the steps or structures in which the necessary causal relationship does not exist logically, the execution order of the steps or the module structure of the device is not limited to the execution order or the module structure shown in the embodiment or the drawings. When the device, server or terminal product of the method or module structure is applied, it may be executed sequentially or in parallel according to the method or module structure shown in the embodiment or the drawing (for example, parallel processor or multi-thread processing). Environment, even including distributed processing, server cluster implementation environment).

Of course, the description of the following embodiments does not constitute a limitation on other expandable technical solutions based on the present specification. For example, in other implementation scenarios, the embodiments provided in this specification can also be applied to implementation scenarios of fund fraud identification, product transactions, service transactions, and the like. For a specific implementation, as shown in FIG. 1 , the data processing method for insurance fraud identification provided by the present specification may include:

S0: obtaining relationship-related data of the to-be-identified person;

S2: constructing the multi-degree relationship network map data of the to-be-identified group based on the relationship-related data, and extracting the person characteristic data of the to-be-identified group;

S4: using the constructed supervised learning algorithm to identify the multi-degree relationship network map data and the person characteristic data of the to-be-identified group, and confirm that the to-be-identified group defrauds the output result; the supervised learning algorithm includes adopting Data relationship model obtained by training the selected target population's multi-degree relationship network data and personnel characteristic data, marking historical fraud insurance personnel as sample data

In the application scenario of the present embodiment, the insurance insurance, accounting, and claims are mainly applied to the claiming personnel. In the embodiment of the present specification, the situation in which the fraud insurance motivation occurs from the beginning of the insurance is considered in some embodiments, and the fraud insurance personnel are The main purpose is to apply for insurance benefits, and of course there are some motives that are only after the insurance. The insured person is the main subject of the insurance, such as the fraudulent person of the fellow group deliberately creating the accident of the insured person. Therefore, in the present embodiment, the claimant and the insured are selected when identifying the target group when the fraud is present. Staff collection. Therefore, in some embodiments of the method described in the present specification, when the target group is selected to perform the acquisition learning of the relationship feature data, the target group may include a set of persons applying for the claimant and the insured. It should be noted that the application claimant may include the insured in some implementation situations, such as the father insuring the son, the father as the beneficiary, and the father applying for the claimant after the accident; or in some cases, the claimant may also include the insured. Personnel, such as the insured person, insured himself, the beneficiary is himself. The application claimant and the insured mentioned in the above can understand the names of the personnel in different roles in the insurance business, and are not currently different personnel. Some claimants and insured persons shown in some implementation scenarios can All or part of the same.

Of course, in other embodiments, the selection of the target group may also select one or more of the claim applicant or the insured or the insured or the beneficiary.

The relationship association data may include data information associated with personnel in the target group in various dimensions, such as household registration, age, relative/classmate relationship between personnel, insurance data, insurance risk data, and the like. The specific relationship-related data may be selected according to the actual application scenario to determine which categories of data are used. Generally, the operator may use the data information that may be involved in the fraudulent behavior as the basis for collecting the relationship-related data. In an embodiment provided by the present specification, the relationship-related data may include at least one of the following:

Social relationship data, terminal data, terminal application and application account operation information, behavior data associated with insurance behavior, personnel basic attribute data, and geographic location data.

The social relationship data may include social relationships between people in the target group, such as cousins, teachers and students, family members, classmates, leaders, and subordinates. The terminal data may include a brand, a model, and a category of a communication device used by a person, and some people in the fraud scene use a mobile phone of the same brand. The application of the terminal and the application account operation information may be used to determine whether to use the same application, and use the same account to log in to different terminal applications for insurance fraud operation. In some scenarios, multiple following listeners are unified and commanded on the terminal. Take action. The behavior data associated with the insurance behavior may include behavior data of the target person's insurance behavior, claim behavior, compensation amount, and the like. The personnel basic attribute data may include the age, gender, occupation, household registration, and the like of the applicant/applicant. The geographic location data may include geographic location information currently in which the target population is located or information of a region that has historically passed/detained fruit. Of course, the data relationship association data of each dimension described above may have other definitions or contain more/less data categories and information, and may also include relationship-related data of other dimensions than the above, such as consumption information. Even credit records or administrative penalty information, one or more of the above data information may be collected during specific collection.

The personnel characteristic data may include data information associated with a single person itself, such as gender, age, insurance service account number or terminal application account registration time, credit history, consumption status, etc., or may also include behaviors associated with insurance behaviors. Data, such as multiple insurance behaviors, frequent claims, and whether the amount of compensation is normal. It can also include transaction data for other goods or services, such as long-term large expenses, multiple vehicle insurance, multiple mobile phones, and multiple communication accounts/social accounts.

The person characteristic data used for the specific person feature calculation may adopt a combination of one or more of the above to realize the identification of the person's own characteristics. Therefore, in another embodiment, the person characteristic data may include feature data extracted by at least one of a user registration account, transaction data, and behavior data associated with the insurance behavior.

There is usually a relatively close relationship network between the personnel of the scam group. In this embodiment, the multi-dimensional relationship association data obtained above can be used to construct the multi-degree relationship network map data of the target group. The multi-degree relationship network graph data may include a relationship network graph generated based on a relationship chain between different people established by the relationship association data, wherein the relationship chain data between the persons on the relationship network graph is a multi-degree relationship Network graph data. The relationship chain can represent relationship data between every two people, such as A and B are boss relationships, A and C are family relationships, and the like. The relationship between two separate persons may be referred to as a one-degree relationship, and the “multiple degrees” in the multi-degree relationship network map data described in this embodiment may include associations between new persons established based on the one-degree relationship. The data, such as the second relationship between the first person and the third person based on the one-time relationship between the first person and the second person and the one-time relationship between the second person and the third person, may even further establish the first relationship based on the other relationship The third relationship between a person and a fourth person, and so on.

As an example, where A is a single person and B is a brother-in-law of A, then A and B are once social relations, and A has no social relationship with company boss C of brother-in-law B, but in the present embodiment, due to the existence of B is both a brother-in-law of A and a subordinate of company boss C, so A establishes a second relationship with company boss C.

In addition to the social relationship between the above-mentioned personnel, it is also possible to form other types of multi-degree relationship network map data according to the relationship-related data or relationship construction requirements adopted, such as whether it is a fellow, using the same communication tool, and on a multi-person terminal. An application logs in at a fixed time period, and so on. Of course, in the specific implementation of constructing the relationship network based on the relationship-related data, the determination between the relationships may be pre-designed to establish a relationship chain rule.

Based on the constructed multi-degree relationship network graph data and the extracted personnel feature data, the embodiment can use the supervised learning algorithm to learn the relationship characteristics and self-features of the fraud-investigating personnel, so that an effective recognition model can be established.

In general, the common methods of machine learning are mainly divided into supervised learning, sometimes referred to as supervised learning and unsupervised learning. Supervised learning is a kind of classification processing. Usually, for a tagged data set, an existing optimal training model (ie known data and its corresponding output) is used to train to obtain an optimal model (this model belongs to a set of functions). , the best means that it is the best under a certain evaluation criterion), and then use this model to map all the inputs to the corresponding output, and make a simple judgment on the output to achieve the purpose of classification, which has the unknown data. The ability to classify. Typical examples of supervised learning are KNN (k-NearestNeighbor), SVM (Support Vector Machine), and support vector machines. A supervised learning algorithm can obtain more accurate output results than a supervised algorithm with a certain number of training samples.

According to the selected supervised learning algorithms, the processing processes of other specific relationship features and self-features are designed and determined according to the type of algorithm and the recognition processing requirements. For example, a supervised graph algorithm such as Structure2vec can be used. For example, in one embodiment, the constructed supervised learning algorithm includes:

S40: using the selected supervised learning algorithm to perform the first relationship network learning on the relationship characteristics between the target personnel and other personnel in the multi-degree relationship network data of the target group, and performing the second self-attribute learning based on the self-characteristic data of the target person characteristics ;

S42: using the feature data obtained by the first relational network learning and the second self-attribute learning as an independent variable of the supervised learning algorithm, and establishing a relationship model by using a labeled historical fraudster as a dependent variable;

S44: Determine a constructed supervised learning algorithm when the output of the relationship model reaches a preset accuracy rate.

2 is a schematic diagram of a processing procedure of an embodiment of a supervised learning algorithm provided by the present specification.

In the example shown in Figure 2, Structure2vec's supervised graph algorithm can be used: on the one hand, to learn the relationship characteristics of the target person and its neighbors (such as how many people are related, whether it is related to the fraudster), on the other hand Learning the characteristics of the target person (such as gender, age, etc.), the above characteristics are used as the x variable of the model; secondly, according to the historical marking, whether it is the scammers as the y variable; finally, the relevant model is established according to y and x, thereby achieving The y case can be predicted only by relying on x.

The final identification in the application scenario in this embodiment may be a single person. That is to say, the reason in this embodiment is that after the supervised learning algorithm learns the relationship characteristics of the gang fraud insurance, and then combines the characteristics of the fraud insurance personnel, it can directly obtain whether the person to be identified is a fraudulent or fraudulent person. The scam guarantees the output. For example, the probability of marking a person as a fraudster or a normal person, or as a fraudster.

Of course, the mark described here is the recognition result of the fraudster based on the relationship feature and the self-characteristic, and can be used as a basis and reference for initially determining whether these people are fraudulent persons. Final determination of whether it is a fraudulent insurance can be subjective judgment by the operator, or combined with other calculation methods to judge and determine.

The data processing method of the insurance fraud provided by the embodiment can construct the multi-degree relationship network map data of the crowd based on the multi-dimensional relationship association data of the insured person and the insured person, and can further explore the relationship network between the personnel and improve the relationship network. Identify efficiency and scope. At the same time, combined with the characteristics data of the fraudsters themselves, a supervised learning model is established to learn the relationship network characteristics and characteristics of the fraudsters. The gang's swindlers not only have obvious and abundance of relationship characteristics on the relationship network, but also their own characteristics often show similarities. Therefore, the methods provided in the embodiments of the present specification can identify fraudsters more effectively and efficiently. Improve the efficiency of recognition processing.

In another embodiment of the method provided by the present specification, the data information of the historical fraudster may be combined with the multi-degree relationship network map data for identification of the fraudster. Specifically, in another embodiment of the method provided by the present specification, the relationship association data may further include: historical fraudulent personnel list data.

In this embodiment, the data information of the historical fraud insurance group is added, and when the classified community is analyzed and processed, the degree of participation of the historical fraud insurance personnel is considered. In general, if historical fraudsters have a high concentration of relationships in a classified community, the likelihood that a person in the classified community will swindle is greater. The relationship concentration described in this embodiment may include the degree of participation of the historical fraud insurance personnel, and may specifically include the number of historical fraud insurance personnel in the classified community, the proportion of historical fraud insurance personnel, the history fraud protector and other personnel. The degree of relationship density and so on. An example of the degree of relationship intensiveness is as follows: in a risk community of 10 people, 2 historical fraudsters are kinship with one or more other relationships, and 2 employees are classmates. Indicates a fraudulent gang that may be a pyramid scheme. The specific relationship concentration can be calculated in different ways, such as the number of historical fraudsters, the proportion, and the relationship network. In another embodiment, the relationship concentration may be calculated from two indicators of the size of the person to be identified and the number of historical fraudsters, and the relationship concentration may be used as a measure of the probability of fraudulent insurance. . Specifically, it may include:

Taking the logarithm of the number of persons to be identified as the first factor;

Taking the proportion of the number of historical fraudsters in the person to be identified as the second factor;

And determining, according to the product of the first factor and the second factor, a group fraud probability of the to-be-identified group.

Then, the personal fraud probability value calculated by combining the self-features can be calculated, and the group fraud probability is calculated to determine the probability that the final output group is a fraud or a single person is fraudulent. Or the group fraud insurance probability and the personal fraud insurance probability are respectively utilized separately, and no mutual calculation is performed.

For example, in the specific implementation, the probability of community fraud can be calculated in the following way:

RiskDegree=log (total number of classified community members)* Number of historical fraudsters/total number of classified community members.

Of course, other calculation methods or deformation, transformation methods, such as taking natural logarithms, etc., may be used, and limitations and details are herein.

The above embodiments provide a fraudulent group that can use fraudulent data from a historical fraudster to identify fraudulent insurance. In another embodiment provided by the present specification, the relationship network feature between each member of the group can be utilized to determine whether it is a fraudster. Specifically, such as determining the network structure characteristics of the personnel relationship in the crowd;

If the network structure feature conforms to a preset fraud protection network structure, the crowd is marked as a fraud group.

The above described method can be used in the training of a supervised learning algorithm, and the crowd is a target person. In the process of identifying the person to be identified, the crowd is the person to be identified.

The network structure feature may be based on personnel information in a crowd, network information between people, and the like. The relationship network information herein may be the one-time information described above, and may also include the constructed multi-degree information.

A certain algorithm can be used to identify the characteristics of the relationship network in the analysis community. If the network structure features meet the characteristics of the fraud protection group, it can be marked as a fraud group at this time. For example, in one example, the relationship network in the crowd may be a network structure such as a "spherical network" or a "pyramid network." The “pyramid network” is similar to a pyramid scheme organization, with a layer-by-layer relationship structure, which is more likely to be fraudulent. The “spherical network” is a fraudulent organization that is related to each other in the network and may be decentralized.

The data processing method of the insurance fraud provided by the embodiment of the present specification, the mining of the relational data supporting relation network algorithm using the relational network close to the actual relational network, and the calculation of the relationship network data of the multi-degree relationship. Based on the multi-dimensional relationship between the insured and the insured, the multi-degree relationship network map data of the crowd can be used to more deeply explore the relationship network between the people and improve the recognition efficiency and scope. At the same time, combined with the characteristics data of the fraudsters themselves, a supervised learning model is established to learn the relationship network characteristics and characteristics of the fraudsters. The gang's swindlers not only have obvious and abundance of relationship characteristics on the relationship network, but also their own characteristics often show similarities. Therefore, the methods provided in the embodiments of the present specification can identify fraudsters more effectively and efficiently. Improve the efficiency of recognition processing.

The method described above can be used for insurance fraud identification on the client side, such as the anti-fraud application installed by the mobile terminal and the insurance service provided by the payment application. The client can be a PC (personal computer), a server, an industrial computer (industrial control computer), a mobile smart phone, a tablet electronic device, a portable computer (such as a laptop computer, etc.), a personal digital assistant (PDA), or a desktop. Computer or smart wearable device, etc. Mobile communication terminals, handheld devices, in-vehicle devices, wearable devices, television devices, computing devices. It can also be applied to the insurance service party or the system server of the servant or the third party institution, which may include a separate server, a server cluster, a distributed system server or a server that processes the device request data and other associated data processing. System server combination.

The method embodiments provided by the embodiments of the present specification can be executed in a mobile terminal, a computer terminal, a server, or the like. Taking the operation on the server as an example, FIG. 3 is a block diagram showing the hardware structure of a server for identifying a damaged component of a vehicle according to an embodiment of the present invention. As shown in FIG. 3, server 10 may include one or more (only one shown) processor 102 (processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), A memory 104 for storing data, and a transmission module 106 for communication functions. It will be understood by those skilled in the art that the structure shown in FIG. 3 is merely illustrative and does not limit the structure of the above electronic device. For example, server 10 may also include more or fewer components than those shown in FIG. 3, for example, may also include other processing hardware, such as a database or multi-level cache, or have a different configuration than that shown in FIG.

The memory 104 can be used to store software programs and modules of application software, such as program instructions/modules corresponding to the search method in the embodiment of the present invention, and the processor 102 executes various functions by running software programs and modules stored in the memory 104. Application and data processing, that is, a processing method for realizing the content display of the above navigation interaction interface. Memory 104 may include high speed random access memory, and may also include non-volatile memory such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory. In some examples, memory 104 may further include memory remotely located relative to processor 102, which may be coupled to computer terminal 10 via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission module 106 is configured to receive or transmit data via a network. The network specific examples described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transport module 106 includes a Network Interface Controller (NIC) that can be connected to other network devices through a base station to communicate with the Internet. In one example, the transmission module 106 can be a Radio Frequency (RF) module for communicating with the Internet wirelessly.

Based on the device model identification method described above, the present specification also provides a data processing apparatus for insurance fraud identification. The apparatus may include a system (including a distributed system), software (applications), modules, components, servers, clients, etc., using the methods described in the embodiments of the present specification, in conjunction with necessary device hardware for implementing the hardware. Based on the same innovative concept, the processing device in one embodiment provided by this specification is as described in the following embodiments. For the implementation of the specific processing device in the embodiment of the present specification, reference may be made to the implementation of the foregoing method, and details are not described herein again. Although the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated. Specifically, as shown in FIG. 4, FIG. 4 is a schematic structural diagram of a module of an embodiment of a data processing apparatus for insurance fraud identification provided by the present specification, which may include:

The data obtaining module 101 is configured to obtain relationship association data of the to-be-identified group;

The feature calculation module 102 may be configured to construct the multi-degree relationship network map data of the to-be-identified group based on the relationship-related data, and extract the person characteristic data of the to-be-identified group;

The fraud identification module 103 may be configured to identify the multi-degree relationship network map data and the person characteristic data of the to-be-identified group by using the constructed supervised learning algorithm, and determine that the to-be-identified group defrauds the output result; The supervised learning algorithm includes a data relation model obtained by using the multi-degree relationship network data and personnel characteristic data of the selected target population, and the marked historical fraud insurance personnel as the sample data.

In a specific embodiment of the apparatus, the relationship association data may include at least one of the following:

In another embodiment of the apparatus, the fraud identification module 103 determines that the to-be-identified crowd fraud output output includes a probability of outputting a single target person to be identified as a fraudulent person or a fraudulent person.

In another embodiment of the apparatus, the selected target population includes a collection of persons applying for claims and insured persons.

In another embodiment of the apparatus, the person characteristic data includes feature data extracted by at least one of a user registration account number, transaction data, and behavior data associated with the insurance behavior.

FIG. 5 is another embodiment of the apparatus. As shown in FIG. 5, the fraud identification module 103 includes:

The feature learning module 1031 may be configured to perform, by using the selected supervised learning algorithm, the relationship between the target person and the other person in the multi-degree relationship network data of the target group, the first relationship network learning, and the self-characteristic data based on the target person feature Performing a second self attribute learning;

The relationship establishing module 1032 may be configured to use the feature data obtained by the first relationship network learning and the second self attribute learning as an independent variable of the supervised learning algorithm, and establish a relationship by using a labeled historical fraud person as a dependent variable. model;

The model training module 1033 can be configured to determine a constructed supervised learning algorithm when the output of the relationship model reaches a preset accuracy rate. The training iteration of the parameters in the model can be used as an online when the output accuracy requirements are met.

The server or client provided by the embodiment of the present specification may be implemented by a processor executing a corresponding program instruction in a computer, such as a C++ language of a Windows operating system, implemented on a PC or a server, or other corresponding to, for example, Linux or a system. Apply the design language to the necessary hardware implementations, or to implement processing logic based on quantum computers. Accordingly, the present specification also provides a data processing device for insurance fraud identification, which may specifically include a processor and a memory for storing processor-executable instructions, the processor implementing the instructions to:

Obtaining relationship-related data of the people to be identified;

The above instructions may be stored in a variety of computer readable storage media. The computer readable storage medium may include physical means for storing information, which may be digitized and stored in a medium utilizing electrical, magnetic or optical means. The computer readable storage medium of this embodiment may include: means for storing information by means of electrical energy, such as various types of memories, such as RAM, ROM, etc.; means for storing information by magnetic energy means, such as hard disk, floppy disk, magnetic tape, magnetic Core memory, bubble memory, U disk; means for optically storing information such as CD or DVD. Of course, there are other ways of readable storage media such as quantum memories, graphene memories, and the like. The instructions involved in the above described device or server or client or processing device are as described above.

The processing device may specifically provide an insurance anti-fraud identification server for an insurance server or a third-party service organization, and the server may be a separate server, a server cluster, a distributed system server, or a server that processes data by requesting data. System server combination for data processing. Accordingly, embodiments of the present specification also provide a specific server product, the server including at least one processor and a memory for storing processor-executable instructions, the processor implementing the instructions to:

Obtaining relationship-related data of the people to be identified;

It should be noted that the apparatus, the processing device, and the server described in the foregoing embodiments of the present specification may further include other embodiments according to the description of the related method embodiments. For a specific implementation, reference may be made to the description of the method embodiments, and details are not described herein.

The various embodiments in the specification are described in a progressive manner, and the same or similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the hardware + program type embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.

The foregoing description of the specific embodiments of the specification has been described. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than the embodiments and still achieve the desired results. In addition, the processes depicted in the figures are not necessarily in a particular order or in a sequential order to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

Although the present application provides method operational steps as described in the embodiments or flowcharts, more or less operational steps may be included based on conventional or non-creative labor. The order of the steps recited in the embodiments is only one of the many steps of the order of execution, and does not represent a single order of execution. When the actual device or system server product is executed, it may be executed sequentially or in parallel according to the method shown in the embodiment or the drawings (for example, an environment of parallel processor or multi-thread processing).

Although the content of the embodiment of the present specification refers to the type of relationship-related data collection, the range of the target population selected during training, the probability of calculating the probability of fraudulent insurance, etc., data acquisition, storage, interaction, calculation, judgment, etc. The data is described, however, embodiments of the present specification are not limited to situations that must be consistent with industry communication standards, standard oversight or unsupervised model processing, communication protocols, and standard data models/templates or embodiments of the specification. Certain industry standards or implementations that have been modified in a manner that uses a custom approach or an embodiment described above may also achieve the same, equivalent, or similar, or post-deformation implementation effects of the above-described embodiments. Embodiments obtained by applying such modified or modified data acquisition, storage, judgment, processing, etc., may still fall within the scope of alternative embodiments of the present specification.

In the 1990s, improvements to a technology could clearly distinguish between hardware improvements (eg, improvements to circuit structures such as diodes, transistors, switches, etc.) or software improvements (for process flow improvements). However, as technology advances, many of today's method flow improvements can be seen as direct improvements in hardware circuit architecture. Designers almost always get the corresponding hardware circuit structure by programming the improved method flow into the hardware circuit. Therefore, it cannot be said that the improvement of a method flow cannot be implemented by hardware entity modules. For example, a Programmable Logic Device (PLD) (such as a Field Programmable Gate Array (FPGA)) is an integrated circuit whose logic function is determined by the user programming the device. Designers program themselves to "integrate" a digital system on a single PLD without having to ask the chip manufacturer to design and fabricate a dedicated integrated circuit chip. Moreover, today, instead of manually making integrated circuit chips, this programming is mostly implemented using "logic compiler" software, which is similar to the software compiler used in programming development, but before compiling The original code has to be written in a specific programming language. This is called the Hardware Description Language (HDL). HDL is not the only one, but there are many kinds, such as ABEL (Advanced Boolean Expression Language). AHDL (Altera Hardware Description Language), Confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., are currently the most commonly used VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog. It should also be apparent to those skilled in the art that the hardware flow for implementing the logic method flow can be easily obtained by simply programming the method flow into the integrated circuit with a few hardware description languages.

The controller can be implemented in any suitable manner, for example, the controller can take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (eg, software or firmware) executable by the (micro)processor. In the form of logic gates, switches, application specific integrated circuits (ASICs), programmable logic controllers, and embedded microcontrollers, examples of controllers include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, The Microchip PIC18F26K20 and the Silicone Labs C8051F320, the memory controller can also be implemented as part of the memory's control logic. Those skilled in the art will also appreciate that in addition to implementing the controller in purely computer readable program code, the controller can be logically programmed by means of logic gates, switches, ASICs, programmable logic controllers, and embedding. The form of a microcontroller or the like to achieve the same function. Such a controller can therefore be considered a hardware component, and the means for implementing various functions included therein can also be considered as a structure within the hardware component. Or even a device for implementing various functions can be considered as a software module that can be both a method of implementation and a structure within a hardware component.

The processing device, device, module or unit set forth in the above embodiments may be implemented by a computer chip or an entity, or by a product having a certain function. A typical implementation device is a computer. Specifically, the computer can be, for example, a personal computer, a laptop computer, a car-mounted human-machine interaction device, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet. A computer, wearable device, or a combination of any of these devices.

Although embodiments of the present specification provide method operational steps as described in the embodiments or flowcharts, more or fewer operational steps may be included based on conventional or non-creative means. The order of the steps recited in the embodiments is only one of the many steps of the order of execution, and does not represent a single order of execution. When the actual device or terminal product is executed, it may be executed sequentially or in parallel according to the embodiment or the method shown in the drawings (for example, a parallel processor or a multi-threaded environment, or even a distributed data processing environment). The terms "comprising," "comprising," or "comprising" or "comprising" or "the" Elements, or elements that are inherent to such a process, method, product, or device. In the absence of further limitations, it is not excluded that there are additional identical or equivalent elements in the process, method, product, or device.

For the convenience of description, the above devices are described as being separately divided into various modules by function. Of course, in the implementation of the embodiments of the present specification, the functions of the modules may be implemented in the same software or software, or the modules that implement the same function may be implemented by multiple sub-modules or a combination of sub-units. The device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or integrated. Go to another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.

Those skilled in the art will also appreciate that in addition to implementing the controller in purely computer readable program code, the controller can be logically programmed by means of logic gates, switches, ASICs, programmable logic controllers, and embedding. The form of a microcontroller or the like to achieve the same function. Therefore, such a controller can be considered as a hardware component, and a device for internally implementing it for implementing various functions can also be regarded as a structure within a hardware component. Or even a device for implementing various functions can be considered as a software module that can be both a method of implementation and a structure within a hardware component.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.

The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.

These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory. Memory is an example of a computer readable medium.

Computer readable media includes both permanent and non-persistent, removable and non-removable media. Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.

Those skilled in the art will appreciate that embodiments of the present specification can be provided as a method, system, or computer program product. Thus, embodiments of the present specification can take the form of an entirely hardware embodiment, an entirely software embodiment or a combination of software and hardware. Moreover, embodiments of the present specification can take the form of a computer program product embodied on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

Embodiments of the present description can be described in the general context of computer-executable instructions executed by a computer, such as a program module. Generally, program modules include routines, programs, objects, components, data structures, and the like that perform particular tasks or implement particular abstract data types. Embodiments of the present specification can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are connected through a communication network. In a distributed computing environment, program modules can be located in both local and remote computer storage media including storage devices.

The various embodiments in the specification are described in a progressive manner, and the same or similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment. In the description of the present specification, the description with reference to the terms "one embodiment", "some embodiments", "example", "specific example", or "some examples" and the like means a specific feature described in connection with the embodiment or example. The structure, materials, or features are included in at least one embodiment or example of the embodiments of the specification. In the present specification, the schematic representation of the above terms is not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in a suitable manner in any one or more embodiments or examples. In addition, various embodiments or examples described in the specification, as well as features of various embodiments or examples, may be combined and combined.

The above descriptions are only examples of the embodiments of the present specification, and are not intended to limit the embodiments of the present specification. Various modifications and changes may be made to the embodiments of the present disclosure. Any modifications, equivalents, improvements, etc. made within the spirit and scope of the embodiments of the present invention are intended to be included within the scope of the appended claims.

Claims

A data processing method for insurance fraud identification, the method comprising:

Obtaining relationship-related data of the people to be identified;

Constructing the multi-degree relationship network map data of the to-be-identified group based on the relationship-related data and extracting the person characteristic data of the to-be-identified group;

Identifying the multi-degree relationship network map data and the person characteristic data of the to-be-identified group by using the constructed supervised learning algorithm, and determining that the to-be-identified group defrauds the output result; the supervised learning algorithm includes adopting to select The data relationship model of the target group's multi-degree relationship network data and personnel characteristic data, and the marking history of the fraudsters as training data.
The method of claim 1, the relationship association data comprising at least one of the following:

Social relationship data, terminal data, terminal application and application account operation information, behavior data associated with insurance behavior, personnel basic attribute data, and geographic location data.
The method of claim 1, wherein said determining the population to be identified by the fraudulent output comprises determining whether a single target person to be identified is a fraudulent person or a fraudulent person.
The method of claim 1 wherein said selected target population comprises a collection of persons applying for claims and insured persons.
The method according to any one of claims 1 to 3, wherein the person characteristic data comprises feature data extracted from at least one of a user registration account number, transaction data, and behavior data associated with the insurance behavior.
The method according to any one of claims 1 to 3, wherein the supervised learning algorithm is constructed in the following manner:

Using the selected supervised learning algorithm to perform the first relationship network learning on the relationship characteristics between the target personnel and other personnel in the multi-degree relationship network data of the target group, and performing the second self-attribute learning based on the self-characteristic data of the target person characteristics;

The feature data obtained by the first relationship network learning and the second self attribute learning is used as an independent variable of the supervised learning algorithm, and the historical fraud controller is used as a dependent variable to establish a relationship model;

The constructed supervised learning algorithm is determined when the output of the relationship model reaches a preset accuracy rate.
A data processing device for insurance fraud identification, comprising:

a data acquisition module, configured to acquire relationship association data of the to-be-identified group;

a feature calculation module, configured to construct the multi-degree relationship network map data of the to-be-identified group based on the relationship-related data, and extract the person characteristic data of the to-be-identified group;

a fraud identification module, configured to identify the multi-degree relationship network map data and the person characteristic data of the to-be-identified group by using the constructed supervised learning algorithm, and confirm that the to-be-identified group defrauds the output result; The learning algorithm includes a data relationship model obtained by using the multi-degree relationship network data and the person characteristic data of the selected target group, and the marked historical fraud insurance personnel as the sample data.
The apparatus of claim 7, wherein the relationship-related data comprises at least one of the following:

Social relationship data, terminal data, terminal application and application account operation information, behavior data associated with insurance behavior, personnel basic attribute data, and geographic location data.
The apparatus of claim 7, wherein the fraud identification module determines that the to-be-identified crowd fraud output output includes a probability of outputting a single target person to be identified as a fraudulent person or a fraudulent person.
The apparatus of claim 7, wherein the selected target population comprises a set of persons applying for a claimant and an insured.
The apparatus according to claim 7 or 9, wherein the person characteristic data comprises feature data extracted from at least one of a user registration account number, transaction data, and behavior data associated with the insurance behavior.
The device of claim 7 or 9, the fraud identification module comprising:

The feature learning module is configured to perform, by using the selected supervised learning algorithm, the relationship between the target person and other personnel in the multi-degree relationship network data of the target group, the first relationship network learning, and the self-characteristic data based on the target person feature Second self-attribute learning;

a relationship establishing module, configured to use the feature data obtained by the first relationship network learning and the second self attribute learning as an independent variable of the supervised learning algorithm, and establish a relationship model by using a historical fraud controller as a dependent variable;

The model training module is configured to determine the constructed supervised learning algorithm when the output of the relationship model reaches a preset accuracy rate.
A processing device includes a processor and a memory for storing processor-executable instructions that, when executed by the processor, are implemented:

Obtaining relationship-related data of the people to be identified;

Constructing the multi-degree relationship network map data of the to-be-identified group based on the relationship-related data and extracting the person characteristic data of the to-be-identified group;

Identifying the multi-degree relationship network map data and the person characteristic data of the to-be-identified group by using the constructed supervised learning algorithm, and determining that the to-be-identified group defrauds the output result; the supervised learning algorithm includes adopting to select The data relationship model of the target group's multi-degree relationship network data and personnel characteristic data, and the marking history of the fraudsters as training data.
A server comprising at least one processor and a memory for storing processor-executable instructions, the processor implementing the instructions to:

Obtaining relationship-related data of the people to be identified;

Constructing the multi-degree relationship network map data of the to-be-identified group based on the relationship-related data and extracting the person characteristic data of the to-be-identified group;

Identifying the multi-degree relationship network map data and the person characteristic data of the to-be-identified group by using the constructed supervised learning algorithm, and determining that the to-be-identified group defrauds the output result; the supervised learning algorithm includes adopting to select The data relationship model of the target group's multi-degree relationship network data and personnel characteristic data, and the marking history of the fraudsters as training data.