CN109345399B - Method, device, computer equipment and storage medium for evaluating risk of claim settlement - Google Patents

Method, device, computer equipment and storage medium for evaluating risk of claim settlement Download PDF

Info

Publication number
CN109345399B
CN109345399B CN201811238812.4A CN201811238812A CN109345399B CN 109345399 B CN109345399 B CN 109345399B CN 201811238812 A CN201811238812 A CN 201811238812A CN 109345399 B CN109345399 B CN 109345399B
Authority
CN
China
Prior art keywords
entity
data
settlement
knowledge
knowledge graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811238812.4A
Other languages
Chinese (zh)
Other versions
CN109345399A (en
Inventor
邢欣来
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201811238812.4A priority Critical patent/CN109345399B/en
Publication of CN109345399A publication Critical patent/CN109345399A/en
Application granted granted Critical
Publication of CN109345399B publication Critical patent/CN109345399B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method, a device, computer equipment and a storage medium for evaluating risk of claim settlement. The method comprises the following steps: acquiring historical claim settlement data, and constructing a knowledge graph according to the historical claim settlement data to obtain an initial knowledge graph; receiving the current reported claim data and analyzing claim factors included in the current reported claim data; and importing the claim settlement factors into an initial knowledge graph, calculating a correlation value between the claim settlement factors and each entity in the initial knowledge graph, and displaying corresponding correlation entities if the correlation value between the entity and the claim settlement factors is larger than a preset correlation threshold. The method adopts the knowledge graph technology to realize the combination of the correlation advantages of the knowledge graph data and discover new correlation factors.

Description

Method, device, computer equipment and storage medium for evaluating risk of claim settlement
Technical Field
The present invention relates to the field of risk control for claims, and in particular, to a method, an apparatus, a computer device, and a storage medium for risk assessment for claims.
Background
Currently, due to limited data, lack of large-scale computing platform, traditional insurance enterprises can only assist insurance personnel in checking and protecting by combining experience with some simple rules based on limited characteristics (such as age, sex and risk conditions). With the social development, new insurance fraud types are continuously emerging, and the wind control function sets related parameters according to experience, so that the wind control function is insensitive to the new fraud types; the claim risk factor and the claim rule are manually summarized relatively, and the accuracy degree is difficult to control.
Disclosure of Invention
The embodiment of the invention provides a method, a device, computer equipment and a storage medium for evaluating claim risk, aiming at solving the problem that the accuracy degree of the method is difficult to control because the claim risk factors and the claim rules are relatively dependent on manual summary in the prior art.
In a first aspect, an embodiment of the present invention provides a method for evaluating risk of claim settlement, including:
acquiring historical claim settlement data, and constructing a knowledge graph according to the historical claim settlement data to obtain an initial knowledge graph;
receiving the current reported claim data and analyzing claim factors included in the current reported claim data;
and importing the claim settlement factors into an initial knowledge graph, calculating a correlation value between the claim settlement factors and each entity in the initial knowledge graph, and displaying corresponding correlation entities if the correlation value between the entity and the claim settlement factors is larger than a preset correlation threshold.
In a second aspect, an embodiment of the present invention provides a risk assessment apparatus for claims, including:
the initial knowledge graph construction unit is used for acquiring historical claim settlement data, constructing a knowledge graph according to the historical claim settlement data and obtaining an initial knowledge graph;
The claim factor analysis unit is used for receiving the current reported claim data and analyzing the claim factors included in the current reported claim data;
the related entity obtaining unit is used for importing the claim settlement factors into an initial knowledge graph, calculating a related value between the claim settlement factors and each entity in the initial knowledge graph, and displaying the corresponding related entity if the related value between the entity and the claim settlement factors is larger than a preset related threshold.
In a third aspect, an embodiment of the present invention further provides a computer device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the method for evaluating claim risk according to the first aspect when executing the computer program.
In a fourth aspect, an embodiment of the present invention further provides a storage medium, where the storage medium stores a computer program, where the computer program when executed by a processor causes the processor to perform the method for evaluating risk of claims according to the first aspect.
The embodiment of the invention provides a method, a device, computer equipment and a storage medium for evaluating claim risk. The method comprises the steps of importing a claim settlement factor included in the current reported claim settlement data into an initial knowledge graph, calculating a correlation value between the claim settlement factor and each entity in the initial knowledge graph, and displaying the corresponding correlation entity if the correlation value between the entity and the claim settlement factor is larger than a preset correlation threshold. The method realizes the combination of the correlation advantages of the knowledge graph data and discovers new correlation factors.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart illustrating a method for evaluating risk of claims according to an embodiment of the present invention;
FIG. 2 is a schematic sub-flowchart of a method for evaluating risk of claims according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of another sub-flowchart of the method for evaluating risk of claims according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of another sub-flowchart of the method for evaluating risk of claims according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of another sub-flowchart of the method for evaluating risk of claims according to an embodiment of the present invention;
FIG. 6 is a schematic block diagram of an apparatus for evaluating risk of claims according to an embodiment of the present invention;
FIG. 7 is a schematic block diagram of a subunit of the claim risk assessment apparatus according to an embodiment of the present invention;
FIG. 8 is a schematic block diagram of another subunit of the apparatus for risk assessment of claims according to an embodiment of the present invention;
FIG. 9 is a schematic block diagram of another subunit of the apparatus for risk assessment of claims according to an embodiment of the present invention;
FIG. 10 is a schematic block diagram of another subunit of the apparatus for risk assessment of claims according to an embodiment of the present invention;
fig. 11 is a schematic block diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
Referring to fig. 1, fig. 1 is a flowchart of an embodiment of a method for evaluating risk of claims, where the method is applied to a management server, and the method is executed by application software installed in the management server, and the management server is an enterprise terminal for evaluating risk of claims.
As shown in fig. 1, the method includes steps S101 to S103.
S101, acquiring historical claim settlement data, and constructing a knowledge graph according to the historical claim settlement data to obtain an initial knowledge graph.
In this embodiment, the historical claim data at least includes data such as a policy number, an applicant, an amount of insurance, an type of insurance, an effective period of insurance, an address of the applicant, a contact number of the applicant, and a certificate number of the applicant; these data can be used as risk factors in connection with claims. After massive historical claim data are obtained, a knowledge graph can be constructed according to the historical claim data.
The logic structure of the knowledge graph is divided into two layers: a data layer and a mode layer.
At the data layer of the knowledge graph, knowledge is stored in the graph database in the units of facts (fact). If the [ entity-relationship-entity ] or [ entity-attribute-value ] triples are used as the basic expression of facts, all data stored in the graph database will form a huge entity relationship network to form a graph of knowledge.
The schema layer is above the data layer and is the core of the knowledge graph, refined knowledge is stored in the schema layer, an ontology library is generally adopted to manage the schema layer of the knowledge graph, and the relationship among entities, relations, types and attributes of the entities and other objects is normalized by the supporting capability of the ontology library on axioms, rules and constraint conditions. The position of the ontology library in the knowledge graph is equivalent to a die of the knowledge library, and the knowledge library with the ontology library has less redundant knowledge.
The knowledge graph construction process is a process of extracting knowledge elements (namely facts) from the original data by adopting a series of automatic or semi-automatic technical means from the original data, and storing the knowledge elements (namely facts) into a data layer and a mode layer of a knowledge base. This is an iterative update process, and each iteration contains three phases, based on knowledge acquisition logic: information extraction, knowledge fusion and knowledge processing.
The knowledge graph is constructed by refining the entity and the relation of the historical claim data, the risk factors in the knowledge graph are not only related to other risk factors, but also related to other data, and a data neighborhood or neighborhood set most related to the current reported claim data can be extracted through a graph mining algorithm, namely new related factors are found from the related data.
In one embodiment, as shown in fig. 2, step S101 includes:
s1011, extracting entities, attributes and interrelationships among the entities included in the historical claim data to obtain extracted knowledge expression information;
s1012, sequentially carrying out entity linking and knowledge merging on the extracted knowledge expression information to obtain the fused knowledge expression information;
s1013, carrying out ontology construction, knowledge reasoning and quality assessment on the fused knowledge expression information in sequence to obtain an initial knowledge graph.
The knowledge graph has two construction modes of bottom-up and top-down. The top-down construction is to extract ontology and mode information from high-quality data by means of structural data sources such as encyclopedia internet websites and the like, and add the ontology and mode information into a knowledge base; the bottom-up construction is to extract the resource mode from the publicly collected data by a certain technical means, select a new mode with higher confidence, and add the new mode into the knowledge base after manual auditing.
At present, the knowledge graph is mostly built in a bottom-up mode, and in the embodiment of the application, a bottom-up knowledge graph building technology is mainly adopted, and the knowledge graph is divided into 3 layers according to the process of knowledge acquisition: information extraction, knowledge fusion and knowledge processing. And obtaining an initial knowledge graph by carrying out the 3 processing procedures on the historical claim settlement data.
The process of constructing the knowledge graph in a bottom-up mode is an iterative updating process, and each round of updating comprises 3 steps:
a1 Information extraction, namely extracting entities (concepts), attributes and interrelationships among the entities from various types of data sources, and forming an ontology knowledge expression on the basis of the information extraction;
a2 Knowledge fusion, which is integrated after new knowledge is obtained, to eliminate contradictions and ambiguities, such as that some entities may have multiple expressions, a particular designation may correspond to multiple different entities, etc.;
a3 For knowledge processing, after quality evaluation (part of the knowledge is needed to be manually screened), the qualified part can be added into the knowledge base to ensure the quality of the knowledge base, and after data is newly added, knowledge reasoning can be performed, the existing knowledge can be expanded, and the new knowledge can be obtained.
In one embodiment, as shown in fig. 3, step S1011 includes:
s10111, extracting a named entity from the historical claim data through a conditional random field to obtain first processing data;
s10112, extracting entity attributes of the first processing data to obtain second processing data;
s10113, extracting attributes of the second processing data to obtain extracted knowledge expression information.
In this embodiment, when information extraction is performed, a key issue is how to automatically extract information from heterogeneous data sources to obtain candidate knowledge units. Information extraction is a technique for automatically extracting structured information such as entities, relationships, and entity attributes from semi-structured and unstructured data. The key technologies involved include: named entity identification, relationship extraction, and attribute extraction.
Named entity recognition (named entity recognition, NER), also known as entity extraction, refers to the automatic recognition of named entities from a text dataset, with the usual approach being to perform entity boundary recognition based on conditional random fields. The conditional random field (conditional random fields, abbreviated as CRF, or CRFs) is a discriminant probability model, which is a type of random field, and is commonly used for labeling or analyzing sequence data, such as natural language text or biological sequences. In the conditional random field, the distribution of random variable Y is conditional probability, and the given observation value is random variable X.
The historical claim data is extracted by the entity to obtain a series of discrete named entities, and in order to obtain semantic information, the association relation between the entities is extracted from the related corpus, and the entities (concepts) are connected through the association relation to form a net-shaped knowledge structure. In this case, in the embodiment of the present application, an open information extraction prototype system (TextRunner) based on a self-supervision (self-supervision) learning manner may be used, where the system uses a small amount of manual labeled data as a training set, thereby obtaining an entity relationship classification model, classifying the open data according to the entity relationship classification model, and training a naive bayes model according to the classification result to identify an [ entity-relationship-entity ] triplet, so as to extract entity attributes of the first processing data, and obtain the second processing data.
The goal of attribute extraction is to collect attribute information for a particular entity from different information sources. For example, for a public character, information such as nickname, birthday, nationality, educational background and the like can be obtained from the network public information. The attribute extraction technology can collect the information from various data sources, and complete sketching of entity attributes is realized. Since the attributes of an entity can be regarded as a noun relationship between the entity and the attribute values, the attribute extraction problem can also be regarded as a relationship extraction problem. In the embodiment of the application, based on the semi-structured data of the encyclopedia website, the training corpus is generated through automatic extraction and is used for training the entity attribute labeling model, and then the training corpus is applied to entity attribute extraction of unstructured data.
In one embodiment, as shown in fig. 4, step S10111 includes:
s10111a, acquiring the summarized entity category;
s10111b, comparing the historical claim data with the generalized entity categories through the conditional random field and identifying entity boundaries to obtain first processing data.
In this embodiment, when entity extraction is performed, 112 existing entity categories may be adopted, entity boundary recognition is performed based on a conditional random field, and finally, automatic classification of entities is implemented by adopting an adaptive perceptron, so that comparison and entity boundary recognition of historical claim data and the induced entity categories are implemented through the conditional random field, and first processing data is obtained.
S102, receiving the current reported claim data and analyzing the claim factor included in the current reported claim data.
In this embodiment, the current reported claim data reported by the user through the intelligent terminal generally includes data such as insurance types (e.g. car insurance, birth insurance, life insurance, etc.), insurance time, insurance address, etc., and information such as insurance types, insurance addresses, etc. in the report data may be regarded as claim factors. According to the claim settlement factors in the claim settlement data reported currently and the obtained initial knowledge graph, the correlation factors of which the correlation values with the claim settlement factors in the initial knowledge graph exceed the preset correlation threshold can be calculated.
S103, importing the claim settlement factors into an initial knowledge graph, calculating a correlation value between the claim settlement factors and each entity in the initial knowledge graph, and if the correlation value between the entity and the claim settlement factors is larger than a preset correlation threshold, displaying the corresponding correlation entity.
In this embodiment, when calculating the correlation entity with the correlation value of the claim factor exceeding the preset correlation threshold, a distance-based model may be used to perform calculation (i.e. calculate the distance between the claim factor and each entity in the initial knowledge graph as the correlation value). Because each entity and other entities in the initial knowledge graph can be vectorized, the pearson correlation degree between the corresponding word vector after vectorization of each entity and the corresponding semantic vector in the claim settlement factors is calculated, and then the correlation value of the entity factors and the claim settlement factors can be obtained. And then, calculating a correlation value between the claim settlement factor and each entity in the initial knowledge graph, and if the correlation value between the entity and the claim settlement factor is larger than a preset correlation threshold, displaying the corresponding correlation entity, so as to recommend which correlation entity can be selected to construct the claim settlement rule.
In one embodiment, as shown in fig. 5, step S103 includes:
s1031, obtaining corresponding semantic vectors in the claim settlement factors;
s1032, acquiring word vectors corresponding to each entity in the entities included in the initial knowledge graph;
s1033, obtaining the pearson correlation degree between the corresponding semantic vector and each word vector in the claim settlement factors;
s1034, if the pearson correlation degree between the word vector of the entity and the semantic vector exceeds a preset correlation threshold, acquiring the word vector of the corresponding entity and the related entity corresponding to the word vector.
In the embodiment of the present application, the claim factor includes a plurality of keywords, each keyword corresponds to a word vector, and the word vectors of the plurality of keywords are multiplied by corresponding weight values respectively and summed to obtain the corresponding semantic vector in the claim factor. And then, calculating the pearson correlation degree of the semantic vector and the word vector corresponding to each entity in the entities included in the initial knowledge graph, so as to obtain word vectors with pearson correlation degree exceeding a preset correlation threshold value between the semantic vector and the semantic vector, and obtain related entities corresponding to the word vectors, wherein the related entities can be used as candidate claim settlement factors for constructing claim settlement rules. Where the pearson correlation between two vectors is defined as the quotient of the covariance and standard deviation between the two variables.
The method comprises the steps of importing a claim settlement factor included in the current reported claim settlement data into an initial knowledge graph, calculating a correlation value between the claim settlement factor and each entity in the initial knowledge graph, and displaying the corresponding correlation entity if the correlation value between the entity and the claim settlement factor is larger than a preset correlation threshold. The method realizes the combination of the correlation advantages of the knowledge graph data and discovers new correlation factors.
The embodiment of the invention also provides a device for evaluating the risk of the claim, which is used for executing any embodiment of the method for evaluating the risk of the claim. Specifically, referring to fig. 6, fig. 6 is a schematic block diagram of an apparatus for evaluating risk of claims according to an embodiment of the present invention. The claim risk assessment apparatus 100 may be configured in a management server.
As shown in fig. 6, the claim risk assessment apparatus 100 includes an initial knowledge-graph construction unit 101, a claim factor analysis unit 102, and a related entity acquisition unit 103.
The initial knowledge graph construction unit 101 is configured to obtain historical claim settlement data, and construct a knowledge graph according to the historical claim settlement data to obtain an initial knowledge graph.
In this embodiment, the historical claim data at least includes data such as a policy number, an applicant, an amount of insurance, an type of insurance, an effective period of insurance, an address of the applicant, a contact number of the applicant, and a certificate number of the applicant; these data can be used as risk factors in connection with claims. After massive historical claim data are obtained, a knowledge graph can be constructed according to the historical claim data.
The knowledge graph is constructed by refining the entity and the relation of the historical claim data, the risk factors in the knowledge graph are not only related to other risk factors, but also related to other data, and a data neighborhood or neighborhood set most related to the current reported claim data can be extracted through a graph mining algorithm, namely new related factors are found from the related data.
In an embodiment, as shown in fig. 7, the initial knowledge graph construction unit 101 includes:
entity extraction unit 1011, configured to extract entities, attributes, and correlations among the entities included in the historical claim data, and obtain extracted knowledge expression information;
a knowledge fusion unit 1012, configured to sequentially perform entity linking and knowledge merging on the extracted knowledge expression information to obtain fused knowledge expression information;
and the knowledge processing unit 1013 is configured to sequentially perform ontology construction, knowledge reasoning and quality assessment on the fused knowledge expression information to obtain an initial knowledge graph.
At present, the knowledge graph is mostly built in a bottom-up mode, and in the embodiment of the application, a bottom-up knowledge graph building technology is mainly adopted, and the knowledge graph is divided into 3 layers according to the process of knowledge acquisition: information extraction, knowledge fusion and knowledge processing. And obtaining an initial knowledge graph by carrying out the 3 processing procedures on the historical claim settlement data.
In one embodiment, as shown in fig. 8, the entity extraction unit 1011 includes:
a first data processing unit 10111, configured to extract a named entity from the historical claim data through the conditional random field, so as to obtain first processed data;
a second data processing unit 10112, configured to perform entity attribute extraction on the first processing data to obtain second processing data;
and an attribute extraction unit 10113, configured to perform attribute extraction on the second processing data, so as to obtain extracted knowledge expression information.
In this embodiment, when information extraction is performed, a key issue is how to automatically extract information from heterogeneous data sources to obtain candidate knowledge units. Information extraction is a technique for automatically extracting structured information such as entities, relationships, and entity attributes from semi-structured and unstructured data. The key technologies involved include: named entity identification, relationship extraction, and attribute extraction.
Named entity recognition (named entity recognition, NER), also known as entity extraction, refers to the automatic recognition of named entities from a text dataset, with the usual approach being to perform entity boundary recognition based on conditional random fields. The conditional random field (conditional random fields, abbreviated as CRF, or CRFs) is a discriminant probability model, which is a type of random field, and is commonly used for labeling or analyzing sequence data, such as natural language text or biological sequences. In the conditional random field, the distribution of random variable Y is conditional probability, and the given observation value is random variable X.
The historical claim data is extracted by the entity to obtain a series of discrete named entities, and in order to obtain semantic information, the association relation between the entities is extracted from the related corpus, and the entities (concepts) are connected through the association relation to form a net-shaped knowledge structure. In this case, in the embodiment of the present application, an open information extraction prototype system (TextRunner) based on a self-supervision (self-supervision) learning manner may be used, where the system uses a small amount of manual labeled data as a training set, thereby obtaining an entity relationship classification model, classifying the open data according to the entity relationship classification model, and training a naive bayes model according to the classification result to identify an [ entity-relationship-entity ] triplet, so as to extract entity attributes of the first processing data, and obtain the second processing data.
The goal of attribute extraction is to collect attribute information for a particular entity from different information sources. For example, for a public character, information such as nickname, birthday, nationality, educational background and the like can be obtained from the network public information. The attribute extraction technology can collect the information from various data sources, and complete sketching of entity attributes is realized. Since the attributes of an entity can be regarded as a noun relationship between the entity and the attribute values, the attribute extraction problem can also be regarded as a relationship extraction problem. In the embodiment of the application, based on the semi-structured data of the encyclopedia website, the training corpus is generated through automatic extraction and is used for training the entity attribute labeling model, and then the training corpus is applied to entity attribute extraction of unstructured data.
In an embodiment, as shown in fig. 9, the first data processing unit 10111 includes:
a historical entity category acquiring unit 10111a for acquiring the summarized entity category;
the entity boundary recognition unit 10111b is configured to compare the historical claim data with the generalized entity class through the conditional random field and recognize the entity boundary, so as to obtain first processing data.
In this embodiment, when entity extraction is performed, 112 existing entity categories may be adopted, entity boundary recognition is performed based on a conditional random field, and finally, automatic classification of entities is implemented by adopting an adaptive perceptron, so that comparison and entity boundary recognition of historical claim data and the induced entity categories are implemented through the conditional random field, and first processing data is obtained.
And the claim factor analyzing unit 102 is configured to receive the current reported claim data and analyze the claim factor included in the current reported claim data.
In this embodiment, the current reported claim data reported by the user through the intelligent terminal generally includes data such as insurance types (e.g. car insurance, birth insurance, life insurance, etc.), insurance time, insurance address, etc., and information such as insurance types, insurance addresses, etc. in the report data may be regarded as claim factors. According to the claim settlement factors in the claim settlement data reported currently and the obtained initial knowledge graph, the correlation factors of which the correlation values with the claim settlement factors in the initial knowledge graph exceed the preset correlation threshold can be calculated.
And a related entity obtaining unit 103, configured to import the claim settlement factor into an initial knowledge graph, calculate a correlation value between the claim settlement factor and each entity in the initial knowledge graph, and if the correlation value between some entity and the claim settlement factor is greater than a preset correlation threshold, display the corresponding related entity.
In this embodiment, when calculating the correlation entity whose correlation value with the claim factor exceeds the preset correlation threshold, a distance-based model may be used for calculation. Because each entity and other entities in the initial knowledge graph can be vectorized, the pearson correlation degree between the corresponding word vector after vectorization of each entity and the corresponding semantic vector in the claim settlement factors is calculated, and then the correlation value of the entity factors and the claim settlement factors can be obtained. And then, calculating a correlation value between the claim settlement factor and each entity in the initial knowledge graph, and if the correlation value between the entity and the claim settlement factor is larger than a preset correlation threshold, displaying the corresponding correlation entity, so as to recommend which correlation entity can be selected to construct the claim settlement rule.
In one embodiment, as shown in fig. 10, the related entity obtaining unit 103 includes:
A semantic vector acquiring unit 1031, configured to acquire a corresponding semantic vector in the claim factor;
a word vector obtaining unit 1032, configured to obtain a word vector corresponding to each entity in the entities included in the initial knowledge graph;
the pearson correlation calculation unit 1033 is configured to obtain pearson correlation between the semantic vector corresponding to the claim factor and each word vector;
and a related entity judging unit 1034, configured to obtain a word vector of the corresponding entity and a related entity corresponding to the word vector if the pearson correlation degree between the word vector of the existing entity and the semantic vector exceeds a preset correlation threshold.
In the embodiment of the present application, the claim factor includes a plurality of keywords, each keyword corresponds to a word vector, and the word vectors of the plurality of keywords are multiplied by corresponding weight values respectively and summed to obtain the corresponding semantic vector in the claim factor. And then, calculating the pearson correlation degree of the semantic vector and the word vector corresponding to each entity in the entities included in the initial knowledge graph, so as to obtain word vectors with pearson correlation degree exceeding a preset correlation threshold value between the semantic vector and the semantic vector, and obtain related entities corresponding to the word vectors, wherein the related entities can be used as candidate claim settlement factors for constructing claim settlement rules.
The device imports the claim settlement factors included in the current reported claim settlement data into an initial knowledge graph, calculates the correlation value between the claim settlement factors and each entity in the initial knowledge graph, and displays the corresponding correlation entity if the correlation value between the entity and the claim settlement factors is larger than a preset correlation threshold. The method realizes the combination of the correlation advantages of the knowledge graph data and discovers new correlation factors.
The above-described risk assessment apparatus may be implemented in the form of a computer program that is executable on a computer device as shown in fig. 11.
Referring to fig. 11, fig. 11 is a schematic block diagram of a computer device according to an embodiment of the present invention.
With reference to FIG. 11, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.
The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032, when executed, can cause the processor 502 to perform a claim risk assessment method.
The processor 502 is used to provide computing and control capabilities to support the operation of the overall computer device 500.
The internal memory 504 provides an environment for the execution of a computer program 5032 in the non-volatile storage medium 503, which computer program 5032, when executed by the processor 502, causes the processor 502 to perform the claim risk assessment method.
The network interface 505 is used for network communication, such as providing for transmission of data information, etc. It will be appreciated by those skilled in the art that the structure shown in FIG. 11 is merely a block diagram of some of the structures associated with the present inventive arrangements and does not constitute a limitation of the computer device 500 to which the present inventive arrangements may be applied, and that a particular computer device 500 may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
Wherein the processor 502 is configured to execute a computer program 5032 stored in a memory to perform the following functions: acquiring historical claim settlement data, and constructing a knowledge graph according to the historical claim settlement data to obtain an initial knowledge graph; receiving the current reported claim data and analyzing claim factors included in the current reported claim data; and importing the claim settlement factors into an initial knowledge graph, calculating a correlation value between the claim settlement factors and each entity in the initial knowledge graph, and displaying corresponding correlation entities if the correlation value between the entity and the claim settlement factors is larger than a preset correlation threshold.
In one embodiment, the processor 502 performs the following operations when performing the step of constructing a knowledge-graph based on historical claim data to obtain an initial knowledge-graph: extracting entities, attributes and interrelationships among the entities included in the historical claim data to obtain extracted knowledge expression information; sequentially carrying out entity linking and knowledge merging on the extracted knowledge expression information to obtain the merged knowledge expression information; and carrying out ontology construction, knowledge reasoning and quality assessment on the fused knowledge expression information in sequence to obtain an initial knowledge graph.
In one embodiment, the processor 502 performs the following operations when performing the step of identifying and extracting the entities, attributes, and interrelationships between the entities included in the historical claims data to obtain the extracted knowledge representation information: extracting named entities from the historical claim data through a conditional random field to obtain first processing data; extracting entity attributes from the first processing data to obtain second processing data; and extracting the attribute of the second processing data to obtain extracted knowledge expression information.
In one embodiment, the processor 502, when performing the step of extracting named entities from the historical claims data via conditional random fields to obtain first processed data, performs the following: acquiring the summarized entity category; and comparing the historical claim data with the generalized entity categories through the conditional random field and identifying entity boundaries to obtain first processing data.
In one embodiment, the processor 502 performs the following operations when executing the step of obtaining a correlation entity whose correlation value with the claim factor exceeds a preset correlation threshold: obtaining a corresponding semantic vector in the claim settlement factors; acquiring word vectors corresponding to each entity in the entities included in the initial knowledge graph; acquiring the pearson correlation between the corresponding semantic vector and each word vector in the claim settlement factors; if the pearson correlation degree between the word vector of the entity and the semantic vector exceeds a preset correlation threshold, acquiring the word vector of the corresponding entity and the related entity corresponding to the word vector.
Those skilled in the art will appreciate that the embodiment of the computer device shown in fig. 11 is not limiting of the specific construction of the computer device, and in other embodiments, the computer device may include more or less components than those shown, or certain components may be combined, or a different arrangement of components. For example, in some embodiments, the computer device may include only a memory and a processor, and in such embodiments, the structure and function of the memory and the processor are consistent with the embodiment shown in fig. 11, and will not be described again.
It should be appreciated that in embodiments of the present invention, the processor 502 may be a central processing unit (Central Processing Unit, CPU), the processor 502 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf Programmable gate arrays (FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
In another embodiment of the present invention, a storage medium is provided. The storage medium may be a non-volatile computer readable storage medium. The storage medium stores a computer program, wherein the computer program when executed by a processor performs the steps of: acquiring historical claim settlement data, and constructing a knowledge graph according to the historical claim settlement data to obtain an initial knowledge graph; receiving the current reported claim data and analyzing claim factors included in the current reported claim data; and importing the claim settlement factors into an initial knowledge graph, calculating a correlation value between the claim settlement factors and each entity in the initial knowledge graph, and displaying corresponding correlation entities if the correlation value between the entity and the claim settlement factors is larger than a preset correlation threshold.
In an embodiment, the step of constructing a knowledge graph according to the historical claim data to obtain an initial knowledge graph includes: extracting entities, attributes and interrelationships among the entities included in the historical claim data to obtain extracted knowledge expression information; sequentially carrying out entity linking and knowledge merging on the extracted knowledge expression information to obtain the merged knowledge expression information; and carrying out ontology construction, knowledge reasoning and quality assessment on the fused knowledge expression information in sequence to obtain an initial knowledge graph.
In an embodiment, the step of extracting the entity, the attribute and the interrelationship between the entities included in the historical claim data to obtain the extracted knowledge representation information includes: extracting named entities from the historical claim data through a conditional random field to obtain first processing data; extracting entity attributes from the first processing data to obtain second processing data; and extracting the attribute of the second processing data to obtain extracted knowledge expression information.
In one embodiment, the step of extracting the named entity from the historical claim data by the conditional random field to obtain the first processed data includes: acquiring the summarized entity category; and comparing the historical claim data with the generalized entity categories through the conditional random field and identifying entity boundaries to obtain first processing data.
In an embodiment, the step of obtaining the correlation entity whose correlation value with the claim factor exceeds a preset correlation threshold includes: obtaining a corresponding semantic vector in the claim settlement factors; acquiring word vectors corresponding to each entity in the entities included in the initial knowledge graph; acquiring the pearson correlation between the corresponding semantic vector and each word vector in the claim settlement factors; if the pearson correlation degree between the word vector of the entity and the semantic vector exceeds a preset correlation threshold, acquiring the word vector of the corresponding entity and the related entity corresponding to the word vector.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus, device and unit described above may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein. Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the units is merely a logical function division, there may be another division manner in actual implementation, or units having the same function may be integrated into one unit, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices, or elements, or may be an electrical, mechanical, or other form of connection.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment of the present invention.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units may be stored in a storage medium if implemented in the form of software functional units and sold or used as stand-alone products. Based on such understanding, the technical solution of the present invention is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (9)

1. A method of claim risk assessment, comprising:
acquiring historical claim settlement data, and constructing a knowledge graph according to the historical claim settlement data to obtain an initial knowledge graph;
receiving the current reported claim data and analyzing claim factors included in the current reported claim data;
importing the claim settlement factors into an initial knowledge graph, calculating a correlation value between the claim settlement factors and each entity in the initial knowledge graph, and if the correlation value between an entity and the claim settlement factors is larger than a preset correlation threshold, displaying the corresponding correlation entity;
wherein each entity in the initial knowledge graph corresponds to a claim risk factor;
the claim settlement factors comprise a risk report type and a risk report address;
Acquiring a correlation entity with a correlation value between the correlation entity and the claim settlement factor exceeding a preset correlation threshold value, wherein the correlation entity comprises the following components:
obtaining a corresponding semantic vector in the claim settlement factors;
acquiring word vectors corresponding to each entity in the entities included in the initial knowledge graph;
acquiring the pearson correlation between the corresponding semantic vector and each word vector in the claim settlement factors;
if the pearson correlation degree between the word vector of the entity and the semantic vector exceeds a preset correlation threshold, acquiring the word vector of the corresponding entity and the related entity corresponding to the word vector.
2. The method for evaluating risk of claim 1, wherein the constructing a knowledge graph according to historical claim data to obtain an initial knowledge graph comprises:
extracting entities, attributes and interrelationships among the entities included in the historical claim data to obtain extracted knowledge expression information;
sequentially carrying out entity linking and knowledge merging on the extracted knowledge expression information to obtain the merged knowledge expression information;
and carrying out ontology construction, knowledge reasoning and quality assessment on the fused knowledge expression information in sequence to obtain an initial knowledge graph.
3. The method for evaluating claim 2, wherein extracting the entity, the attribute, and the interrelationship between the entities included in the historical claim data to obtain the extracted knowledge representation information comprises:
extracting named entities from the historical claim data through a conditional random field to obtain first processing data;
extracting entity attributes from the first processing data to obtain second processing data;
and extracting the attribute of the second processing data to obtain extracted knowledge expression information.
4. The method of claim 3, wherein extracting named entities from historical claim data via conditional random fields to obtain first processed data comprises:
acquiring the summarized entity category;
and comparing the historical claim data with the generalized entity categories through the conditional random field and identifying entity boundaries to obtain first processing data.
5. A risk assessment apparatus for claims, comprising:
the initial knowledge graph construction unit is used for acquiring historical claim settlement data, constructing a knowledge graph according to the historical claim settlement data and obtaining an initial knowledge graph;
The claim factor analysis unit is used for receiving the current reported claim data and analyzing the claim factors included in the current reported claim data;
the related entity obtaining unit is used for importing the claim settlement factors into an initial knowledge graph, calculating a related value between the claim settlement factors and each entity in the initial knowledge graph, and displaying the corresponding related entity if the related value between the entity and the claim settlement factors is larger than a preset related threshold;
wherein each entity in the initial knowledge graph corresponds to a claim risk factor;
the claim settlement factors comprise a risk report type and a risk report address;
the related entity obtaining unit includes:
obtaining a corresponding semantic vector in the claim settlement factors;
acquiring word vectors corresponding to each entity in the entities included in the initial knowledge graph;
acquiring the pearson correlation between the corresponding semantic vector and each word vector in the claim settlement factors;
if the pearson correlation degree between the word vector of the entity and the semantic vector exceeds a preset correlation threshold, acquiring the word vector of the corresponding entity and the related entity corresponding to the word vector.
6. The claim risk assessment device according to claim 5, wherein the initial knowledge-graph construction unit includes:
The entity extraction unit is used for extracting the entities, the attributes and the interrelationships among the entities included in the historical claim data to obtain extracted knowledge expression information;
the knowledge fusion unit is used for carrying out entity linking and knowledge combination on the extracted knowledge expression information in sequence to obtain fused knowledge expression information;
and the knowledge processing unit is used for sequentially carrying out ontology construction, knowledge reasoning and quality evaluation on the fused knowledge expression information to obtain an initial knowledge graph.
7. The claim risk assessment device according to claim 6, wherein the entity extraction unit includes:
the first data processing unit is used for extracting named entities from the historical claim data through the conditional random field to obtain first processing data;
the second data processing unit is used for extracting entity attributes of the first processing data to obtain second processing data;
and the attribute extraction unit is used for extracting the attributes of the second processing data to obtain extracted knowledge expression information.
8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the claim risk assessment method of any one of claims 1 to 4 when the computer program is executed.
9. A storage medium storing a computer program which, when executed by a processor, causes the processor to perform the claim risk assessment method of any one of claims 1 to 4.
CN201811238812.4A 2018-10-23 2018-10-23 Method, device, computer equipment and storage medium for evaluating risk of claim settlement Active CN109345399B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811238812.4A CN109345399B (en) 2018-10-23 2018-10-23 Method, device, computer equipment and storage medium for evaluating risk of claim settlement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811238812.4A CN109345399B (en) 2018-10-23 2018-10-23 Method, device, computer equipment and storage medium for evaluating risk of claim settlement

Publications (2)

Publication Number Publication Date
CN109345399A CN109345399A (en) 2019-02-15
CN109345399B true CN109345399B (en) 2024-03-26

Family

ID=65311336

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811238812.4A Active CN109345399B (en) 2018-10-23 2018-10-23 Method, device, computer equipment and storage medium for evaluating risk of claim settlement

Country Status (1)

Country Link
CN (1) CN109345399B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110060166A (en) * 2019-03-13 2019-07-26 平安科技(深圳)有限公司 Intelligence Claims Resolution method, apparatus, computer equipment and storage medium
US11966928B2 (en) * 2019-05-08 2024-04-23 International Business Machines Corporation Intelligent learning and application of operational rules
CN110322216A (en) * 2019-05-30 2019-10-11 阿里巴巴集团控股有限公司 The case checking method and device of knowledge based map
CN110503236A (en) * 2019-07-08 2019-11-26 中国平安人寿保险股份有限公司 Risk Forecast Method, device, equipment and the storage medium of knowledge based map
CN110689322A (en) * 2019-09-27 2020-01-14 成都知识视觉科技有限公司 Artificial intelligence auxiliary claims checking system suitable for insurance claims settlement process
CN110866836B (en) * 2019-11-14 2022-12-06 支付宝(杭州)信息技术有限公司 Computer-implemented medical insurance scheme auditing method and device
CN111159431A (en) * 2019-12-30 2020-05-15 深圳Tcl新技术有限公司 Knowledge graph-based information visualization method, device, equipment and storage medium
CN113434627A (en) * 2020-03-18 2021-09-24 中国电信股份有限公司 Work order processing method and device and computer readable storage medium
CN111652704A (en) * 2020-06-09 2020-09-11 唐松 Financial credit risk assessment method based on knowledge graph and graph deep learning
CN111797406A (en) * 2020-07-15 2020-10-20 智博云信息科技(广州)有限公司 Medical fund data analysis processing method and device and readable storage medium
CN112069808A (en) * 2020-09-28 2020-12-11 深圳壹账通智能科技有限公司 Financing wind control method and device, computer equipment and storage medium
CN112215711B (en) * 2020-10-13 2023-09-19 中国银行股份有限公司 Product risk assessment method and device
CN113379053A (en) * 2020-12-17 2021-09-10 中国人民公安大学 Emergency response decision-making method and device and electronic equipment
CN112508745B (en) * 2021-02-05 2021-08-27 北京肇祺信息科技有限公司 Document evaluation method and device
CN113643141A (en) * 2021-08-30 2021-11-12 平安医疗健康管理股份有限公司 Method, device and equipment for generating explanatory conclusion report and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012133735A (en) * 2010-12-24 2012-07-12 Kddi Corp Social graph updating system, social graph updating method, and program
CN105373590A (en) * 2015-10-22 2016-03-02 百度在线网络技术(北京)有限公司 Knowledge data processing method and knowledge data processing device
WO2017212268A1 (en) * 2016-06-08 2017-12-14 Blippar.Com Limited Data processing system and data processing method
CN108305175A (en) * 2017-12-30 2018-07-20 上海栈略数据技术有限公司 Settlement of insurance claim air control assisted verification system based on intellectual medical knowledge mapping

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012133735A (en) * 2010-12-24 2012-07-12 Kddi Corp Social graph updating system, social graph updating method, and program
CN105373590A (en) * 2015-10-22 2016-03-02 百度在线网络技术(北京)有限公司 Knowledge data processing method and knowledge data processing device
WO2017212268A1 (en) * 2016-06-08 2017-12-14 Blippar.Com Limited Data processing system and data processing method
CN108305175A (en) * 2017-12-30 2018-07-20 上海栈略数据技术有限公司 Settlement of insurance claim air control assisted verification system based on intellectual medical knowledge mapping

Also Published As

Publication number Publication date
CN109345399A (en) 2019-02-15

Similar Documents

Publication Publication Date Title
CN109345399B (en) Method, device, computer equipment and storage medium for evaluating risk of claim settlement
US11334635B2 (en) Domain specific natural language understanding of customer intent in self-help
US11526809B2 (en) Primary key-foreign key relationship determination through machine learning
CN109992646B (en) Text label extraction method and device
US10298757B2 (en) Integrated service centre support
WO2021093755A1 (en) Matching method and apparatus for questions, and reply method and apparatus for questions
Jiang et al. A multi-objective PSO approach of mining association rules for affective design based on online customer reviews
US20190236460A1 (en) Machine learnt match rules
CN111309822A (en) User identity identification method and device
US11593665B2 (en) Systems and methods driven by link-specific numeric information for predicting associations based on predicate types
US20220107980A1 (en) Providing an object-based response to a natural language query
US11599666B2 (en) Smart document migration and entity detection
CN113392920B (en) Method, apparatus, device, medium, and program product for generating cheating prediction model
US11216730B2 (en) Utilizing machine learning to perform a merger and optimization operation
Chen et al. A hybrid approach for question retrieval in community question answerin
US11386162B1 (en) Systems and methods for machine learning-enhanced pairing of user search stimuli-to-online digital profile data
CN115965464A (en) Empty shell enterprise identification method and device, storage medium and electronic device
CN115221954A (en) User portrait method, device, electronic equipment and storage medium
Liu et al. Dynamic updating of the knowledge base for a large-scale question answering system
CN113868481A (en) Component acquisition method and device, electronic equipment and storage medium
Xu et al. Dr. right!: Embedding-based adaptively-weighted mixture multi-classification model for finding right doctors with healthcare experience data
US20170076219A1 (en) Prediction of future prominence attributes in data set
CN114138954A (en) User consultation problem recommendation method, system, computer equipment and storage medium
Beheshti et al. Data curation apis
CN113868438B (en) Information reliability calibration method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant