CN112417167A

CN112417167A - Construction method and device of insurance knowledge graph, computer equipment and storage medium

Info

Publication number: CN112417167A
Application number: CN202011313478.1A
Authority: CN
Inventors: 陈岳峰
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2020-11-20
Filing date: 2020-11-20
Publication date: 2021-02-26

Abstract

The invention discloses a construction method, a device, computer equipment and a storage medium of an insurance knowledge graph, wherein the method comprises the following steps: receiving website information of a third-party insurance platform and website information of an official filing platform input by a user; acquiring a first data set and a second data set for constructing an insurance knowledge map from a third-party insurance platform and an official filing platform respectively based on website information and a web crawler program; obtaining a public praise score for each insurance product from the first data set according to the public praise score model; data capture is carried out from the first data set and the second data set according to the ontology model so as to obtain the sales state information and the product attribute information of all insurance products in the insurance knowledge map; and constructing an insurance knowledge graph according to the sales state information, the product attribute information and the public praise score. The invention is based on the knowledge map technology, and by the method, not only can the insurance products be accurately recorded in an all-round way, but also the timeliness and the authority of information are considered.

Description

Construction method and device of insurance knowledge graph, computer equipment and storage medium

Technical Field

The invention relates to the technical field of knowledge maps, in particular to a construction method and device of an insurance knowledge map, computer equipment and a storage medium.

Background

The insurance industry always pays attention to the integration of innovative thinking into the traditional business model, and the knowledge graph which has great potential technology is regarded as a hot topic in the insurance industry for a long time, but the current practice of most insurance companies or insurance technology initial companies on the knowledge graph technology only stays at a very early stage due to the reasons that the related technology is not mature and the fit point of the technology and the business is not clear at present, and a complete scheme for accurately extracting and recording the attributes of the insurance products and recording the market praise from a plurality of information sources from the filing of the insurance products to the sale stopping is not provided.

Disclosure of Invention

In view of the above technical problems, embodiments of the present invention provide a method and an apparatus for constructing an insurance knowledge graph, a computer device, and a storage medium, which not only accurately record insurance products in an all-around manner, but also consider timeliness and information authority by collecting data from multiple information sources.

In a first aspect, an embodiment of the present invention provides a method for constructing an insurance knowledge graph, including:

receiving website information of a third-party insurance platform and website information of an official filing platform input by a user;

respectively acquiring a first data set and a second data set for constructing an insurance knowledge map from the third-party insurance platform and the official filing platform based on the website information and a preset web crawler program;

obtaining a public praise score for each insurance product from the first data set according to a preset public praise score model;

capturing data from the first data set and the second data set according to a preset ontology model to obtain the sales state information and the product attribute information of all insurance products in the insurance knowledge graph;

and constructing the insurance knowledge graph according to the sales state information, the product attribute information and the public praise score.

In a second aspect, an embodiment of the present invention provides an insurance knowledge graph constructing apparatus, including:

the first receiving unit is used for receiving the website information of the third-party insurance platform and the website information of the official filing platform input by a user;

the first acquisition unit is used for acquiring a first data set and a second data set for constructing an insurance knowledge map from the third-party insurance platform and the official filing platform respectively based on the website information and a preset web crawler program;

a second obtaining unit, configured to obtain a public praise score of each insurance product from the first data set according to a preset public praise score model;

the third acquisition unit is used for performing data capture from the first data set and the second data set according to a preset ontology model to obtain the sales state information and the product attribute information of all insurance products in the insurance knowledge graph;

and the construction unit is used for constructing the insurance knowledge graph according to the sales state information, the product attribute information and the public praise score.

In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the computer program, implements the method for constructing an insurance knowledge graph according to the first aspect.

In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program, when executed by a processor, causes the processor to execute the method for constructing an insurance knowledgegraph according to the first aspect.

The embodiment of the invention provides a construction method, a device, computer equipment and a storage medium of an insurance knowledge graph, wherein the method comprises the following steps: receiving website information of a third-party insurance platform and website information of an official filing platform input by a user; respectively acquiring a first data set and a second data set for constructing an insurance knowledge map from the third-party insurance platform and the official filing platform based on the website information and a preset web crawler program; obtaining a public praise score for each insurance product from the first data set according to a preset public praise score model; capturing data from the first data set and the second data set according to a preset ontology model to obtain the sales state information and the product attribute information of all insurance products in the insurance knowledge graph; and constructing the insurance knowledge graph according to the sales state information, the product attribute information and the public praise score. By the method, the efficiency and the accuracy of establishing the insurance knowledge map are improved, the established insurance knowledge map gives consideration to timeliness and information authority, public praise and sales state of insurance products are displayed, and the life cycle of the insurance products is completely described.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic flow chart of a method for constructing an insurance knowledge graph according to an embodiment of the present invention;

FIG. 2 is a sub-flow diagram of a method for building an insurance knowledge graph according to an embodiment of the present invention;

FIG. 3 is a schematic view of another sub-flow of a method for building an insurance knowledge graph according to an embodiment of the present invention;

FIG. 4 is a schematic view of another sub-flow of a method for building an insurance knowledge graph according to an embodiment of the present invention;

FIG. 5 is a schematic view of another sub-flow of a method for building an insurance knowledge graph according to an embodiment of the present invention;

FIG. 6 is a schematic view of another sub-flow of a method for building an insurance knowledge graph according to an embodiment of the present invention;

FIG. 7 is another schematic flow chart diagram of a method for building an insurance knowledge graph according to an embodiment of the present invention;

FIG. 8 is a schematic block diagram of an insurance knowledgegraph building apparatus provided in accordance with an embodiment of the invention;

FIG. 9 is a block schematic diagram of the sub-elements of an insurance knowledge graph building apparatus provided by an embodiment of the present invention;

FIG. 10 is a schematic block diagram of another sub-unit of an insurance knowledgegraph building apparatus provided in an embodiment of the invention;

FIG. 11 is a schematic block diagram of another sub-unit of an insurance knowledgegraph building apparatus provided in an embodiment of the invention;

FIG. 12 is a schematic block diagram of another sub-unit of an insurance knowledgegraph building apparatus provided in an embodiment of the invention;

FIG. 13 is a schematic block diagram of another sub-unit of an insurance knowledgegraph building apparatus provided in an embodiment of the invention;

FIG. 14 is a schematic block diagram of a computer device provided by an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Referring to fig. 1, fig. 1 is a schematic flow chart of a method for constructing an insurance knowledge graph according to an embodiment of the present invention. The construction method of the insurance knowledge graph of the embodiment of the invention is applied to the terminal equipment, and the method is executed through the application software installed in the terminal equipment. The terminal device is a terminal device with an internet access function, such as a desktop computer, a notebook computer, a tablet computer or a mobile phone.

The construction method of the insurance intellectual map will be described in detail below. As shown in fig. 1, the method includes the following steps S110 to S150.

And S110, receiving the website information of the third-party insurance platform and the website information of the official filing platform input by the user.

And receiving the website information of the third-party insurance platform and the website information of the official filing platform, which are input by the user. Specifically, the third-party insurance platform refers to a sale platform for insurance products of an insurance company by relying on a website platform provided by a third party with a mature technology, and can be an electronic place for providing insurance intermediaries and a facultative agency industry website, namely an electronic commerce platform constructed by the third party and providing information, transaction and other services for a plurality of buyers and sellers, and can well operate three core processes of information flow, fund flow and logistics and establish an efficient information exchange platform for enterprises; the official filing platform is an official platform of the China Bank insurance supervision and management Committee, any insurance product sold in the market needs to be filed by the China Bank insurance supervision and management Committee, and a third-party insurance platform usually cannot sell insurance products without bright spots. In an embodiment of the invention, the third-party insurance platform provides insurance intermediary and facultative agency industry websites for a plurality of users.

S120, acquiring a first data set and a second data set for constructing an insurance knowledge map from the third-party insurance platform and the official filing platform respectively based on the website information and a preset web crawler program.

And respectively acquiring a first data set and a second data set for constructing an insurance knowledge map from the third-party insurance platform and the official filing platform based on the website information and a preset web crawler program. Specifically, the Web crawler is a program that roams a Web document set along links, and the Web crawler performs data crawling on a corresponding Web page by using standard protocols such as HTTP and the like through given URLs. In the embodiment of the invention, the web crawler program performs data crawling through website information of the third-party insurance platform to obtain a first data set for constructing the insurance knowledge graph, the first data set comprises texts and pictures crawled from the third-party insurance platform, the texts in the first data set comprise information of evaluation of corresponding insurance products by users, sales state information of the corresponding insurance products and part of attribute information, and the pictures in the first data set are attribute information of propaganda of the corresponding insurance products by the third-party insurance platform; and data crawling is carried out through the website information of the official filing platform to obtain a second data set for constructing the insurance knowledge map, wherein the second data set comprises attribute information of insurance products crawled from the website information of the official filing platform and text data of sales state information.

S130, acquiring a public praise score of each insurance product from the first data set according to a preset public praise score model.

A public praise score for each insurance product is obtained from the first data set according to a preset public praise score model. Specifically, the public praise scoring model is a model for scoring public praises of all insurance products. The text set of the first data set comprises text information of corresponding insurance products evaluated by a user, the text information of each insurance product is obtained from the first data set based on the insurance list, and then the text information of the evaluation is input into the public praise scoring model, so that public praise scoring of the corresponding insurance product in the text information of the evaluation can be obtained.

In another embodiment, as shown in fig. 2, step S130 includes sub-steps S131 and S132.

S131, obtaining the text of each insurance product in the first data set and classifying the text of each insurance product in the first data set according to a preset text classification model to obtain a plurality of evaluation information of each insurance product in the first data set.

And acquiring a text of each insurance product in the first data set, and classifying the text of each insurance product in the first data set according to a preset text classification model to obtain a plurality of evaluation information of each insurance product in the first data set. Specifically, the evaluation information is category information for evaluating the corresponding insurance product by the user, and the category information includes positive evaluation, negative evaluation and neutral evaluation. The text of each insurance product in the first data set describes the evaluation of a certain insurance product by a plurality of users. Because different users have different comments on the same insurance product and cannot form a unified standard, the text classification model classifies the evaluation of the insurance product by different users in the text of each insurance product in the first data set, and further a plurality of standardized evaluations, namely a plurality of evaluation information, of each insurance product in the first data set are obtained. In the embodiment of the present invention, the text classification model is a text classification processing on the basis of a TextCNN text classification algorithm for classifying the text of each insurance product in the first data set, so as to obtain a plurality of evaluation information of each insurance product in the first data set.

In another embodiment, as shown in FIG. 3, step S131 includes substeps S1311 through S1314.

S1311, performing word segmentation processing on the text of each insurance product in the first data set to obtain each word in the text of each insurance product in the first data set.

And performing word segmentation on the text of each insurance product in the first data set to obtain each word in the text of each insurance product in the first data set. Specifically, in the process of segmenting the text of each insurance product in the first data set, four methods based on rules, statistics, semantics or understanding can be adopted for segmenting words. In the embodiment of the present invention, a word segmentation method based on statistics is adopted to perform word segmentation processing on the text of each insurance product in the first data set, and the main principle of the word segmentation method based on statistics is as follows: obtaining the probability of adjacent appearance of characters in the text of each insurance product in the first data set through a preset N-gram model, then counting the frequency of combinations of adjacent appearance characters in the text of each insurance product in the first data set, calculating the mutual occurrence information among the characters, further completing word segmentation of the text of each insurance product in the first data set, and obtaining each word in the text of each insurance product in the first data set.

S1312, processing each word in the text of each insurance product in the first data set according to a preset word embedding model to obtain a word vector of the text of each insurance product in the first data set.

And processing each word in the text of each insurance product in the first data set according to a preset word embedding model to obtain a word vector of the text of each insurance product in the first data set. Specifically, the word embedding model is a model for mapping each word in the text of each insurance product in the first data set to a word vector, that is, the word embedding model is used for digitizing each word in the text of each insurance product in the first data set, and the word embedding model performs vectorization processing on each word in the text of each insurance product in the first data set, so as to perform convolution processing on each word in the text of each insurance product in the first data set subsequently.

S1313, performing convolution and pooling on the word vectors of the texts of the insurance products in the first data set in sequence to obtain the feature vectors of the texts of the insurance products in the first data set.

And sequentially performing convolution and pooling on the word vector of the text of each insurance product in the first data set to obtain the feature vector of the text of each insurance product in the first data set. Specifically, feature extraction is performed by using one-dimensional convolution on word vectors of texts of each insurance product in the first data set, so that a shallow feature vector of the text of each insurance product in the first data set is obtained, then maximum pooling operation is performed on the shallow feature vector, and finally the feature vector of the text of each insurance product in the first data set is obtained.

And S1314, inputting the feature vector of the text of each insurance product in the first data set into a classifier for classification processing to obtain a plurality of evaluation information of each insurance product in the first data set.

And inputting the feature vector of the text of each insurance product in the first data set into a classifier for classification processing to obtain a plurality of evaluation information of each insurance product in the first data set. In the embodiment of the present invention, the feature vector of the text of each insurance product in the first data set is input into a Softmax classifier, and the classification processing is performed by the Softmax classifier, so that a plurality of evaluation information of each insurance product in the first data set can be obtained.

S132, according to the evaluation information of each insurance product in the first data set, a public praise score of each insurance product in the first data set is obtained.

And acquiring a public praise score of each insurance product in the first data set according to the plurality of evaluation information of each insurance product in the first data set. Specifically, after acquiring a plurality of evaluation information of each insurance product in the first data set, counting the plurality of evaluation information of the insurance product to obtain the number of positive evaluations, the number of negative evaluations and the number of neutral evaluations of each insurance product in the first data set, and then acquiring a public praise score of each insurance product in the first data set by using a preset public praise calculation formula, wherein the public praise score calculation formula is as follows: public praise score P ═ a (a)₁×V₁+a₂×V₂+a₃×V₃)×100/(V₁+V₂+V₃) Wherein a is₁、a₂、a₃Is a preset parameter, V₁Number of positive evaluations for insurance products, V₂Number of negative evaluations for insurance product, V₃The number of neutral evaluations of the insurance product.

S140, data capture is carried out from the first data set and the second data set according to a preset ontology model so as to obtain the sales state information and the product attribute information of all insurance products in the insurance knowledge graph.

And capturing data from the first data set and the second data set according to a preset ontology model to obtain the sales state information and the product attribute information of all insurance products in the insurance knowledge graph. Specifically, the ontology model is a model for constructing a structural layer of the insurance intellectual map, and since the characteristics of insurance products in the insurance field are relatively single and fixed, the ontology of the insurance intellectual map in the prior art can be directly used, and the sales state information and the product attribute information of all insurance products in the insurance intellectual map are entity layer information for constructing the insurance intellectual map. The sales state information is initial time information of each insurance product in the insurance knowledge graph for selling and information whether the sales is ended, and the product attribute information is sales companies to which each insurance product in the insurance knowledge graph belongs and attribute introduction information of each insurance product. And capturing data from the first data set and the second data set through the ontology model to obtain entity layer information for constructing the insurance knowledge graph, namely the sales state information and the product attribute information of all insurance products in the insurance knowledge graph. In the embodiment of the invention, the concept of the ontology in the ontology model is insurance, property insurance, personal insurance and responsibility insurance are sub-concepts of the insurance, the dangerous species of the property insurance lower layer, such as marine insurance, fire insurance, transportation insurance and engineering insurance, the dangerous species of the personal insurance lower layer, such as human life insurance, health insurance and accidental injury insurance, and the dangerous species of the responsibility insurance lower layer, such as employer responsibility insurance, occupational responsibility insurance and product responsibility insurance, are respectively used as sub-concepts of the property insurance, the personal insurance and the responsibility insurance.

In another embodiment, as shown in fig. 4, step S140 includes sub-steps S141 and S142.

And S141, respectively processing the first data set and the second data set according to a preset data processing model to obtain a structured data set.

And respectively processing the first data set and the second data set according to a preset data processing model to obtain a structured data set. The data processing model is a model for converting unstructured data in the first data set and the second data set into structured data, respectively. The data sets crawled by the web crawler program from the third-party insurance platform and the official docket insurance platform are unstructured data and structured data, the unstructured data are data with irregular or incomplete data structures and have no predefined data models, and the data are inconvenient to be represented by a database two-dimensional logic table, so that the unstructured data in the first data set and the second data set need to be processed, so that all data in the first data set and the second data set are structured data.

In another embodiment, as shown in fig. 5, step S141 includes sub-steps S1411 and S1412.

S1411, respectively performing text conversion on the unstructured texts in the first data set and the second data set according to a preset text conversion model to obtain structured texts.

And respectively performing text conversion on the unstructured texts in the first data set and the second data set according to a preset text conversion model to obtain structured texts. Specifically, the text conversion model is a model for converting unstructured texts in the first data set and the second data set into structured texts, and the specific conversion process is as follows: and acquiring unstructured texts in the first data set and the second data set, converting the unstructured texts into semi-structured texts, and finally converting the semi-structured texts to obtain the structured texts. In the embodiment of the present invention, the semi-structured text is a text in an XML format.

And S1412, identifying the pictures in the first data set based on an OCR (optical character recognition) technology to obtain character information in the pictures.

And identifying the pictures in the first data set based on an OCR (optical character recognition) technology to obtain the character information in the pictures. Specifically, the OCR (Optical Character Recognition) technology is a technology that, for a print Character, characters in a paper document are optically converted into an image file of a black-and-white dot matrix, and characters in the image are converted into a text format by Recognition software for further editing and processing by Character processing software. In the embodiment of the invention, the picture is preprocessed, and then the preprocessed picture is input into a pre-trained convolutional neural network model to obtain the character information in the picture.

In another embodiment, as shown in fig. 6, step S1412 includes sub-steps S14121, S14122, and S14123.

S14121, regularizing the picture according to a nonlinear regularization rule to enlarge or reduce the picture.

And carrying out regularization processing on the picture according to a nonlinear regularization rule so as to enlarge or reduce the picture. Specifically, the non-linear regularization rule changes the size of the picture on the premise of keeping the overall shape of the picture unchanged, so that the original shape of the characters in the picture can be kept to a large extent, and the distortion degree is small. The conversion formula is as follows:

wherein, W represents the original width of the character, H represents the original height of the character, W 'represents the normalized width of the character, H' represents the normalized height of the character, m represents the conversion ratio of the width of the character, and n represents the conversion ratio of the height of the character.

Assuming that the coordinates of the character points in the picture are (x, y), the linear regularization calculation formula corresponding to the point coordinates is as follows:

where (x, y) represents the original point coordinates of the character, (x ', y') represents the original point coordinates of the character after normalization, m represents the conversion ratio of the width of the character, and n represents the conversion ratio of the height of the character.

S14122, carrying out interpolation processing on the normalized picture according to the segmentation interpolation processing rule of the character to generate a preprocessed picture.

And carrying out interpolation processing on the normalized picture according to the segmentation interpolation processing rule of the character to generate a preprocessed picture. Specifically, the character piecewise interpolation processing rule performs piecewise interpolation processing on the characters in the normalized picture according to a specific function. And simulating the track of the character by utilizing a piecewise linear interpolation constructor so as to obtain the recognition rate of characters in the picture.

Suppose there is an interval [ a, b ]]The point of presence x on the interval₀，x₁，x₂，…x_nAnd the size of the magnetic core is a ═ x₀＜x₁＜x₂＜…＜x_nB, f (x) is a function defined in the interval, which corresponds to a function value y₀，y₁，y₂，…y_nIf the function φ (x) satisfies the following condition: a. in the interval [ a, b]Above, the phi (x) function is a continuous function; b. in each sub-interval [ x ]_i，x_i+1]Where (i is 0, 1, 2 …, n-1), phi (x) is a polynomial of degree k; then the phi (x) function is f (x) in the interval [ a, b ]]The piecewise k-th order interpolation polynomial. And when k is 1, piecewise linear interpolation is performed, and when k is 2, piecewise parabolic interpolation is performed.

Piecewise linear interpolation, also called piecewise linear interpolation, is applied in each subinterval [ x ]_i，x_i+1](i-0, 1, 2 …, n-1), phi (x) is a first order interpolation polynomial of the formula:

wherein, x_i，x_i+1Is the interval [ a, b]Point of, y_i，y_i+1For a function f (x) at point x, x_i，x_i+1The corresponding function value, phi (x), is a piecewise linear interpolation polynomial.

S14123, inputting the preprocessed picture into a pre-trained convolutional neural network model to obtain the character information in the picture.

And inputting the preprocessed picture into a pre-trained convolutional neural network model to obtain the character information in the picture. Specifically, the convolutional neural network model is a neural network model which is trained in advance and used for character recognition of a picture containing characters. And the convolutional neural network model respectively performs convolution, pooling and classification on the preprocessed pictures to obtain the character information in the pictures.

And S142, capturing data from the structured data set according to the ontology model to obtain the sales state information and the product attribute information of all insurance products in the insurance knowledge graph.

S150, constructing the insurance knowledge graph according to the sales state information, the product attribute information and the public praise score.

And constructing the insurance knowledge graph according to the sales state information, the product attribute information and the public praise score. Specifically, the sales status information and the product attribute information include both data captured from the first data set and data captured from the second data set, so knowledge fusion needs to be performed on the sales status information and the product attribute information in the same insurance product respectively belonging to the first data set and the second data set by an entity alignment technique to obtain the fused sales status information and product attribute information of each insurance product, and finally, the public praise score of each insurance product, the fused sales status information and the fused product attribute information are stored in a structured database, so that the insurance knowledge map is obtained.

In another embodiment, as shown in fig. 6, after step S150, steps S161 and S162 are further included.

S161, if a query sentence input by a user is received, obtaining a term with an entity name in the query sentence according to a preset named entity identification model.

And if a query sentence input by a user is received, acquiring a term with an entity name in the query sentence according to a preset named entity identification model. Specifically, the named entity recognition model is a model for performing named entity recognition on each term in the query statement. After receiving the query statement input by the second user, performing word segmentation processing on the query statement to obtain a single word in the query statement, and then performing entity naming identification to obtain a word with an entity name in the query statement. For example, when the query statement input by the second user is "i want to know the relevant condition of health risk", the query statement is subjected to word segmentation processing to obtain "i", "want", "know", "health risk", "relevant", "condition", and then seven terms of "i", "want", "know", "health risk", "of", "relevant", "condition" are subjected to entity naming recognition to obtain the term with entity naming in the query statement as "health risk".

And S162, carrying out entity linkage from the insurance knowledge graph according to the terms with entity names in the query sentence to obtain a query result.

And carrying out entity linkage from the insurance knowledge graph according to the terms with entity names in the query sentence to obtain a query result. Specifically, a plurality of entities similar to the terms with the entity names in the query sentence can be obtained from the insurance knowledge graph through the terms with the entity names in the query sentence, similarity calculation is carried out on the terms with the entity names in the query sentence and the entities similar to the terms with the entity names in the query sentence, the entity with the highest similarity is the term with the entity names closest to the query sentence, and entity linkage is carried out, so that the query result matched with the query sentence can be obtained from the insurance knowledge graph.

In the construction method of the insurance knowledge map provided by the embodiment of the invention, the website information of the third-party insurance platform and the website information of the official filing platform input by a user are received; respectively acquiring a first data set and a second data set for constructing an insurance knowledge map from the third-party insurance platform and the official filing platform based on the website information and a preset web crawler program; obtaining a public praise score for each insurance product from the first data set according to a preset public praise score model; capturing data from the first data set and the second data set according to a preset ontology model to obtain the sales state information and the product attribute information of all insurance products in the insurance knowledge graph; and constructing the insurance knowledge graph according to the sales state information, the product attribute information and the public praise score. By the method, the efficiency and the accuracy of establishing the insurance knowledge map are improved, the established insurance knowledge map gives consideration to timeliness and information authority, public praise and sales state of insurance products are displayed, and the life cycle of the insurance products is completely described.

The embodiment of the invention also provides an insurance knowledgebase constructing device 100 which is used for executing any embodiment of the insurance knowledgebase constructing method. Specifically, referring to fig. 8, fig. 8 is a schematic block diagram of an insurance knowledge graph building apparatus 100 according to an embodiment of the present invention.

As shown in fig. 8, the apparatus 100 for building an insurance knowledgebase includes a first receiving unit 110, a first obtaining unit 120, a second obtaining unit 130, a third obtaining unit 140, and a building unit 150.

The first receiving unit 110 is configured to receive website information of the third party insurance platform and website information of the official docket platform, which are input by a user.

A first obtaining unit 120, configured to obtain a first data set and a second data set for constructing an insurance knowledge graph from the third party insurance platform and the official docket platform, respectively, based on the website information and a preset web crawler program.

A second obtaining unit 130, configured to obtain a public praise score of each insurance product from the first data set according to a preset public praise score model.

In another embodiment of the present invention, as shown in fig. 9, the second obtaining unit 130 includes: a first classification unit 131 and a fourth acquisition unit 132.

The first classifying unit 131 is configured to acquire a text of each insurance product in the first data set and classify the text of each insurance product in the first data set according to a preset text classification model to obtain multiple evaluation information of each insurance product in the first data set.

In another embodiment of the present invention, as shown in fig. 10, the first classification unit 131 includes: a word segmentation unit 1311, a processing unit 1312, a convolution unit 1313 and a second classification unit 1314.

A word segmentation unit 1311, configured to perform word segmentation on the text of each insurance product in the first data set, so as to obtain each word in the text of each insurance product in the first data set.

The processing unit 1312 is configured to process each word in the text of each insurance product in the first data set according to a preset word embedding model, so as to obtain a word vector of the text of each insurance product in the first data set.

A convolution unit 1313, configured to perform convolution and pooling on the word vector of the text of each insurance product in the first data set in sequence to obtain a feature vector of the text of each insurance product in the first data set.

The second classification unit 1314 is configured to input the feature vector of the text of each insurance product in the first data set into a classifier for classification processing, so as to obtain multiple evaluation information of each insurance product in the first data set.

A fourth obtaining unit 132, configured to obtain a public praise score of each insurance product in the first data set according to the multiple evaluation information of each insurance product in the first data set.

A third obtaining unit 140, configured to capture data from the first data set and the second data set according to a preset ontology model to obtain the sales state information and the product attribute information of all insurance products in the insurance knowledge graph.

In another embodiment of the present invention, as shown in fig. 11, the third obtaining unit 140 includes: a data processing unit 141 and a data fetch unit 142.

The data processing unit 141 is configured to process the first data set and the second data set according to a preset data processing model to obtain a structured data set.

In another embodiment of the present invention, as shown in fig. 12, the data processing unit 141 includes: a conversion unit 1411 and a first recognition unit 1412.

A conversion unit 1411, configured to perform text conversion on the unstructured texts in the first data set and the second data set respectively according to a preset text conversion model to obtain structured texts.

The first identifying unit 1412 is configured to identify the picture in the first data set based on an OCR recognition technology, so as to obtain the text information in the picture.

In another embodiment of the present invention, as shown in fig. 13, the first identifying unit 1412 includes: a regularization processing unit 14121, an interpolation processing unit 14122, and a second recognition unit 14123.

A regularization processing unit 14121 for regularizing the picture according to a non-linear regularization rule to enlarge or reduce the picture.

And the interpolation processing unit 14122 is used for carrying out interpolation processing on the normalized picture according to the segmentation interpolation processing rule of the character so as to generate a preprocessed picture.

The second identifying unit 14123 is configured to input the preprocessed picture into a pre-trained convolutional neural network model, so as to obtain text information in the picture.

And the data capturing unit 142 is configured to capture data from the structured data set according to the ontology model to obtain the sales state information and the product attribute information of all insurance products in the insurance knowledge graph.

The construction unit 150 is configured to construct the insurance knowledge graph according to the sales status information, the product attribute information, and the public praise score.

In another embodiment of the present invention, the apparatus 100 for constructing an insurance knowledge graph further comprises: a second receiving unit 161 and a fifth acquiring unit 162.

The second receiving unit 161 is configured to, if a query statement input by a user is received, obtain a term with an entity name in the query statement according to a preset named entity identification model.

A fifth obtaining unit 162, configured to perform entity linking from the insurance knowledge graph according to the terms with entity names in the query statement, so as to obtain a query result.

The insurance knowledge map building device 100 provided by the embodiment of the invention is used for executing the website information of the third party insurance platform and the website information of the official filing platform which are used for receiving the input of the user; respectively acquiring a first data set and a second data set for constructing an insurance knowledge map from the third-party insurance platform and the official filing platform based on the website information and a preset web crawler program; obtaining a public praise score for each insurance product from the first data set according to a preset public praise score model; capturing data from the first data set and the second data set according to a preset ontology model to obtain the sales state information and the product attribute information of all insurance products in the insurance knowledge graph; and constructing the insurance knowledge graph according to the sales state information, the product attribute information and the public praise score.

Referring to fig. 14, fig. 14 is a schematic block diagram of a computer device according to an embodiment of the present invention.

Referring to fig. 14, the device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.

The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032, when executed, causes the processor 502 to perform a method of building an insurance knowledgegraph.

The processor 502 is used to provide computing and control capabilities that support the operation of the overall device 500.

The internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503, and when the computer program 5032 is executed by the processor 502, the processor 502 can be caused to execute the method for constructing the insurance knowledge graph.

The network interface 505 is used for network communication, such as providing transmission of data information. It will be appreciated by those skilled in the art that the configuration shown in fig. 14 is a block diagram of only a portion of the configuration associated with the inventive arrangements and is not intended to limit the apparatus 500 to which the inventive arrangements may be applied, and that a particular apparatus 500 may include more or less components than those shown, or some components may be combined, or have a different arrangement of components.

Wherein the processor 502 is configured to run the computer program 5032 stored in the memory to implement the following functions: receiving website information of a third-party insurance platform and website information of an official filing platform input by a user; respectively acquiring a first data set and a second data set for constructing an insurance knowledge map from the third-party insurance platform and the official filing platform based on the website information and a preset web crawler program; obtaining a public praise score for each insurance product from the first data set according to a preset public praise score model; capturing data from the first data set and the second data set according to a preset ontology model to obtain the sales state information and the product attribute information of all insurance products in the insurance knowledge graph; and constructing the insurance knowledge graph according to the sales state information, the product attribute information and the public praise score.

Those skilled in the art will appreciate that the embodiment of the apparatus 500 shown in fig. 14 does not constitute a limitation on the specific construction of the apparatus 500, and in other embodiments, the apparatus 500 may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. For example, in some embodiments, the apparatus 500 may only include the memory and the processor 502, and in such embodiments, the structure and function of the memory and the processor 502 are the same as those of the embodiment shown in fig. 14, and are not repeated herein.

It should be understood that in the present embodiment, the Processor 502 may be a Central Processing Unit (CPU), and the Processor 502 may also be other general-purpose processors 502, a Digital Signal Processor 502 (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The general-purpose processor 502 may be a microprocessor 502 or the processor 502 may be any conventional processor 502 or the like.

In another embodiment of the present invention, a computer storage medium is provided. The storage medium may be a non-volatile computer-readable storage medium. The storage medium stores a computer program 5032, wherein the computer program 5032 when executed by the processor 502 performs the steps of: receiving website information of a third-party insurance platform and website information of an official filing platform input by a user; respectively acquiring a first data set and a second data set for constructing an insurance knowledge map from the third-party insurance platform and the official filing platform based on the website information and a preset web crawler program; obtaining a public praise score for each insurance product from the first data set according to a preset public praise score model; capturing data from the first data set and the second data set according to a preset ontology model to obtain the sales state information and the product attribute information of all insurance products in the insurance knowledge graph; and constructing the insurance knowledge graph according to the sales state information, the product attribute information and the public praise score.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses, devices and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only a logical division, and there may be other divisions when the actual implementation is performed, or units having the same function may be grouped into one unit, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a device 500 (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A construction method of an insurance knowledge graph is characterized by comprising the following steps:

2. The insurance knowledgegraph constructing method according to claim 1, wherein the obtaining of the public praise score of each insurance product from the first data set according to a preset public praise score model comprises:

acquiring a text of each insurance product in the first data set and classifying the text of each insurance product in the first data set according to a preset text classification model to obtain a plurality of evaluation information of each insurance product in the first data set;

and acquiring a public praise score of each insurance product in the first data set according to the plurality of evaluation information of each insurance product in the first data set.

3. The insurance knowledgebase map construction method according to claim 2, wherein the classifying the text of each insurance product in the first data set according to a preset text classification model to obtain a plurality of evaluation information of each insurance product in the first data set comprises:

performing word segmentation on the text of each insurance product in the first data set to obtain each word in the text of each insurance product in the first data set;

processing each word in the text of each insurance product in the first data set according to a preset word embedding model to obtain a word vector of the text of each insurance product in the first data set;

performing convolution and pooling on word vectors of texts of each insurance product in the first data set in sequence to obtain feature vectors of the texts of each insurance product in the first data set;

and inputting the feature vector of the text of each insurance product in the first data set into a classifier for classification processing to obtain a plurality of evaluation information of each insurance product in the first data set.

4. The method for constructing an insurance knowledgegraph according to claim 1, wherein the capturing data from the first data set and the second data set according to a preset ontology model to obtain the sales state information and the product attribute information of all insurance products in the insurance knowledgegraph comprises:

respectively processing the first data set and the second data set according to a preset data processing model to obtain a structured data set;

and capturing data from the structured data set according to the ontology model to obtain the sales state information and the product attribute information of all insurance products in the insurance knowledge graph.

5. The insurance knowledgegraph constructing method according to claim 4, wherein the processing the first data set and the second data set according to a preset data processing model to obtain a structured data set comprises:

respectively performing text conversion on the unstructured texts in the first data set and the second data set according to a preset text conversion model to obtain structured texts;

and identifying the pictures in the first data set based on an OCR (optical character recognition) technology to obtain the character information in the pictures.

6. The insurance knowledgebase map construction method according to claim 4, wherein identifying the picture in the first data set based on OCR recognition technology to obtain the text information in the picture comprises:

regularizing the picture according to a nonlinear regularization rule to enlarge or reduce the picture;

carrying out interpolation processing on the normalized picture according to the segmentation interpolation processing rule of the character to generate a preprocessed picture;

and inputting the preprocessed picture into a pre-trained convolutional neural network model to obtain the character information in the picture.

7. The insurance knowledgegraph constructing method according to claim 1, wherein after constructing the insurance knowledgegraph according to the sales status information, the product attribute information, and the public praise score, the insurance knowledgegraph constructing method further comprises:

if a query sentence input by a user is received, acquiring a term with an entity name in the query sentence according to a preset named entity identification model;

and carrying out entity linkage from the insurance knowledge graph according to the terms with entity names in the query sentence to obtain a query result.

8. An insurance knowledge graph building device, comprising:

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of construction of an insurance knowledgegraph according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to execute the method of construction of an insurance knowledgegraph according to any one of claims 1 to 7.