CN116542783A

CN116542783A - Risk assessment method, device, equipment and storage medium based on artificial intelligence

Info

Publication number: CN116542783A
Application number: CN202310506373.5A
Authority: CN
Inventors: 卢金金; 潘劲松; 陈少琼
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2023-05-05
Filing date: 2023-05-05
Publication date: 2023-08-04

Abstract

The embodiment of the application belongs to the field of artificial intelligence, and relates to a risk assessment method based on artificial intelligence, which comprises the following steps: acquiring case data of insurance claim cases; obtaining structured data, document data, voice data and picture data from the case data; text extraction is carried out on the document data and the voice data to obtain first text information and second text information; obtaining a first risk score based on the first text information and the second text information; obtaining voiceprint features based on the voice data; obtaining a second risk score based on the picture data; performing risk assessment on the structured data, the first risk score, the voiceprint feature and the second risk score to obtain a risk score; and generating a risk assessment result based on the risk score. The application also provides an artificial intelligence-based risk assessment device, computer equipment and a storage medium. In addition, the present application relates to blockchain technology, where target risk scores may be stored in the blockchain. The method and the device improve the efficiency and the accuracy of case risk assessment.

Description

Risk assessment method, device, equipment and storage medium based on artificial intelligence

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to an artificial intelligence-based risk assessment method, apparatus, computer device, and storage medium.

Background

Along with the development of improvement of openness and rapid economic development, insurance is taken as a risk guarantee product of personal and property, and more attention is paid to the insurance, meanwhile, insurance fraud is gradually becoming an increasing point of economic crimes, and insurance companies need to invest more manpower and material resources in the field of insurance anti-fraud for prevention, so that new fraud measures are still layered endlessly as soon as possible.

The current mode for identifying the risk of the insurance case mainly comprises the following schemes: 1) Manually judging whether a case has a risk in a certain aspect according to experience or a case site; 2) Manually setting rules on the system, and initiating manual investigation if the cases meet the rule conditions, wherein common rules include whether the cases are in danger at night, whether the cases are delayed, and the like. The existing processing mode for identifying the risk of the insurance case needs to consume more manpower and material resources, has large workload and low working efficiency, and cannot guarantee the accuracy of the obtained risk identification result.

Disclosure of Invention

An object of the embodiments of the present application is to provide a risk assessment method, apparatus, computer device and storage medium based on artificial intelligence, so as to solve the technical problems that the existing processing method for identifying the risk of the insurance case needs to consume more manpower and material resources, has large workload, has low working efficiency, and cannot guarantee the accuracy of the obtained risk identification result.

In order to solve the above technical problems, the embodiments of the present application provide an artificial intelligence based risk assessment method, which adopts the following technical scheme:

acquiring case data of an insurance claim case to be processed;

acquiring target structured data, target document data, target voice data and target picture data from the case data based on preset data screening conditions;

text information extraction is carried out on the target document data to obtain corresponding first text information, and text information extraction is carried out on the target voice data to obtain corresponding second text information;

performing risk scoring processing on the first text information and the second text information based on a preset target text classification model to obtain corresponding first risk scores;

performing voiceprint extraction on the target voice data based on a preset voiceprint recognition model to obtain target voiceprint characteristics corresponding to a preset information type;

performing risk scoring processing on the target picture data based on a preset picture classification model to obtain a corresponding second risk score;

performing risk assessment processing on the target structured data, the first risk score, the target voiceprint feature and the second risk score based on a preset risk assessment model to obtain a target risk score corresponding to the insurance claim case;

And generating a risk assessment result of the insurance claim case based on the target risk score.

Further, the step of extracting text information from the target document data to obtain corresponding first text information specifically includes:

performing text position detection on the target document data based on a preset text detection algorithm to obtain a feature map corresponding to the target document data;

acquiring a text block in the feature map;

and carrying out text recognition on the text block based on a preset text recognition algorithm to obtain the first text information.

Further, before the step of performing risk scoring processing on the first text information and the second text information based on the preset target text classification model to obtain the corresponding first risk score, the method further includes:

collecting historical sample text data;

preprocessing the historical sample text data to construct a training set and a testing set; the training set comprises training texts and text label information corresponding to the training texts;

performing word vectorization on the training set based on a preset algorithm to map the training set into corresponding word vectors;

Constructing a word vector matrix corresponding to the word vector;

inputting the word vector matrix into a preset initial text classification model for training to obtain a trained first initial text classification model;

testing the first initial text classification model based on the test set to obtain a second initial classification model;

and taking the second initial classification model as the target text classification model.

Further, before the step of inputting the word vector matrix into a preset initial text classification model to train and obtaining a trained first initial text classification model, the method further includes:

acquiring a plurality of text classification models;

respectively acquiring cost information and performance information of each text classification model;

and determining the initial text classification model from all the text classification models based on the cost information and the performance information.

Further, before the step of performing risk scoring processing on the target picture data based on the preset picture classification model to obtain the corresponding second risk score, the method further includes:

obtaining a pre-training model obtained after pre-training a preset convolutional neural network;

Acquiring historical case data;

acquiring training data from the historical case data; the training data comprises training pictures and risk tag information corresponding to the training pictures;

fine tuning the pre-training model based on the training data to obtain a target pre-training model meeting preset convergence conditions;

and taking the target pre-training model as the picture classification model.

Further, the step of performing risk assessment processing on the target structured data, the first risk score, the target voiceprint feature and the second risk score based on a preset risk assessment model to obtain a target risk score corresponding to the insurance claim case includes:

classifying the target structured data based on a preset dimension to obtain corresponding dynamic characteristic data and static characteristic data;

coding the category characteristics contained in the static characteristic data to obtain corresponding numerical value data;

and inputting the dynamic characteristic data, the numerical data, the first risk score, the voiceprint characteristic and the second risk score into the risk assessment model for score calculation processing to obtain a target risk score corresponding to the insurance claim case.

Further, the step of generating the risk assessment result of the insurance claim case based on the target risk score specifically includes:

judging whether the target risk score is smaller than a preset risk threshold value or not;

if the risk threshold value is smaller than the risk threshold value, generating a first risk assessment result that the insurance claim case is a non-risk case;

and if the risk threshold value is not smaller than the risk threshold value, generating a second risk assessment result that the insurance claim case is a risk case.

In order to solve the above technical problems, the embodiments of the present application further provide an artificial intelligence based risk assessment device, which adopts the following technical scheme:

the first acquisition module is used for acquiring case data of the insurance claim case to be processed;

the second acquisition module is used for acquiring target structured data, target document data, target voice data and target picture data from the case data based on preset data screening conditions;

the first extraction module is used for extracting text information from the target document data to obtain corresponding first text information, and extracting text information from the target voice data to obtain corresponding second text information;

The first processing module is used for carrying out risk scoring processing on the first text information and the second text information based on a preset target text classification model to obtain corresponding first risk scores;

the second extraction module is used for carrying out voiceprint extraction on the target voice data based on a preset voiceprint recognition model to obtain target voiceprint characteristics corresponding to a preset information type;

the second processing module is used for carrying out risk scoring processing on the target picture data based on a preset picture classification model to obtain a corresponding second risk score;

the evaluation module is used for performing risk evaluation processing on the target structured data, the first risk score, the target voiceprint feature and the second risk score based on a preset risk evaluation model to obtain a target risk score corresponding to the insurance claim case;

and the generation module is used for generating a risk assessment result of the insurance claim case based on the target risk score.

In order to solve the above technical problems, the embodiments of the present application further provide a computer device, which adopts the following technical schemes:

acquiring case data of an insurance claim case to be processed;

In order to solve the above technical problems, embodiments of the present application further provide a computer readable storage medium, which adopts the following technical solutions:

acquiring case data of an insurance claim case to be processed;

Compared with the prior art, the embodiment of the application has the following main beneficial effects:

according to the embodiment of the application, the text classification model, the voiceprint recognition model, the picture classification model and the risk assessment model are used for carrying out risk assessment on the multidimensional data which are contained in the insurance claim case and are relevant to the risk assessment, so that the risk assessment result corresponding to the insurance claim case can be rapidly and accurately generated, the processing efficiency of carrying out risk assessment on the insurance claim case is improved, and the accuracy of the generated risk assessment result is ensured.

Drawings

For a clearer description of the solution in the present application, a brief description will be given below of the drawings that are needed in the description of the embodiments of the present application, it being obvious that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow chart of one embodiment of an artificial intelligence based risk assessment method according to the present application;

FIG. 3 is a schematic diagram of one embodiment of an artificial intelligence based risk assessment device according to the present application;

FIG. 4 is a schematic structural diagram of one embodiment of a computer device according to the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description and claims of the present application and in the description of the figures above are intended to cover non-exclusive inclusions. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

In order to better understand the technical solutions of the present application, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings.

As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103. It should be noted that, the risk assessment method based on artificial intelligence provided in the embodiments of the present application is generally executed by a server/terminal device, and accordingly, the risk assessment device based on artificial intelligence is generally disposed in the server/terminal device. It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow chart of one embodiment of an artificial intelligence based risk assessment method according to the present application is shown. The risk assessment method based on artificial intelligence comprises the following steps:

step S201, acquiring case data of an insurance claim case to be processed.

In this embodiment, the electronic device (for example, the server/terminal device shown in fig. 1) on which the risk assessment method based on artificial intelligence operates may acquire the case data through a wired connection manner or a wireless connection manner. It should be noted that the wireless connection may include, but is not limited to, 3G/4G/5G connection, wiFi connection, bluetooth connection, wiMAX connection, zigbee connection, UWB (ultra wideband) connection, and other now known or later developed wireless connection. The case data of the insurance claim case to be processed can be obtained by inquiring the case data created in advance.

Step S202, acquiring target structured data, target document data, target voice data and target picture data from the case data based on preset data screening conditions.

In this embodiment, the data filtering conditions include a structured data filtering condition corresponding to a preset first information category, a document data filtering condition corresponding to a second information category, a voice data filtering condition corresponding to a third information category, and a picture data filtering condition corresponding to a fourth information category. Wherein the first information category includes: underwriting dimension, case reporting dimension, investigation dimension, license plate number dimension, frame number dimension, case reporting mobile phone number dimension, customer information and the like. The target structured data includes underwriting information: the total insurance amount, the vehicle loss insurance amount, the insurance start-stop date, the insurance policy history insurance emergence condition, the insurance policy and other insurance availability information; report information: such as case report time, time of occurrence, address of occurrence, occurrence history, cause of occurrence, case report voice, etc.; survey information: survey address, survey person, survey description, travel license information and other available information; loss assessment information: loss availability information such as repair shops, vehicle loss pictures, accident scene pictures, claim documents and the like; standard and three license plate number information: license plate number available information such as historical risk information; standard and three frame number information: the available information of frame numbers such as historical risk information; number information of the case report mobile phone: mobile phone number available information such as history application, report and the like; applicant, insured, and owner information: historical application and risk, etc. The second information category may include the categories of underwriting documents, claim documents, and case reporting voices; the third information category may include a report voice category; the fourth category of information may include a vehicle loss picture type, an accident scene picture type, and the like.

Step S203, extracting text information from the target document data to obtain corresponding first text information, and extracting text information from the target voice data to obtain corresponding second text information.

In this embodiment, an end-to-end speech recognition scheme may be selected to extract text information from the target speech data to obtain the corresponding second text information. Specifically, the speech input by the newspaper is transcribed by using an end-to-end speech recognition standard scheme, and text data is output. The details are as follows: a model structure; the voice recognition basic model structure is an encoder-encoder framework, wherein the encoder is a standard multi-layer encoder structure, the encoder is a CTC encoder and Attention decoder combined structure, and the CTC encoder is a single-layer linear layer structure; attention decoder is a standard multilayer transducer structure. In training, CTCloss and Attention loss are adopted jointly to accelerate training convergence; during decoding, the output scores of the CTCbeam search and the attribute decoder are used for weighted summation, and the final recognition result is determined according to score ranking. Model training: the sources of the training corpus are open-source public labeling data sets and internal desensitization labeling data sets. The data set is divided into a training set and a verification set before training. During training, 1) generating a character level vocabulary list based on texts in a training set; 2) Generating a globalcmVN based on the audio sequences in the training set for eliminating the sound differences; 3) And (3) arranging the training set and the verification set data according to a standard format, inputting the training set and the verification set data into an encoder-decoder framework for training, and outputting an identification model which can be used for decoding after multiple iterations. The decoding flow is as follows: 1) After the audio signal is acquired, firstly, checking the audio, judging that the audio format meets the input requirement, and if the format of the input audio does not meet the requirement, formatting; 2) Obtaining a frequency domain feature sequence (MFCC or fbank) of the audio through an audio feature extraction tool; 3) After the downsampling operation, the frequency domain feature sequence is input into an encoder, and a frame-token posterior probability map is output; 4) Based on the probability map, a CTC beam search decoding is firstly carried out to obtain Nb est paths and paths CTC scores, then an Attention decoder is used for carrying out recovery on the Nb est paths to obtain the Attention scores of the Nb est paths, weighting summation is carried out based on the CTC scores and the Attention scores to obtain final scores of all paths, and the paths with the highest scores are taken as final output and used as text information obtained through recognition. In addition, the specific implementation process of extracting the text information from the target document data to obtain the first text information will be described in further detail in the following specific embodiments, which will not be described herein.

Step S204, performing risk scoring processing on the first text information and the second text information based on a preset target text classification model to obtain a corresponding first risk score.

In this embodiment, after the first text information and the second text information are respectively input into the target text classification model, the target text classification model predicts a risk probability value of the word vector matrix of each text, that is, a risk score of each text.

Step S205, voiceprint extraction is performed on the target voice data based on a preset voiceprint recognition model, and target voiceprint features corresponding to the preset information type are obtained.

In this embodiment, the voiceprint recognition model may be an x-vector model. The information types include the speech intensity of the newspaper. The information types include application information, case report information and risk information. The voice print characteristics are extracted by adopting an x-vector model algorithm on the case-reporting voice, and a voice print library is built and used as a unique identifier of a person, so that historical application, case reporting and risk information of the case-reporting voice print are extracted for an insurance claim case. The x-vector model adds a statistics pooling layer and an embellishing layer in the TDNN network, and converts any-length input into a characteristic expression vector with fixed length to be used for representing voiceprint characteristics. The technical details are as follows: model structure: the deep neural network structure used by the X-vector system is mainly a multi-layer Frame-level layer, a single-layer Pooling layer, a two-layer Segment-level layer and a softmax layer. The loss function selects cross entropy. Model training: the corpus sources are open-source public labeling data sets and internal desensitization labeling data sets. The data set is divided into a training set and a verification set before training. During training, 1) conventional data enhancement, VAD (voice frequency) voice frequency preprocessing operation such as silence removal section and the like are carried out on data; 2) Obtaining a frequency domain feature sequence (MFCC) of the audio generated in the previous step through an audio feature extraction tool; 3) The training set and the verification set data are arranged according to a standard format and then are input into an x-vector system, the vectors output by a segment processing layer are taken and communicated The softmax classifier is used for classification to predict the class of the target speaker for discriminative training. Voiceprint library construction: registering all historical voiceprints according to audio information in historical insurance, case reporting and danger. Before each voiceprint is put in storage, firstly, audio frequency is processed, then the audio frequency is extracted, the audio frequency fragment of the previous 10s is input into an x-vector model, the corresponding feature expression vector is output, the matching is carried out between PLDA and the existing voiceprints in the voiceprint library through a classical similarity algorithm, if the similarity score is higher than a preset threshold value, the input audio frequency information is classified into the existing voiceprint category, and if the similarity score is lower than the threshold value, the voiceprint category is newly built. The PLDA algorithm is a channel compensation algorithm, and given a corpus u, the PLDA model can be written as: ω (u) =μ+v x y (u) +ε (u), where μ is the average of all training audio x-vector vectors; v is the load matrix, each column of which is the basis of the speaker subspace, y (u) is the hidden variable of ω (u) mapping in the speaker subspace, and ε (u) is the residual noise term. To improve the PLDA algorithm effect, the following mean variance normalization is required for the x-vector:wherein, sigma is the covariance matrix of all training audio x-vector vectors, and in the test stage, two x-vector similarity scores are calculated by using PLDA model. Assuming that H1 represents that two corpus segments are from the same speaker, H0 represents that two corpus segments are from different speakers, and two x-vector vectors are ω (u 1) and ω (u 2), respectively, the similarity calculation formula is as follows: / >

Step S206, performing risk scoring processing on the target picture data based on a preset picture classification model to obtain a corresponding second risk score.

In this embodiment, the risk scoring processing is performed on the target picture data based on the preset picture classification model, so as to obtain a corresponding second risk score.

Step S207, performing risk assessment processing on the target structured data, the first risk score, the target voiceprint feature and the second risk score based on a preset risk assessment model, so as to obtain a target risk score corresponding to the insurance claim case.

In this embodiment, the implementation process of performing risk assessment processing on the target structured data, the first risk score, the target voiceprint feature, and the second risk score based on the preset risk assessment model to obtain the target risk score corresponding to the insurance claim case will be described in further detail in the following specific embodiments, which will not be described herein.

Step S208, generating a risk assessment result of the insurance claim case based on the target risk score.

In this embodiment, the specific implementation process of generating the risk assessment result of the insurance claim case based on the target risk score is described in further detail in the following specific embodiments, which will not be described herein.

According to the risk assessment method and the risk assessment device, the text classification model, the voiceprint recognition model, the picture classification model and the risk assessment model are used for carrying out risk assessment on multidimensional data which are contained in the insurance claim case and are relevant to risk assessment, so that a risk assessment result corresponding to the insurance claim case can be rapidly and accurately generated, the processing efficiency of carrying out risk assessment on the insurance claim case is improved, and the accuracy of the generated risk assessment result is guaranteed.

In some optional implementations, the text information extraction of the target document data in step S203 obtains corresponding first text information, including the following steps:

and detecting the text position of the target document data based on a preset text detection algorithm to obtain a feature map corresponding to the target document data.

In this embodiment, the document data is typically picture data including text information. The text detection algorithm may be specifically a psenet algorithm. The psenet algorithm is based on the thought of picture segmentation, firstly, 4 feature images from large size to small size are generated based on target document data by adopting the CNN algorithm, then the feature images are respectively up-sampled, and finally the feature images are combined into oneFor the feature F, respectively contracting the text box into n segments kernel l in proportion, respectively regressing the segments of different sizes kernel l for the feature map F, wherein the loss function is as follows: l=λl _c +(1-λ)L _s The loss function is divided into 2 parts, L _c Representing the entire text instance (S _n ) And the scaled text instance (S ₁ →S _n-1 ) And lambda is used to balance L _c And L _s . The sticking problem of the text can be well processed by using the psenet algorithm, and the method is suitable for insurance documents in various scenes.

And acquiring the text blocks in the feature map.

In this embodiment, after the text position detection for the target document data is completed, text blocks in the feature map may be respectively scratched out of the feature map.

In this embodiment, the text recognition algorithm may be specifically a Master algorithm. The Master algorithm mainly comprises 2 core modules: (1) A Multi-Aspect based global context attention mechanism encoder; (2) a transducer-based decoder. The attention mechanism-based algorithm achieves a very good effect in the field of natural language processing, the same algorithm idea is migrated to the OCR algorithm, and the Multi-Aspect-based encoder can achieve a better result in the presence of C context modeling, and can well reduce the problem of attention confusion.

According to the method and the device, the target document data are processed based on the text detection algorithm and the text recognition algorithm, so that the text information of the target document data is extracted rapidly and accurately to obtain the required first text information.

In some optional implementations of this embodiment, before step S204, the electronic device may further perform the following steps:

historical sample text data is collected.

In this embodiment, the above-mentioned historical sample text data may refer to text information contained in data such as an underwriting document, a claim document, and a case reporting voice acquired from prestored historical insurance claim case data.

Preprocessing the historical sample text data to construct a training set and a testing set; the training set comprises training texts and text label information corresponding to the training texts.

In this embodiment, the preprocessing may include word segmentation, word deactivation, and the like. The test set comprises test texts and test text label information corresponding to the test texts. The part of training texts can be marked according to the loss types, such as whether there is risk, whether drunk driving, intentional placement, whether people hurt fraud and the like. For example, in the task of judging whether a case is at risk, marking the training text as being at risk or not; in the task of judging whether the case is drunk driving or not, the text is marked as drunk driving and non-drunk driving. Other types are similarly applicable, and text can be directly marked as a specific loss type.

And carrying out word vectorization on the training set based on a preset algorithm so as to map the training set into corresponding word vectors.

In this embodiment, the preset algorithm may be a skip-gram algorithm or a cbow algorithm. The training texts contained in the training set can be firstly segmented and invalid words are removed, and then each word obtained based on the training texts is converted into word vectors by utilizing the preset algorithm.

And constructing a word vector matrix corresponding to the word vector.

In this embodiment, the construction process of the word vector matrix includes: assuming that there are m words, the word vector dimensions are n, m and n can be adjusted according to the actual conditions, and the word vector matrix dimensions are m×n.

And inputting the word vector matrix into a preset initial text classification model for training to obtain a trained first initial text classification model.

In this embodiment, the initial text classification model may be a TextCNN model, and the word vector matrix may be input into an input layer of the TextCNN model. The convolution layer convolves the input word vector matrix with a number of learnable convolution kernels. The maximum pooling layer is used for carrying out maximum sampling operation on the characteristics output by the convolution layer and extracting the maximum characteristics. The last layer is a full-connection layer, all the extracted features are connected together, and the probability of each category is output through a softmax function, so that a trained first initial text classification model is obtained. The model parameters of the TextCNN model can be adjusted according to actual data performance.

And testing the first initial text classification model based on the test set to obtain a second initial classification model.

In this embodiment, the test set is used to test the first initial text classification model to verify the model classification effect of the first initial text classification model. And if the obtained model classification effect accords with the expected effect, directly taking the first initial text classification model as the second initial classification model. And if the obtained model classification effect does not accord with the expected effect, re-executing the training process for the first initial text classification model based on the training set until a second initial text classification model with the model classification effect accord with the expected effect is obtained.

According to the method and the device, the training set and the testing set are constructed through the collected historical sample text data, and then the initial text classification model is trained and tested based on the training set and the testing set, so that a final target text classification model is obtained, risk scoring processing can be achieved on the first text information and the second text information based on the obtained target text classification model, and the generation efficiency and the accuracy of the obtained first risk score are guaranteed.

In some optional implementations, before the step of inputting the word vector matrix into a preset initial text classification model for training to obtain a trained first initial text classification model, the electronic device may further perform the following steps:

a plurality of text classification models are obtained.

In this embodiment, the text classification model may include at least a textCNN model, a bert model, and the like.

And respectively acquiring cost information and performance information of each text classification model.

In this embodiment, the performance information includes model performance information and stability information. The cost information and the performance information can be obtained by checking model performance information of various pre-stored text classification models in an actual application scene.

In this embodiment, by acquiring a first weight corresponding to the cost information, a second weight and a third weight corresponding to the model performance information and the stability information in the performance information, respectively, and then based on a calculation formula: and s= (n+x+c)/c+a to obtain model scores of the text classification models, wherein s is the model score, N is the model efficiency information, X is the stability information, C is the cost information, a is the first weight, b is the second weight, and C is the third weight. And then the text classification model with the highest model score is used as an initial text classification model.

According to the method and the device, the cost information and the performance information of each text classification model are obtained through obtaining various text classification models, and then the initial text classification model is determined from all the text classification models based on the cost information and the performance information, so that the obtained initial text classification model can be guaranteed to have lower cost information and higher performance information, and the selection intelligence of the initial text classification model is improved.

In some alternative implementations, before step S206, the electronic device may further perform the following steps:

and obtaining a pre-training model obtained after pre-training the preset convolutional neural network.

In this embodiment, the convolutional neural network may be any one of ResNet-50, VGGNet, googLeNet, resNeXt-50, deiT-S, etc. ResNet-50 is preferred as the convolutional neural network described above. The ResNet50 network has a stronger and deeper architecture, can obtain good classification effect and has superior performance. The convolutional neural network ResNet50 can be pre-trained using the ImageNet dataset to arrive at the pre-training model described above.

Historical case data is obtained.

In this embodiment, the history case data may refer to pre-stored history insurance claim case data.

Acquiring training data from the historical case data; the training data comprises training pictures and risk tag information corresponding to the training pictures.

In this embodiment, the training pictures may include car damage pictures of historical cases, such as car loss pictures and accident scene pictures. The obtained training pictures may have some invalid pictures less related to case risk information, such as logo or other pictures uploaded by an operator by mistake, the part needs to be marked as a non-vehicle loss picture, the real vehicle loss and accident scene pictures are marked as vehicle loss pictures, and then the invalid pictures in the training data are filtered. In addition, the labeling process of the risk tag information may include: and marking the train damage picture as whether the train damage picture is at risk or not according to the information of whether the case recorded by the surveyor or the surveyor is at risk or not and the performance of the picture, such as inconsistent trace and loss, no trace, multiple dangerous cases in the same place and the like.

And fine tuning the pre-training model based on the training data to obtain a target pre-training model meeting preset convergence conditions.

In this embodiment, the process of fine tuning the pre-training model based on the training data to obtain the target pre-training model satisfying the preset convergence condition may include: the pre-set number of layers of the pre-trained model are fixed, and the parameters of the layers do not participate in training, only the output layer and the convolution layer close to the output layer are trained. The value of the predetermined number is not limited, and may be, for example, 40. During fine tuning, random gradient descent is adopted for optimization; after the loss function is determined, the goal is to minimize the loss function, obtain the gradient of the current round by deviating each parameter, then update the loss function according to the opposite direction of the gradient, and continuously perform iterative update so as to obtain the global optimal solution of the super parameter. Wherein the activation function may employ a ReLu function. The pre-training model is trained to obtain the characteristic information of the training picture, and then the training picture is classified by a softmax classifier, and a final classification result is obtained by outputting probability. According to the errors generated by the prediction result (the classification result) of the training picture and the real label (the risk label information corresponding to the training picture), the objective function carries out network parameter learning of the pre-training model through a back propagation method, and according to the errors generated between the prediction result and the real label, the network parameter learning is continuously transmitted to a front layer, namely back propagation is carried out, so that the parameters of the pre-training model are continuously updated, the training of the pre-training model is completed, and the target pre-training model is obtained.

And taking the target pre-training model as the picture classification model.

The method comprises the steps of obtaining a pre-training model obtained after pre-training a preset convolutional neural network; then acquiring training data from the obtained historical case data; and fine tuning the pre-training model based on training data to obtain a target pre-training model meeting preset convergence conditions, and using the target pre-training model as a picture classification model. According to the method and the device, the required picture classification model is obtained through training by means of pretraining and fine tuning the convolutional neural network, the situation of fitting can be avoided to a certain extent, and good classification effect and calculation efficiency can be achieved when the picture classification model is used for processing the target picture subsequently.

In some alternative implementations of the present embodiment, step S207 includes the steps of:

and classifying the target structured data based on a preset dimension to obtain corresponding dynamic characteristic data and static characteristic data.

In this embodiment, the preset dimensions include an underwriting dimension, a case reporting dimension, a survey dimension, a license plate number dimension, a frame number dimension, a case reporting mobile phone number dimension, and a customer information dimension.

And carrying out coding processing on the category characteristics contained in the static characteristic data to obtain corresponding numerical value data.

In this embodiment, the onehot code may be used to encode the class feature data, so that the assignment sequence problem caused by the tag code may be prevented, and the method may be applied to both a machine learning algorithm and a deep neural network algorithm.

In this embodiment, the training generation process of the risk assessment model may include: the specified training data is obtained from the historical insurance claim cases, the specified structured data, the specified document data, the specified voice data and the specified picture data can be obtained from the case data corresponding to the specified historical insurance claim cases based on the preset data screening conditions, and the real scoring score corresponding to the specified historical insurance claim cases can be obtained. The step of obtaining the specified structured data, the specified document data, the specified voice data, and the specified picture data may refer to the foregoing process of obtaining the target structured data, the target document data, the target voice data, and the target picture data, which is not described herein. And training a preset initial model by adopting a stacking mode based on the appointed training data, and obtaining a risk score matched with the appointed historical insurance claim case by inputting the appointed training data into the initial model. And subsequently, according to the risk score and the real score, adjusting model parameters of the initial model until the initial model converges to obtain the risk assessment model. The initial model may be any model such as DeepFM, lightGBM, catBoost, and preferably a deep fm model. In addition, the loss value of the initial model can be determined according to the risk score and the real score, and if the loss value is smaller than or equal to a preset threshold value, the initial model is determined to be converged; and if the loss value is larger than the preset threshold value, determining that the initial model is not converged. The preset threshold may be set according to practical situations, and is not specifically limited herein. Further, the manner of determining the loss value of the initial model may include: and obtaining a preset weight parameter, carrying out difference operation on the risk score and the real score to obtain a score difference value, and multiplying the score difference value by the preset weight parameter to obtain the loss value. The preset weight parameter may be set according to an actual situation, which is not specifically limited in this embodiment. In addition, the risk assessment model can be continuously operated on line and is iteratively updated by periodically applying new data, so that the risk assessment model can not only cope with various new scheme modes, but also continuously enable insurance enterprises to reduce huge risk payment cost.

Classifying target structured data based on preset dimensions to obtain corresponding dynamic characteristic data and static characteristic data; coding the category characteristics contained in the static characteristic data to obtain corresponding numerical data; and further inputting the dynamic characteristic data, the numerical data, the first risk score, the voiceprint characteristic and the second risk score into a risk assessment model for score calculation processing to obtain a target risk score corresponding to the insurance claim case. By using the risk assessment model, the target risk score corresponding to the insurance claim case to be processed can be rapidly and accurately generated, and the processing efficiency and accuracy of risk assessment on the insurance claim case are improved.

In some alternative implementations of the present embodiment, step S208 includes the steps of:

and judging whether the target risk score is smaller than a preset risk threshold value.

In this embodiment, the risk threshold may be set according to actual situations, which is not specifically limited in this embodiment.

And if the risk threshold value is smaller than the risk threshold value, generating a first risk assessment result that the insurance claim case is a non-risk case.

In this embodiment, if it is detected that the target risk score is smaller than the risk threshold, it indicates that the insurance claim case to be processed is not at risk, and the process of calculating the claim cost in the next step may be entered

In this embodiment, if the target risk score is detected to be not less than the risk threshold, it indicates that the risk exists in the insurance claim case to be processed, and then related personnel need to be notified to initiate investigation on the insurance claim case with the risk.

Judging whether the target risk score is smaller than a preset risk threshold value or not; if the risk threshold value is smaller than the risk threshold value, generating a first risk assessment result that the insurance claim case is a non-risk case; and if the risk threshold value is not smaller than the risk threshold value, generating a second risk assessment result that the insurance claim case is the risk case. According to the risk assessment method and the risk assessment device, whether the insurance claim case has risks or not is automatically and intelligently detected through the risk assessment model, the wind control capacity of claim settlement is enhanced, and the safety of claim settlement is guaranteed.

It should be emphasized that, to further ensure the privacy and security of the risk assessment results, the risk assessment results may also be stored in a blockchain node.

The blockchain referred to in the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among them, artificial Intelligence (AI) is a theory, method, technique, and application system that simulates, extends, and expands human intelligence using a digital computer or a machine controlled by a digital computer, perceives the environment, acquires knowledge, and uses the knowledge to obtain an optimal result.

Those skilled in the art will appreciate that implementing all or part of the processes of the methods of the embodiments described above may be accomplished by computer readable instructions to instruct associated hardware, which may be stored in a computer readable storage medium, which when executed, includes processes of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of an artificial intelligence-based risk assessment apparatus, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to various electronic devices.

As shown in fig. 3, the artificial intelligence based risk assessment apparatus 300 according to the present embodiment includes:

A first obtaining module 301, configured to obtain case data of an insurance claim case to be processed;

the second obtaining module 302 is configured to obtain, from the case data, target structured data, target document data, target voice data, and target picture data based on a preset data screening condition;

the first extraction module 303 is configured to perform text information extraction on the target document data to obtain corresponding first text information, and perform text information extraction on the target voice data to obtain corresponding second text information;

the first processing module 304 is configured to perform risk scoring processing on the first text information and the second text information based on a preset target text classification model, so as to obtain a corresponding first risk score;

the second extraction module 305 is configured to perform voiceprint extraction on the target voice data based on a preset voiceprint recognition model, so as to obtain a target voiceprint feature corresponding to a preset information type;

the second processing module 306 is configured to perform risk scoring processing on the target picture data based on a preset picture classification model, so as to obtain a corresponding second risk score;

the evaluation module 307 is configured to perform risk evaluation processing on the target structured data, the first risk score, the target voiceprint feature, and the second risk score based on a preset risk evaluation model, so as to obtain a target risk score corresponding to the insurance claim case;

A generating module 308, configured to generate a risk assessment result of the insurance claim case based on the target risk score.

In this embodiment, the operations performed by the modules or units respectively correspond to the steps of the artificial intelligence based risk assessment method in the foregoing embodiment one by one, and are not described in detail.

In some alternative implementations of the present embodiment, the first extraction module 303 includes:

the detection sub-module is used for detecting the text position of the target document data based on a preset text detection algorithm to obtain a feature map corresponding to the target document data;

the obtaining submodule is used for obtaining text blocks in the feature map;

and the recognition sub-module is used for carrying out text recognition on the text block based on a preset text recognition algorithm to obtain the first text information.

In some optional implementations of the present embodiment, the artificial intelligence based risk assessment apparatus further includes:

the acquisition module is used for acquiring historical sample text data;

The first construction module is used for preprocessing the historical sample text data and constructing a training set and a testing set; the training set comprises training texts and text label information corresponding to the training texts;

the third processing module is used for carrying out word vectorization on the training set based on a preset algorithm so as to map the training set into corresponding word vectors;

the second construction module is used for constructing a word vector matrix corresponding to the word vector;

the first training module is used for inputting the word vector matrix into a preset initial text classification model for training to obtain a trained first initial text classification model;

the optimizing module is used for testing the first initial text classification model based on the test set to obtain a second initial classification model;

and the first determining module is used for taking the second initial classification model as the target text classification model.

In this embodiment, the operations performed by the modules or units are respectively corresponding to the steps of the artificial intelligence based risk assessment method in the foregoing embodiment, which is described herein in detail.

The third acquisition module is used for acquiring various text classification models;

a fourth module, configured to obtain cost information and performance information of each text classification model respectively;

and the second determining module is used for determining the initial text classification model from all the text classification models based on the cost information and the performance information.

the fourth acquisition module is used for acquiring a pre-training model obtained after pre-training a preset convolutional neural network;

a fifth acquisition module for acquiring historical case data;

a sixth acquisition module, configured to acquire training data from the historical case data; the training data comprises training pictures and risk tag information corresponding to the training pictures;

the adjustment module is used for fine adjustment of the pre-training model based on the training data to obtain a target pre-training model meeting preset convergence conditions;

And the third determining module is used for taking the target pre-training model as the picture classification model.

In some alternative implementations of the present embodiment, the second processing module 307 includes:

the first processing sub-module is used for classifying the target structured data based on a preset dimension to obtain corresponding dynamic characteristic data and static characteristic data;

the second processing sub-module is used for carrying out coding processing on the category characteristics contained in the static characteristic data to obtain corresponding numerical value data;

and the third processing sub-module is used for inputting the dynamic characteristic data, the numerical value data, the first risk score, the voiceprint characteristic and the second risk score into the risk assessment model for score calculation processing to obtain a target risk score corresponding to the insurance claim case.

In some optional implementations of the present embodiment, the generating module 308 includes:

the judging sub-module is used for judging whether the target risk score is smaller than a preset risk threshold value or not;

the first generation sub-module is used for generating a first risk assessment result that the insurance claim case is a non-risk case if the risk threshold value is smaller than the risk threshold value;

and the second generation sub-module is used for generating a second risk assessment result of which the insurance claim case is a risk case if the risk threshold is not smaller than the risk threshold.

In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 4, fig. 4 is a basic structural block diagram of a computer device according to the present embodiment.

The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. In which only the computer device 4 having components 41-43 is shown, it should be understood that not all of the illustrated components are required to be implemented, and that more or fewer components may be implemented instead. In addition, as will be appreciated by those skilled in the art, a computer device herein is a device capable of automatically performing numerical calculations and/or information processing in accordance with predetermined or stored instructions, the hardware of which includes, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (Field-Programmable Gate Array, FPGAs), digital processors (Digital Signal Processor, DSPs), embedded devices, and the like.

The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.

The memory 41 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the computer device 4. Of course, the memory 41 may also comprise both an internal memory unit of the computer device 4 and an external memory device. In this embodiment, the memory 41 is typically used to store an operating system and various application software installed on the computer device 4, such as computer readable instructions of an artificial intelligence-based risk assessment method. Further, the memory 41 may be used to temporarily store various types of data that have been output or are to be output.

The processor 42 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or process data, such as executing computer readable instructions of the artificial intelligence based risk assessment method.

The network interface 43 may comprise a wireless network interface or a wired network interface, the network interface 43 being used to establish a communication connection between the computer device 4 and other electronic devices.

The present application also provides another embodiment, namely, a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of an artificial intelligence-based risk assessment method as described above.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.

It is apparent that the embodiments described above are only some embodiments of the present application, but not all embodiments, the preferred embodiments of the present application are given in the drawings, but not limiting the patent scope of the present application. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a more thorough understanding of the present disclosure. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing, or equivalents may be substituted for elements thereof. All equivalent structures made by the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the protection scope of the application.

Claims

1. A risk assessment method based on artificial intelligence, comprising the steps of:

acquiring case data of an insurance claim case to be processed;

2. The risk assessment method based on artificial intelligence according to claim 1, wherein the step of extracting text information from the target document data to obtain corresponding first text information specifically includes:

acquiring a text block in the feature map;

3. The risk assessment method according to claim 1, wherein before the step of performing risk scoring processing on the first text information and the second text information based on the preset target text classification model to obtain a corresponding first risk score, the risk assessment method further comprises:

collecting historical sample text data;

constructing a word vector matrix corresponding to the word vector;

4. The artificial intelligence based risk assessment method according to claim 3, wherein before the step of inputting the word vector matrix into a preset initial text classification model for training to obtain a trained first initial text classification model, the method further comprises:

acquiring a plurality of text classification models;

5. The risk assessment method according to claim 1, wherein before the step of performing risk scoring processing on the target picture data based on the preset picture classification model to obtain a corresponding second risk score, the risk assessment method further comprises:

acquiring historical case data;

and taking the target pre-training model as the picture classification model.

6. The artificial intelligence based risk assessment method according to claim 1, wherein the step of performing risk assessment processing on the target structured data, the first risk score, the target voiceprint feature, and the second risk score based on a preset risk assessment model to obtain a target risk score corresponding to the insurance claim case comprises:

7. The artificial intelligence based risk assessment method according to claim 1, wherein the step of generating the risk assessment result of the insurance claim case based on the target risk score specifically comprises:

8. An artificial intelligence based risk assessment device, comprising:

9. A computer device comprising a memory having stored therein computer readable instructions which when executed implement the steps of the artificial intelligence based risk assessment method of any of claims 1 to 7.

10. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the artificial intelligence based risk assessment method according to any of claims 1 to 7.