CN116561183B

CN116561183B - Intelligent information retrieval system for massive medical insurance data

Info

Publication number: CN116561183B
Application number: CN202310833085.0A
Authority: CN
Inventors: 刘利锋
Original assignee: Beijing Universal Medical Rescue Co ltd
Current assignee: Beijing Universal Medical Rescue Co ltd
Priority date: 2023-07-10
Filing date: 2023-07-10
Publication date: 2023-09-19
Anticipated expiration: 2043-07-10
Also published as: CN116561183A

Abstract

The invention relates to the technical field of electronic digital data processing, in particular to an intelligent information retrieval system for massive medical insurance data, which comprises the following components: and obtaining the risk rate and the retrieval probability corresponding to the insurance data according to the relation between the corresponding quantity of the insurance data under different ages and different sexes when different past cases exist in the insurance data, and encoding and compressing the insurance data according to the retrieval probability. According to the invention, by linking the relation of different features in the insurance data, the stability and the accuracy of the risk rate evaluation result of the logistic regression model on the insurance data are improved, the insurance data are encoded and compressed according to the retrieval probability, the problem that the encoding of the insurance data with high retrieval probability is overlong is avoided, and the efficiency and the speed in the data retrieval process are greatly improved.

Description

Intelligent information retrieval system for massive medical insurance data

Technical Field

The invention relates to the technical field of electronic digital data processing, in particular to an intelligent information retrieval system for massive medical insurance data.

Background

With the development of society and the increase of population aging trend, the medical insurance industry plays an increasingly important role, the core of medical insurance is data, and the processing and management of medical insurance data is a very important task for insurance companies. However, the conventional data management method has difficulty in meeting the requirements of efficient processing and retrieval of huge insurance data, and the massive insurance data processing requires a great deal of time and resources, so that the problems of data redundancy, repetition and the like are easy to occur, and the data retrieval efficiency is low. Therefore, a new intelligent mass medical insurance data information retrieval system is needed, and the medical insurance data can be analyzed and encoded by using a machine learning algorithm, so that the compression and the structural encoding of the data are realized, and meanwhile, the data retrieval efficiency and accuracy are improved, so that the requirements of the medical insurance industry are met.

Currently, when retrieving medical insurance information, the existing character matching technology is adopted, however, the method has the following defects: 1. storage space is wasted. In a conventional relational database, each piece of data needs to store the values of the respective attributes, and a large amount of redundant data exists. 2. The search efficiency is low. In the case of huge data volume, the efficiency problems caused by the traditional character string matching and fuzzy query will become more and more obvious.

Disclosure of Invention

The invention provides an intelligent information retrieval system for massive medical insurance data, which aims to solve the existing problems.

The invention relates to an intelligent information retrieval system for massive medical insurance data, which adopts the following technical scheme:

the invention provides an intelligent information retrieval system for massive medical insurance data, which comprises the following modules:

a data preparation module: acquiring insurance data in a medical insurance information database to obtain a first data set and a second data set;

and a data dividing module: the method comprises the steps of dividing a first data set to obtain a training set and a verification set;

probability analysis module: the method comprises the steps of acquiring a plurality of past cases in a first data set, and acquiring correlation factors between the past cases and ages according to the number of people with different ages under any past case; acquiring the connection parameters between the past cases and the ages by combining the correlation factors; further combining the contact parameters to obtain characteristic parameters of the previous case; acquiring the risk rate of the insurance data in the second data set according to the characteristic parameters, and acquiring the retrieval probability of the insurance data by combining the risk rate;

and a data storage module: and according to the size of the retrieval probability, primary coding data is obtained, and the primary coding data is subjected to coding compression storage, so that the quick retrieval of the insurance data is further realized.

Further, the first data set and the second data set are acquired by the following steps:

recording a set formed by all insurance data in the medical insurance database as an insurance data set;

recording a set formed by insurance data corresponding to the medical insurance information paid for in the insurance data set as a first data set;

and recording a set formed by all data corresponding to the medical insurance being used by the applicant in the insurance data set as a second data set.

Further, the training set and the verification set are obtained by the following steps:

firstly, clustering all insurance data in a first data set by using a K-means++ algorithm according to the ages and sexes of corresponding insurance applicators in the insurance data and the distances among three dimensions of the previous cases to obtain a plurality of cluster clusters;

then, scrambling all cluster clusters by using a random huffling algorithm;

finally, each cluster is divided according to a preset proportion to respectively obtain a training set and a verification set.

Further, the probability analysis module comprises the following units:

a multi-data set unit: extracting case names of past cases of the applicant corresponding to different insurance data in the training set, obtaining a set formed by all the past cases, and marking the set as a multi-element data set;

contact parameter unit: the method is used for obtaining the correlation factor between the past cases and the ages according to the difference between the number of people at different ages in the past cases and the average value of the number of people in all the past cases; obtaining the odds of the past cases at different ages, and obtaining the contact parameters between the past cases and the ages by combining the differences among the number of the past cases at different ages and the correlation factors;

characteristic parameter unit: obtaining the odds of the past cases according to the odds and the association parameters;

risk rate unit: taking the characteristic parameters of all past cases in the training set as independent variables, training a logistic regression model, and optimizing the trained logistic regression model by utilizing the characteristic parameters of all past cases in the verification set to obtain a logistic regression model for risk rate assessment of insurance data; acquiring characteristic parameters of all past cases in the second data set, and outputting risk rates of corresponding insurance data of all the past cases as input of a logistic regression model;

search probability unit: and acquiring the retrieval time of the insurance data and the update time of the medical insurance information database, and acquiring the retrieval probability of the insurance data in the second data set by combining the risk rate.

Further, the correlation factor is obtained by the following steps:

wherein ,indicate->A factor related between past cases and age; />Representing the training set age asHas +.>The total number of previous cases, wherein ∈>，/>The age interval of the applicant corresponding to all insurance data in the training set; />Representing the maximum age of the insurance data in the training set corresponding to the applicant; />Mean value of the number of people in all past cases; />Representing a hyperbolic tangent function.

Further, the contact parameter obtaining method includes the following steps:

firstly, acquiring the number of persons who pay after any past case exists under each age and different property in a first data set;

then, the first data is concentrated in the age intervalAnd age interval->In (1) there is->The total number of patients who pay for the past cases and the total number of patients who have +.>The ratio between the total number of past cases is recorded as the odds;

finally, the specific acquisition method of the contact parameters comprises the following steps:

wherein ,indicate->The connection parameters between the past cases and the ages; />Representing the training set age asHas +.>The total number of past cases; />Representing the maximum age of the insurance data in the training set corresponding to the applicant; />Indicate->The former case is->Probability of reimbursement within an age interval; />Indicate->The former case is->Probability of reimbursement for an age interval; />Indicate->Correlation factors between past cases and age.

Further, the characteristic parameters are obtained by the following steps:

firstly, respectively acquiring the number of reimbursements of men and women when any past case exists in a first data set, and respectively marking the ratio of the number of reimbursements to all the number of reimbursements in the first data set as male reimbursement probability and female reimbursement probability;

and then, the product result of the 1 plus male odds or female odds and the contact parameters is recorded as the characteristic parameters of the previous case.

Further, the retrieval probability is obtained by the following steps:

wherein ,representing the +.>The retrieval probability of the individual insurance data; />Representing the second data setRisk rate of individual insurance data; />Representing the +.>The insurance data is at the +.>Time of the secondary search; />Representing the +.>Time when the individual insurance data was last retrieved, +.>Indicating the last update time of the medical insurance information database,/->Representing natural constants.

Further, according to the size of the retrieval probability, primary encoded data is obtained, the primary encoded data is encoded, compressed and stored, and the quick retrieval of insurance data is further realized, and the method comprises the following specific steps:

firstly, carrying out linear normalization processing on the retrieval probabilities of all insurance data in a second data set to obtain normalized retrieval probabilities, and presetting a retrieval probability threshold according to experience;

then, recording insurance data with normalized retrieval probability larger than a retrieval probability threshold value as first encoded data; recording insurance data with normalized retrieval probability smaller than a retrieval probability threshold as non-primary coding data; acquiring repeated characters in all primary coding data by utilizing a character statistics method, and performing short code length coding in variable length coding on the primary coding data and the repeated characters; the non-repeated characters and the non-primary coded data are coded by using long codes in variable length codes to obtain coded compressed data corresponding to all insurance data;

and finally, storing all the coded compressed data in a medical insurance information database to realize quick retrieval of insurance data.

The technical scheme of the invention has the beneficial effects that:

(1) Compared with a machine learning algorithm with single characteristics, the method and the device for combining the characteristics of the safety data have the advantages that the characteristics are combined through the relation between different characteristics in the safety data, so that a result obtained when the risk rate of the safety data is analyzed by machine learning is more stable, the risk rate evaluation of the safety data is more accurate, the noise interference resistance is stronger, and the stability degree of abnormal data is higher.

(2) And obtaining the retrieval probability of the insurance data by using the risk rates of different insurance data and the frequency statistics results of the retrieved insurance data, and performing variable length coding of different coding lengths according to the size of the retrieval probability, so that the insurance data with higher retrieval probability has small enough data volume after coding compression, and is retrieved more quickly when being retrieved, thereby improving the retrieval efficiency.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a block flow diagram of an intelligent information retrieval system for massive medical insurance data according to the present invention;

fig. 2 is a schematic diagram of a module refinement structure of the probability analysis module.

Detailed Description

In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following detailed description refers to the specific implementation, structure, characteristics and effects of an intelligent information retrieval system for mass medical insurance data according to the invention by combining the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The following specifically describes a specific scheme of the intelligent information retrieval system for massive medical insurance data provided by the invention with reference to the accompanying drawings.

Referring to fig. 1, a block flow diagram of a system for intelligent information retrieval of mass medical insurance data according to an embodiment of the present invention is shown, the system includes the following blocks:

a data preparation module: the method is used for acquiring insurance data in the medical insurance information database and obtaining a first data set and a second data set.

In this embodiment, when retrieving the medical insurance information, the risk ratio is calculated according to the characteristics of the medical insurance so as to facilitate the encoding and compression of the insurance data better later, so that the insurance data and the corresponding data set need to be acquired first:

acquiring all insurance data in a medical insurance database, wherein any one insurance data comprises personal information of an applicant and corresponding insurance information, the insurance data comprises age, sex and past cases of the applicant, and a set formed by all insurance data is recorded as an insurance data set;

dividing the insurance data set into two parts, namely a first data set and a second data set, wherein the specific dividing method comprises the following steps:

recording a set formed by insurance data corresponding to medical insurance information which has been paid for in the insurance data set as a first data set, wherein the first data set contains M pieces of insurance data;

the set formed by all the data corresponding to the medical insurance used by the applicant in the insurance data set is recorded as a second data set, and the second data set comprisesAnd insurance data.

In addition, the number of persons who pay for any past case exists at each age and different grade in the first data set is acquired.

So far, insurance data sets corresponding to all insurance data are obtained.

And a data dividing module: for partitioning the first data set to obtain a training set and a validation set for training the machine learning model.

When insurance data in the medical insurance information database is retrieved, there are typically the following reasons:

(1) The insurance data which is searched for many times recently shows that the requirement degree of consulting the insurance data in the latest time is higher;

(2) The risk of medical insurance is relatively high, and since claims may be required, the probability of retrieving insurance data with a high risk is high.

Therefore, the embodiment analyzes the risk rate of the insurance data of different insurance applicant by machine learning, and obtains the probability of the insurance data being searched according to the risk rate of the insurance data and the search times of the recently corresponding insurance data, and marks the probability as the search probability; and then, according to the similarity relation between the insurance data with higher retrieval probability and the difference between the insurance data corresponding to the lower retrieval probability, the insurance data is subjected to variable length data coding, so that the insurance data with higher retrieval probability is in a shorter code length state as much as possible, and the corresponding insurance data can be quickly retrieved when the data is retrieved, thereby improving the retrieval efficiency.

In addition, when the risk rate analysis is performed on insurance data of an applicant by using machine learning, a fitting phenomenon often appears, namely the generalization capability of the machine learning is insufficient, and a corresponding risk rate cannot be accurately analyzed on new insurance data by the machine learning, so that the subsequent retrieval probability is judged to be wrong, the encoding rule of variable length encoding is further changed, the code length allocated to the insurance data with larger original retrieval probability is longer, and the time spent in retrieval is longer.

The risk rate refers to a risk of disease occurrence of the applicant or a risk that the insurance company needs to pay for the applicant.

Therefore, the insurance data set needs to be divided into a training set and a verification set, the verification set is utilized to conduct supervised learning of the training set, the generalization capability of machine learning is improved, and the specific dividing method is as follows:

firstly, features of insurance data are extracted, and since the risk rate analysis is performed on the insurance data by machine learning in this embodiment, feature extraction related to the risk rate needs to be performed on the insurance data, and factors affecting the risk rate of medical insurance are more common: the invention uses the age, sex and past cases of the applicant as the characteristics of the corresponding insurance data.

And then, clustering all insurance data in the first data set by using a K-means++ algorithm according to the ages and sexes of corresponding insurance applicators in the insurance data and the distances of three dimensions of the previous cases to obtain a plurality of clusters.

Finally, the present embodiment employs conventionalThe dividing ratio of (a) is that the quantity ratio of the training set to the insurance data in the verification set is +>Memory training set->Contains insurance data->Personal, verification set->Contains insurance data->And if it is, thenThe specific dividing mode of the training set and the verification set is as follows: scrambling all clusters by using a random huffling algorithm, and then, each cluster is in accordance with +.>Dividing the ratio of (2) to obtain training sets +.>And verification set->。

So far, the training set is obtained by dividing the first data set in the acquired insurance dataAnd verification set->。

Probability analysis module: the method is used for training and verifying machine learning, acquiring the risk rate of insurance data and further acquiring the retrieval probability of the insurance data.

Specifically, as shown in fig. 2, a schematic diagram of a module refinement structure of the probability analysis module includes: a multi-data set unit, a contact parameter unit, a characteristic parameter unit, a risk rate unit and a retrieval probability unit.

A multi-data set unit: in order to make the detection process faster when retrieving the insurance data, the embodiment performs the relationship analysis based on the medical insurance basic characteristics according to all the insurance data in the training set, evaluates the risk rate of the insurance data in combination with the machine learning model, and then obtains the retrieval probability of different insurance data by using the risk rate.

When selecting the machine learning model, the embodiment selects the logistic regression model to evaluate the risk rate of the insurance data because the risk rate of the medical insurance essentially belongs to the binary problem; when the existing logistic regression model is used for risk assessment of insurance data, single low-level features are usually used for assessment, and the assessment result is not accurate enough, so that the risk rate assessment result is more accurate by using the multi-feature fusion method for assessment in the embodiment.

In the characteristics of medical insurance, the past case is a multiple parameter and is a direct influence factor of the risk rate of insurance data, so that the embodiment constructs the connection parameters among different past cases, ages and sexes as independent variables to establish a logistic regression model.

Extracting the case names of the past cases of the applicant corresponding to different insurance data in the training set to obtain a past case multi-element data set：

wherein ,representing the->Case name of previous case ++>, wherein />The total number of case names of all past cases in the training set is represented.

Contact parameter unit: and the relationship between the insurance data is analyzed, and the contact parameters are obtained.

Firstly, according to the relation between each past case and different ages, the related factors between the past cases and the ages in the first data set are obtained, and the specific obtaining method is as follows:

wherein ,indicate->A factor related between past cases and age; />Representing the training set age asHas +.>The total number of previous cases, wherein ∈>，/>The age interval of the applicant corresponding to all insurance data in the training set; />Representing the maximum age of the insurance data in the training set corresponding to the applicant; />Mean of the number of people in all past cases; />Representing a hyperbolic tangent function;

then, the first data is concentrated in the age intervalAnd age interval->In (1) there is->The total number of patients who pay for the past cases and the total number of patients who have +.>The ratio between the total number of past cases is recorded as the odds; the method for acquiring the connection parameters between the past cases and the ages by combining the correlation factors comprises the following steps:

wherein ,indicate->The connection parameters between the past cases and the ages; />Representing the training set age asHas +.>Headcount of past casesWherein->，/>The age interval of the applicant corresponding to all insurance data in the training set; />Representing the maximum age of the insurance data in the training set corresponding to the applicant; />Indicate->The former case is->Probability of reimbursement within an age interval; />Indicate->The former case is->Probability of reimbursement for an age interval; />Indicate->Correlation factors between past cases and age.

Acquisition of the firstWhen the relation parameters between the previous cases and the ages are the same, three logical relations are introduced in the embodiment: "age is irrelevant to previous cases", "the incidence of previous cases is linked with a smaller age relative to an older age" and "the previous cases are linked with a larger age relative to a smaller ageIncidence relation of ";

wherein the logical relationship 'age is irrelevant to the past cases', is obtained by the correlation factorRepresentation of age and +.>One constraint of the incidence rate of the previous cases is calculated by normalizing the variance of the number of the occurrence of each previous case in different age stages, and the smaller the variance is, the more common the incidence rate of the corresponding previous case in each age stage is, so that the correlation between the previous case and the age is not great;

the larger the variance, the greater the incidence of the past cases at a certain age group, i.eThe correlation between the past cases and the ages is strong;

then utilizeTwo different values of (a) to represent +.>The relation between the "incidence relation of previous cases with a smaller age relative to a larger age" and the "incidence relation of previous cases with a larger age relative to a smaller age";

taking the first logical relationship as an example, if the incidence of the past case exists at a lower age, the older the incidence of the population is compared with the higher age, i.eThe larger the correlation factor is, the stronger the relation between the corresponding past case and the age is, then the probability of reimbursement is used for multiplying the relation to obtain the relation parameters of the past case and the age, the larger the relation parameters are, the more the first part is in the age stage>In the case of the past cases, the greater the probability of paying a claim, the greater the probability of searching the database.

In the subsequent machine learning training and verification, the corresponding logical relationship is selected according to the age of the applicant corresponding to each insurance data.

Characteristic parameter unit: and obtaining characteristic parameters by combining the contact parameters, and training the logistic regression model by combining the characteristic parameters to obtain the logistic regression model corresponding to the odds ratio of the insurance data.

then, sex connection is carried out by utilizing the connection parameters of the age and the previous case to obtain the characteristic parameters of the previous case, and the specific obtaining method comprises the following steps:

wherein ,indicate->Characteristic parameters of the previous cases; />Indicate->The connection parameters between the past cases and the ages; />Indicate->Male odds of past cases; />Indicate->Probability of female reimbursement for past cases.

The previous cases with different sexes have the firstThe greater the probability of reimbursement for an existing case, the greater the likelihood that it will be retrieved.

For example: hypertension is a common disorder, but the likelihood of making an insurance claim is different for different sexes of different ages, i.e., the higher the risk, the higher the likelihood of making the claim, the higher the probability that the corresponding insurance data will be retrieved.

Finally, taking the characteristic parameters of all past cases in the training set as independent variables, training a logistic regression model, and optimizing the trained logistic regression model by utilizing the characteristic parameters of all past cases in the verification set to obtain the logistic regression model for risk rate assessment of insurance data；

It should be noted that, training and optimization of the logistic regression model are performed in the prior art, and are not repeated in this embodiment.

Risk rate unit: acquiring characteristic parameters of all past cases in the second data set and taking the characteristic parameters as a logistic regression modelThe input and output of (1) are the corresponding risk rates of the prior cases, and the corresponding risk rate is taken as the risk rate of the insurance data with the corresponding prior case, and is recorded as +.>Representing the +.>Risk rate of individual insurance data.

Search probability unit: for obtaining a retrieval probability of the insurance data in the second data set.

Acquiring the corresponding time when each insurance data in the medical insurance database is searched and the corresponding time when the medical insurance database is updated; the retrieval probability of the insurance data in the second data set is obtained by combining the risk rate of the insurance data, and the specific obtaining method is as follows:

The search probability is obtained by the interaction of two parts:

(1) The first part is the first part in the second data setThe risk rate of the insurance data after being evaluated by the machine learning model is larger, the greater the risk rate is, the greater the possibility that the insurance data needs to be paid for reimbursement is, the greater the possibility that the insurance data is searched is, namely the greater the searching probability is;

(2) The second part is the second data setThe more frequently the insurance data is retrieved, the closer the insurance data is retrieved to the last medical insurance information database update time, indicating a greater likelihood that the insurance data is retrieved again.

So far, the retrieval probability of the insurance data in the second data set is obtained.

And a data storage module: the intelligent information retrieval method is used for intelligently encoding the insurance data, storing the encoded compressed data and further realizing intelligent information retrieval of the insurance data.

Classifying the insurance data according to the retrieval probability, and performing variable length coding on the insurance data corresponding to the higher retrieval probability, wherein the specific method comprises the following steps:

firstly, carrying out linear normalization processing on the retrieval probabilities of all insurance data in a second data set to obtain normalized retrieval probabilities, and presetting a retrieval probability threshold according to experienceExperience value->, wherein />Representing the amount of insurance data in the second data set;

finally, storing all the coded compressed data in a medical insurance information database, so that the staff can conveniently search insurance data;

it should be noted that, the character statistics method and the variable length coding are both the prior art, and this embodiment is not repeated.

All insurance data are encoded and compressed by combining with the retrieval probability, and because the insurance data with higher retrieval probability are encoded in a variable length mode by utilizing the occurrence frequency of the data, the encoding length of the insurance data with higher retrieval probability is shortened after encoding and compression, the whole data size is smaller, and retrieval can be completed only by sentence matching in a shorter time when retrieval is carried out.

The following examples were usedThe model is used only to represent the negative correlation and the result of the constraint model output is at +.>In the section, other models with the same purpose can be replaced in the specific implementation, and the embodiment only uses/>The model is described as an example, without specific limitation, wherein +.>Refers to the input of the model.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims

1. An intelligent information retrieval system for massive medical insurance data is characterized by comprising the following modules:

and a data storage module: according to the size of the retrieval probability, primary coding data are obtained, the primary coding data are subjected to coding compression storage, and rapid retrieval of insurance data is further realized;

the probability analysis module comprises the following units:

2. The intelligent information retrieval system of mass medical insurance data according to claim 1, wherein the first data set and the second data set are obtained by the following steps:

3. The intelligent information retrieval system of massive medical insurance data according to claim 1, wherein the training set and the verification set are obtained by the following steps:

then, scrambling all cluster clusters by using a random huffling algorithm;

4. The intelligent information retrieval system for massive medical insurance data according to claim 1, wherein the correlation factor is obtained by the following method:

wherein ,indicate->A factor related between past cases and age; />Representing the training set age size +.>Has +.>The total number of previous cases, wherein ∈>，/>The age interval of the applicant corresponding to all insurance data in the training set; />Representing the maximum age of the insurance data in the training set corresponding to the applicant; />Mean value of the number of people in all past cases; />Representing a hyperbolic tangent function.

5. The intelligent information retrieval system for massive medical insurance data according to claim 1, wherein the contact parameters are obtained by the following steps:

wherein ,indicate->The connection parameters between the past cases and the ages; />Representing the training set age size +.>Has +.>The total number of past cases; />Representing the maximum age of the insurance data in the training set corresponding to the applicant;indicate->The former case is->Probability of reimbursement within an age interval; />Indicate->The former case is->Age intervalThe odds of (2); />Indicate->Correlation factors between past cases and age.

6. The intelligent information retrieval system for massive medical insurance data according to claim 1, wherein the characteristic parameters are obtained by the following steps:

7. The intelligent information retrieval system for massive medical insurance data according to claim 1, wherein the retrieval probability is obtained by the following steps:

wherein ,representing the +.>The retrieval probability of the individual insurance data; />Representing the +.>Risk rate of individual insurance data; />Representing the +.>The insurance data is at the +.>Time of the secondary search; />Representing the +.>Time when the individual insurance data was last retrieved, +.>Indicating the last update time of the medical insurance information database,/->Representing natural constants.

8. The intelligent information retrieval system of massive medical insurance data according to claim 1, wherein the primary coding data is obtained according to the size of the retrieval probability, the primary coding data is coded, compressed and stored, and further the quick retrieval of the insurance data is realized, and the method comprises the following specific steps: