WO2020177365A1

WO2020177365A1 - Data mining-based social insurance data processing method and apparatus, and computer device

Info

Publication number: WO2020177365A1
Application number: PCT/CN2019/116126
Authority: WO
Inventors: 陈娴娴; 阮晓雯; 徐亮
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-03-07
Filing date: 2019-11-07
Publication date: 2020-09-10
Also published as: CN110008250A; CN110008250B

Abstract

A data mining-based social insurance data processing method, comprising: receiving a resource obtaining request sent by a terminal, wherein the resource obtaining request comprises a request type and request information; obtaining multiple pieces of social insurance data according to the resource obtaining request and the request information, wherein the social insurance data comprises multiple pieces of field data; inputting the social insurance data into a vector training model, and vectorizing the multiple pieces of field data corresponding to the social insurance data by means of the vector training model to output feature vectors corresponding to the multiple pieces of field data; extracting dimension values of the multiple feature vectors, calculating the similarities between the multiple feature vectors by a preset algorithm according to the dimension values, and extracting feature vectors having a similarity reaching a preset threshold; obtaining a data analysis model according to the request type, and analyzing the extracted feature vectors by means of the data analysis model to obtain multiple types of index data and corresponding values; and generating analysis result data on the basis of the multiple types of index data and the corresponding values, and pushing the analysis result data to the terminal.

Description

Social security data processing method, device and computer equipment based on data mining

Cross-references to related applications:

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on March 7, 2019. The application number is 2019101716064, and the application title is "Social Security Data Processing Methods, Devices and Computer Equipment Based on Data Mining". Incorporated in this application by reference.

Technical field

This application relates to a social security data processing method, device and computer equipment based on data mining.

Background technique

With the rapid economic development, social insurance has become an important part of the people's livelihood economy. With the continuous development of computer technology, various business processes such as registration of social insurance personnel, collection of social insurance funds, and payment of social insurance funds have all been networked and informatized, and the social insurance business system has also accumulated a large amount of social insurance data.

However, most of the existing methods of mining social security data are only querying social security data and simple data processing, and there is no deeper analysis and mining of these large amounts of social security data. And a large amount of social security data has a large amount of data, and the information is complicated and redundant. When mining and analyzing a large amount of social security data, a large amount of social security data is prone to insufficient mining depth and chaotic processes, which leads to the efficiency of data mining. And the accuracy rate is low.

Summary of the invention

According to various embodiments disclosed in the present application, a social security data processing method, device and computer equipment based on data mining.

A social security data processing method based on data mining includes:

Receiving a resource acquisition request sent by the terminal, where the resource acquisition request includes a request type and request information;

Acquiring multiple social security data according to the resource acquisition request and request information, the social security data including multiple field data;

Inputting the social security data into a vector training model, performing vector processing on multiple field data corresponding to the social security data through the vector training model, and outputting feature vectors corresponding to the multiple field data;

Extracting the dimension values of the multiple feature vectors, calculating the similarity between the multiple feature vectors by using a preset algorithm according to the dimension value, and extracting the feature vectors whose similarity reaches a preset threshold;

Obtain a preset data analysis model according to the request type, analyze the extracted feature vector through the data analysis model, and obtain multiple types of index data and corresponding values; and

The analysis result data is generated according to the multiple types of index data and corresponding values, and the analysis result data is pushed to the terminal.

A social security data processing device based on data mining includes:

The request receiving module is configured to receive a resource acquisition request sent by the terminal, where the resource acquisition request includes the request type and request information;

A data acquisition module, configured to acquire multiple social security data according to the resource acquisition request and request information, the social security data including multiple field data;

The feature extraction module is used to input the social security data into a vector training model, perform vector processing on multiple field data corresponding to the social security data through the vector training model, and output feature vectors corresponding to the multiple field data; extract Calculating the dimensionality values of the multiple feature vectors using a preset algorithm according to the dimensionality value to calculate the similarity between the multiple feature vectors, and extracting feature vectors whose similarity reaches a preset threshold;

The data analysis module is used to obtain a preset data analysis model according to the request type, and analyze the extracted feature vectors through the data analysis model to obtain multiple types of index data and corresponding values; and

The data push module is configured to generate analysis result data according to the multiple types of index data and corresponding values, and push the analysis result data to the terminal.

A computer device, including a memory and one or more processors, the memory stores computer readable instructions, when the computer readable instructions are executed by the processor, the one or more processors execute The following steps:

One or more non-volatile computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:

The details of one or more embodiments of the application are set forth in the following drawings and description. Other features and advantages of this application will become apparent from the description, drawings and claims.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings needed in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.

Fig. 1 is an application scenario diagram of a social security data processing method based on data mining according to one or more embodiments;

Fig. 2 is a schematic flow chart of a social security data processing method based on data mining according to one or more embodiments.

FIG. 3 is a schematic flowchart of a step of vectorizing multiple field data corresponding to social insurance data according to one or more embodiments.

Fig. 4 is a schematic flow chart of the steps of analyzing the extracted feature vectors through the data analysis model according to one or more embodiments.

Fig. 5 is a block diagram of a social security data processing device based on data mining according to one or more embodiments.

Figure 6 is a block diagram of a computer device according to one or more embodiments.

detailed description

In order to make the technical solutions and advantages of the present application clearer, the following further describes the present application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the application, and not used to limit the application.

The social security data processing method based on data mining provided in this application can be applied to the application environment as shown in FIG. 1. The terminal 102 communicates with the server 104 through the network through the network. The terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server 104 may be implemented by an independent server or a server cluster composed of multiple servers. The terminal 102 may send a resource acquisition request to the server, and the resource acquisition request includes the request type and request information. After the server 104 receives the resource acquisition request sent by the terminal, it acquires multiple social security data according to the resource acquisition request and the request information carried, and the social security data includes multiple field data. The server 104 further vectorizes the multiple field data corresponding to the social security data to obtain feature vectors corresponding to the multiple field data. The server 104 calculates the similarity between the multiple feature vectors according to the preset algorithm, and extracts the feature vectors whose similarity reaches the preset threshold. The server further obtains a preset data analysis model, analyzes the extracted feature vector through the data analysis model, obtains corresponding analysis result data, and pushes the analysis result data to the corresponding terminal 102. Through feature extraction and screening of a large amount of social security data, and the use of valuable feature vectors extracted by data analysis models for analysis, the valuable information in the social security data can be effectively mined, thereby effectively improving social security The efficiency and accuracy of data analysis.

In one of the embodiments, as shown in FIG. 2, a method for processing social insurance data based on data mining is provided. Taking the method applied to the server in FIG. 1 as an example for description, the method includes the following steps:

Step 202: Receive a resource acquisition request sent by a terminal, where the resource acquisition request includes the request type and request information.

The user can input the relevant field information through the corresponding terminal and send a data analysis request to the server. The resource acquisition request can be the result data obtained after the analysis of the social security data. The resource acquisition request carries the request type and request information, where the request type may be the type of the acquired resource data, such as social security analysis data. The request information may be field information input by the user, for example, it may be field information such as the range and time interval of social insurance data.

Step 204: Acquire multiple social security data according to the resource acquisition request and the request information, and the social security data includes multiple field data.

The social insurance data may be social insurance data, for example, it may include endowment insurance data, medical insurance data, unemployment insurance data, work injury insurance data, maternity insurance data, etc. After the server receives the resource acquisition request sent by the terminal, it acquires multiple social security data from the local database or the third-party database according to the resource acquisition request and request information. For example, when the scope of the social security data obtained in the request information is a certain company, the server obtains the social security data corresponding to the company. The social security data includes multiple field data, such as name, gender, age, region, affiliated company, payment duration, payment amount and other field information.

Step 206: Input the social security data into the vector training model, perform vector processing on the multiple field data corresponding to the social security data through the vector training model, and output feature vectors corresponding to the multiple field data.

After obtaining multiple social security data, the server vectorizes multiple field data corresponding to the social security data. Specifically, the server may obtain a preset corpus, and obtain associated corpus data from the corpus according to the social security data. The server further obtains a preset vector training model. For example, the vector training model may be a neural network model based on word2vec. The server inputs the social security data and the obtained associated corpus data into the vector training model, and then uses the vector training model to combine the associated corpus data to calculate and train the social security data to obtain multiple word vectors corresponding to the social security data , And convert the word vector into the corresponding feature vector according to the preset algorithm. In this way, feature vectors corresponding to multiple field data can be obtained.

Step 208: Extract the dimensional values of the multiple feature vectors, calculate the similarity between the multiple feature vectors according to the dimensional values using a preset algorithm, and extract the feature vectors whose similarity reaches the preset threshold.

After obtaining the feature vectors corresponding to the multiple field data, the server calculates the similarity between the multiple feature vectors according to a preset algorithm. Specifically, the server may first calculate multiple dimension values of multiple feature vectors according to a preset objective function, and the dimension values may be feature values of different dimensions corresponding to each feature vector. The server further follows the preset distance algorithm and the dimension value of the feature vector to calculate the similarity between the multiple feature vectors, and then extracts the feature vector whose similarity reaches the preset threshold.

Step 210: Obtain a preset data analysis model according to the request type, and analyze the extracted feature vector through the data analysis model to obtain multiple types of index data and corresponding values.

After the server extracts the feature vector, it further obtains the corresponding preset data analysis model according to the request type. The data analysis model can include multiple different types of data analysis modules, such as insurance payment rate, payment base analysis, business operation status, etc. Multiple types of indicator data analysis modules. Analyze the extracted feature vectors through the data analysis model.

Specifically, the server may first use the data analysis model to calculate the distribution value and field saturation of multiple feature vectors, where the distribution value may be the value of the field data corresponding to the feature vector, and the field saturation may be the feature vector corresponding to the field data. The degree of saturation of the values of multiple preset index data. The server further performs statistical screening on multiple feature vectors through the data analysis model, and extracts feature vectors that reach a preset saturation value. The server performs semantic analysis on the extracted feature vectors according to a preset semantic analysis algorithm, and obtains the weight of each feature vector, that is, the importance value of the feature vector. The server then analyzes the multiple feature vectors according to the distribution value, field saturation, and weight of the feature vector to obtain multiple types of index data and values corresponding to the feature vector.

Step 212: Generate analysis result data according to multiple types of index data and corresponding values, and push the analysis result data to the terminal.

After the server generates the analysis result data, it then generates the analysis result data according to multiple types of index data and corresponding values corresponding to each feature vector. Then push the analysis result data to the corresponding terminal. Further, the server can also generate view data in a preset format from the analysis result data, and push the generated view data to the corresponding terminal, so that the user can clearly understand the analysis result data.

For example, when the social security data obtained is social security data of a certain company or a certain area, mining and analyzing the multiple social security data obtained can effectively analyze the insurance payment rate, payment base analysis, and business operation status And other indicator data. Through feature extraction and screening of a large amount of social security data, and the use of valuable feature vectors extracted from data analysis models for analysis, it is possible to effectively mine and analyze the valuable information in social security data, thereby effectively Improve the analysis efficiency and accuracy of social security data.

In the aforementioned data mining-based social security data processing method, after the server receives the resource acquisition request sent by the terminal, it acquires multiple social security data according to the resource acquisition request and the carried request information, and the social security data includes multiple field data. The server then inputs the social security data into the vector training model, and vectorizes multiple field data corresponding to the social security data through the vector training model, and outputs feature vectors corresponding to the multiple field data. The dimension values of multiple feature vectors are extracted, and the similarity between the multiple feature vectors is calculated according to the dimension value using a preset algorithm, and the feature vectors whose similarity reaches the preset threshold are extracted. The server further obtains the preset data analysis model, analyzes the extracted feature vector through the data analysis model, obtains multiple types of indicator data and corresponding values, and generates analysis result data based on multiple types of indicator data and corresponding values. And push the analysis result data to the corresponding terminal. Through feature extraction and screening of a large amount of social security data, and the use of valuable feature vectors extracted by data analysis models for analysis, the valuable information in the social security data can be effectively mined, thereby effectively improving social security The efficiency and accuracy of data analysis.

In one of the embodiments, as shown in FIG. 3, the steps of performing vector processing on multiple field data corresponding to social insurance data through the vector training model specifically include the following:

Step 302: Obtain a preset corpus, and obtain associated corpus data from the corpus according to the social security data.

The terminal may send a resource acquisition request to the server, and the resource acquisition request carries the request type and request information. After receiving the resource acquisition request request sent by the terminal, the server acquires multiple corresponding social security data from the local database or a third-party database according to the resource acquisition request and the request information, and the social security data includes multiple field data.

After the server obtains multiple social security data, it then obtains a preset corpus. The corpus can be a pre-set corpus that includes a variety of words or sentences related to social insurance.

Step 304: Obtain a preset vector training model, and perform word vector calculation and training on the social security data and corpus data through the vector training model to obtain multiple corresponding word vectors.

Step 306: Convert multiple word vectors into corresponding feature vectors according to a preset algorithm.

The server further obtains a preset vector training model, and inputs social security data and corpus data into the vector training model. For example, the vector training model may be a neural network model based on word2vec. The vector training model is used to calculate and train social security data and expected data, and obtain word vectors corresponding to multiple social security data. For example, through word vector training, each word can be trained to obtain a vector in n-dimensional space. For example, when n takes 2 dimensions, the corresponding vector of "body" is [0.5365654,0.726268], and the corresponding "part" corresponds The word vector of may be [0.52222458,0.7511456]. The cos value of these two vectors is very close, and the distance corresponding to the semantic space is very close, which means that "identity" is a word. If n takes 100, each word is transformed into a 100-dimensional vector. The word vector model is used to vectorize the social security data, which can accurately and effectively extract the word vector in the social security data.

After the server extracts the word vector in the social security data, it further converts the word vector into a corresponding feature vector according to a preset algorithm. For example, a preset vector representation method can be used to convert a word vector into a corresponding feature vector. This can effectively extract the feature vector corresponding to the social security data.

In one of the embodiments, using a preset algorithm to calculate the similarity between multiple feature vectors according to the dimension value, and extracting the feature vector whose similarity reaches the preset threshold includes: calculating multiple features according to the preset objective function Multiple dimension values of the vector; calculate the similarity between multiple feature vectors according to the preset distance algorithm and dimension value; extract the feature vectors whose similarity reaches the preset threshold.

The server vectorizes the multiple field data corresponding to the social security data, thereby obtaining the feature vector corresponding to the multiple field data. The server further calculates the correlation between the multiple feature vectors according to a preset algorithm. Specifically, the server may calculate the multiple dimension values of multiple feature vectors according to a preset objective function, and calculate the similarity between the multiple feature vectors according to the preset distance algorithm and dimension value, and then extract the similarity to reach the expected value. Set the threshold feature vector. For example, the preset distance algorithm may be the Euclidean distance algorithm.

For example, the calculation formula of the Euclidean distance function can be as follows:

The expression of the objective function can be:

B _k =arg min(P(A _i ,B _i ))

The objective function is used to minimize the value of P(A _i , B _i ). Extract the values of the three dimensions of Max, Min, and Mean. Max is the maximum value of the same dimension of the extracted vector. For example, 0.5>0.2>0.1 (first dimension), 0.7>0.5>0.2 (second dimension), then Max Corresponds to [0.5,0.7], the same Min corresponds to [0.1,0.2], Mean mean corresponds to [0.8/3,1.4/3], and then connects these three vectors horizontally through 3* It is represented by an n-dimensional vector. For example, when n is 2, the body is represented as [0.5,0.2], the part is represented as [0.1,0.7], and the certificate is represented as [0.2,0.5]. Therefore, through the extraction of the three dimensions of Max, Min, and Mean, Max is the maximum value of the same dimension of the extracted vector, such as 0.5>0.2>0.1 (first dimension), 0.7>0.5>0.2 (second dimension), then Max corresponds to Is [0.5,0.7], the same Min corresponds to [0.1,0.2], mean mean corresponds to [0.8/3,1.4/3], and then these three vectors are connected horizontally, so the short text " “ID” can be represented by a 6-dimensional vector [0.5, 0.7, 0.1, 0.2, 0.8/3, 1.4/3]. Similarly, if the short text is "insurance statement", it can also be represented by a 6-dimensional vector. Therefore, no matter how long the short text is, it can be represented by a 3*n dimension vector. The similarity between the texts can be calculated by calculating the Euclidean distance of the vectors corresponding to the multiple dimensions of the multiple texts, and the text similarity results can be obtained.

After the server calculates the similarity between the multiple feature vectors, it further extracts the feature vectors whose similarity reaches a preset threshold. The similarity between multiple feature vectors is calculated through the preset objective function and distance algorithm, and then the feature vectors whose similarity reaches the preset threshold are extracted, which can effectively extract the feature extraction of social security data.

In one of the embodiments, before obtaining the preset data analysis model according to the request type, it further includes: obtaining a plurality of sample social insurance data, the sample social insurance data includes a plurality of field data; the sample social insurance data is vectorized to obtain multiple The feature vector corresponding to the field data; cluster multiple feature vectors, calculate the correlation between feature vectors and the weight of each feature vector according to the clustering results, and extract feature vectors that meet the conditional threshold; use the extracted features The vector and the corresponding weight construct a data analysis model according to a preset algorithm.

Before the server obtains the preset data analysis model, it also needs to construct a data analysis model. Specifically, the server can obtain a large amount of sample social security data in advance, and the server first performs vector processing on the sample social security data, so that feature vectors corresponding to multiple field data in the sample social security data can be obtained. After the server vectorizes the sample social security data, it performs feature extraction on the social security data. Specifically, the server can perform cluster analysis on multiple feature vectors through a preset clustering algorithm, calculate the correlation between feature vectors and the weight of each feature vector, and then extract feature vectors that meet the preset condition threshold. . The server then constructs a data analysis model according to a preset algorithm according to the extracted feature vector and the corresponding weight. Among them, the data analysis model may include multiple different types of data analysis modules, such as insurance premium payment rate, payment base analysis, business operation status and other types of indicator data analysis modules. Through the analysis and feature extraction of mainland social security data, and the use of the extracted valuable feature vectors to construct a data analysis model, the accuracy of the data analysis model can be effectively improved.

In one of the embodiments, as shown in FIG. 4, the step of analyzing the extracted feature vector through the data analysis model specifically includes the following content:

Step 402: Calculate the distribution values and field saturations of multiple feature vectors through the data analysis model.

Step 404: Perform feature field screening on multiple feature vectors, and extract feature vectors that reach a preset saturation value.

The server vectorizes the multiple field data corresponding to the social security data, thereby obtaining feature vectors corresponding to the multiple field data. The server further calculates the similarity between the multiple feature vectors according to the preset algorithm, and extracts the feature vectors whose similarity reaches the preset threshold.

The server performs feature extraction on the social security data, and after extracting the corresponding feature vector, it further obtains a preset data analysis model according to the request type in the resource acquisition request, and analyzes the extracted feature vector through the data analysis model. Specifically, after the server obtains the preset data analysis model, it inputs the feature vector corresponding to the extracted field data into the data analysis model, calculates the distribution value and field saturation of the field data through the data analysis model, and compares the field data Performing feature field screening, for example, may be performing statistical screening on field data to extract feature vectors that reach a preset saturation value. The distribution value may be the value of the field data corresponding to the feature vector.

For example, when a certain field is age, the distribution value of the field data can be the distribution of the number of people in each age group such as 10-20, 20-30, 30-40. Field saturation can be the saturation degree of the value of multiple preset index data corresponding to the feature vector and field data. For example, the input data may have some unsaturation. If some fields are empty, the field data of the field is saturated The degree is relatively low. Therefore, the server needs to perform statistical exploration on the feature vector corresponding to the field data to perform secondary field screening.

Step 406: Analyze the extracted feature vector according to the preset semantic analysis algorithm to obtain the weight of the feature vector.

Step 408: Perform analysis according to the distribution value of the feature vector and the field saturation and weight to obtain multiple types of index data and corresponding values corresponding to the feature vector.

Step 410: Generate analysis result data according to multiple types of index data and corresponding values.

The server performs statistical screening on multiple feature vectors, and after extracting the feature vectors that reach the preset saturation value, it further analyzes the extracted field data according to the preset semantic analysis algorithm to obtain the weight corresponding to the field data, that is, the degree of importance value.

The server analyzes the distribution value, field saturation, and importance value of the field data to obtain multiple types of indicator data and corresponding values, and generates corresponding analysis result data based on the multiple types of indicator data and corresponding values. Analyze the extracted field data through the data analysis model, thereby effectively analyzing the analysis result data corresponding to the social insurance data.

For example, semantic analysis may be based on the matching relationship between the fields input by the user and the real fields, and the requested information includes the fields input by the user. For example, fields based on thousands of dimensions of social security big data, including desensitized ID number, height, weight, social security desensitized account number, social security attributes, etc., and users may only be interested in a few specific fields. Therefore, the user only needs to enter the field of interest, and the server analyzes the feature vector corresponding to the extracted social insurance data to analyze the field information related to the field of interest entered by the user in the data set, and calculates the feature vector corresponding The weight of, and then get the associated field information. If the user enters a relatively vague field of interest, such as "Payment", the "Payment" contains information such as the number of annual claims, the amount of compensation, and the reason for the compensation.

The data analysis model can include multiple different types of data analysis modules, such as insurance payment rate, payment base analysis, business operation status and other types of indicator data analysis modules. The server then analyzes the multiple feature vectors according to the distribution value, field saturation, and weight of the feature vector to obtain multiple types of index data and values corresponding to the feature vector. The server further generates analysis result data according to multiple types of index data and corresponding numerical values corresponding to each feature vector. After the server generates the analysis result data, it pushes the analysis result data to the corresponding terminal. Through feature extraction and screening of a large amount of social security data, and the use of valuable feature vectors extracted by data analysis models for analysis, it is possible to effectively mine and analyze valuable information in social security data, thereby effectively Improve the analysis efficiency and accuracy of social security data.

In one of the embodiments, the analysis result data includes multiple types of index data and corresponding values, and the method further includes: generating corresponding index analysis data according to the index data and the corresponding values; and placing the index analysis data in a preset manner Generate corresponding analysis view data; add event type identification and corresponding interface call parameters to the analysis view data; push the analysis view data to the terminal.

After receiving the resource acquisition request request sent by the terminal, the server acquires multiple corresponding social security data from the local database or a third-party database according to the resource acquisition request and the request information, and the social security data includes multiple field data. The server vectorizes the multiple field data corresponding to the social security data, thereby obtaining feature vectors corresponding to the multiple field data. The server further calculates the similarity between the multiple feature vectors according to the preset algorithm, and extracts the feature vectors whose similarity reaches the preset threshold.

The server performs feature extraction on the social security data, and after extracting the corresponding feature vector, it further obtains a preset data analysis model according to the request type in the resource acquisition request, and analyzes the extracted feature vector through the data analysis model. The data analysis model can include multiple different types of data analysis modules, such as insurance payment rate, payment base analysis, business operation status and other types of indicator data analysis modules. The server then analyzes the multiple feature vectors according to the distribution value, field saturation, and weight of the feature vector to obtain multiple types of index data and values corresponding to the feature vector. The server further generates analysis result data according to multiple types of index data and corresponding numerical values corresponding to each feature vector.

After the server obtains the corresponding analysis result data by mining and analyzing the social security data, the analysis result data includes multiple types of index data and corresponding values. The server may further generate index analysis data corresponding to multiple index types from the analysis result data according to the index data type. The server may also generate corresponding visual analysis view data according to a preset method of module data of multiple indicator types. Specifically, the server can obtain a preset integration function according to the request type, integrate corresponding view resource data through the integration function according to multiple preset timing parameters and corresponding predicted values in the analysis result data, and add event types to the view resource data Identification and corresponding interface call parameters. For example, the preset integration function can be a python visualization function, and visualization functions such as histogram visualization function, distribution density, heat map, etc. can be used to embed and integrate corresponding view data, and the corresponding visualization image can be drawn through nested functions.

After the server integrates the corresponding analysis view data through the integration function based on multiple types of indicator data and corresponding values in the analysis result data, it further adds event type identification and corresponding interface call parameters to the analysis view data, and integrates the corresponding class to perform storage. In order to facilitate the server or terminal to call the generated analysis view data, so that when the server or terminal obtains the associated social security analysis data or analysis view data again, it can directly call the mining analysis based on the event type identification and the corresponding interface call parameters Data, which in turn improves the analysis efficiency and utilization value of social security data.

After the server generates the corresponding analysis view data, it sends the analysis view data to the corresponding terminal, so that the corresponding terminal can effectively perform further analysis based on the mined social insurance data combined with the corresponding business, thereby effectively mining and analyzing The later analysis data is used, thereby effectively improving the mining efficiency and analysis efficiency of social security data.

It should be understood that although the various steps in the flowcharts of FIGS. 2-4 are displayed in sequence as indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least some of the steps in Figures 2-4 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.

In one of the embodiments, as shown in FIG. 5, a social security data processing device based on data mining is provided, including: a request receiving module 502, a data acquisition module 504, a feature extraction module 506, a data analysis module 508, and data push Module 510, where:

The request receiving module 502 is configured to receive a resource acquisition request sent by the terminal, and the resource acquisition request includes the request type and request information;

The data acquisition module 504 is configured to acquire multiple social security data according to the resource acquisition request and the request information, and the social security data includes multiple field data;

The feature extraction module 506 is used to input social security data into the vector training model, and perform vector processing on multiple field data corresponding to the social security data through the vector training model, and output feature vectors corresponding to multiple field data; extract multiple feature vectors Dimension value, using a preset algorithm to calculate the similarity between multiple feature vectors according to the dimensional value, and extract the feature vector whose similarity reaches the preset threshold;

The data analysis module 508 is configured to obtain a preset data analysis model according to the request type, analyze the extracted feature vectors through the data analysis model, and obtain multiple types of indicator data and corresponding values;

The data push module 510 is configured to generate analysis result data according to multiple types of index data and corresponding values, and push the analysis result data to the terminal.

In one of the embodiments, the feature extraction module 506 is also used to obtain a preset corpus, obtain related corpus data from the corpus according to the social security data; obtain a preset vector training model, and use the vector training model to compare social security data and corpus The data is calculated and trained on word vectors to obtain multiple corresponding word vectors; the word vectors are converted into corresponding feature vectors according to a preset algorithm.

In one of the embodiments, the feature extraction module 506 is further configured to calculate multiple dimension values of multiple feature vectors according to a preset objective function; calculate the similarity between multiple feature vectors according to a preset distance algorithm and dimension value ; Extract the feature vector whose similarity reaches the preset threshold.

In one of the embodiments, the device further includes a model building module for acquiring a plurality of sample social insurance data, the sample social insurance data includes a plurality of field data; the sample social insurance data is vectorized to obtain the characteristics corresponding to the plurality of field data Vector; cluster multiple feature vectors, calculate the correlation between feature vectors and the weight of each feature vector according to the clustering results, and extract feature vectors that meet the conditional threshold; and use the extracted feature vectors and corresponding The weight constructs a data analysis model according to a preset algorithm.

In one of the embodiments, the data analysis module 508 is also used to calculate the distribution value and field saturation of multiple feature vectors through the data analysis model; perform feature field screening on multiple feature vectors, and extract those that reach the preset saturation value. Feature vector; according to the preset semantic analysis algorithm, perform semantic analysis on the extracted feature vector to obtain the weight of the feature vector; analyze according to the distribution value of the feature vector, field saturation and weight, and obtain the feature vector corresponding to multiple types Index data and corresponding values; generate analysis result data based on multiple types of index data and corresponding values.

In one of the embodiments, the analysis result data includes multiple types of index data and corresponding numerical values. The device further includes a view data generating module for generating corresponding index analysis data according to the index data and the corresponding numerical values; The analysis data integration function integrates social insurance data to generate corresponding analysis view data; adds event type identification and corresponding interface call parameters to the analysis view data. The interface call parameters are used to call the generated analysis view data according to the event type identification; the analysis view data Push to the terminal.

Regarding the specific limitation of the social security data processing device based on data mining, please refer to the above limitation on the social security data processing method based on data mining, which will not be repeated here. The various modules in the above-mentioned data mining-based social security data processing device can be implemented in whole or in part by software, hardware, and combinations thereof. The foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.

In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 6. The computer equipment includes a processor, a memory, a network interface and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium. The database of the computer equipment is used to store data such as social security data, corpus and analysis result data. The network interface of the computer device is used to communicate with an external terminal through a network connection. When the computer-readable instructions are executed by the processor, the steps of the data mining-based social security data processing method provided in any embodiment of the present application are realized.

Those skilled in the art can understand that the structure shown in FIG. 6 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.

A computer device includes a memory and one or more processors. The memory stores computer readable instructions. When the computer readable instructions are executed by the processor, the one or more processors execute the following steps:

Receive the resource acquisition request sent by the terminal, the resource acquisition request includes the request type and request information;

Obtain multiple social security data according to the resource acquisition request and request information, and the social security data includes multiple field data;

Input social security data into the vector training model, and perform vector processing on multiple field data corresponding to the social security data through the vector training model, and output feature vectors corresponding to multiple field data;

Extract the dimension values of multiple feature vectors, use a preset algorithm to calculate the similarity between multiple feature vectors according to the dimension value, and extract the feature vectors whose similarity reaches the preset threshold;

Obtain a preset data analysis model according to the request type, analyze the extracted feature vector through the data analysis model, and obtain multiple types of indicator data and corresponding values; and

Generate analysis result data according to multiple types of indicator data and corresponding values, and push the analysis result data to the terminal.

A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through computer-readable instructions, which can be stored in a non-volatile computer. In a readable storage medium, when the computer-readable instructions are executed, they may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

The technical features of the above embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction between the combinations of these technical features, they should It is considered as the range described in this specification.

The above-mentioned embodiments only express several implementation manners of the present application, and the description is relatively specific and detailed, but it should not be understood as a limitation on the scope of the invention patent. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims

A social security data processing method based on data mining, the method comprising:

Receiving a resource acquisition request sent by the terminal, where the resource acquisition request includes a request type and request information;

Acquiring multiple social security data according to the resource acquisition request and request information, the social security data including multiple field data;

Inputting the social security data into a vector training model, performing vector processing on multiple field data corresponding to the social security data through the vector training model, and outputting feature vectors corresponding to the multiple field data;

Extracting the dimension values of the multiple feature vectors, calculating the similarity between the multiple feature vectors by using a preset algorithm according to the dimension value, and extracting the feature vectors whose similarity reaches a preset threshold;

Obtain a preset data analysis model according to the request type, analyze the extracted feature vector through the data analysis model, and obtain multiple types of index data and corresponding values; and

The analysis result data is generated according to the multiple types of index data and corresponding values, and the analysis result data is pushed to the terminal.
The method according to claim 1, wherein the step of performing vector processing on multiple field data corresponding to the social insurance data through a vector training model comprises:

Obtaining a preset corpus, and obtaining associated corpus data from the corpus according to the social security data;

Obtaining a preset vector training model, and performing word vector calculation and training on the social insurance data and the corpus data through the vector training model to obtain multiple corresponding word vectors; and

The multiple word vectors are converted into corresponding feature vectors according to a preset algorithm.
The method according to claim 1, wherein the step of using a preset algorithm to calculate the similarity between a plurality of feature vectors according to the dimension value, and extracting the feature vector whose similarity reaches a preset threshold, comprises:

Calculate multiple dimension values of multiple feature vectors according to preset objective functions;

Calculating the similarity between multiple feature vectors according to the preset distance algorithm and the dimension value; and

The feature vector whose similarity reaches a preset threshold is extracted.
The method according to claim 1, wherein before obtaining a preset data analysis model according to the request type, the method further comprises:

Acquiring multiple sample social security data, where the sample social security data includes multiple field data;

Perform vectorization processing on the sample social security data to obtain feature vectors corresponding to multiple field data;

Clustering multiple feature vectors, calculating the correlation between feature vectors and the weight of each feature vector according to the clustering results, and extracting feature vectors that meet the conditional threshold; and

Use the extracted feature vectors and corresponding weights to construct a data analysis model according to a preset algorithm.
The method according to claim 1, wherein the step of analyzing the extracted feature vectors through the data analysis model comprises:

Calculate the distribution value and field saturation of multiple feature vectors through the data analysis model;

Perform feature field screening on multiple feature vectors, and extract feature vectors that reach preset saturation values;

According to the preset semantic analysis algorithm, analyze the extracted feature vector to obtain the weight of the feature vector;

Perform analysis according to the distribution value, field saturation and weight of the feature vector to obtain multiple types of index data and corresponding values corresponding to the feature vector; and

The analysis result data is generated according to the multiple types of index data and corresponding values.
The method according to any one of claims 1 to 5, wherein the method further comprises:

Generating corresponding index analysis data according to the index data and corresponding values;

Integrating the indicator analysis data into the analysis view data corresponding to the social insurance data according to a preset integration function;

Adding an event type identifier and corresponding interface call parameters to the analysis view data, where the interface call parameters are used to call the generated analysis view data according to the event type identifier; and

Push the analysis view data to the terminal.
A social security data processing device based on data mining, the device comprising:

The request receiving module is configured to receive a resource acquisition request sent by the terminal, where the resource acquisition request includes the request type and request information;

A data acquisition module, configured to acquire multiple social security data according to the resource acquisition request and request information, the social security data including multiple field data;

The feature extraction module is used to input the social security data into a vector training model, perform vector processing on multiple field data corresponding to the social security data through the vector training model, and output feature vectors corresponding to the multiple field data; extract Calculating the dimensionality values of the multiple feature vectors using a preset algorithm according to the dimensionality value to calculate the similarity between the multiple feature vectors, and extracting feature vectors whose similarity reaches a preset threshold;

The data analysis module is used to obtain a preset data analysis model according to the request type, and analyze the extracted feature vectors through the data analysis model to obtain multiple types of index data and corresponding values; and

The data push module is configured to generate analysis result data according to the multiple types of index data and corresponding values, and push the analysis result data to the terminal.
7. The device according to claim 7, wherein the feature extraction module is further configured to obtain a preset corpus, obtain associated corpus data from the corpus according to the social security data; obtain preset vector training A model, which performs word vector calculation and training on the social insurance data and the corpus data through the vector training model to obtain corresponding multiple word vectors; and converts the multiple word vectors into corresponding features according to a preset algorithm vector.
The device according to claim 7, wherein the feature extraction module is further configured to calculate multiple dimension values of multiple feature vectors according to a preset objective function; and calculate multiple dimension values according to a preset distance algorithm and the dimension value The similarity between the multiple feature vectors; and extracting the feature vectors whose similarity reaches a preset threshold.
7. The device according to claim 7, wherein the device further comprises a model building module for obtaining a plurality of sample social security data, the sample social security data includes a plurality of field data; To obtain feature vectors corresponding to multiple field data; cluster multiple feature vectors, calculate the correlation between feature vectors and the weight of each feature vector according to the clustering results, and extract feature vectors that meet the conditional threshold ; And use the extracted feature vectors and corresponding weights to construct a data analysis model according to a preset algorithm.
The device according to claim 7, wherein the data analysis module is further configured to calculate the distribution value and field saturation of multiple feature vectors through the data analysis model; and perform feature field screening on multiple feature vectors Check, extract the feature vector that reaches the preset saturation value; analyze the extracted feature vector according to the preset semantic analysis algorithm to obtain the weight of the feature vector; perform according to the distribution value of the feature vector and the field saturation and weight Analyze to obtain multiple types of index data and corresponding values corresponding to the feature vector; and generate analysis result data according to the multiple types of indicator data and corresponding values.
The device according to claim 7, wherein the device further comprises a view data generating module, configured to generate corresponding indicator analysis data according to the indicator data and corresponding values; and the indicator analysis data is preset The integration function integrates the analysis view data corresponding to the social security data; adds an event type identifier and corresponding interface call parameters to the analysis view data, and the interface call parameters are used to call the generated analysis view data according to the event type identifier; and Push the analysis view data to the terminal.
A computer device includes a memory and one or more processors, the memory stores at least one computer readable instruction, and the computer readable instruction is loaded by the processor and executes the following steps:

Receiving a resource acquisition request sent by the terminal, where the resource acquisition request includes a request type and request information;

Acquiring multiple social security data according to the resource acquisition request and request information, the social security data including multiple field data;

Inputting the social security data into a vector training model, performing vector processing on multiple field data corresponding to the social security data through the vector training model, and outputting feature vectors corresponding to the multiple field data;

Extracting the dimension values of the multiple feature vectors, calculating the similarity between the multiple feature vectors by using a preset algorithm according to the dimension value, and extracting the feature vectors whose similarity reaches a preset threshold;

Obtain a preset data analysis model according to the request type, analyze the extracted feature vector through the data analysis model, and obtain multiple types of index data and corresponding values; and

The analysis result data is generated according to the multiple types of index data and corresponding values, and the analysis result data is pushed to the terminal.
The computer device according to claim 13, wherein the processor further executes the following steps when executing the computer-readable instructions: obtaining a preset corpus, and obtaining an associated corpus from the corpus according to the social security data Data; obtain a preset vector training model, and perform word vector calculation and training on the social insurance data and the corpus data through the vector training model to obtain a plurality of corresponding word vectors; and according to the preset algorithm The word vectors are converted into corresponding feature vectors.
The computer device according to claim 13, wherein the processor further executes the following steps when executing computer-readable instructions: calculating the distribution value and field saturation of a plurality of feature vectors through the data analysis model; Multiple feature vectors are screened for feature fields, and feature vectors that reach a preset saturation value are extracted; according to a preset semantic analysis algorithm, the extracted feature vectors are analyzed to obtain the weight of the feature vector; according to the distribution of the feature vector Value and field saturation and weight are analyzed to obtain multiple types of index data and corresponding numerical values corresponding to the feature vector; and analysis result data is generated according to the multiple types of indicator data and corresponding numerical values.
The computer device according to claim 13, wherein the processor further executes the following steps when executing the computer-readable instructions: generating corresponding index analysis data according to the index data and corresponding values; analyzing the index The data integrates the analysis view data corresponding to the social security data according to a preset integration function; an event type identifier and corresponding interface call parameters are added to the analysis view data, and the interface call parameters are used to call the generated analysis according to the event type identifier View data; and pushing the analysis view data to the terminal.
A non-volatile computer-readable storage medium in which at least one computer-readable instruction is stored, and the computer-readable instruction is loaded by a processor and executes the following steps:

Receiving a resource acquisition request sent by the terminal, where the resource acquisition request includes a request type and request information;

Acquiring multiple social security data according to the resource acquisition request and request information, the social security data including multiple field data;

Inputting the social security data into a vector training model, performing vector processing on multiple field data corresponding to the social security data through the vector training model, and outputting feature vectors corresponding to the multiple field data;

Extracting the dimension values of the multiple feature vectors, calculating the similarity between the multiple feature vectors by using a preset algorithm according to the dimension value, and extracting the feature vectors whose similarity reaches a preset threshold;

Obtain a preset data analysis model according to the request type, analyze the extracted feature vector through the data analysis model, and obtain multiple types of index data and corresponding values; and

The analysis result data is generated according to the multiple types of index data and corresponding values, and the analysis result data is pushed to the terminal.
The storage medium according to claim 17, wherein when the computer-readable instructions are executed by the processor, the following steps are further executed: obtaining a preset corpus, and obtaining relevant information from the corpus according to the social security data. Corpus data; obtain a preset vector training model, and calculate and train the social security data and the corpus data through the vector training model to calculate and train word vectors to obtain multiple corresponding word vectors; and The multiple word vectors are converted into corresponding feature vectors.
The storage medium according to claim 17, wherein when the computer-readable instructions are executed by the processor, the following steps are further executed: calculating the distribution values and field saturation of a plurality of feature vectors through the data analysis model Degree; to perform feature field screening on multiple feature vectors, and extract feature vectors that reach a preset saturation value; analyze the extracted feature vectors according to a preset semantic analysis algorithm to obtain the weight of feature vectors; according to the features The distribution value, field saturation and weight of the vector are analyzed to obtain multiple types of index data and corresponding values corresponding to the feature vector; and analysis result data is generated according to the multiple types of index data and corresponding values.
The storage medium according to claim 17, wherein when the computer-readable instructions are executed by the processor, the following steps are further executed: generating corresponding index analysis data according to the index data and corresponding values; The indicator analysis data integrates the analysis view data corresponding to the social security data according to a preset integration function; an event type identification and corresponding interface call parameters are added to the analysis view data, and the interface call parameters are used to call the office according to the event type identification. Generated analysis view data; and pushing the analysis view data to the terminal.