WO2020177365A1 - Data mining-based social insurance data processing method and apparatus, and computer device - Google Patents

Data mining-based social insurance data processing method and apparatus, and computer device Download PDF

Info

Publication number
WO2020177365A1
WO2020177365A1 PCT/CN2019/116126 CN2019116126W WO2020177365A1 WO 2020177365 A1 WO2020177365 A1 WO 2020177365A1 CN 2019116126 W CN2019116126 W CN 2019116126W WO 2020177365 A1 WO2020177365 A1 WO 2020177365A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
feature vectors
analysis
preset
vector
Prior art date
Application number
PCT/CN2019/116126
Other languages
French (fr)
Chinese (zh)
Inventor
陈娴娴
阮晓雯
徐亮
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020177365A1 publication Critical patent/WO2020177365A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to a social security data processing method, device and computer equipment based on data mining.
  • a social security data processing method based on data mining.
  • a social security data processing method based on data mining includes:
  • the social security data including multiple field data
  • the analysis result data is generated according to the multiple types of index data and corresponding values, and the analysis result data is pushed to the terminal.
  • a social security data processing device based on data mining includes:
  • the request receiving module is configured to receive a resource acquisition request sent by the terminal, where the resource acquisition request includes the request type and request information;
  • a data acquisition module configured to acquire multiple social security data according to the resource acquisition request and request information, the social security data including multiple field data;
  • the feature extraction module is used to input the social security data into a vector training model, perform vector processing on multiple field data corresponding to the social security data through the vector training model, and output feature vectors corresponding to the multiple field data; extract Calculating the dimensionality values of the multiple feature vectors using a preset algorithm according to the dimensionality value to calculate the similarity between the multiple feature vectors, and extracting feature vectors whose similarity reaches a preset threshold;
  • the data analysis module is used to obtain a preset data analysis model according to the request type, and analyze the extracted feature vectors through the data analysis model to obtain multiple types of index data and corresponding values;
  • the data push module is configured to generate analysis result data according to the multiple types of index data and corresponding values, and push the analysis result data to the terminal.
  • a computer device including a memory and one or more processors, the memory stores computer readable instructions, when the computer readable instructions are executed by the processor, the one or more processors execute The following steps:
  • the social security data including multiple field data
  • the analysis result data is generated according to the multiple types of index data and corresponding values, and the analysis result data is pushed to the terminal.
  • One or more non-volatile computer-readable storage media storing computer-readable instructions.
  • the one or more processors execute the following steps:
  • the social security data including multiple field data
  • the analysis result data is generated according to the multiple types of index data and corresponding values, and the analysis result data is pushed to the terminal.
  • Fig. 1 is an application scenario diagram of a social security data processing method based on data mining according to one or more embodiments
  • Fig. 2 is a schematic flow chart of a social security data processing method based on data mining according to one or more embodiments.
  • FIG. 3 is a schematic flowchart of a step of vectorizing multiple field data corresponding to social insurance data according to one or more embodiments.
  • Fig. 4 is a schematic flow chart of the steps of analyzing the extracted feature vectors through the data analysis model according to one or more embodiments.
  • Fig. 5 is a block diagram of a social security data processing device based on data mining according to one or more embodiments.
  • Figure 6 is a block diagram of a computer device according to one or more embodiments.
  • the social security data processing method based on data mining provided in this application can be applied to the application environment as shown in FIG. 1.
  • the terminal 102 communicates with the server 104 through the network through the network.
  • the terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
  • the server 104 may be implemented by an independent server or a server cluster composed of multiple servers.
  • the terminal 102 may send a resource acquisition request to the server, and the resource acquisition request includes the request type and request information.
  • the server 104 After the server 104 receives the resource acquisition request sent by the terminal, it acquires multiple social security data according to the resource acquisition request and the request information carried, and the social security data includes multiple field data.
  • the server 104 further vectorizes the multiple field data corresponding to the social security data to obtain feature vectors corresponding to the multiple field data.
  • the server 104 calculates the similarity between the multiple feature vectors according to the preset algorithm, and extracts the feature vectors whose similarity reaches the preset threshold.
  • the server further obtains a preset data analysis model, analyzes the extracted feature vector through the data analysis model, obtains corresponding analysis result data, and pushes the analysis result data to the corresponding terminal 102.
  • a method for processing social insurance data based on data mining is provided. Taking the method applied to the server in FIG. 1 as an example for description, the method includes the following steps:
  • Step 202 Receive a resource acquisition request sent by a terminal, where the resource acquisition request includes the request type and request information.
  • the user can input the relevant field information through the corresponding terminal and send a data analysis request to the server.
  • the resource acquisition request can be the result data obtained after the analysis of the social security data.
  • the resource acquisition request carries the request type and request information, where the request type may be the type of the acquired resource data, such as social security analysis data.
  • the request information may be field information input by the user, for example, it may be field information such as the range and time interval of social insurance data.
  • Step 204 Acquire multiple social security data according to the resource acquisition request and the request information, and the social security data includes multiple field data.
  • the social insurance data may be social insurance data, for example, it may include endowment insurance data, medical insurance data, unemployment insurance data, work injury insurance data, maternity insurance data, etc.
  • the server After the server receives the resource acquisition request sent by the terminal, it acquires multiple social security data from the local database or the third-party database according to the resource acquisition request and request information. For example, when the scope of the social security data obtained in the request information is a certain company, the server obtains the social security data corresponding to the company.
  • the social security data includes multiple field data, such as name, gender, age, region, affiliated company, payment duration, payment amount and other field information.
  • Step 206 Input the social security data into the vector training model, perform vector processing on the multiple field data corresponding to the social security data through the vector training model, and output feature vectors corresponding to the multiple field data.
  • the server vectorizes multiple field data corresponding to the social security data.
  • the server may obtain a preset corpus, and obtain associated corpus data from the corpus according to the social security data.
  • the server further obtains a preset vector training model.
  • the vector training model may be a neural network model based on word2vec.
  • the server inputs the social security data and the obtained associated corpus data into the vector training model, and then uses the vector training model to combine the associated corpus data to calculate and train the social security data to obtain multiple word vectors corresponding to the social security data , And convert the word vector into the corresponding feature vector according to the preset algorithm. In this way, feature vectors corresponding to multiple field data can be obtained.
  • Step 208 Extract the dimensional values of the multiple feature vectors, calculate the similarity between the multiple feature vectors according to the dimensional values using a preset algorithm, and extract the feature vectors whose similarity reaches the preset threshold.
  • the server calculates the similarity between the multiple feature vectors according to a preset algorithm. Specifically, the server may first calculate multiple dimension values of multiple feature vectors according to a preset objective function, and the dimension values may be feature values of different dimensions corresponding to each feature vector. The server further follows the preset distance algorithm and the dimension value of the feature vector to calculate the similarity between the multiple feature vectors, and then extracts the feature vector whose similarity reaches the preset threshold.
  • Step 210 Obtain a preset data analysis model according to the request type, and analyze the extracted feature vector through the data analysis model to obtain multiple types of index data and corresponding values.
  • the server After the server extracts the feature vector, it further obtains the corresponding preset data analysis model according to the request type.
  • the data analysis model can include multiple different types of data analysis modules, such as insurance payment rate, payment base analysis, business operation status, etc. Multiple types of indicator data analysis modules. Analyze the extracted feature vectors through the data analysis model.
  • the server may first use the data analysis model to calculate the distribution value and field saturation of multiple feature vectors, where the distribution value may be the value of the field data corresponding to the feature vector, and the field saturation may be the feature vector corresponding to the field data.
  • the degree of saturation of the values of multiple preset index data The server further performs statistical screening on multiple feature vectors through the data analysis model, and extracts feature vectors that reach a preset saturation value.
  • the server performs semantic analysis on the extracted feature vectors according to a preset semantic analysis algorithm, and obtains the weight of each feature vector, that is, the importance value of the feature vector.
  • the server analyzes the multiple feature vectors according to the distribution value, field saturation, and weight of the feature vector to obtain multiple types of index data and values corresponding to the feature vector.
  • Step 212 Generate analysis result data according to multiple types of index data and corresponding values, and push the analysis result data to the terminal.
  • the server After the server generates the analysis result data, it then generates the analysis result data according to multiple types of index data and corresponding values corresponding to each feature vector. Then push the analysis result data to the corresponding terminal. Further, the server can also generate view data in a preset format from the analysis result data, and push the generated view data to the corresponding terminal, so that the user can clearly understand the analysis result data.
  • the social security data obtained is social security data of a certain company or a certain area
  • mining and analyzing the multiple social security data obtained can effectively analyze the insurance payment rate, payment base analysis, and business operation status And other indicator data.
  • the server After the server receives the resource acquisition request sent by the terminal, it acquires multiple social security data according to the resource acquisition request and the carried request information, and the social security data includes multiple field data.
  • the server then inputs the social security data into the vector training model, and vectorizes multiple field data corresponding to the social security data through the vector training model, and outputs feature vectors corresponding to the multiple field data.
  • the dimension values of multiple feature vectors are extracted, and the similarity between the multiple feature vectors is calculated according to the dimension value using a preset algorithm, and the feature vectors whose similarity reaches the preset threshold are extracted.
  • the server further obtains the preset data analysis model, analyzes the extracted feature vector through the data analysis model, obtains multiple types of indicator data and corresponding values, and generates analysis result data based on multiple types of indicator data and corresponding values. And push the analysis result data to the corresponding terminal.
  • the steps of performing vector processing on multiple field data corresponding to social insurance data through the vector training model specifically include the following:
  • Step 302 Obtain a preset corpus, and obtain associated corpus data from the corpus according to the social security data.
  • the terminal may send a resource acquisition request to the server, and the resource acquisition request carries the request type and request information.
  • the server After receiving the resource acquisition request request sent by the terminal, the server acquires multiple corresponding social security data from the local database or a third-party database according to the resource acquisition request and the request information, and the social security data includes multiple field data.
  • the server After the server obtains multiple social security data, it then obtains a preset corpus.
  • the corpus can be a pre-set corpus that includes a variety of words or sentences related to social insurance.
  • Step 304 Obtain a preset vector training model, and perform word vector calculation and training on the social security data and corpus data through the vector training model to obtain multiple corresponding word vectors.
  • Step 306 Convert multiple word vectors into corresponding feature vectors according to a preset algorithm.
  • the server further obtains a preset vector training model, and inputs social security data and corpus data into the vector training model.
  • the vector training model may be a neural network model based on word2vec.
  • the vector training model is used to calculate and train social security data and expected data, and obtain word vectors corresponding to multiple social security data.
  • each word can be trained to obtain a vector in n-dimensional space. For example, when n takes 2 dimensions, the corresponding vector of "body” is [0.5365654,0.726268], and the corresponding "part” corresponds The word vector of may be [0.52222458,0.7511456].
  • the cos value of these two vectors is very close, and the distance corresponding to the semantic space is very close, which means that "identity" is a word. If n takes 100, each word is transformed into a 100-dimensional vector.
  • the word vector model is used to vectorize the social security data, which can accurately and effectively extract the word vector in the social security data.
  • the server After the server extracts the word vector in the social security data, it further converts the word vector into a corresponding feature vector according to a preset algorithm.
  • a preset vector representation method can be used to convert a word vector into a corresponding feature vector. This can effectively extract the feature vector corresponding to the social security data.
  • using a preset algorithm to calculate the similarity between multiple feature vectors according to the dimension value, and extracting the feature vector whose similarity reaches the preset threshold includes: calculating multiple features according to the preset objective function Multiple dimension values of the vector; calculate the similarity between multiple feature vectors according to the preset distance algorithm and dimension value; extract the feature vectors whose similarity reaches the preset threshold.
  • the terminal may send a resource acquisition request to the server, and the resource acquisition request carries the request type and request information.
  • the server After receiving the resource acquisition request request sent by the terminal, the server acquires multiple corresponding social security data from the local database or a third-party database according to the resource acquisition request and the request information, and the social security data includes multiple field data.
  • the server vectorizes the multiple field data corresponding to the social security data, thereby obtaining the feature vector corresponding to the multiple field data.
  • the server further calculates the correlation between the multiple feature vectors according to a preset algorithm. Specifically, the server may calculate the multiple dimension values of multiple feature vectors according to a preset objective function, and calculate the similarity between the multiple feature vectors according to the preset distance algorithm and dimension value, and then extract the similarity to reach the expected value.
  • Set the threshold feature vector may be the Euclidean distance algorithm.
  • the calculation formula of the Euclidean distance function can be as follows:
  • the expression of the objective function can be:
  • Max is the maximum value of the same dimension of the extracted vector. For example, 0.5>0.2>0.1 (first dimension), 0.7>0.5>0.2 (second dimension), then Max Corresponds to [0.5,0.7], the same Min corresponds to [0.1,0.2], Mean mean corresponds to [0.8/3,1.4/3], and then connects these three vectors horizontally through 3* It is represented by an n-dimensional vector. For example, when n is 2, the body is represented as [0.5,0.2], the part is represented as [0.1,0.7], and the certificate is represented as [0.2,0.5].
  • Max is the maximum value of the same dimension of the extracted vector, such as 0.5>0.2>0.1 (first dimension), 0.7>0.5>0.2 (second dimension), then Max corresponds to Is [0.5,0.7], the same Min corresponds to [0.1,0.2], mean mean corresponds to [0.8/3,1.4/3], and then these three vectors are connected horizontally, so the short text " “ID” can be represented by a 6-dimensional vector [0.5, 0.7, 0.1, 0.2, 0.8/3, 1.4/3]. Similarly, if the short text is "insurance statement", it can also be represented by a 6-dimensional vector. Therefore, no matter how long the short text is, it can be represented by a 3*n dimension vector.
  • the similarity between the texts can be calculated by calculating the Euclidean distance of the vectors corresponding to the multiple dimensions of the multiple texts, and the text similarity results can be obtained.
  • the server After the server calculates the similarity between the multiple feature vectors, it further extracts the feature vectors whose similarity reaches a preset threshold.
  • the similarity between multiple feature vectors is calculated through the preset objective function and distance algorithm, and then the feature vectors whose similarity reaches the preset threshold are extracted, which can effectively extract the feature extraction of social security data.
  • the preset data analysis model before obtaining the preset data analysis model according to the request type, it further includes: obtaining a plurality of sample social insurance data, the sample social insurance data includes a plurality of field data; the sample social insurance data is vectorized to obtain multiple The feature vector corresponding to the field data; cluster multiple feature vectors, calculate the correlation between feature vectors and the weight of each feature vector according to the clustering results, and extract feature vectors that meet the conditional threshold; use the extracted features The vector and the corresponding weight construct a data analysis model according to a preset algorithm.
  • the server Before the server obtains the preset data analysis model, it also needs to construct a data analysis model. Specifically, the server can obtain a large amount of sample social security data in advance, and the server first performs vector processing on the sample social security data, so that feature vectors corresponding to multiple field data in the sample social security data can be obtained. After the server vectorizes the sample social security data, it performs feature extraction on the social security data. Specifically, the server can perform cluster analysis on multiple feature vectors through a preset clustering algorithm, calculate the correlation between feature vectors and the weight of each feature vector, and then extract feature vectors that meet the preset condition threshold. . The server then constructs a data analysis model according to a preset algorithm according to the extracted feature vector and the corresponding weight.
  • the data analysis model may include multiple different types of data analysis modules, such as insurance premium payment rate, payment base analysis, business operation status and other types of indicator data analysis modules.
  • data analysis modules such as insurance premium payment rate, payment base analysis, business operation status and other types of indicator data analysis modules.
  • the step of analyzing the extracted feature vector through the data analysis model specifically includes the following content:
  • Step 402 Calculate the distribution values and field saturations of multiple feature vectors through the data analysis model.
  • Step 404 Perform feature field screening on multiple feature vectors, and extract feature vectors that reach a preset saturation value.
  • the terminal may send a resource acquisition request to the server, and the resource acquisition request carries the request type and request information.
  • the server After receiving the resource acquisition request request sent by the terminal, the server acquires multiple corresponding social security data from the local database or a third-party database according to the resource acquisition request and the request information, and the social security data includes multiple field data.
  • the server vectorizes the multiple field data corresponding to the social security data, thereby obtaining feature vectors corresponding to the multiple field data.
  • the server further calculates the similarity between the multiple feature vectors according to the preset algorithm, and extracts the feature vectors whose similarity reaches the preset threshold.
  • the server performs feature extraction on the social security data, and after extracting the corresponding feature vector, it further obtains a preset data analysis model according to the request type in the resource acquisition request, and analyzes the extracted feature vector through the data analysis model. Specifically, after the server obtains the preset data analysis model, it inputs the feature vector corresponding to the extracted field data into the data analysis model, calculates the distribution value and field saturation of the field data through the data analysis model, and compares the field data Performing feature field screening, for example, may be performing statistical screening on field data to extract feature vectors that reach a preset saturation value. The distribution value may be the value of the field data corresponding to the feature vector.
  • the distribution value of the field data can be the distribution of the number of people in each age group such as 10-20, 20-30, 30-40.
  • Field saturation can be the saturation degree of the value of multiple preset index data corresponding to the feature vector and field data.
  • the input data may have some unsaturation. If some fields are empty, the field data of the field is saturated The degree is relatively low. Therefore, the server needs to perform statistical exploration on the feature vector corresponding to the field data to perform secondary field screening.
  • Step 406 Analyze the extracted feature vector according to the preset semantic analysis algorithm to obtain the weight of the feature vector.
  • Step 408 Perform analysis according to the distribution value of the feature vector and the field saturation and weight to obtain multiple types of index data and corresponding values corresponding to the feature vector.
  • Step 410 Generate analysis result data according to multiple types of index data and corresponding values.
  • the server performs statistical screening on multiple feature vectors, and after extracting the feature vectors that reach the preset saturation value, it further analyzes the extracted field data according to the preset semantic analysis algorithm to obtain the weight corresponding to the field data, that is, the degree of importance value.
  • the server analyzes the distribution value, field saturation, and importance value of the field data to obtain multiple types of indicator data and corresponding values, and generates corresponding analysis result data based on the multiple types of indicator data and corresponding values. Analyze the extracted field data through the data analysis model, thereby effectively analyzing the analysis result data corresponding to the social insurance data.
  • semantic analysis may be based on the matching relationship between the fields input by the user and the real fields, and the requested information includes the fields input by the user.
  • fields based on thousands of dimensions of social security big data including desensitized ID number, height, weight, social security desensitized account number, social security attributes, etc., and users may only be interested in a few specific fields. Therefore, the user only needs to enter the field of interest, and the server analyzes the feature vector corresponding to the extracted social insurance data to analyze the field information related to the field of interest entered by the user in the data set, and calculates the feature vector corresponding The weight of, and then get the associated field information. If the user enters a relatively vague field of interest, such as "Payment”, the "Payment” contains information such as the number of annual claims, the amount of compensation, and the reason for the compensation.
  • the data analysis model can include multiple different types of data analysis modules, such as insurance payment rate, payment base analysis, business operation status and other types of indicator data analysis modules.
  • the server then analyzes the multiple feature vectors according to the distribution value, field saturation, and weight of the feature vector to obtain multiple types of index data and values corresponding to the feature vector.
  • the server further generates analysis result data according to multiple types of index data and corresponding numerical values corresponding to each feature vector. After the server generates the analysis result data, it pushes the analysis result data to the corresponding terminal.
  • the analysis result data includes multiple types of index data and corresponding values
  • the method further includes: generating corresponding index analysis data according to the index data and the corresponding values; and placing the index analysis data in a preset manner Generate corresponding analysis view data; add event type identification and corresponding interface call parameters to the analysis view data; push the analysis view data to the terminal.
  • the server After receiving the resource acquisition request request sent by the terminal, the server acquires multiple corresponding social security data from the local database or a third-party database according to the resource acquisition request and the request information, and the social security data includes multiple field data.
  • the server vectorizes the multiple field data corresponding to the social security data, thereby obtaining feature vectors corresponding to the multiple field data.
  • the server further calculates the similarity between the multiple feature vectors according to the preset algorithm, and extracts the feature vectors whose similarity reaches the preset threshold.
  • the server performs feature extraction on the social security data, and after extracting the corresponding feature vector, it further obtains a preset data analysis model according to the request type in the resource acquisition request, and analyzes the extracted feature vector through the data analysis model.
  • the data analysis model can include multiple different types of data analysis modules, such as insurance payment rate, payment base analysis, business operation status and other types of indicator data analysis modules.
  • the server analyzes the multiple feature vectors according to the distribution value, field saturation, and weight of the feature vector to obtain multiple types of index data and values corresponding to the feature vector.
  • the server further generates analysis result data according to multiple types of index data and corresponding numerical values corresponding to each feature vector.
  • the analysis result data includes multiple types of index data and corresponding values.
  • the server may further generate index analysis data corresponding to multiple index types from the analysis result data according to the index data type.
  • the server may also generate corresponding visual analysis view data according to a preset method of module data of multiple indicator types.
  • the server can obtain a preset integration function according to the request type, integrate corresponding view resource data through the integration function according to multiple preset timing parameters and corresponding predicted values in the analysis result data, and add event types to the view resource data Identification and corresponding interface call parameters.
  • the preset integration function can be a python visualization function, and visualization functions such as histogram visualization function, distribution density, heat map, etc. can be used to embed and integrate corresponding view data, and the corresponding visualization image can be drawn through nested functions.
  • the server After the server integrates the corresponding analysis view data through the integration function based on multiple types of indicator data and corresponding values in the analysis result data, it further adds event type identification and corresponding interface call parameters to the analysis view data, and integrates the corresponding class to perform storage.
  • event type identification and corresponding interface call parameters In order to facilitate the server or terminal to call the generated analysis view data, so that when the server or terminal obtains the associated social security analysis data or analysis view data again, it can directly call the mining analysis based on the event type identification and the corresponding interface call parameters Data, which in turn improves the analysis efficiency and utilization value of social security data.
  • the server After the server generates the corresponding analysis view data, it sends the analysis view data to the corresponding terminal, so that the corresponding terminal can effectively perform further analysis based on the mined social insurance data combined with the corresponding business, thereby effectively mining and analyzing The later analysis data is used, thereby effectively improving the mining efficiency and analysis efficiency of social security data.
  • a social security data processing device based on data mining including: a request receiving module 502, a data acquisition module 504, a feature extraction module 506, a data analysis module 508, and data push Module 510, where:
  • the request receiving module 502 is configured to receive a resource acquisition request sent by the terminal, and the resource acquisition request includes the request type and request information;
  • the data acquisition module 504 is configured to acquire multiple social security data according to the resource acquisition request and the request information, and the social security data includes multiple field data;
  • the feature extraction module 506 is used to input social security data into the vector training model, and perform vector processing on multiple field data corresponding to the social security data through the vector training model, and output feature vectors corresponding to multiple field data; extract multiple feature vectors Dimension value, using a preset algorithm to calculate the similarity between multiple feature vectors according to the dimensional value, and extract the feature vector whose similarity reaches the preset threshold;
  • the data analysis module 508 is configured to obtain a preset data analysis model according to the request type, analyze the extracted feature vectors through the data analysis model, and obtain multiple types of indicator data and corresponding values;
  • the data push module 510 is configured to generate analysis result data according to multiple types of index data and corresponding values, and push the analysis result data to the terminal.
  • the feature extraction module 506 is also used to obtain a preset corpus, obtain related corpus data from the corpus according to the social security data; obtain a preset vector training model, and use the vector training model to compare social security data and corpus
  • the data is calculated and trained on word vectors to obtain multiple corresponding word vectors; the word vectors are converted into corresponding feature vectors according to a preset algorithm.
  • the feature extraction module 506 is further configured to calculate multiple dimension values of multiple feature vectors according to a preset objective function; calculate the similarity between multiple feature vectors according to a preset distance algorithm and dimension value ; Extract the feature vector whose similarity reaches the preset threshold.
  • the device further includes a model building module for acquiring a plurality of sample social insurance data, the sample social insurance data includes a plurality of field data; the sample social insurance data is vectorized to obtain the characteristics corresponding to the plurality of field data Vector; cluster multiple feature vectors, calculate the correlation between feature vectors and the weight of each feature vector according to the clustering results, and extract feature vectors that meet the conditional threshold; and use the extracted feature vectors and corresponding The weight constructs a data analysis model according to a preset algorithm.
  • the data analysis module 508 is also used to calculate the distribution value and field saturation of multiple feature vectors through the data analysis model; perform feature field screening on multiple feature vectors, and extract those that reach the preset saturation value.
  • Feature vector according to the preset semantic analysis algorithm, perform semantic analysis on the extracted feature vector to obtain the weight of the feature vector; analyze according to the distribution value of the feature vector, field saturation and weight, and obtain the feature vector corresponding to multiple types Index data and corresponding values; generate analysis result data based on multiple types of index data and corresponding values.
  • the analysis result data includes multiple types of index data and corresponding numerical values.
  • the device further includes a view data generating module for generating corresponding index analysis data according to the index data and the corresponding numerical values;
  • the analysis data integration function integrates social insurance data to generate corresponding analysis view data; adds event type identification and corresponding interface call parameters to the analysis view data.
  • the interface call parameters are used to call the generated analysis view data according to the event type identification; the analysis view data Push to the terminal.
  • the various modules in the above-mentioned data mining-based social security data processing device can be implemented in whole or in part by software, hardware, and combinations thereof.
  • the foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 6.
  • the computer equipment includes a processor, a memory, a network interface and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium.
  • the database of the computer equipment is used to store data such as social security data, corpus and analysis result data.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • FIG. 6 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • a computer device includes a memory and one or more processors.
  • the memory stores computer readable instructions.
  • the one or more processors execute the following steps:
  • the resource acquisition request includes the request type and request information
  • the social security data includes multiple field data
  • Extract the dimension values of multiple feature vectors use a preset algorithm to calculate the similarity between multiple feature vectors according to the dimension value, and extract the feature vectors whose similarity reaches the preset threshold;
  • One or more non-volatile computer-readable storage media storing computer-readable instructions.
  • the one or more processors execute the following steps:
  • the resource acquisition request includes the request type and request information
  • the social security data includes multiple field data
  • Extract the dimension values of multiple feature vectors use a preset algorithm to calculate the similarity between multiple feature vectors according to the dimension value, and extract the feature vectors whose similarity reaches the preset threshold;
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • ROM read only memory
  • PROM programmable ROM
  • EPROM electrically programmable ROM
  • EEPROM electrically erasable programmable ROM
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

A data mining-based social insurance data processing method, comprising: receiving a resource obtaining request sent by a terminal, wherein the resource obtaining request comprises a request type and request information; obtaining multiple pieces of social insurance data according to the resource obtaining request and the request information, wherein the social insurance data comprises multiple pieces of field data; inputting the social insurance data into a vector training model, and vectorizing the multiple pieces of field data corresponding to the social insurance data by means of the vector training model to output feature vectors corresponding to the multiple pieces of field data; extracting dimension values of the multiple feature vectors, calculating the similarities between the multiple feature vectors by a preset algorithm according to the dimension values, and extracting feature vectors having a similarity reaching a preset threshold; obtaining a data analysis model according to the request type, and analyzing the extracted feature vectors by means of the data analysis model to obtain multiple types of index data and corresponding values; and generating analysis result data on the basis of the multiple types of index data and the corresponding values, and pushing the analysis result data to the terminal.

Description

基于数据挖掘的社保数据处理方法、装置和计算机设备Social security data processing method, device and computer equipment based on data mining
相关申请的交叉引用:Cross-references to related applications:
本申请要求于2019年03月07日提交至中国专利局,申请号为2019101716064,申请名称为“基于数据挖掘的社保数据处理方法、装置和计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on March 7, 2019. The application number is 2019101716064, and the application title is "Social Security Data Processing Methods, Devices and Computer Equipment Based on Data Mining". Incorporated in this application by reference.
技术领域Technical field
本申请涉及一种基于数据挖掘的社保数据处理方法、装置和计算机设备。This application relates to a social security data processing method, device and computer equipment based on data mining.
背景技术Background technique
随着经济的飞速发展,社会保险成为了民生经济的重要组成部分。随着计算机技术的不断发展,社会保险人员登记、社会保险金征收、社会保险金偿付等各个业务流程已经全部实现网络化和信息化,社保业务系统也积累了大量的社保数据。With the rapid economic development, social insurance has become an important part of the people's livelihood economy. With the continuous development of computer technology, various business processes such as registration of social insurance personnel, collection of social insurance funds, and payment of social insurance funds have all been networked and informatized, and the social insurance business system has also accumulated a large amount of social insurance data.
然而,现有的对社保数据进行挖掘的方式中,大多的只是对社保数据进行查询以及简单的数据处理,对这些大量的社保数据没有进行更深层次的分析和挖掘。且大量的社保数据具有数据量大,信息度繁杂冗余,在对大量的社保数据进行挖掘和分析时,大量的社保数据极易存在挖掘度深度不够、流程混乱等情况,导致数据挖掘的效率和准确率较低。However, most of the existing methods of mining social security data are only querying social security data and simple data processing, and there is no deeper analysis and mining of these large amounts of social security data. And a large amount of social security data has a large amount of data, and the information is complicated and redundant. When mining and analyzing a large amount of social security data, a large amount of social security data is prone to insufficient mining depth and chaotic processes, which leads to the efficiency of data mining. And the accuracy rate is low.
发明内容Summary of the invention
根据本申请公开的各种实施例一种基于数据挖掘的社保数据处理方法、装置和计算机设备。According to various embodiments disclosed in the present application, a social security data processing method, device and computer equipment based on data mining.
一种基于数据挖掘的社保数据处理方法包括:A social security data processing method based on data mining includes:
接收终端发送的资源获取请求,所述资源获取请求包括请求类型和请求信息;Receiving a resource acquisition request sent by the terminal, where the resource acquisition request includes a request type and request information;
根据所述资源获取请求和请求信息获取多个社保数据,所述社保数据包括多个字段数据;Acquiring multiple social security data according to the resource acquisition request and request information, the social security data including multiple field data;
将所述社保数据输入至向量训练模型中,通过向量训练模型对所述社保数据对应的多个字段数据进行向量处理,输出所述多个字段数据对应的特征向量;Inputting the social security data into a vector training model, performing vector processing on multiple field data corresponding to the social security data through the vector training model, and outputting feature vectors corresponding to the multiple field data;
提取所述多个特征向量的维度值,利用预设算法根据所述维度值计算多个特征 向量之间的相似度,提取出所述相似度达到预设阈值的特征向量;Extracting the dimension values of the multiple feature vectors, calculating the similarity between the multiple feature vectors by using a preset algorithm according to the dimension value, and extracting the feature vectors whose similarity reaches a preset threshold;
根据所述请求类型获取预设的数据分析模型,通过所述数据分析模型对提取的特征向量进行分析,得到多个类型的指标数据和对应的数值;及Obtain a preset data analysis model according to the request type, analyze the extracted feature vector through the data analysis model, and obtain multiple types of index data and corresponding values; and
根据所述多个类型的指标数据和对应的数值生成分析结果数据,将所述分析结果数据推送至所述终端。The analysis result data is generated according to the multiple types of index data and corresponding values, and the analysis result data is pushed to the terminal.
一种基于数据挖掘的社保数据处理装置包括:A social security data processing device based on data mining includes:
请求接收模块,用于接收终端发送的资源获取请求,所述资源获取请求包括请求类型和请求信息;The request receiving module is configured to receive a resource acquisition request sent by the terminal, where the resource acquisition request includes the request type and request information;
数据获取模块,用于根据所述资源获取请求和请求信息获取多个社保数据,所述社保数据包括多个字段数据;A data acquisition module, configured to acquire multiple social security data according to the resource acquisition request and request information, the social security data including multiple field data;
特征提取模块,用于将所述社保数据输入至向量训练模型中,通过向量训练模型对所述社保数据对应的多个字段数据进行向量处理,输出所述多个字段数据对应的特征向量;提取所述多个特征向量的维度值,利用预设算法根据所述维度值计算多个特征向量之间的相似度,提取出所述相似度达到预设阈值的特征向量;The feature extraction module is used to input the social security data into a vector training model, perform vector processing on multiple field data corresponding to the social security data through the vector training model, and output feature vectors corresponding to the multiple field data; extract Calculating the dimensionality values of the multiple feature vectors using a preset algorithm according to the dimensionality value to calculate the similarity between the multiple feature vectors, and extracting feature vectors whose similarity reaches a preset threshold;
数据分析模块,用于根据所述请求类型获取预设的数据分析模型,通过所述数据分析模型对提取的特征向量进行分析,得到多个类型的指标数据和对应的数值;及The data analysis module is used to obtain a preset data analysis model according to the request type, and analyze the extracted feature vectors through the data analysis model to obtain multiple types of index data and corresponding values; and
数据推送模块,用于根据所述多个类型的指标数据和对应的数值生成分析结果数据,将所述分析结果数据推送至所述终端。The data push module is configured to generate analysis result data according to the multiple types of index data and corresponding values, and push the analysis result data to the terminal.
一种计算机设备,包括存储器和一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述一个或多个处理器执行以下步骤:A computer device, including a memory and one or more processors, the memory stores computer readable instructions, when the computer readable instructions are executed by the processor, the one or more processors execute The following steps:
接收终端发送的资源获取请求,所述资源获取请求包括请求类型和请求信息;Receiving a resource acquisition request sent by the terminal, where the resource acquisition request includes a request type and request information;
根据所述资源获取请求和请求信息获取多个社保数据,所述社保数据包括多个字段数据;Acquiring multiple social security data according to the resource acquisition request and request information, the social security data including multiple field data;
将所述社保数据输入至向量训练模型中,通过向量训练模型对所述社保数据对应的多个字段数据进行向量处理,输出所述多个字段数据对应的特征向量;Inputting the social security data into a vector training model, performing vector processing on multiple field data corresponding to the social security data through the vector training model, and outputting feature vectors corresponding to the multiple field data;
提取所述多个特征向量的维度值,利用预设算法根据所述维度值计算多个特征向量之间的相似度,提取出所述相似度达到预设阈值的特征向量;Extracting the dimension values of the multiple feature vectors, calculating the similarity between the multiple feature vectors by using a preset algorithm according to the dimension value, and extracting the feature vectors whose similarity reaches a preset threshold;
根据所述请求类型获取预设的数据分析模型,通过所述数据分析模型对提取的特征向量进行分析,得到多个类型的指标数据和对应的数值;及Obtain a preset data analysis model according to the request type, analyze the extracted feature vector through the data analysis model, and obtain multiple types of index data and corresponding values; and
根据所述多个类型的指标数据和对应的数值生成分析结果数据,将所述分析结果数据推送至所述终端。The analysis result data is generated according to the multiple types of index data and corresponding values, and the analysis result data is pushed to the terminal.
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:One or more non-volatile computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:
接收终端发送的资源获取请求,所述资源获取请求包括请求类型和请求信息;Receiving a resource acquisition request sent by the terminal, where the resource acquisition request includes a request type and request information;
根据所述资源获取请求和请求信息获取多个社保数据,所述社保数据包括多个字段数据;Acquiring multiple social security data according to the resource acquisition request and request information, the social security data including multiple field data;
将所述社保数据输入至向量训练模型中,通过向量训练模型对所述社保数据对应的多个字段数据进行向量处理,输出所述多个字段数据对应的特征向量;Inputting the social security data into a vector training model, performing vector processing on multiple field data corresponding to the social security data through the vector training model, and outputting feature vectors corresponding to the multiple field data;
提取所述多个特征向量的维度值,利用预设算法根据所述维度值计算多个特征向量之间的相似度,提取出所述相似度达到预设阈值的特征向量;Extracting the dimension values of the multiple feature vectors, calculating the similarity between the multiple feature vectors by using a preset algorithm according to the dimension value, and extracting the feature vectors whose similarity reaches a preset threshold;
根据所述请求类型获取预设的数据分析模型,通过所述数据分析模型对提取的特征向量进行分析,得到多个类型的指标数据和对应的数值;及Obtain a preset data analysis model according to the request type, analyze the extracted feature vector through the data analysis model, and obtain multiple types of index data and corresponding values; and
根据所述多个类型的指标数据和对应的数值生成分析结果数据,将所述分析结果数据推送至所述终端。The analysis result data is generated according to the multiple types of index data and corresponding values, and the analysis result data is pushed to the terminal.
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。The details of one or more embodiments of the application are set forth in the following drawings and description. Other features and advantages of this application will become apparent from the description, drawings and claims.
附图说明Description of the drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings needed in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.
图1为根据一个或多个实施例中基于数据挖掘的社保数据处理方法的应用场景图;Fig. 1 is an application scenario diagram of a social security data processing method based on data mining according to one or more embodiments;
图2为根据一个或多个实施例中基于数据挖掘的社保数据处理方法的流程示意图。Fig. 2 is a schematic flow chart of a social security data processing method based on data mining according to one or more embodiments.
图3为根据一个或多个实施例中对社保数据对应的多个字段数据进行向量化步骤的流程示意图。FIG. 3 is a schematic flowchart of a step of vectorizing multiple field data corresponding to social insurance data according to one or more embodiments.
图4为根据一个或多个实施例中通过数据分析模型对提取的特征向量进行分析的步骤的流程示意图。Fig. 4 is a schematic flow chart of the steps of analyzing the extracted feature vectors through the data analysis model according to one or more embodiments.
图5为根据一个或多个实施例中基于数据挖掘的社保数据处理装置的框图。Fig. 5 is a block diagram of a social security data processing device based on data mining according to one or more embodiments.
图6为根据一个或多个实施例中计算机设备的框图。Figure 6 is a block diagram of a computer device according to one or more embodiments.
具体实施方式detailed description
为了使本申请的目技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the technical solutions and advantages of the present application clearer, the following further describes the present application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the application, and not used to limit the application.
本申请提供的基于数据挖掘的社保数据处理方法,可以应用于如图1所示的应用环境中。终端102通过网络与服务器104通过网络进行通信。终端102可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备,服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。终端102可以向服务器发送资源获取请求,资源获取请求包括请求类型和请求信息。服务器104接收终端发送的资源获取请求后,根据资源获取请求和携带的请求信息获取多个社保数据,社保数据中包括了多个字段数据。服务器104进而对社保数据对应的多个字段数据进行向量化,得到多个字段数据对应的特征向量。服务器104根据预设算法计算多个特征向量之间的相似度,提取出相似度达到预设阈值的特征向量。服务器进一步获取预设的数据分析模型,通过数据分析模型对提取的特征向量进行分析,得到对应的分析结果数据,并将分析结果数据推送至对应的终端102。通过对大量的社保数据进行特征提取和筛查,并利用数据分析模型提取出的有价值的特征向量进行分析,由此能够有效地挖掘出社保数据中有价值的信息,进而有效地提高了社保数据的分析效率和准确率。The social security data processing method based on data mining provided in this application can be applied to the application environment as shown in FIG. 1. The terminal 102 communicates with the server 104 through the network through the network. The terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server 104 may be implemented by an independent server or a server cluster composed of multiple servers. The terminal 102 may send a resource acquisition request to the server, and the resource acquisition request includes the request type and request information. After the server 104 receives the resource acquisition request sent by the terminal, it acquires multiple social security data according to the resource acquisition request and the request information carried, and the social security data includes multiple field data. The server 104 further vectorizes the multiple field data corresponding to the social security data to obtain feature vectors corresponding to the multiple field data. The server 104 calculates the similarity between the multiple feature vectors according to the preset algorithm, and extracts the feature vectors whose similarity reaches the preset threshold. The server further obtains a preset data analysis model, analyzes the extracted feature vector through the data analysis model, obtains corresponding analysis result data, and pushes the analysis result data to the corresponding terminal 102. Through feature extraction and screening of a large amount of social security data, and the use of valuable feature vectors extracted by data analysis models for analysis, the valuable information in the social security data can be effectively mined, thereby effectively improving social security The efficiency and accuracy of data analysis.
在其中一个实施例中,如图2所示,提供了一种基于数据挖掘的社保数据处理方法,以该方法应用于图1中的服务器为例进行说明,包括以下步骤:In one of the embodiments, as shown in FIG. 2, a method for processing social insurance data based on data mining is provided. Taking the method applied to the server in FIG. 1 as an example for description, the method includes the following steps:
步骤202,接收终端发送的资源获取请求,资源获取请求包括请求类型和请求信息。Step 202: Receive a resource acquisition request sent by a terminal, where the resource acquisition request includes the request type and request information.
用户可以通过对应的终端输入相关的字段信息,并向服务器发送数据分析请求,资源获取请求可以是获取对社保数据进行分析后的结果数据。资源获取请求中携带了请求类型和请求信息,其中,请求类型可以是获取的资源数据的类型,例如社保类的分析数据。请求信息可以是用户输入的字段信息,例如可以是社保数据的范围、时间区间等字段信息。The user can input the relevant field information through the corresponding terminal and send a data analysis request to the server. The resource acquisition request can be the result data obtained after the analysis of the social security data. The resource acquisition request carries the request type and request information, where the request type may be the type of the acquired resource data, such as social security analysis data. The request information may be field information input by the user, for example, it may be field information such as the range and time interval of social insurance data.
步骤204,根据资源获取请求和请求信息获取多个社保数据,社保数据包括多 个字段数据。Step 204: Acquire multiple social security data according to the resource acquisition request and the request information, and the social security data includes multiple field data.
社保数据可以是社会保险数据,例如可以包括养老保险数据、医疗保险数据、失业保险数据、工伤保险数据以及生育保险数据等。服务器接收到终端发送的资源获取请求后,根据资源获取请求和请求信息从本地数据库或第三方数据库中获取多个社保数据。例如,当请求信息中获取的社保数据的范围为某个企业时,服务器则获取该企业对应的社保数据。社保数据中包括了多个字段数据,例如姓名、性别、年龄、地区、所属企业、缴费时长、缴费金额等字段信息。The social insurance data may be social insurance data, for example, it may include endowment insurance data, medical insurance data, unemployment insurance data, work injury insurance data, maternity insurance data, etc. After the server receives the resource acquisition request sent by the terminal, it acquires multiple social security data from the local database or the third-party database according to the resource acquisition request and request information. For example, when the scope of the social security data obtained in the request information is a certain company, the server obtains the social security data corresponding to the company. The social security data includes multiple field data, such as name, gender, age, region, affiliated company, payment duration, payment amount and other field information.
步骤206,将社保数据输入至向量训练模型中,通过向量训练模型对社保数据对应的多个字段数据进行向量处理,输出多个字段数据对应的特征向量。Step 206: Input the social security data into the vector training model, perform vector processing on the multiple field data corresponding to the social security data through the vector training model, and output feature vectors corresponding to the multiple field data.
服务器获取多个社保数据后,对社保数据对应的多个字段数据进行向量化。具体地,服务器可以获取预设的语料库,并根据社保数据从语料库中获取相关联的语料数据。服务器进一步获取预设的向量训练模型,例如,向量训练模型可以是基于word2 vec的神经网络模型。服务器则将社保数据和获取的相关联的语料数据输入至向量训练模型中,进而通过向量训练模型结合相关联的语料数据对社保数据进行词向量计算和训练,得到社保数据对应的多个词向量,并根据预设算法将词向量转换为对应的特征向量。由此能够得到多个字段数据对应的特征向量。After obtaining multiple social security data, the server vectorizes multiple field data corresponding to the social security data. Specifically, the server may obtain a preset corpus, and obtain associated corpus data from the corpus according to the social security data. The server further obtains a preset vector training model. For example, the vector training model may be a neural network model based on word2vec. The server inputs the social security data and the obtained associated corpus data into the vector training model, and then uses the vector training model to combine the associated corpus data to calculate and train the social security data to obtain multiple word vectors corresponding to the social security data , And convert the word vector into the corresponding feature vector according to the preset algorithm. In this way, feature vectors corresponding to multiple field data can be obtained.
步骤208,提取多个特征向量的维度值,利用预设算法根据维度值计算多个特征向量之间的相似度,提取出相似度达到预设阈值的特征向量。Step 208: Extract the dimensional values of the multiple feature vectors, calculate the similarity between the multiple feature vectors according to the dimensional values using a preset algorithm, and extract the feature vectors whose similarity reaches the preset threshold.
服务器得到多个字段数据对应的特征向量后,根据预设算法计算出多个特征向量之间的相似度。具体地,服务器可以首先根据预设的目标函数计算多个特征向量的多个维度值,维度值可以是每个特征向量对应不同维度的特征值。服务器进一步跟进预设的距离算法和特征向量的维度值计算多个特征向量之间的相似度,进而提取出相似度达到预设阈值的特征向量。After obtaining the feature vectors corresponding to the multiple field data, the server calculates the similarity between the multiple feature vectors according to a preset algorithm. Specifically, the server may first calculate multiple dimension values of multiple feature vectors according to a preset objective function, and the dimension values may be feature values of different dimensions corresponding to each feature vector. The server further follows the preset distance algorithm and the dimension value of the feature vector to calculate the similarity between the multiple feature vectors, and then extracts the feature vector whose similarity reaches the preset threshold.
步骤210,根据请求类型获取预设的数据分析模型,通过数据分析模型对提取的特征向量进行分析,得到多个类型的指标数据和对应的数值。Step 210: Obtain a preset data analysis model according to the request type, and analyze the extracted feature vector through the data analysis model to obtain multiple types of index data and corresponding values.
服务器提取出特征向量后,则进一步根据请求类型获取对应预设的数据分析模型,数据分析模型中可以包括多个不同类型的数据分析模块,例如参保缴费率、缴费基数分析、企业经营状况等多个类型的指标数据分析模块。通过数据分析模型对提取的特征向量进行分析。After the server extracts the feature vector, it further obtains the corresponding preset data analysis model according to the request type. The data analysis model can include multiple different types of data analysis modules, such as insurance payment rate, payment base analysis, business operation status, etc. Multiple types of indicator data analysis modules. Analyze the extracted feature vectors through the data analysis model.
具体地,服务器可以首先用过数据分析模型计算出多个特征向量的分布值和字段饱和度,其中分布值可以是特征向量对应的字段数据的值,字段饱和度可以是特 征向量和字段数据对应多个预设指标数据的值的饱和程度。服务器进一步通过数据分析模型对多个特征向量进行统计筛查,提取达到预设饱和值的特征向量。服务器则根据预设的语义分析算法对提取出的特征向量进行语义分析,得到每个特征向量的权重,即特征向量的重要程度值。服务器进而根据特征向量的分布值、字段饱和度以及权重对多个特征向量进行分析,得到特征向量对应多个类型的指标数据和数值。Specifically, the server may first use the data analysis model to calculate the distribution value and field saturation of multiple feature vectors, where the distribution value may be the value of the field data corresponding to the feature vector, and the field saturation may be the feature vector corresponding to the field data. The degree of saturation of the values of multiple preset index data. The server further performs statistical screening on multiple feature vectors through the data analysis model, and extracts feature vectors that reach a preset saturation value. The server performs semantic analysis on the extracted feature vectors according to a preset semantic analysis algorithm, and obtains the weight of each feature vector, that is, the importance value of the feature vector. The server then analyzes the multiple feature vectors according to the distribution value, field saturation, and weight of the feature vector to obtain multiple types of index data and values corresponding to the feature vector.
步骤212,根据多个类型的指标数据和对应的数值生成分析结果数据,将分析结果数据推送至终端。Step 212: Generate analysis result data according to multiple types of index data and corresponding values, and push the analysis result data to the terminal.
服务器生成分析结果数据后,进而根据每个特征向量对应多个类型的指标数据和对应的数值生成分析结果数据。则将分析结果数据推送至对应的终端。进一步地,服务器还可以将分析结果数据生成预设格式的视图数据,并将生成的视图数据推送至对应的终端,由此可以使得用户能够清楚地了解分析结果数据。After the server generates the analysis result data, it then generates the analysis result data according to multiple types of index data and corresponding values corresponding to each feature vector. Then push the analysis result data to the corresponding terminal. Further, the server can also generate view data in a preset format from the analysis result data, and push the generated view data to the corresponding terminal, so that the user can clearly understand the analysis result data.
例如,当获取到的社保数据为某个企业或某个区域的社保数据时,则对获取的多个社保数据进行挖掘分析,可以有效地分析出参保缴费率、缴费基数分析、企业经营状况等指标数据。通过对大量的社保数据进行特征提取和筛查,并利用数据分析模型提取出的有价值的特征向量进行分析,由此能够有效地挖掘和分析出社保数据中有价值的信息,由此有效地提高了社保数据的分析效率和准确率。For example, when the social security data obtained is social security data of a certain company or a certain area, mining and analyzing the multiple social security data obtained can effectively analyze the insurance payment rate, payment base analysis, and business operation status And other indicator data. Through feature extraction and screening of a large amount of social security data, and the use of valuable feature vectors extracted from data analysis models for analysis, it is possible to effectively mine and analyze the valuable information in social security data, thereby effectively Improve the analysis efficiency and accuracy of social security data.
上述基于数据挖掘的社保数据处理方法中,服务器接收终端发送的资源获取请求后,根据资源获取请求和携带的请求信息获取多个社保数据,社保数据中包括了多个字段数据。服务器进而将社保数据输入至向量训练模型中,通过向量训练模型对社保数据对应的多个字段数据进行向量化,输出多个字段数据对应的特征向量。提取多个特征向量的维度值,利用预设算法根据维度值计算多个特征向量之间的相似度,提取出相似度达到预设阈值的特征向量。服务器进一步获取预设的数据分析模型,通过数据分析模型对提取的特征向量进行分析,得到多个类型的指标数据和对应的数值,根据多个类型的指标数据和对应的数值生成分析结果数据,并将分析结果数据推送至对应的终端。通过对大量的社保数据进行特征提取和筛查,并利用数据分析模型提取出的有价值的特征向量进行分析,由此能够有效地挖掘出社保数据中有价值的信息,进而有效地提高了社保数据的分析效率和准确率。In the aforementioned data mining-based social security data processing method, after the server receives the resource acquisition request sent by the terminal, it acquires multiple social security data according to the resource acquisition request and the carried request information, and the social security data includes multiple field data. The server then inputs the social security data into the vector training model, and vectorizes multiple field data corresponding to the social security data through the vector training model, and outputs feature vectors corresponding to the multiple field data. The dimension values of multiple feature vectors are extracted, and the similarity between the multiple feature vectors is calculated according to the dimension value using a preset algorithm, and the feature vectors whose similarity reaches the preset threshold are extracted. The server further obtains the preset data analysis model, analyzes the extracted feature vector through the data analysis model, obtains multiple types of indicator data and corresponding values, and generates analysis result data based on multiple types of indicator data and corresponding values. And push the analysis result data to the corresponding terminal. Through feature extraction and screening of a large amount of social security data, and the use of valuable feature vectors extracted by data analysis models for analysis, the valuable information in the social security data can be effectively mined, thereby effectively improving social security The efficiency and accuracy of data analysis.
在其中一个实施例中,如图3所示,通过向量训练模型对社保数据对应的多个字段数据进行向量处理的步骤,具体包括以下内容:In one of the embodiments, as shown in FIG. 3, the steps of performing vector processing on multiple field data corresponding to social insurance data through the vector training model specifically include the following:
步骤302,获取预设的语料库,根据社保数据从所述语料库中获取相关联的语 料数据。Step 302: Obtain a preset corpus, and obtain associated corpus data from the corpus according to the social security data.
终端可以向服务器发送资源获取请求,资源获取请求中携带了请求类型和请求信息。服务器接收终端发送的资源获取请求请求后,根据资源获取请求和请求信息从本地数据库或第三方数据库中获取对应的多个社保数据,社保数据中包括多个字段数据。The terminal may send a resource acquisition request to the server, and the resource acquisition request carries the request type and request information. After receiving the resource acquisition request request sent by the terminal, the server acquires multiple corresponding social security data from the local database or a third-party database according to the resource acquisition request and the request information, and the social security data includes multiple field data.
服务器获取多个社保数据后,进而获取预设的语料库。语料库可以是预先设置的包括多种与社保相关的词汇或语句的语料库。After the server obtains multiple social security data, it then obtains a preset corpus. The corpus can be a pre-set corpus that includes a variety of words or sentences related to social insurance.
步骤304,获取预设的向量训练模型,通过向量训练模型对社保数据和语料数据进行词向量计算和训练,得到对应的多个词向量。Step 304: Obtain a preset vector training model, and perform word vector calculation and training on the social security data and corpus data through the vector training model to obtain multiple corresponding word vectors.
步骤306,根据预设算法将多个词向量转换为对应的特征向量。Step 306: Convert multiple word vectors into corresponding feature vectors according to a preset algorithm.
服务器进一步获取预设的向量训练模型,将社保数据和语料数据输入至向量训练模型中,例如,向量训练模型可以是基于word2 vec的神经网络模型。通过向量训练模型对社保数据和预料数据进行计算和训练,得到多个社保数据对应的词向量。例如,通过词向量的训练,每个字都可以训练得到n维空间中的一个向量,如当n取2维时,则“身”对应向量是[0.5365654,0.726268],对应的“份”对应的词向量可能是[0.52222458,0.7511456],这两个向量的cos值也就是余弦距离非常近,对应到语义空间中的距离非常近,则表示“身份”是成词的。如果n取100,则每个字转化成100维度的向量。通过词向量模型对社保数据进行向量化,能够准确有效地提取出社保数据中的词向量。The server further obtains a preset vector training model, and inputs social security data and corpus data into the vector training model. For example, the vector training model may be a neural network model based on word2vec. The vector training model is used to calculate and train social security data and expected data, and obtain word vectors corresponding to multiple social security data. For example, through word vector training, each word can be trained to obtain a vector in n-dimensional space. For example, when n takes 2 dimensions, the corresponding vector of "body" is [0.5365654,0.726268], and the corresponding "part" corresponds The word vector of may be [0.52222458,0.7511456]. The cos value of these two vectors is very close, and the distance corresponding to the semantic space is very close, which means that "identity" is a word. If n takes 100, each word is transformed into a 100-dimensional vector. The word vector model is used to vectorize the social security data, which can accurately and effectively extract the word vector in the social security data.
服务器提取出社保数据中的词向量后,则进一步根据预设算法将词向量转换为对应的特征向量。例如,可以利用预设的向量表示法将词向量转换为对应的特征向量。由此能够有效地提取出社保数据对应的特征向量。After the server extracts the word vector in the social security data, it further converts the word vector into a corresponding feature vector according to a preset algorithm. For example, a preset vector representation method can be used to convert a word vector into a corresponding feature vector. This can effectively extract the feature vector corresponding to the social security data.
在其中一个实施例中,利用预设算法根据维度值计算多个特征向量之间的相似度,提取出相似度达到预设阈值的特征向量步骤,包括:根据预设的目标函数计算多个特征向量的多个维度值;根据预设的距离算法和维度值计算多个特征向量之间的相似度;提取出相似度达到预设阈值的特征向量。In one of the embodiments, using a preset algorithm to calculate the similarity between multiple feature vectors according to the dimension value, and extracting the feature vector whose similarity reaches the preset threshold includes: calculating multiple features according to the preset objective function Multiple dimension values of the vector; calculate the similarity between multiple feature vectors according to the preset distance algorithm and dimension value; extract the feature vectors whose similarity reaches the preset threshold.
终端可以向服务器发送资源获取请求,资源获取请求中携带了请求类型和请求信息。服务器接收终端发送的资源获取请求请求后,根据资源获取请求和请求信息从本地数据库或第三方数据库中获取对应的多个社保数据,社保数据中包括多个字段数据。The terminal may send a resource acquisition request to the server, and the resource acquisition request carries the request type and request information. After receiving the resource acquisition request request sent by the terminal, the server acquires multiple corresponding social security data from the local database or a third-party database according to the resource acquisition request and the request information, and the social security data includes multiple field data.
服务器对社保数据对应的多个字段数据进行向量化,由此得到多个字段数据对 应的特征向量。服务器则进一步根据预设算法计算多个特征向量之间的相关性。具体地,服务器可以根据预设的目标函数计算多个特征向量的多个维度值,并根据预设的距离算法和维度值计算多个特征向量之间的相似度,进而提取出相似度达到预设阈值的特征向量。例如,预设的距离算法可以为欧式距离算法。The server vectorizes the multiple field data corresponding to the social security data, thereby obtaining the feature vector corresponding to the multiple field data. The server further calculates the correlation between the multiple feature vectors according to a preset algorithm. Specifically, the server may calculate the multiple dimension values of multiple feature vectors according to a preset objective function, and calculate the similarity between the multiple feature vectors according to the preset distance algorithm and dimension value, and then extract the similarity to reach the expected value. Set the threshold feature vector. For example, the preset distance algorithm may be the Euclidean distance algorithm.
例如,欧式距离函数的计算公式可以如下:For example, the calculation formula of the Euclidean distance function can be as follows:
Figure PCTCN2019116126-appb-000001
Figure PCTCN2019116126-appb-000001
目标函数的表达式可以为:The expression of the objective function can be:
B k=arg min(P(A i,B i)) B k =arg min(P(A i ,B i ))
通过目标函数使得P(A i,B i)的值最小。对Max、Min、Mean三个维度的值进行抽取,Max就是抽取向量同一个维度的最大值,例如0.5>0.2>0.1(第一维度),0.7>0.5>0.2(第二维度),则Max对应的就是[0.5,0.7],同样的Min对应的就是[0.1,0.2],Mean均值对应的就是[0.8/3,1.4/3],然后再将这三个向量横向连接起来,通过3*n维度的向量来表示。例如当n取2时,身表示为[0.5,0.2],份表示为[0.1,0.7],证表示为[0.2,0.5]。因此通过Max、Min、Mean三个维度的抽取,Max就是抽取向量同一个维度的最大值,如0.5>0.2>0.1(第一维度),0.7>0.5>0.2(第二维度),则Max对应的就是[0.5,0.7],同样的Min对应的就是[0.1,0.2],mean均值对应的就是[0.8/3,1.4/3],然后再将这三个向量横向连接起来,因此短文本“身份证”就可以用[0.5,0.7,0.1,0.2,0.8/3,1.4/3]这一个6个维度的向量来表示。同样的,如果短文本是“保险说明”,同样也可以用一个6个维度的向量来表示。因此无论是长度为多少的短文本,都可以通过3*n维度的向量来表示。文本之间的相似度,则可以通过对多个文本的多个维度对应的向量的欧式距离进行计算,就可以得到文本的相似度结果。 The objective function is used to minimize the value of P(A i , B i ). Extract the values of the three dimensions of Max, Min, and Mean. Max is the maximum value of the same dimension of the extracted vector. For example, 0.5>0.2>0.1 (first dimension), 0.7>0.5>0.2 (second dimension), then Max Corresponds to [0.5,0.7], the same Min corresponds to [0.1,0.2], Mean mean corresponds to [0.8/3,1.4/3], and then connects these three vectors horizontally through 3* It is represented by an n-dimensional vector. For example, when n is 2, the body is represented as [0.5,0.2], the part is represented as [0.1,0.7], and the certificate is represented as [0.2,0.5]. Therefore, through the extraction of the three dimensions of Max, Min, and Mean, Max is the maximum value of the same dimension of the extracted vector, such as 0.5>0.2>0.1 (first dimension), 0.7>0.5>0.2 (second dimension), then Max corresponds to Is [0.5,0.7], the same Min corresponds to [0.1,0.2], mean mean corresponds to [0.8/3,1.4/3], and then these three vectors are connected horizontally, so the short text " “ID” can be represented by a 6-dimensional vector [0.5, 0.7, 0.1, 0.2, 0.8/3, 1.4/3]. Similarly, if the short text is "insurance statement", it can also be represented by a 6-dimensional vector. Therefore, no matter how long the short text is, it can be represented by a 3*n dimension vector. The similarity between the texts can be calculated by calculating the Euclidean distance of the vectors corresponding to the multiple dimensions of the multiple texts, and the text similarity results can be obtained.
服务器计算出多个特征向量之间的相似度后,则进一步提取出相似度达到预设阈值的特征向量。通过预设的目标函数和距离算法计算多个特征向量之间的相似度,进而提取出相似度达到预设阈值的特征向量,由此能够有效地提取对社保数据进行特征提取。After the server calculates the similarity between the multiple feature vectors, it further extracts the feature vectors whose similarity reaches a preset threshold. The similarity between multiple feature vectors is calculated through the preset objective function and distance algorithm, and then the feature vectors whose similarity reaches the preset threshold are extracted, which can effectively extract the feature extraction of social security data.
在其中一个实施例中,根据请求类型获取预设的数据分析模型之前,还包括:获取多个样本社保数据,样本社保数据包括多个字段数据;对样本社保数据进向量化处理,得到多个字段数据对应的特征向量;对多个特征向量进行聚类,根据聚类结果计算特征向量之间的相关性以及每个特征向量的权重,提取出满足条件阈值的特征向量;利用提取出的特征向量和对应的权重按照预设算法构建数据分析模型。In one of the embodiments, before obtaining the preset data analysis model according to the request type, it further includes: obtaining a plurality of sample social insurance data, the sample social insurance data includes a plurality of field data; the sample social insurance data is vectorized to obtain multiple The feature vector corresponding to the field data; cluster multiple feature vectors, calculate the correlation between feature vectors and the weight of each feature vector according to the clustering results, and extract feature vectors that meet the conditional threshold; use the extracted features The vector and the corresponding weight construct a data analysis model according to a preset algorithm.
服务器获取预设的数据分析模型之前,还需要对构建出数据分析模型。具体地, 服务器可以预先获取大量的样本社保数据,服务器首先对样本社保数据进向量处理,从而可以得到样本社保数据中多个字段数据对应的特征向量。服务器对样本社保数据进向量化后,对社保数据进行特征提取。具体地,服务器可以通过预设的聚类算法对多个特征向量进行聚类分析,计算出特征向量之间的相关性以及每个特征向量的权重,进而提取出满足预设条件阈值的特征向量。服务器进而根据提取出的特征向量和对应的权值按照预设算法构建数据分析模型。其中,数据分析模型中可以包括多个不同类型的数据分析模块,例如参保缴费率、缴费基数分析、企业经营状况等多个类型的指标数据分析模块。通过对大陆的社保数据进行分析和特征提取,并利用提取出的有价值的特征向量构建数据分析模型,由此能够有效地提高数据分析模型的准确率。Before the server obtains the preset data analysis model, it also needs to construct a data analysis model. Specifically, the server can obtain a large amount of sample social security data in advance, and the server first performs vector processing on the sample social security data, so that feature vectors corresponding to multiple field data in the sample social security data can be obtained. After the server vectorizes the sample social security data, it performs feature extraction on the social security data. Specifically, the server can perform cluster analysis on multiple feature vectors through a preset clustering algorithm, calculate the correlation between feature vectors and the weight of each feature vector, and then extract feature vectors that meet the preset condition threshold. . The server then constructs a data analysis model according to a preset algorithm according to the extracted feature vector and the corresponding weight. Among them, the data analysis model may include multiple different types of data analysis modules, such as insurance premium payment rate, payment base analysis, business operation status and other types of indicator data analysis modules. Through the analysis and feature extraction of mainland social security data, and the use of the extracted valuable feature vectors to construct a data analysis model, the accuracy of the data analysis model can be effectively improved.
在其中一个实施例中,如图4所示,通过数据分析模型对提取的特征向量进行分析的步骤,具体包括以下内容:In one of the embodiments, as shown in FIG. 4, the step of analyzing the extracted feature vector through the data analysis model specifically includes the following content:
步骤402,通过数据分析模型计算出多个特征向量的分布值和字段饱和度。Step 402: Calculate the distribution values and field saturations of multiple feature vectors through the data analysis model.
步骤404,对多个特征向量进行特征字段筛查,提取达到预设饱和值的特征向量。Step 404: Perform feature field screening on multiple feature vectors, and extract feature vectors that reach a preset saturation value.
终端可以向服务器发送资源获取请求,资源获取请求中携带了请求类型和请求信息。服务器接收终端发送的资源获取请求请求后,根据资源获取请求和请求信息从本地数据库或第三方数据库中获取对应的多个社保数据,社保数据中包括多个字段数据。The terminal may send a resource acquisition request to the server, and the resource acquisition request carries the request type and request information. After receiving the resource acquisition request request sent by the terminal, the server acquires multiple corresponding social security data from the local database or a third-party database according to the resource acquisition request and the request information, and the social security data includes multiple field data.
服务器对社保数据对应的多个字段数据进行向量化,由此得到多个字段数据对应的特征向量。服务器进而根据预设算法计算多个特征向量之间的相似度,提取出相似度达到预设阈值的特征向量。The server vectorizes the multiple field data corresponding to the social security data, thereby obtaining feature vectors corresponding to the multiple field data. The server further calculates the similarity between the multiple feature vectors according to the preset algorithm, and extracts the feature vectors whose similarity reaches the preset threshold.
服务器对社保数据进行特征提取,提取出对应的特征向量后,则进一步根据资源获取请求中的请求类型获取预设的数据分析模型,通过数据分析模型对提取的特征向量进行分析。具体地,服务器获取预设的数据分析模型后,将提取出的字段数据对应的特征向量输入至数据分析模型中,通过数据分析模型计算出字段数据的分布值和字段饱和度,并对字段数据进行特征字段筛查,例如可以是对字段数据进行统计筛查,以提取达到预设饱和值的特征向量。分布值可以是特征向量对应的字段数据的值。The server performs feature extraction on the social security data, and after extracting the corresponding feature vector, it further obtains a preset data analysis model according to the request type in the resource acquisition request, and analyzes the extracted feature vector through the data analysis model. Specifically, after the server obtains the preset data analysis model, it inputs the feature vector corresponding to the extracted field data into the data analysis model, calculates the distribution value and field saturation of the field data through the data analysis model, and compares the field data Performing feature field screening, for example, may be performing statistical screening on field data to extract feature vectors that reach a preset saturation value. The distribution value may be the value of the field data corresponding to the feature vector.
例如,当某一个字段是年龄时,字段数据的分布值可以是10-20、20-30、30-40等每一个年龄段的人数分布。字段饱和度可以是特征向量和字段数据对应多个预设 指标数据的值的饱和程度,例如输入数据可能会存在一些不饱和的情况,如说有一些字段是空的,则字段数据的字段饱和度就比较低。因此,服务器需要对字段数据对应的特征向量进行统计探查进行二次字段筛选。For example, when a certain field is age, the distribution value of the field data can be the distribution of the number of people in each age group such as 10-20, 20-30, 30-40. Field saturation can be the saturation degree of the value of multiple preset index data corresponding to the feature vector and field data. For example, the input data may have some unsaturation. If some fields are empty, the field data of the field is saturated The degree is relatively low. Therefore, the server needs to perform statistical exploration on the feature vector corresponding to the field data to perform secondary field screening.
步骤406,根据预设的语义分析算法,对提取出的特征向量进行分析,得到特征向量的权重。Step 406: Analyze the extracted feature vector according to the preset semantic analysis algorithm to obtain the weight of the feature vector.
步骤408,根据特征向量的分布值和字段饱和度以及权重进行分析,得到特征向量对应多个类型的指标数据和对应的数值。Step 408: Perform analysis according to the distribution value of the feature vector and the field saturation and weight to obtain multiple types of index data and corresponding values corresponding to the feature vector.
步骤410,根据多个类型的指标数据和对应的数值生成分析结果数据。Step 410: Generate analysis result data according to multiple types of index data and corresponding values.
服务器对多个特征向量进行统计筛查,提取达到预设饱和值的特征向量后,进一步根据预设的语义分析算法,对提取出的字段数据进行分析,得到字段数据对应的权重,即重要程度值。The server performs statistical screening on multiple feature vectors, and after extracting the feature vectors that reach the preset saturation value, it further analyzes the extracted field data according to the preset semantic analysis algorithm to obtain the weight corresponding to the field data, that is, the degree of importance value.
服务器则根据字段数据的分布值和字段饱和度以及重要程度值进行分析,得到多个类型的指标数据和对应的数值,并根据多个类型的指标数据和对应的数值生成对应的分析结果数据。通过数据分析模型对提取的字段数据进行分析,由此有效地分析出社保数据对应的分析结果数据。The server analyzes the distribution value, field saturation, and importance value of the field data to obtain multiple types of indicator data and corresponding values, and generates corresponding analysis result data based on the multiple types of indicator data and corresponding values. Analyze the extracted field data through the data analysis model, thereby effectively analyzing the analysis result data corresponding to the social insurance data.
例如,语义分析可以是基于用户输入的字段与真实字段之间的匹配关系,请求信息中即包括用户输入的字段。如基于社保大数据上千维度的字段,包括脱敏身份证号、身高、体重、社保脱敏账号、社保属性等等,而用户可能只对自己特定的几个字段感兴趣。因此,用户只用输入感兴趣的字段,服务器则通过对提取出的社保数据对应的特征向量进语义分析,分析出数据集中与用户输入的感兴趣字段相关的字段信息,并计算出特征向量对应的权重,进而获取相关联的字段信息。如果用户输入了一个较为模糊的感兴趣字段时,如“赔付”,而“赔付”中包含了年赔付次数、赔付金额、赔付原因等信息。For example, semantic analysis may be based on the matching relationship between the fields input by the user and the real fields, and the requested information includes the fields input by the user. For example, fields based on thousands of dimensions of social security big data, including desensitized ID number, height, weight, social security desensitized account number, social security attributes, etc., and users may only be interested in a few specific fields. Therefore, the user only needs to enter the field of interest, and the server analyzes the feature vector corresponding to the extracted social insurance data to analyze the field information related to the field of interest entered by the user in the data set, and calculates the feature vector corresponding The weight of, and then get the associated field information. If the user enters a relatively vague field of interest, such as "Payment", the "Payment" contains information such as the number of annual claims, the amount of compensation, and the reason for the compensation.
数据分析模型中可以包括多个不同类型的数据分析模块,例如参保缴费率、缴费基数分析、企业经营状况等多个类型的指标数据分析模块。服务器进而根据特征向量的分布值、字段饱和度以及权重对多个特征向量进行分析,得到特征向量对应多个类型的指标数据和数值。服务器进而根据每个特征向量对应多个类型的指标数据和对应的数值生成分析结果数据。服务器生成分析结果数据后,则将分析结果数据推送至对应的终端。通过对大量的社保数据进行特征提取和筛查,并利用数据分析模型提取出的有价值的特征向量进行分析,由此能够有效地挖掘和分析出社保数据中有价值的信息,由此有效地提高了社保数据的分析效率和准确率。The data analysis model can include multiple different types of data analysis modules, such as insurance payment rate, payment base analysis, business operation status and other types of indicator data analysis modules. The server then analyzes the multiple feature vectors according to the distribution value, field saturation, and weight of the feature vector to obtain multiple types of index data and values corresponding to the feature vector. The server further generates analysis result data according to multiple types of index data and corresponding numerical values corresponding to each feature vector. After the server generates the analysis result data, it pushes the analysis result data to the corresponding terminal. Through feature extraction and screening of a large amount of social security data, and the use of valuable feature vectors extracted by data analysis models for analysis, it is possible to effectively mine and analyze valuable information in social security data, thereby effectively Improve the analysis efficiency and accuracy of social security data.
在其中一个实施例中,分析结果数据中包括多个类型的指标数据和对应的数值,该方法还包括:根据指标数据和对应的数值生成对应的指标分析数据;将指标分析数据按照预设方式生成对应的分析视图数据;对分析视图数据添加事件类型标识和对应的接口调用参数;将分析视图数据推送至终端。In one of the embodiments, the analysis result data includes multiple types of index data and corresponding values, and the method further includes: generating corresponding index analysis data according to the index data and the corresponding values; and placing the index analysis data in a preset manner Generate corresponding analysis view data; add event type identification and corresponding interface call parameters to the analysis view data; push the analysis view data to the terminal.
服务器接收终端发送的资源获取请求请求后,根据资源获取请求和请求信息从本地数据库或第三方数据库中获取对应的多个社保数据,社保数据中包括多个字段数据。服务器则对社保数据对应的多个字段数据进行向量化,由此得到多个字段数据对应的特征向量。服务器进而根据预设算法计算多个特征向量之间的相似度,提取出相似度达到预设阈值的特征向量。After receiving the resource acquisition request request sent by the terminal, the server acquires multiple corresponding social security data from the local database or a third-party database according to the resource acquisition request and the request information, and the social security data includes multiple field data. The server vectorizes the multiple field data corresponding to the social security data, thereby obtaining feature vectors corresponding to the multiple field data. The server further calculates the similarity between the multiple feature vectors according to the preset algorithm, and extracts the feature vectors whose similarity reaches the preset threshold.
服务器对社保数据进行特征提取,提取出对应的特征向量后,则进一步根据资源获取请求中的请求类型获取预设的数据分析模型,通过数据分析模型对提取的特征向量进行分析。数据分析模型中可以包括多个不同类型的数据分析模块,例如参保缴费率、缴费基数分析、企业经营状况等多个类型的指标数据分析模块。服务器进而根据特征向量的分布值、字段饱和度以及权重对多个特征向量进行分析,得到特征向量对应多个类型的指标数据和数值。服务器进而根据每个特征向量对应多个类型的指标数据和对应的数值生成分析结果数据。The server performs feature extraction on the social security data, and after extracting the corresponding feature vector, it further obtains a preset data analysis model according to the request type in the resource acquisition request, and analyzes the extracted feature vector through the data analysis model. The data analysis model can include multiple different types of data analysis modules, such as insurance payment rate, payment base analysis, business operation status and other types of indicator data analysis modules. The server then analyzes the multiple feature vectors according to the distribution value, field saturation, and weight of the feature vector to obtain multiple types of index data and values corresponding to the feature vector. The server further generates analysis result data according to multiple types of index data and corresponding numerical values corresponding to each feature vector.
服务器通过对社保数据进行挖掘分析得到对应的分析结果数据后,分析结果数据中包括多个类型的指标数据和对应的数值。服务器还可以进一步根据指标数据类型将分析结果数据生成对应的多个指标类型的指标分析数据。服务器还可以将多个指标类型的模块数据按照预设方式分别生成对应可视化的分析视图数据。具体地,服务器可以根据请求类型获取预设的集成函数,根据分析结果数据中的多个预设时序参数和对应的预测值通过集成函数集成对应的视图资源数据,并对视图资源数据添加事件类型标识和对应的接口调用参数。例如,预设的集成函数可以为python可视化函数,可以利用直方图可视化函数、分布密度、热度图等可视化函数嵌入集成对应的视图数据,通过嵌套函数能够绘制出对应的可视化图像。After the server obtains the corresponding analysis result data by mining and analyzing the social security data, the analysis result data includes multiple types of index data and corresponding values. The server may further generate index analysis data corresponding to multiple index types from the analysis result data according to the index data type. The server may also generate corresponding visual analysis view data according to a preset method of module data of multiple indicator types. Specifically, the server can obtain a preset integration function according to the request type, integrate corresponding view resource data through the integration function according to multiple preset timing parameters and corresponding predicted values in the analysis result data, and add event types to the view resource data Identification and corresponding interface call parameters. For example, the preset integration function can be a python visualization function, and visualization functions such as histogram visualization function, distribution density, heat map, etc. can be used to embed and integrate corresponding view data, and the corresponding visualization image can be drawn through nested functions.
服务器根据分析结果数据中的多个类型的指标数据和对应的数值通过集成函数集成对应的分析视图数据后,进一步对分析视图数据添加事件类型标识和对应的接口调用参数,并集成对应的类进行存储。以利于服务器或终端对生成的分析视图数据进行调用,由此使得服务器或终端再次获取相关联的社保分析数据或分析视图数据时,可以直接根据事件类型标识和对应的接口调用参数调用挖掘分析出的数据,进而提高了对社保数据的分析效率和利用价值。After the server integrates the corresponding analysis view data through the integration function based on multiple types of indicator data and corresponding values in the analysis result data, it further adds event type identification and corresponding interface call parameters to the analysis view data, and integrates the corresponding class to perform storage. In order to facilitate the server or terminal to call the generated analysis view data, so that when the server or terminal obtains the associated social security analysis data or analysis view data again, it can directly call the mining analysis based on the event type identification and the corresponding interface call parameters Data, which in turn improves the analysis efficiency and utilization value of social security data.
服务器生成对应的分析视图数据后,则将分析视图数据发送至对应的终端,以使得对应的终端可以有效地根据挖掘出的社保数据结合对应的业务进行进一步分析,由此能够有效地对挖掘分析后的分析数据进行利用,从而有效提高了社保数据的挖掘效率和分析效率。After the server generates the corresponding analysis view data, it sends the analysis view data to the corresponding terminal, so that the corresponding terminal can effectively perform further analysis based on the mined social insurance data combined with the corresponding business, thereby effectively mining and analyzing The later analysis data is used, thereby effectively improving the mining efficiency and analysis efficiency of social security data.
应该理解的是,虽然图2-4的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-4中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flowcharts of FIGS. 2-4 are displayed in sequence as indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least some of the steps in Figures 2-4 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
在其中一个实施例中,如图5所示,提供了一种基于数据挖掘的社保数据处理装置,包括:请求接收模块502、数据获取模块504、特征提取模块506、数据分析模块508和数据推送模块510,其中:In one of the embodiments, as shown in FIG. 5, a social security data processing device based on data mining is provided, including: a request receiving module 502, a data acquisition module 504, a feature extraction module 506, a data analysis module 508, and data push Module 510, where:
请求接收模块502,用于接收终端发送的资源获取请求,资源获取请求包括请求类型和请求信息;The request receiving module 502 is configured to receive a resource acquisition request sent by the terminal, and the resource acquisition request includes the request type and request information;
数据获取模块504,用于根据资源获取请求和请求信息获取多个社保数据,社保数据包括多个字段数据;The data acquisition module 504 is configured to acquire multiple social security data according to the resource acquisition request and the request information, and the social security data includes multiple field data;
特征提取模块506,用于将社保数据输入至向量训练模型中,通过向量训练模型对社保数据对应的多个字段数据进行向量处理,输出多个字段数据对应的特征向量;提取多个特征向量的维度值,利用预设算法根据维度值计算多个特征向量之间的相似度,提取出相似度达到预设阈值的特征向量;The feature extraction module 506 is used to input social security data into the vector training model, and perform vector processing on multiple field data corresponding to the social security data through the vector training model, and output feature vectors corresponding to multiple field data; extract multiple feature vectors Dimension value, using a preset algorithm to calculate the similarity between multiple feature vectors according to the dimensional value, and extract the feature vector whose similarity reaches the preset threshold;
数据分析模块508,用于根据请求类型获取预设的数据分析模型,通过数据分析模型对提取的特征向量进行分析,得到多个类型的指标数据和对应的数值;The data analysis module 508 is configured to obtain a preset data analysis model according to the request type, analyze the extracted feature vectors through the data analysis model, and obtain multiple types of indicator data and corresponding values;
数据推送模块510,用于根据多个类型的指标数据和对应的数值生成分析结果数据,将分析结果数据推送至终端。The data push module 510 is configured to generate analysis result data according to multiple types of index data and corresponding values, and push the analysis result data to the terminal.
在其中一个实施例中,特征提取模块506还用于获取预设的语料库,根据社保数据从语料库中获取相关联的语料数据;获取预设的向量训练模型,通过向量训练模型对社保数据和语料数据进行词向量计算和训练,得到对应的多个词向量;根据预设算法将词向量转换为对应的特征向量。In one of the embodiments, the feature extraction module 506 is also used to obtain a preset corpus, obtain related corpus data from the corpus according to the social security data; obtain a preset vector training model, and use the vector training model to compare social security data and corpus The data is calculated and trained on word vectors to obtain multiple corresponding word vectors; the word vectors are converted into corresponding feature vectors according to a preset algorithm.
在其中一个实施例中,特征提取模块506还用于根据预设的目标函数计算多个 特征向量的多个维度值;根据预设的距离算法和维度值计算多个特征向量之间的相似度;提取出相似度达到预设阈值的特征向量。In one of the embodiments, the feature extraction module 506 is further configured to calculate multiple dimension values of multiple feature vectors according to a preset objective function; calculate the similarity between multiple feature vectors according to a preset distance algorithm and dimension value ; Extract the feature vector whose similarity reaches the preset threshold.
在其中一个实施例中,该装置还包括模型建立模块,用于获取多个样本社保数据,样本社保数据包括多个字段数据;对样本社保数据进向量化处理,得到多个字段数据对应的特征向量;对多个特征向量进行聚类,根据聚类结果计算特征向量之间的相关性以及每个特征向量的权重,提取出满足条件阈值的特征向量;及利用提取出的特征向量和对应的权重按照预设算法构建数据分析模型。In one of the embodiments, the device further includes a model building module for acquiring a plurality of sample social insurance data, the sample social insurance data includes a plurality of field data; the sample social insurance data is vectorized to obtain the characteristics corresponding to the plurality of field data Vector; cluster multiple feature vectors, calculate the correlation between feature vectors and the weight of each feature vector according to the clustering results, and extract feature vectors that meet the conditional threshold; and use the extracted feature vectors and corresponding The weight constructs a data analysis model according to a preset algorithm.
在其中一个实施例中,数据分析模块508还用于通过数据分析模型计算出多个特征向量的分布值和字段饱和度;对多个特征向量进行特征字段筛查,提取达到预设饱和值的特征向量;根据预设的语义分析算法,对提取出的特征向量进行语义分析,得到特征向量的权重;根据特征向量的分布值和字段饱和度以及权重进行分析,得到特征向量对应多个类型的指标数据和对应的数值;根据多个类型的指标数据和对应的数值生成分析结果数据。In one of the embodiments, the data analysis module 508 is also used to calculate the distribution value and field saturation of multiple feature vectors through the data analysis model; perform feature field screening on multiple feature vectors, and extract those that reach the preset saturation value. Feature vector; according to the preset semantic analysis algorithm, perform semantic analysis on the extracted feature vector to obtain the weight of the feature vector; analyze according to the distribution value of the feature vector, field saturation and weight, and obtain the feature vector corresponding to multiple types Index data and corresponding values; generate analysis result data based on multiple types of index data and corresponding values.
在其中一个实施例中,分析结果数据中包括多个类型的指标数据和对应的数值,该装置还包括视图数据生成模块,用于根据指标数据和对应的数值生成对应的指标分析数据;将指标分析数据集成函数集成社保数据生成对应的分析视图数据;对分析视图数据添加事件类型标识和对应的接口调用参数,接口调用参数用于根据事件类型标识调用所生成的分析视图数据;将分析视图数据推送至终端。In one of the embodiments, the analysis result data includes multiple types of index data and corresponding numerical values. The device further includes a view data generating module for generating corresponding index analysis data according to the index data and the corresponding numerical values; The analysis data integration function integrates social insurance data to generate corresponding analysis view data; adds event type identification and corresponding interface call parameters to the analysis view data. The interface call parameters are used to call the generated analysis view data according to the event type identification; the analysis view data Push to the terminal.
关于基于数据挖掘的社保数据处理装置的具体限定可以参见上文中对于基于数据挖掘的社保数据处理方法的限定,在此不再赘述。上述基于数据挖掘的社保数据处理装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。Regarding the specific limitation of the social security data processing device based on data mining, please refer to the above limitation on the social security data processing method based on data mining, which will not be repeated here. The various modules in the above-mentioned data mining-based social security data processing device can be implemented in whole or in part by software, hardware, and combinations thereof. The foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图6所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储社保数据、语料库和分析结果数据等数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现本申请任意一个实施例中提 供的基于数据挖掘的社保数据处理方法的步骤。In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 6. The computer equipment includes a processor, a memory, a network interface and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium. The database of the computer equipment is used to store data such as social security data, corpus and analysis result data. The network interface of the computer device is used to communicate with an external terminal through a network connection. When the computer-readable instructions are executed by the processor, the steps of the data mining-based social security data processing method provided in any embodiment of the present application are realized.
本领域技术人员可以理解,图6中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 6 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
一种计算机设备,包括存储器和一个或多个处理器,存储器中储存有计算机可读指令,计算机可读指令被处理器执行时,使得一个或多个处理器执行以下步骤:A computer device includes a memory and one or more processors. The memory stores computer readable instructions. When the computer readable instructions are executed by the processor, the one or more processors execute the following steps:
接收终端发送的资源获取请求,资源获取请求包括请求类型和请求信息;Receive the resource acquisition request sent by the terminal, the resource acquisition request includes the request type and request information;
根据资源获取请求和请求信息获取多个社保数据,社保数据包括多个字段数据;Obtain multiple social security data according to the resource acquisition request and request information, and the social security data includes multiple field data;
将社保数据输入至向量训练模型中,通过向量训练模型对社保数据对应的多个字段数据进行向量处理,输出多个字段数据对应的特征向量;Input social security data into the vector training model, and perform vector processing on multiple field data corresponding to the social security data through the vector training model, and output feature vectors corresponding to multiple field data;
提取多个特征向量的维度值,利用预设算法根据维度值计算多个特征向量之间的相似度,提取出相似度达到预设阈值的特征向量;Extract the dimension values of multiple feature vectors, use a preset algorithm to calculate the similarity between multiple feature vectors according to the dimension value, and extract the feature vectors whose similarity reaches the preset threshold;
根据请求类型获取预设的数据分析模型,通过数据分析模型对提取的特征向量进行分析,得到多个类型的指标数据和对应的数值;及Obtain a preset data analysis model according to the request type, analyze the extracted feature vector through the data analysis model, and obtain multiple types of indicator data and corresponding values; and
根据多个类型的指标数据和对应的数值生成分析结果数据,将分析结果数据推送至终端。Generate analysis result data according to multiple types of indicator data and corresponding values, and push the analysis result data to the terminal.
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:One or more non-volatile computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:
接收终端发送的资源获取请求,资源获取请求包括请求类型和请求信息;Receive the resource acquisition request sent by the terminal, the resource acquisition request includes the request type and request information;
根据资源获取请求和请求信息获取多个社保数据,社保数据包括多个字段数据;Obtain multiple social security data according to the resource acquisition request and request information, and the social security data includes multiple field data;
将社保数据输入至向量训练模型中,通过向量训练模型对社保数据对应的多个字段数据进行向量处理,输出多个字段数据对应的特征向量;Input social security data into the vector training model, and perform vector processing on multiple field data corresponding to the social security data through the vector training model, and output feature vectors corresponding to multiple field data;
提取多个特征向量的维度值,利用预设算法根据维度值计算多个特征向量之间的相似度,提取出相似度达到预设阈值的特征向量;Extract the dimension values of multiple feature vectors, use a preset algorithm to calculate the similarity between multiple feature vectors according to the dimension value, and extract the feature vectors whose similarity reaches the preset threshold;
根据请求类型获取预设的数据分析模型,通过数据分析模型对提取的特征向量进行分析,得到多个类型的指标数据和对应的数值;及Obtain a preset data analysis model according to the request type, analyze the extracted feature vector through the data analysis model, and obtain multiple types of indicator data and corresponding values; and
根据多个类型的指标数据和对应的数值生成分析结果数据,将分析结果数据推送至终端。Generate analysis result data according to multiple types of indicator data and corresponding values, and push the analysis result data to the terminal.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through computer-readable instructions, which can be stored in a non-volatile computer. In a readable storage medium, when the computer-readable instructions are executed, they may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction between the combinations of these technical features, they should It is considered as the range described in this specification.
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation manners of the present application, and the description is relatively specific and detailed, but it should not be understood as a limitation on the scope of the invention patent. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims (20)

  1. 一种基于数据挖掘的社保数据处理方法,所述方法包括:A social security data processing method based on data mining, the method comprising:
    接收终端发送的资源获取请求,所述资源获取请求包括请求类型和请求信息;Receiving a resource acquisition request sent by the terminal, where the resource acquisition request includes a request type and request information;
    根据所述资源获取请求和请求信息获取多个社保数据,所述社保数据包括多个字段数据;Acquiring multiple social security data according to the resource acquisition request and request information, the social security data including multiple field data;
    将所述社保数据输入至向量训练模型中,通过向量训练模型对所述社保数据对应的多个字段数据进行向量处理,输出所述多个字段数据对应的特征向量;Inputting the social security data into a vector training model, performing vector processing on multiple field data corresponding to the social security data through the vector training model, and outputting feature vectors corresponding to the multiple field data;
    提取所述多个特征向量的维度值,利用预设算法根据所述维度值计算多个特征向量之间的相似度,提取出所述相似度达到预设阈值的特征向量;Extracting the dimension values of the multiple feature vectors, calculating the similarity between the multiple feature vectors by using a preset algorithm according to the dimension value, and extracting the feature vectors whose similarity reaches a preset threshold;
    根据所述请求类型获取预设的数据分析模型,通过所述数据分析模型对提取的特征向量进行分析,得到多个类型的指标数据和对应的数值;及Obtain a preset data analysis model according to the request type, analyze the extracted feature vector through the data analysis model, and obtain multiple types of index data and corresponding values; and
    根据所述多个类型的指标数据和对应的数值生成分析结果数据,将所述分析结果数据推送至所述终端。The analysis result data is generated according to the multiple types of index data and corresponding values, and the analysis result data is pushed to the terminal.
  2. 根据权利要求1所述的方法,其特征在于,所述通过向量训练模型对所述社保数据对应的多个字段数据进行向量处理的步骤,包括:The method according to claim 1, wherein the step of performing vector processing on multiple field data corresponding to the social insurance data through a vector training model comprises:
    获取预设的语料库,根据所述社保数据从所述语料库中获取相关联的语料数据;Obtaining a preset corpus, and obtaining associated corpus data from the corpus according to the social security data;
    获取预设的向量训练模型,通过所述向量训练模型对所述社保数据和所述语料数据进行词向量计算和训练,得到对应的多个词向量;及Obtaining a preset vector training model, and performing word vector calculation and training on the social insurance data and the corpus data through the vector training model to obtain multiple corresponding word vectors; and
    根据预设算法将所述多个词向量转换为对应的特征向量。The multiple word vectors are converted into corresponding feature vectors according to a preset algorithm.
  3. 根据权利要求1所述的方法,其特征在于,所述利用预设算法根据所述维度值计算多个特征向量之间的相似度,提取出相似度达到预设阈值的特征向量步骤,包括:The method according to claim 1, wherein the step of using a preset algorithm to calculate the similarity between a plurality of feature vectors according to the dimension value, and extracting the feature vector whose similarity reaches a preset threshold, comprises:
    根据预设的目标函数计算多个特征向量的多个维度值;Calculate multiple dimension values of multiple feature vectors according to preset objective functions;
    根据预设的距离算法和所述维度值计算多个特征向量之间的相似度;及Calculating the similarity between multiple feature vectors according to the preset distance algorithm and the dimension value; and
    提取出所述相似度达到预设阈值的特征向量。The feature vector whose similarity reaches a preset threshold is extracted.
  4. 根据权利要求1所述的方法,其特征在于,所述根据所述请求类型获取预设的数据分析模型之前,还包括:The method according to claim 1, wherein before obtaining a preset data analysis model according to the request type, the method further comprises:
    获取多个样本社保数据,所述样本社保数据包括多个字段数据;Acquiring multiple sample social security data, where the sample social security data includes multiple field data;
    对所述样本社保数据进向量化处理,得到多个字段数据对应的特征向量;Perform vectorization processing on the sample social security data to obtain feature vectors corresponding to multiple field data;
    对多个特征向量进行聚类,根据聚类结果计算特征向量之间的相关性以及每个 特征向量的权重,提取出满足条件阈值的特征向量;及Clustering multiple feature vectors, calculating the correlation between feature vectors and the weight of each feature vector according to the clustering results, and extracting feature vectors that meet the conditional threshold; and
    利用提取出的特征向量和对应的权重按照预设算法构建数据分析模型。Use the extracted feature vectors and corresponding weights to construct a data analysis model according to a preset algorithm.
  5. 根据权利要求1所述的方法,其特征在于,通过所述数据分析模型对提取的特征向量进行分析的步骤,包括:The method according to claim 1, wherein the step of analyzing the extracted feature vectors through the data analysis model comprises:
    通过所述数据分析模型计算出多个特征向量的分布值和字段饱和度;Calculate the distribution value and field saturation of multiple feature vectors through the data analysis model;
    对多个特征向量进行特征字段筛查,提取达到预设饱和值的特征向量;Perform feature field screening on multiple feature vectors, and extract feature vectors that reach preset saturation values;
    根据预设的语义分析算法,对提取出的特征向量进行分析,得到特征向量的权重;According to the preset semantic analysis algorithm, analyze the extracted feature vector to obtain the weight of the feature vector;
    根据所述特征向量的分布值和字段饱和度以及权重进行分析,得到所述特征向量对应多个类型的指标数据和对应的数值;及Perform analysis according to the distribution value, field saturation and weight of the feature vector to obtain multiple types of index data and corresponding values corresponding to the feature vector; and
    根据所述多个类型的指标数据和对应的数值生成分析结果数据。The analysis result data is generated according to the multiple types of index data and corresponding values.
  6. 根据权利要求1至5任意一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 5, wherein the method further comprises:
    根据所述指标数据和对应的数值生成对应的指标分析数据;Generating corresponding index analysis data according to the index data and corresponding values;
    将所述指标分析数据按照预设集成函数集成所述社保数据对应的分析视图数据;Integrating the indicator analysis data into the analysis view data corresponding to the social insurance data according to a preset integration function;
    对所述分析视图数据添加事件类型标识和对应的接口调用参数,所述接口调用参数用于根据事件类型标识调用所生成的分析视图数据;及Adding an event type identifier and corresponding interface call parameters to the analysis view data, where the interface call parameters are used to call the generated analysis view data according to the event type identifier; and
    将所述分析视图数据推送至所述终端。Push the analysis view data to the terminal.
  7. 一种基于数据挖掘的社保数据处理装置,所述装置包括:A social security data processing device based on data mining, the device comprising:
    请求接收模块,用于接收终端发送的资源获取请求,所述资源获取请求包括请求类型和请求信息;The request receiving module is configured to receive a resource acquisition request sent by the terminal, where the resource acquisition request includes the request type and request information;
    数据获取模块,用于根据所述资源获取请求和请求信息获取多个社保数据,所述社保数据包括多个字段数据;A data acquisition module, configured to acquire multiple social security data according to the resource acquisition request and request information, the social security data including multiple field data;
    特征提取模块,用于将所述社保数据输入至向量训练模型中,通过向量训练模型对所述社保数据对应的多个字段数据进行向量处理,输出所述多个字段数据对应的特征向量;提取所述多个特征向量的维度值,利用预设算法根据所述维度值计算多个特征向量之间的相似度,提取出所述相似度达到预设阈值的特征向量;The feature extraction module is used to input the social security data into a vector training model, perform vector processing on multiple field data corresponding to the social security data through the vector training model, and output feature vectors corresponding to the multiple field data; extract Calculating the dimensionality values of the multiple feature vectors using a preset algorithm according to the dimensionality value to calculate the similarity between the multiple feature vectors, and extracting feature vectors whose similarity reaches a preset threshold;
    数据分析模块,用于根据所述请求类型获取预设的数据分析模型,通过所述数据分析模型对提取的特征向量进行分析,得到多个类型的指标数据和对应的数值;及The data analysis module is used to obtain a preset data analysis model according to the request type, and analyze the extracted feature vectors through the data analysis model to obtain multiple types of index data and corresponding values; and
    数据推送模块,用于根据所述多个类型的指标数据和对应的数值生成分析结果 数据,将所述分析结果数据推送至所述终端。The data push module is configured to generate analysis result data according to the multiple types of index data and corresponding values, and push the analysis result data to the terminal.
  8. 根据权利要求7所述的装置,其特征在于,所述特征提取模块还用于获取预设的语料库,根据所述社保数据从所述语料库中获取相关联的语料数据;获取预设的向量训练模型,通过所述向量训练模型对所述社保数据和所述语料数据进行词向量计算和训练,得到对应的多个词向量;及根据预设算法将所述多个词向量转换为对应的特征向量。7. The device according to claim 7, wherein the feature extraction module is further configured to obtain a preset corpus, obtain associated corpus data from the corpus according to the social security data; obtain preset vector training A model, which performs word vector calculation and training on the social insurance data and the corpus data through the vector training model to obtain corresponding multiple word vectors; and converts the multiple word vectors into corresponding features according to a preset algorithm vector.
  9. 根据权利要求7所述的装置,其特征在于,所述特征提取模块还用于根据预设的目标函数计算多个特征向量的多个维度值;根据预设的距离算法和所述维度值计算多个特征向量之间的相似度;及提取出所述相似度达到预设阈值的特征向量。The device according to claim 7, wherein the feature extraction module is further configured to calculate multiple dimension values of multiple feature vectors according to a preset objective function; and calculate multiple dimension values according to a preset distance algorithm and the dimension value The similarity between the multiple feature vectors; and extracting the feature vectors whose similarity reaches a preset threshold.
  10. 根据权利要求7所述的装置,其特征在于,所述装置还包括模型建立模块,用于获取多个样本社保数据,所述样本社保数据包括多个字段数据;对所述样本社保数据进向量化处理,得到多个字段数据对应的特征向量;对多个特征向量进行聚类,根据聚类结果计算特征向量之间的相关性以及每个特征向量的权重,提取出满足条件阈值的特征向量;及利用提取出的特征向量和对应的权重按照预设算法构建数据分析模型。7. The device according to claim 7, wherein the device further comprises a model building module for obtaining a plurality of sample social security data, the sample social security data includes a plurality of field data; To obtain feature vectors corresponding to multiple field data; cluster multiple feature vectors, calculate the correlation between feature vectors and the weight of each feature vector according to the clustering results, and extract feature vectors that meet the conditional threshold ; And use the extracted feature vectors and corresponding weights to construct a data analysis model according to a preset algorithm.
  11. 根据权利要求7所述的装置,其特征在于,所述数据分析模块还用于通过所述数据分析模型计算出多个特征向量的分布值和字段饱和度;对多个特征向量进行特征字段筛查,提取达到预设饱和值的特征向量;根据预设的语义分析算法,对提取出的特征向量进行分析,得到特征向量的权重;根据所述特征向量的分布值和字段饱和度以及权重进行分析,得到所述特征向量对应多个类型的指标数据和对应的数值;及根据所述多个类型的指标数据和对应的数值生成分析结果数据。The device according to claim 7, wherein the data analysis module is further configured to calculate the distribution value and field saturation of multiple feature vectors through the data analysis model; and perform feature field screening on multiple feature vectors Check, extract the feature vector that reaches the preset saturation value; analyze the extracted feature vector according to the preset semantic analysis algorithm to obtain the weight of the feature vector; perform according to the distribution value of the feature vector and the field saturation and weight Analyze to obtain multiple types of index data and corresponding values corresponding to the feature vector; and generate analysis result data according to the multiple types of indicator data and corresponding values.
  12. 根据权利要求7所述的装置,其特征在于,所述装置还包括视图数据生成模块,用于根据所述指标数据和对应的数值生成对应的指标分析数据;将所述指标分析数据按照预设集成函数集成所述社保数据对应的分析视图数据;对所述分析视图数据添加事件类型标识和对应的接口调用参数,所述接口调用参数用于根据事件类型标识调用所生成的分析视图数据;及将所述分析视图数据推送至所述终端。The device according to claim 7, wherein the device further comprises a view data generating module, configured to generate corresponding indicator analysis data according to the indicator data and corresponding values; and the indicator analysis data is preset The integration function integrates the analysis view data corresponding to the social security data; adds an event type identifier and corresponding interface call parameters to the analysis view data, and the interface call parameters are used to call the generated analysis view data according to the event type identifier; and Push the analysis view data to the terminal.
  13. 一种计算机设备,包括存储器及一个或多个处理器,所述存储器存储有至少一条计算机可读指令,所述计算机可读指令由所述处理器加载并执行以下步骤:A computer device includes a memory and one or more processors, the memory stores at least one computer readable instruction, and the computer readable instruction is loaded by the processor and executes the following steps:
    接收终端发送的资源获取请求,所述资源获取请求包括请求类型和请求信息;Receiving a resource acquisition request sent by the terminal, where the resource acquisition request includes a request type and request information;
    根据所述资源获取请求和请求信息获取多个社保数据,所述社保数据包括多个 字段数据;Acquiring multiple social security data according to the resource acquisition request and request information, the social security data including multiple field data;
    将所述社保数据输入至向量训练模型中,通过向量训练模型对所述社保数据对应的多个字段数据进行向量处理,输出所述多个字段数据对应的特征向量;Inputting the social security data into a vector training model, performing vector processing on multiple field data corresponding to the social security data through the vector training model, and outputting feature vectors corresponding to the multiple field data;
    提取所述多个特征向量的维度值,利用预设算法根据所述维度值计算多个特征向量之间的相似度,提取出所述相似度达到预设阈值的特征向量;Extracting the dimension values of the multiple feature vectors, calculating the similarity between the multiple feature vectors by using a preset algorithm according to the dimension value, and extracting the feature vectors whose similarity reaches a preset threshold;
    根据所述请求类型获取预设的数据分析模型,通过所述数据分析模型对提取的特征向量进行分析,得到多个类型的指标数据和对应的数值;及Obtain a preset data analysis model according to the request type, analyze the extracted feature vector through the data analysis model, and obtain multiple types of index data and corresponding values; and
    根据所述多个类型的指标数据和对应的数值生成分析结果数据,将所述分析结果数据推送至所述终端。The analysis result data is generated according to the multiple types of index data and corresponding values, and the analysis result data is pushed to the terminal.
  14. 根据权利要求13所述的计算机设备,其特征在于,所述处理器执行计算机可读指令时还执行以下步骤:获取预设的语料库,根据所述社保数据从所述语料库中获取相关联的语料数据;获取预设的向量训练模型,通过所述向量训练模型对所述社保数据和所述语料数据进行词向量计算和训练,得到对应的多个词向量;及根据预设算法将所述多个词向量转换为对应的特征向量。The computer device according to claim 13, wherein the processor further executes the following steps when executing the computer-readable instructions: obtaining a preset corpus, and obtaining an associated corpus from the corpus according to the social security data Data; obtain a preset vector training model, and perform word vector calculation and training on the social insurance data and the corpus data through the vector training model to obtain a plurality of corresponding word vectors; and according to the preset algorithm The word vectors are converted into corresponding feature vectors.
  15. 根据权利要求13所述的计算机设备,其特征在于,所述处理器执行计算机可读指令时还执行以下步骤:通过所述数据分析模型计算出多个特征向量的分布值和字段饱和度;对多个特征向量进行特征字段筛查,提取达到预设饱和值的特征向量;根据预设的语义分析算法,对提取出的特征向量进行分析,得到特征向量的权重;根据所述特征向量的分布值和字段饱和度以及权重进行分析,得到所述特征向量对应多个类型的指标数据和对应的数值;及根据所述多个类型的指标数据和对应的数值生成分析结果数据。The computer device according to claim 13, wherein the processor further executes the following steps when executing computer-readable instructions: calculating the distribution value and field saturation of a plurality of feature vectors through the data analysis model; Multiple feature vectors are screened for feature fields, and feature vectors that reach a preset saturation value are extracted; according to a preset semantic analysis algorithm, the extracted feature vectors are analyzed to obtain the weight of the feature vector; according to the distribution of the feature vector Value and field saturation and weight are analyzed to obtain multiple types of index data and corresponding numerical values corresponding to the feature vector; and analysis result data is generated according to the multiple types of indicator data and corresponding numerical values.
  16. 根据权利要求13所述的计算机设备,其特征在于,所述处理器执行计算机可读指令时还执行以下步骤:根据所述指标数据和对应的数值生成对应的指标分析数据;将所述指标分析数据按照预设集成函数集成所述社保数据对应的分析视图数据;对所述分析视图数据添加事件类型标识和对应的接口调用参数,所述接口调用参数用于根据事件类型标识调用所生成的分析视图数据;及将所述分析视图数据推送至所述终端。The computer device according to claim 13, wherein the processor further executes the following steps when executing the computer-readable instructions: generating corresponding index analysis data according to the index data and corresponding values; analyzing the index The data integrates the analysis view data corresponding to the social security data according to a preset integration function; an event type identifier and corresponding interface call parameters are added to the analysis view data, and the interface call parameters are used to call the generated analysis according to the event type identifier View data; and pushing the analysis view data to the terminal.
  17. 一种非易失性的计算机可读存储介质,所述计算机可读存储介质中存储有至少一条计算机可读指令,所述计算机可读指令由处理器加载并执行以下步骤:A non-volatile computer-readable storage medium in which at least one computer-readable instruction is stored, and the computer-readable instruction is loaded by a processor and executes the following steps:
    接收终端发送的资源获取请求,所述资源获取请求包括请求类型和请求信息;Receiving a resource acquisition request sent by the terminal, where the resource acquisition request includes a request type and request information;
    根据所述资源获取请求和请求信息获取多个社保数据,所述社保数据包括多个 字段数据;Acquiring multiple social security data according to the resource acquisition request and request information, the social security data including multiple field data;
    将所述社保数据输入至向量训练模型中,通过向量训练模型对所述社保数据对应的多个字段数据进行向量处理,输出所述多个字段数据对应的特征向量;Inputting the social security data into a vector training model, performing vector processing on multiple field data corresponding to the social security data through the vector training model, and outputting feature vectors corresponding to the multiple field data;
    提取所述多个特征向量的维度值,利用预设算法根据所述维度值计算多个特征向量之间的相似度,提取出所述相似度达到预设阈值的特征向量;Extracting the dimension values of the multiple feature vectors, calculating the similarity between the multiple feature vectors by using a preset algorithm according to the dimension value, and extracting the feature vectors whose similarity reaches a preset threshold;
    根据所述请求类型获取预设的数据分析模型,通过所述数据分析模型对提取的特征向量进行分析,得到多个类型的指标数据和对应的数值;及Obtain a preset data analysis model according to the request type, analyze the extracted feature vector through the data analysis model, and obtain multiple types of index data and corresponding values; and
    根据所述多个类型的指标数据和对应的数值生成分析结果数据,将所述分析结果数据推送至所述终端。The analysis result data is generated according to the multiple types of index data and corresponding values, and the analysis result data is pushed to the terminal.
  18. 根据权利要求17所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:获取预设的语料库,根据所述社保数据从所述语料库中获取相关联的语料数据;获取预设的向量训练模型,通过所述向量训练模型对所述社保数据和所述语料数据进行词向量计算和训练,得到对应的多个词向量;及根据预设算法将所述多个词向量转换为对应的特征向量。The storage medium according to claim 17, wherein when the computer-readable instructions are executed by the processor, the following steps are further executed: obtaining a preset corpus, and obtaining relevant information from the corpus according to the social security data. Corpus data; obtain a preset vector training model, and calculate and train the social security data and the corpus data through the vector training model to calculate and train word vectors to obtain multiple corresponding word vectors; and The multiple word vectors are converted into corresponding feature vectors.
  19. 根据权利要求17所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:通过所述数据分析模型计算出多个特征向量的分布值和字段饱和度;对多个特征向量进行特征字段筛查,提取达到预设饱和值的特征向量;根据预设的语义分析算法,对提取出的特征向量进行分析,得到特征向量的权重;根据所述特征向量的分布值和字段饱和度以及权重进行分析,得到所述特征向量对应多个类型的指标数据和对应的数值;及根据所述多个类型的指标数据和对应的数值生成分析结果数据。The storage medium according to claim 17, wherein when the computer-readable instructions are executed by the processor, the following steps are further executed: calculating the distribution values and field saturation of a plurality of feature vectors through the data analysis model Degree; to perform feature field screening on multiple feature vectors, and extract feature vectors that reach a preset saturation value; analyze the extracted feature vectors according to a preset semantic analysis algorithm to obtain the weight of feature vectors; according to the features The distribution value, field saturation and weight of the vector are analyzed to obtain multiple types of index data and corresponding values corresponding to the feature vector; and analysis result data is generated according to the multiple types of index data and corresponding values.
  20. 根据权利要求17所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:根据所述指标数据和对应的数值生成对应的指标分析数据;将所述指标分析数据按照预设集成函数集成所述社保数据对应的分析视图数据;对所述分析视图数据添加事件类型标识和对应的接口调用参数,所述接口调用参数用于根据事件类型标识调用所生成的分析视图数据;及将所述分析视图数据推送至所述终端。The storage medium according to claim 17, wherein when the computer-readable instructions are executed by the processor, the following steps are further executed: generating corresponding index analysis data according to the index data and corresponding values; The indicator analysis data integrates the analysis view data corresponding to the social security data according to a preset integration function; an event type identification and corresponding interface call parameters are added to the analysis view data, and the interface call parameters are used to call the office according to the event type identification. Generated analysis view data; and pushing the analysis view data to the terminal.
PCT/CN2019/116126 2019-03-07 2019-11-07 Data mining-based social insurance data processing method and apparatus, and computer device WO2020177365A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910171606.4 2019-03-07
CN201910171606.4A CN110008250B (en) 2019-03-07 2019-03-07 Social security data processing method and device based on data mining and computer equipment

Publications (1)

Publication Number Publication Date
WO2020177365A1 true WO2020177365A1 (en) 2020-09-10

Family

ID=67166603

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/116126 WO2020177365A1 (en) 2019-03-07 2019-11-07 Data mining-based social insurance data processing method and apparatus, and computer device

Country Status (2)

Country Link
CN (1) CN110008250B (en)
WO (1) WO2020177365A1 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008250B (en) * 2019-03-07 2024-03-15 平安科技(深圳)有限公司 Social security data processing method and device based on data mining and computer equipment
CN110610196B (en) * 2019-08-14 2023-04-28 平安科技(深圳)有限公司 Desensitization method, system, computer device and computer readable storage medium
CN112528315A (en) * 2019-09-19 2021-03-19 华为技术有限公司 Method and device for identifying sensitive data
CN110674320B (en) * 2019-09-27 2022-03-18 百度在线网络技术(北京)有限公司 Retrieval method and device and electronic equipment
CN111178064B (en) * 2019-12-13 2022-11-29 深圳平安医疗健康科技服务有限公司 Information pushing method and device based on field word segmentation processing and computer equipment
CN111222585A (en) * 2020-01-15 2020-06-02 深圳前海微众银行股份有限公司 Data processing method, device, equipment and medium
CN112085469B (en) * 2020-09-08 2023-04-28 中国平安财产保险股份有限公司 Data approval method, device, equipment and storage medium based on vector machine model
CN113157788B (en) * 2021-04-13 2024-02-13 福州外语外贸学院 Big data mining method and system
CN117314163B (en) * 2023-09-27 2024-04-12 吉贝克信息技术(北京)有限公司 Social security data processing method and system based on big data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120321202A1 (en) * 2011-06-20 2012-12-20 Michael Benjamin Selkowe Fertik Identifying information related to a particular entity from electronic sources, using dimensional reduction and quantum clustering
CN105786711A (en) * 2016-03-25 2016-07-20 广州华多网络科技有限公司 Data analysis method and device
CN108520324A (en) * 2018-04-13 2018-09-11 北京京东金融科技控股有限公司 Method and apparatus for generating information
CN110008250A (en) * 2019-03-07 2019-07-12 平安科技(深圳)有限公司 Social security data processing method, device and computer equipment based on data mining

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10817527B1 (en) * 2016-04-12 2020-10-27 Tableau Software, Inc. Systems and methods of using natural language processing for visual analysis of a data set
CN109325781A (en) * 2018-09-04 2019-02-12 中国平安人寿保险股份有限公司 Client's Quality Analysis Methods, device, computer equipment and storage medium
CN109388675A (en) * 2018-10-12 2019-02-26 平安科技(深圳)有限公司 Data analysing method, device, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120321202A1 (en) * 2011-06-20 2012-12-20 Michael Benjamin Selkowe Fertik Identifying information related to a particular entity from electronic sources, using dimensional reduction and quantum clustering
CN105786711A (en) * 2016-03-25 2016-07-20 广州华多网络科技有限公司 Data analysis method and device
CN108520324A (en) * 2018-04-13 2018-09-11 北京京东金融科技控股有限公司 Method and apparatus for generating information
CN110008250A (en) * 2019-03-07 2019-07-12 平安科技(深圳)有限公司 Social security data processing method, device and computer equipment based on data mining

Also Published As

Publication number Publication date
CN110008250A (en) 2019-07-12
CN110008250B (en) 2024-03-15

Similar Documents

Publication Publication Date Title
WO2020177365A1 (en) Data mining-based social insurance data processing method and apparatus, and computer device
CN110021439B (en) Medical data classification method and device based on machine learning and computer equipment
CN110765265B (en) Information classification extraction method and device, computer equipment and storage medium
WO2021169111A1 (en) Resume screening method and apparatus, computer device and storage medium
WO2021068321A1 (en) Information pushing method and apparatus based on human-computer interaction, and computer device
WO2020177366A1 (en) Data processing method and apparatus based on time sequence data, and computer device
CN108427707B (en) Man-machine question and answer method, device, computer equipment and storage medium
WO2021027553A1 (en) Micro-expression classification model generation method, image recognition method, apparatus, devices, and mediums
WO2020057022A1 (en) Associative recommendation method and apparatus, computer device, and storage medium
CN111444723B (en) Information extraction method, computer device, and storage medium
WO2020057021A1 (en) Data table processing method and device, computer device and storage medium
CN109815333B (en) Information acquisition method and device, computer equipment and storage medium
WO2020147395A1 (en) Emotion-based text classification method and device, and computer apparatus
CN110569500A (en) Text semantic recognition method and device, computer equipment and storage medium
US10445623B2 (en) Label consistency for image analysis
CN110377558B (en) Document query method, device, computer equipment and storage medium
CN108491406B (en) Information classification method and device, computer equipment and storage medium
CN109886719B (en) Data mining processing method and device based on grid and computer equipment
CN109325118B (en) Unbalanced sample data preprocessing method and device and computer equipment
CN110362798B (en) Method, apparatus, computer device and storage medium for judging information retrieval analysis
CN110427612B (en) Entity disambiguation method, device, equipment and storage medium based on multiple languages
CN110674131A (en) Financial statement data processing method and device, computer equipment and storage medium
WO2020034801A1 (en) Medical feature screening method and apparatus, computer device, and storage medium
CN107766498B (en) Method and apparatus for generating information
CN111191446B (en) Interactive information processing method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19918286

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19918286

Country of ref document: EP

Kind code of ref document: A1