CN116756147A

CN116756147A - Data classification method, device, computer equipment and storage medium

Info

Publication number: CN116756147A
Application number: CN202310774925.0A
Authority: CN
Inventors: 黎晓宇
Original assignee: Ping An Property and Casualty Insurance Company of China Ltd
Current assignee: Ping An Property and Casualty Insurance Company of China Ltd
Priority date: 2023-06-27
Filing date: 2023-06-27
Publication date: 2023-09-15

Abstract

The application discloses a data classifying method, a data classifying device, computer equipment and a storage medium, and belongs to the technical field of big data and the technical field of risk-producing finance. According to the method, the data type of the data to be classified is obtained by analyzing the data to be classified, the data classification model matched with the data to be classified is determined based on the data type, the matched classification model is obtained, the data to be classified is imported into the matched classification model, initial classification data is obtained, business scene data corresponding to the data to be classified is obtained, the business scene data and the initial classification data are imported into the classification prediction model, the classification prediction result is obtained, and the initial classification data is combined based on the classification prediction result, so that the data classification result is obtained. The application also relates to the technical field of blockchain, and data to be classified are stored in a blockchain network. The application improves the data quality and accuracy through data classification, reduces the data use cost of developers and improves the research and development efficiency.

Description

Data classification method, device, computer equipment and storage medium

Technical Field

The application belongs to the technical field of big data and the technical field of risk-producing finance, and particularly relates to a data classification method, a data classification device, computer equipment and a storage medium.

Background

Data processing in the financial risk area involves a large number of data cleaning, converting, analyzing, classifying, etc., however, due to the lack of a uniform processing manner, each developer may use its own unique method to process the data. For example, in terms of insurance claims, different insurance companies have different claim settlement processes and rules, so that the claim settlement data processing modes of each company are different, and the processing modes lack of consistency and standardization may affect the quality and accuracy of data. And with the increase of new business, insurance data processing is more and more complex, and when project cooperation development is performed, developers need to be constantly familiar with various data processing modes so as to understand and maintain the existing codes, and the process consumes more time and effort of the developers, so that the development efficiency is reduced.

Disclosure of Invention

The embodiment of the application aims to provide a data classifying method, a data classifying device, computer equipment and a storage medium, so as to solve the technical problems that the data quality and the accuracy are insufficient due to the lack of a uniform processing mode in the existing data processing scheme in the financial risk production field, and development efficiency is reduced due to the fact that developers are required to be familiar with various data processing modes.

In order to solve the above technical problems, the embodiment of the present application provides a data classifying method, which adopts the following technical scheme:

a method of data categorizing comprising:

receiving a data classification instruction uploaded by a client, acquiring data to be classified, and analyzing the data to be classified to acquire the data type of the data to be classified;

determining a data classification model matched with the data to be classified based on the data type to obtain a matched classification model;

importing data to be classified into a matching classification model to obtain initial classification data;

acquiring business scene data corresponding to the data to be classified, and importing the business scene data and the initial classification data into a classification prediction model to obtain a classification prediction result;

the classification prediction result is sent to the client and feedback information returned by the client is received;

and combining the initial classification data based on the feedback information and the classification prediction result to obtain a data classification result.

Further, receiving a data classification instruction uploaded by a client, obtaining data to be classified, analyzing the data to be classified, and obtaining a data type of the data to be classified, wherein the method specifically comprises the following steps:

receiving a data classification instruction uploaded by a client, and acquiring data to be classified based on the data classification instruction;

Analyzing the data to be classified, and determining a data source of the data to be classified;

the data type of the data to be classified is determined based on the data source of the data to be classified.

Further, the data classification model comprises a decision tree model and a support vector machine, and when the matching classification model is the decision tree model, the data to be classified is imported into the matching classification model to obtain initial classification data, and the method specifically comprises the following steps:

preprocessing the data to be classified, wherein the preprocessing comprises data deduplication, missing value processing and numerical value standardization;

extracting features of the preprocessed data to be classified to obtain first data features;

loading a pre-trained decision tree model, importing a first data characteristic into the decision tree model, and obtaining a classification label output by the decision tree model;

classifying the data to be classified according to the classification labels to obtain initial classification data.

Further, the data classification model comprises a decision tree model and a support vector machine, and when the matching classification model is the support vector machine, the data to be classified is imported into the matching classification model to obtain initial classification data, and the method specifically comprises the following steps:

Extracting features of the preprocessed data to be classified to obtain second data features;

loading a pre-trained support vector machine, and importing the second data characteristic into the support vector machine to obtain a decision boundary output by the support vector machine;

classifying the data to be classified according to the decision boundary to obtain initial classification data.

Further, the classification prediction model is a pre-trained transducer model, the classification prediction model comprises an input layer, a self-attention layer, a feedforward neural network layer and an output layer, business scene data corresponding to data to be classified are obtained, and the business scene data and initial classification data are imported into the classification prediction model to obtain a classification prediction result, and the method specifically comprises the following steps:

extracting features of the service scene data and the initial classification data to obtain scene data features;

the method comprises the steps of importing scene data features into a classification prediction model through an input layer, and encoding the scene data features and assigning self-attention weights through a self-attention layer;

vector mapping is carried out on the coded and weighted scene data features through a feedforward neural network layer;

and obtaining a vector mapping result through the output layer, decoding the vector mapping result to obtain a classification prediction result, and outputting the classification prediction result.

Further, before acquiring the service scene data corresponding to the data to be classified, importing the service scene data and the initial classification data into the classification prediction model to obtain a classification prediction result, the method further comprises the following steps:

acquiring training data, wherein the training data comprises historical scene data and historical classification data;

constructing a training data set and a verification data set based on the training data;

training a preset initial prediction model through a training data set, and verifying the trained initial prediction model through a verification data set to obtain a classification prediction model.

Further, the initial prediction model includes an input layer, a self-attention layer, a feedforward neural network layer and an output layer, a preset initial prediction model is trained through a training data set, and the trained initial prediction model is verified by a verification data set to obtain a classification prediction model, and the method specifically includes:

extracting characteristics of training samples in the verification data set to obtain training sample characteristics;

introducing training sample characteristics into the initial prediction model through an input layer of the initial prediction model;

coding the training sample characteristics and assigning self-attention weights through a self-attention layer of an initial prediction model;

Vector mapping is carried out on the training sample characteristics after coding and weighting through a feedforward neural network layer of the initial prediction model;

obtaining a vector mapping result of the training sample characteristics through an output layer of the initial prediction model, and decoding the vector mapping result of the training sample characteristics to obtain a training prediction result;

iterating the initial prediction model through the training prediction result and a preset standard result to obtain a trained initial prediction model;

and verifying the initial prediction model after training through the verification data set, and obtaining the classification prediction model after the initial prediction model passes the verification.

In order to solve the above technical problems, the embodiment of the present application further provides a data classifying device, which adopts the following technical scheme:

a data categorizing apparatus, comprising:

the data type determining module is used for receiving the data classification instruction uploaded by the client, obtaining data to be classified, analyzing the data to be classified, and obtaining the data type of the data to be classified;

the classification model matching module is used for determining a data classification model matched with the data to be classified based on the data type to obtain a matched classification model;

The initial data classification module is used for importing data to be classified into the matched classification model to obtain initial classification data;

the data classification prediction module is used for acquiring service scene data corresponding to the data to be classified, and importing the service scene data and the initial classification data into the classification prediction model to obtain a classification prediction result;

the data classification feedback module is used for sending the classification prediction result to the client and receiving feedback information returned by the client;

and the data final classification module is used for combining the initial classification data based on the feedback information and the classification prediction result to obtain a data classification result.

In order to solve the above technical problems, the embodiment of the present application further provides a computer device, which adopts the following technical schemes:

a computer device comprising a memory having stored therein computer readable instructions which when executed by a processor implement the steps of the data categorization method of any of the preceding claims.

In order to solve the above technical problems, an embodiment of the present application further provides a computer readable storage medium, which adopts the following technical schemes:

A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the data categorization method of any of the preceding claims.

Compared with the prior art, the embodiment of the application has the following main beneficial effects:

the application discloses a data classifying method, a data classifying device, computer equipment and a storage medium, and belongs to the technical field of big data and the technical field of risk-producing finance. According to the method, data to be classified is obtained through receiving a data classification instruction uploaded by a client, the data to be classified is analyzed, the data type of the data to be classified is obtained, a data classification model matched with the data to be classified is determined based on the data type, a matched classification model is obtained, the data to be classified is imported into the matched classification model, initial classification data is obtained, business scene data corresponding to the data to be classified is obtained, the business scene data and the initial classification data are imported into a classification prediction model, a classification prediction result is obtained, the classification prediction result is sent to the client, feedback information returned by the client is received, and the initial classification data is combined based on the feedback information and the classification prediction result, so that the data classification result is obtained. According to the method, the data is initially classified through the data classification model matched with the data to be classified, then reclassification prediction is carried out by utilizing the classification prediction model of machine learning according to the service scene condition, so that the accurate classification of the data based on the use scene is realized, and the method is particularly suitable for data classification in the financial risk production field.

Drawings

In order to more clearly illustrate the solution of the present application, a brief description will be given below of the drawings required for the description of the embodiments of the present application, it being apparent that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained from these drawings without the exercise of inventive effort for a person of ordinary skill in the art.

FIG. 1 illustrates an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 illustrates a flow chart of one embodiment of a data categorization method in accordance with the present application;

FIG. 3 shows a schematic diagram of an embodiment of a data sorting apparatus according to the present application;

fig. 4 shows a schematic structural diagram of an embodiment of a computer device according to the application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description of the application and the claims and the description of the drawings above are intended to cover a non-exclusive inclusion. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

In order to make the person skilled in the art better understand the solution of the present application, the technical solution of the embodiment of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.

The server 105 may be a server that provides various services, such as a background server that provides support for pages displayed on the terminal devices 101, 102, 103, and may be a stand-alone server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.

It should be noted that, the data classifying method provided in the embodiment of the present application is generally executed by a server, and accordingly, the data classifying device is generally disposed in the server.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow chart of one embodiment of a data categorization method according to the application is shown. The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Because of the lack of a uniform processing mode, each developer may use a unique method to process data. For example, in terms of insurance claims, different insurance companies have different claim settlement processes and rules, so that the claim settlement data processing modes of each company are different, and the processing modes lack of consistency and standardization may affect the quality and accuracy of data. And with the increase of new business, insurance data processing is more and more complex, and when project collaborative development is performed, developers need to be constantly familiar with various data processing modes so as to understand and maintain the existing codes, and the process consumes more time and effort of the developers, so that the development efficiency is reduced.

In the embodiment, the application discloses a data classifying method, a device, computer equipment and a storage medium, which belong to the technical field of big data and the technical field of risk production finance.

The data classifying method comprises the following steps:

s201, receiving a data classification instruction uploaded by a client, acquiring data to be classified, analyzing the data to be classified, and acquiring the data type of the data to be classified.

In this embodiment, after receiving a data classification instruction uploaded by a client, a server obtains data to be classified corresponding to the data classification instruction, analyzes the data to be classified, determines a data source of the data to be classified, and determines a data type of the data to be classified based on the data source. For example, in a warranty sale, the data source may include a sales record database, a customer information database, a policy database, etc., and if the data source is a sales record database, the data to be classified is determined to be a sales data type.

In this embodiment, the electronic device (such as the server shown in fig. 1) on which the data classifying method operates may receive the instruction or acquire the data through a wired connection manner or a wireless connection manner. It should be noted that the wireless connection may include, but is not limited to, 3G/4G connections, wiFi connections, bluetooth connections, wiMAX connections, zigbee connections, UWB (ultra wideband) connections, and other now known or later developed wireless connection means.

In this embodiment, for the received data to be classified, the server may analyze to determine the data source and the data type thereof, and by analyzing the structure, field information, etc. of the data, the data source may be determined, and the data type may be determined by the data source, which is to determine an appropriate data classification model, so as to improve the accuracy of data classification.

S202, determining a data classification model matched with the data to be classified based on the data type, and obtaining a matched classification model.

In this embodiment, the server determines, based on the data type, a data classification model that matches the data to be classified in a preset model library, so as to obtain a matching classification model, where the models in the model library at least include a decision tree model and a support vector machine, and by matching an appropriate data classification model, accuracy of data classification can be improved.

S203, importing the data to be classified into a matching classification model to obtain initial classification data.

In this embodiment, after extracting the data features of the data to be classified, the server imports the data features of the data to be classified into the matching classification model to obtain the initial classification data.

In this embodiment, first, preprocessing is performed on data to be classified, where preprocessing includes data deduplication, missing value processing, and numerical normalization. And then, extracting the characteristics of the data to be classified, extracting the attribute capable of describing the characteristics of the data, matching the characteristics of the data to be classified with the characteristics in the decision tree model, comparing the characteristic value of the data to be classified with the conditions of the decision tree nodes from the root node, selecting corresponding sub-nodes according to the matching result, continuing until the leaf nodes are reached, obtaining the final classification result of the decision tree, namely a classification label, distributing the data to be classified to the corresponding classification label according to the classification decision path of the decision tree, completing the classification of the data to be classified, and obtaining the initial classification data.

It should be noted that the decision tree model is composed of a series of feature conditions and decision rules learned by training data, so that it is necessary to ensure that features of the data to be classified are consistent with those of the decision tree model. The decision tree model classifies the data to be classified through a series of characteristic conditions and decision rules, and can classify the data to be classified into different categories or make classification decisions.

In a specific embodiment of the present application, it is assumed that there is some risk sales data, which includes the following fields: age, gender, model, premium, number of claims, and whether insurance is applied, these risk-bearing sales data are classified by using a decision tree model to predict whether a customer will purchase insurance. One of the risk sales data is assumed to be as follows: age 30 years, sex female, model SUV. After extracting the attribute capable of describing the data characteristics, comparing from a root node according to the characteristics and conditions of a decision tree model, if the data meets the conditions and enters a left child node according to the conditional branch of age, then continuously entering the left child node according to the conditional branch of gender, and finally obtaining a classification label which is equal to or more than X years old according to the path of the leaf node according to the condition branch of vehicle type and SUV (speeded up vehicle) due to the fact that the leaf node is reached; female; SUV model ", X is a preset age condition, and the data can be assigned to a corresponding category, such as" purchase insurance ", according to the category label.

In this embodiment, first, preprocessing is performed on data to be classified, where preprocessing includes data deduplication, missing value processing, and numerical normalization. The data to be classified is then converted into the form of feature vectors so that the support vector machine can process it. And inputting the converted data to be classified into a trained support vector machine model, and judging which category the data to be classified belongs to by the support vector machine according to the position of the data to be classified in a feature space and the relation between the data to be classified and a decision boundary so as to finish the classification of the data to be classified and obtain initial classification data.

It should be noted that, converting data into numerical vectors needs to ensure that the dimensions of the vectors match the feature dimensions used to train the model. The support vector machine model learns the decision boundary and the position of the support vector from the training data, and can classify the data to be classified according to the information.

In the above specific embodiment, when the support vector machine is used to implement classification, the age may be normalized, the gender may be encoded (for example, 0 represents male, 1 represents female), the vehicle model may be unithermally encoded, etc., so that each piece of data to be classified may be represented as a feature vector, and the converted data to be classified may be input into the trained support vector machine model. And the support vector machine judges which category the data to be classified belongs to according to the position of the data to be classified in the feature space and the relation between the data to be classified and the decision boundary. For example, assume that there is a piece of data to be classified, in which the age is 30 years, the sex is female, and the model is SUV. The age is normalized, the gender is encoded as 1, the single heat of SUV is encoded as [0,1,0], the features are formed into a feature vector, and then the feature vector is input into a trained support vector machine model, and the support vector machine judges which category the data point belongs to according to the position of the feature vector in the feature space, for example, the category of 'purchasing insurance'.

S204, acquiring business scene data corresponding to the data to be classified, and importing the business scene data and the initial classification data into a classification prediction model to obtain a classification prediction result.

The classification prediction model is a pre-trained transducer model, which is a deep learning model based on self-attention mechanism (self-attention), and the basic components of the transducer model comprise a plurality of self-attention layers and a feedforward neural network layer. The self-attention layer is used for calculating attention weights and encoding the input sequence, and the feedforward neural network layer is used for further mapping and transforming the encoded sequence.

In this embodiment, a classification prediction model is trained in advance based on a transducer model, after primary classification of data to be classified is completed, service scene data corresponding to the data to be classified is obtained, and the service scene data and the initial classification data are imported into the classification prediction model to obtain a classification prediction result.

Further, the classification prediction model comprises an input layer, a self-attention layer, a feedforward neural network layer and an output layer, the business scene data corresponding to the data to be classified is obtained, the business scene data and the initial classification data are imported into the classification prediction model, and a classification prediction result is obtained, and the method specifically comprises the following steps:

In this embodiment, the data features are extracted from the service scene data and the initial classification data, and the feature extraction method may include text processing, image processing, and statistical feature extraction, and an appropriate feature extraction method is selected according to the specific service scene and the data type. The self-attention layer may then encode the scene data features, capture the relevance and importance between the features, and determine the degree of contribution of the different features to the classification result by self-attention weight assignment. The feedforward neural network layer is used for further mapping and converting the coded and weighted scene data characteristics, and can be used for deeply representing and abstracting the characteristics through a series of nonlinear transformation to extract higher-level characteristic information. The output layer extracts relevant information from the vector mapping result output by the feedforward neural network layer, decodes the relevant information, converts the vector mapping result into a specific classification prediction result, processes the specific classification prediction result through a plurality of subsequent neural network layers (such as a full-connection layer) and an activation function, and finally outputs the classification prediction result.

In this embodiment, the historical scene data is known business scene data, and the historical classification data is known classification results of the scene data, which may be from databases inside the company, data warehouse or external data sources, and by collecting and sorting the data, a training data set with tags may be created. The purpose of dividing the training data set and the validation data set is to be able to evaluate the generalization ability of the model and avoid overfitting problems during model training, in general the training data set will occupy a large part of the total data volume, while the validation data set will occupy a smaller part. And finally training a preset initial prediction model through a training data set, and verifying the trained initial prediction model by using a verification data set to obtain a classification prediction model.

In the above embodiment, by acquiring the history data and constructing the training data set and the verification data set based on the history data, one classification prediction model is trained in advance by the training data set and the verification data set so that the data classification prediction is performed subsequently using the classification prediction model.

In this embodiment, features of a training sample in the verification dataset are extracted to obtain features of the training sample, the features of the training sample are imported into an initial prediction model through an input layer of the initial prediction model, the features of the training sample are encoded and assigned by self-attention weight through a self-attention layer of the initial prediction model, the encoded and weighted features of the training sample are vector mapped through a feedforward neural network layer of the initial prediction model, deep representation and abstraction of the features are performed through a series of nonlinear transformations, higher-level feature information is extracted, a vector mapping result of the features of the training sample is obtained through an output layer of the initial prediction model, the vector mapping result of the features of the training sample is decoded to obtain a training prediction result, a prediction error is calculated on the initial prediction model through the training prediction result and a preset standard result, model iteration is performed based on the prediction error until model fitting is performed, a trained initial prediction model is obtained, the trained initial prediction model is verified through the verification dataset, and after the initial prediction model passes verification, a classification prediction model is obtained.

In the above embodiment, the present application quickly trains the classification prediction model based on the transducer model architecture, so that the classification prediction model is used for data classification prediction later.

S205, the classification prediction result is sent to the client and feedback information returned by the client is received.

In this embodiment, after the classification prediction result output by the classification prediction model is obtained, the classification prediction result is transmitted back to the client, and the classification prediction result is displayed on the client, so that the user is instructed to confirm the classification prediction result, and feedback operation data of the user is obtained.

S206, combining the initial classification data based on the feedback information and the classification prediction result to obtain a data classification result.

In this embodiment, the server combines the initial classification data based on the feedback information and the classification prediction result to obtain the data classification result. The feedback information comprises an agreeing classification prediction result and a disagreeing classification prediction result, and when the feedback information is the agreeing classification prediction result, the initial classification data is classified and combined based on a data classification scheme in the classification prediction result to obtain a data classification result.

When the feedback information is "disagree with the classification prediction result", the steps S201 to S204 are re-executed to obtain a new classification prediction result, and the new classification prediction result is sent to the client, and the feedback information returned by the client is received, and the above steps are repeated until the feedback information is "agree with the classification prediction result".

In the above embodiment, the application discloses a data classifying method, which belongs to the technical field of big data and the technical field of risk-producing finance. According to the method, data to be classified is obtained through receiving a data classification instruction uploaded by a client, the data to be classified is analyzed, the data type of the data to be classified is obtained, a data classification model matched with the data to be classified is determined based on the data type, a matched classification model is obtained, the data to be classified is imported into the matched classification model, initial classification data is obtained, business scene data corresponding to the data to be classified is obtained, the business scene data and the initial classification data are imported into a classification prediction model, a classification prediction result is obtained, the classification prediction result is sent to the client, feedback information returned by the client is received, and the initial classification data is combined based on the feedback information and the classification prediction result, so that the data classification result is obtained. According to the method, the data is initially classified through the data classification model matched with the data to be classified, then reclassification prediction is carried out by utilizing the classification prediction model of machine learning according to the service scene condition, so that the accurate classification of the data based on the use scene is realized, and the method is particularly suitable for data classification in the financial risk production field.

It should be emphasized that, to further ensure the privacy and security of the data to be classified, the data to be classified may also be stored in a node of a blockchain.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

Those skilled in the art will appreciate that implementing all or part of the processes of the methods of the embodiments described above may be accomplished by way of computer readable instructions, stored on a computer readable storage medium, which when executed may comprise processes of embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a data classifying apparatus, which corresponds to the embodiment of the method shown in fig. 2, and the apparatus is particularly applicable to various electronic devices.

As shown in fig. 3, the data classifying device 300 according to the present embodiment includes:

the data type determining module 301 is configured to receive a data classification instruction uploaded by a client, obtain data to be classified, and parse the data to be classified to obtain a data type of the data to be classified;

The classification model matching module 302 is configured to determine a data classification model that matches the data to be classified based on the data type, and obtain a matching classification model;

the initial data classification module 303 is configured to import data to be classified into the matching classification model to obtain initial classification data;

the data classification prediction module 304 is configured to obtain service scene data corresponding to the data to be classified, and import the service scene data and the initial classification data into the classification prediction model to obtain a classification prediction result;

the data classification feedback module 305 is configured to send the classification prediction result to the client, and receive feedback information returned by the client;

the data final classification module 306 is configured to combine the initial classification data based on the feedback information and the classification prediction result to obtain a data classification result.

Further, the data type determining module 301 specifically includes:

the instruction receiving unit is used for receiving the data classification instruction uploaded by the client and acquiring data to be classified based on the data classification instruction;

the data source determining unit is used for analyzing the data to be classified and determining the data source of the data to be classified;

and the data type determining unit is used for determining the data type of the data to be classified based on the data source of the data to be classified.

Further, the data classification model includes a decision tree model and a support vector machine, and when the matching classification model is the decision tree model, the initial data classification module 303 specifically includes:

the preprocessing unit is used for preprocessing the data to be classified, wherein the preprocessing comprises data deduplication, missing value processing and numerical value standardization;

the first feature extraction unit is used for carrying out feature extraction on the data to be classified after pretreatment is completed, so as to obtain first data features;

the decision tree unit is used for loading a pre-trained decision tree model, importing the first data features into the decision tree model and obtaining a classification label output by the decision tree model;

the first initial classification unit is used for classifying the data to be classified according to the classification labels to obtain initial classification data.

Further, the data classification model includes a decision tree model and a support vector machine, and when the matching classification model is the support vector machine, the initial data classification module 303 further includes:

the second feature extraction unit is used for carrying out feature extraction on the data to be classified after pretreatment is completed, so as to obtain second data features;

The support vector machine unit is used for loading a pre-trained support vector machine, importing the second data features into the support vector machine and obtaining a decision boundary output by the support vector machine;

the second initial classification unit is used for classifying the data to be classified according to the decision boundary to obtain initial classification data.

Further, the classification prediction model is a pre-trained transducer model, the classification prediction model includes an input layer, a self-attention layer, a feedforward neural network layer, and an output layer, and the data classification prediction module 304 specifically includes:

the scene feature extraction unit is used for extracting features of the business scene data and the initial classification data to obtain scene data features;

the self-attention unit is used for guiding the scene data features into the classification prediction model through the input layer, and coding the scene data features and assigning self-attention weights through the self-attention layer;

the vector mapping unit is used for carrying out vector mapping on the coded and weighted scene data features through the feedforward neural network layer;

and the decoding output unit is used for obtaining the vector mapping result through the output layer, decoding the vector mapping result to obtain a classification prediction result and outputting the classification prediction result.

Further, the data classifying apparatus 300 further includes:

the training data acquisition module is used for acquiring training data, wherein the training data comprises historical scene data and historical classification data;

a data set construction module for constructing a training data set and a verification data set based on the training data;

the model training verification module is used for training a preset initial prediction model through the training data set, and verifying the trained initial prediction model by utilizing the verification data set to obtain a classified prediction model.

Further, the initial prediction model includes an input layer, a self-attention layer, a feedforward neural network layer and an output layer, and the model training verification module specifically includes:

the training feature extraction unit is used for extracting features of training samples in the verification data set to obtain training sample features;

the training feature importing unit is used for importing training sample features into the initial prediction model through an input layer of the initial prediction model;

the self-attention layer unit is used for coding the training sample characteristics and assigning self-attention weights through the self-attention layer of the initial prediction model;

the vector mapping unit is used for carrying out vector mapping on the training sample characteristics after coding and weighting through a feedforward neural network layer of the initial prediction model;

The output decoding unit is used for obtaining the vector mapping result of the training sample characteristics through the output layer of the initial prediction model, and decoding the vector mapping result of the training sample characteristics to obtain a training prediction result;

the model iteration unit is used for iterating the initial prediction model through the training prediction result and a preset standard result to obtain a trained initial prediction model;

the model verification unit is used for verifying the initial prediction model after training through the verification data set, and obtaining the classification prediction model after the initial prediction model passes the verification.

In the above embodiment, the application discloses a data classifying device, which belongs to the technical field of big data and the technical field of risk-producing finance. According to the method, data to be classified is obtained through receiving a data classification instruction uploaded by a client, the data to be classified is analyzed, the data type of the data to be classified is obtained, a data classification model matched with the data to be classified is determined based on the data type, a matched classification model is obtained, the data to be classified is imported into the matched classification model, initial classification data is obtained, business scene data corresponding to the data to be classified is obtained, the business scene data and the initial classification data are imported into a classification prediction model, a classification prediction result is obtained, the classification prediction result is sent to the client, feedback information returned by the client is received, and the initial classification data is combined based on the feedback information and the classification prediction result, so that the data classification result is obtained. According to the method, the data is initially classified through the data classification model matched with the data to be classified, then reclassification prediction is carried out by utilizing the classification prediction model of machine learning according to the service scene condition, so that the accurate classification of the data based on the use scene is realized, and the method is particularly suitable for data classification in the financial risk production field.

In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 4, fig. 4 is a basic structural block diagram of a computer device according to the present embodiment.

The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It should be noted that only computer device 4 having components 41-43 is shown in the figures, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculations and/or information processing in accordance with predetermined or stored instructions, the hardware of which includes, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (fields-Programmable Gate Array, FPGAs), digital processors (Digital Signal Processor, DSPs), embedded devices, etc.

The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.

The memory 41 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the computer device 4. Of course, the memory 41 may also comprise both an internal memory unit of the computer device 4 and an external memory device. In this embodiment, the memory 41 is typically used to store an operating system and various application software installed on the computer device 4, such as computer readable instructions of a data classification method. Further, the memory 41 may be used to temporarily store various types of data that have been output or are to be output.

The processor 42 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or process data, such as computer readable instructions for executing the data categorization method.

The network interface 43 may comprise a wireless network interface or a wired network interface, which network interface 43 is typically used for establishing a communication connection between the computer device 4 and other electronic devices.

In the above embodiment, the application discloses a computer device, which belongs to the technical field of big data and the technical field of risk-producing finance. According to the method, data to be classified is obtained through receiving a data classification instruction uploaded by a client, the data to be classified is analyzed, the data type of the data to be classified is obtained, a data classification model matched with the data to be classified is determined based on the data type, a matched classification model is obtained, the data to be classified is imported into the matched classification model, initial classification data is obtained, business scene data corresponding to the data to be classified is obtained, the business scene data and the initial classification data are imported into a classification prediction model, a classification prediction result is obtained, the classification prediction result is sent to the client, feedback information returned by the client is received, and the initial classification data is combined based on the feedback information and the classification prediction result, so that the data classification result is obtained. According to the method, the data is initially classified through the data classification model matched with the data to be classified, then reclassification prediction is carried out by utilizing the classification prediction model of machine learning according to the service scene condition, so that the accurate classification of the data based on the use scene is realized, and the method is particularly suitable for data classification in the financial risk production field.

The present application also provides another embodiment, namely, a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the data categorization method as described above.

In the above embodiments, the present application discloses a computer readable storage medium, which belongs to the technical field of big data and the technical field of risk-producing finance. According to the method, data to be classified is obtained through receiving a data classification instruction uploaded by a client, the data to be classified is analyzed, the data type of the data to be classified is obtained, a data classification model matched with the data to be classified is determined based on the data type, a matched classification model is obtained, the data to be classified is imported into the matched classification model, initial classification data is obtained, business scene data corresponding to the data to be classified is obtained, the business scene data and the initial classification data are imported into a classification prediction model, a classification prediction result is obtained, the classification prediction result is sent to the client, feedback information returned by the client is received, and the initial classification data is combined based on the feedback information and the classification prediction result, so that the data classification result is obtained. According to the method, the data is initially classified through the data classification model matched with the data to be classified, then reclassification prediction is carried out by utilizing the classification prediction model of machine learning according to the service scene condition, so that the accurate classification of the data based on the use scene is realized, and the method is particularly suitable for data classification in the financial risk production field.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.

The application is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

It is apparent that the above-described embodiments are only some embodiments of the present application, but not all embodiments, and the preferred embodiments of the present application are shown in the drawings, which do not limit the scope of the patent claims. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a thorough and complete understanding of the present disclosure. Although the application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing description, or equivalents may be substituted for elements thereof. All equivalent structures made by the content of the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the scope of the application.

Claims

1. A method of classifying data, comprising:

receiving a data classification instruction uploaded by a client, obtaining data to be classified, and analyzing the data to be classified to obtain the data type of the data to be classified;

Importing the data to be classified into the matching classification model to obtain initial classification data;

the classification prediction result is sent to the client, and feedback information returned by the client is received;

2. The method for classifying data according to claim 1, wherein receiving a data classification instruction uploaded by a client, obtaining data to be classified, and analyzing the data to be classified, and obtaining a data type of the data to be classified, specifically comprises:

receiving a data classification instruction uploaded by a client, and acquiring the data to be classified based on the data classification instruction;

and determining the data type of the data to be classified based on the data source of the data to be classified.

3. The method for classifying data according to claim 1, wherein the data classification model includes a decision tree model and a support vector machine, and when the matching classification model is the decision tree model, the data to be classified is imported into the matching classification model to obtain initial classification data, specifically including:

extracting the characteristics of the preprocessed data to be classified to obtain first data characteristics;

loading a pre-trained decision tree model, and importing the first data features into the decision tree model to obtain a classification label output by the decision tree model;

and classifying the data to be classified according to the classification label to obtain the initial classification data.

4. The method for classifying data according to claim 1, wherein the data classification model includes a decision tree model and a support vector machine, and when the matching classification model is the support vector machine, the data to be classified is imported into the matching classification model to obtain initial classification data, and the method specifically includes:

extracting the characteristics of the preprocessed data to be classified to obtain second data characteristics;

And classifying the data to be classified according to the decision boundary to obtain the initial classification data.

5. The method for classifying data according to any one of claims 1 to 4, wherein the classification prediction model is a pre-trained transducer model, the classification prediction model includes an input layer, a self-attention layer, a feedforward neural network layer, and an output layer, the acquiring the service scene data corresponding to the data to be classified, and importing the service scene data and the initial classification data into the classification prediction model to obtain a classification prediction result, and the method specifically includes:

importing the scene data features into a classification prediction model through the input layer, and encoding and self-attention weight assignment the scene data features through the self-attention layer;

vector mapping is carried out on the coded and weighted scene data features through the feedforward neural network layer;

6. The method for classifying data according to claim 5, wherein before the acquiring the service scene data corresponding to the data to be classified, importing the service scene data and the initial classification data into a classification prediction model to obtain a classification prediction result, further comprising:

training a preset initial prediction model through the training data set, and verifying the trained initial prediction model by utilizing the verification data set to obtain the classification prediction model.

7. The method for classifying data according to claim 6, wherein the initial prediction model includes an input layer, a self-attention layer, a feedforward neural network layer, and an output layer, the training data set is used to train a preset initial prediction model, and the verification data set is used to verify the trained initial prediction model to obtain the classification prediction model, and the method specifically includes:

Importing the training sample characteristics into an initial prediction model through an input layer of the initial prediction model;

coding and self-attention weight assignment are carried out on the training sample characteristics through the self-attention layer of the initial prediction model;

and verifying the initial prediction model after training through the verification data set, and obtaining the classification prediction model after the initial prediction model passes verification.

8. A data sorting apparatus, comprising:

the data type determining module is used for receiving a data classification instruction uploaded by a client, acquiring data to be classified, analyzing the data to be classified and acquiring the data type of the data to be classified;

the initial data classification module is used for importing the data to be classified into the matching classification model to obtain initial classification data;

the data classification prediction module is used for acquiring the service scene data corresponding to the data to be classified, and importing the service scene data and the initial classification data into a classification prediction model to obtain a classification prediction result;

9. A computer device comprising a memory having stored therein computer readable instructions which when executed by the processor implement the steps of the data categorization method of any of claims 1 to 7.

10. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the data categorization method of any of claims 1 to 7.