CN112667569B - Feature method, feature system, computer device, and computer-readable storage medium - Google Patents

Feature method, feature system, computer device, and computer-readable storage medium Download PDF

Info

Publication number
CN112667569B
CN112667569B CN202011542660.4A CN202011542660A CN112667569B CN 112667569 B CN112667569 B CN 112667569B CN 202011542660 A CN202011542660 A CN 202011542660A CN 112667569 B CN112667569 B CN 112667569B
Authority
CN
China
Prior art keywords
data
feature
target data
user
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011542660.4A
Other languages
Chinese (zh)
Other versions
CN112667569A (en
Inventor
季德志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Bank Co Ltd
Original Assignee
Ping An Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Bank Co Ltd filed Critical Ping An Bank Co Ltd
Priority to CN202011542660.4A priority Critical patent/CN112667569B/en
Publication of CN112667569A publication Critical patent/CN112667569A/en
Application granted granted Critical
Publication of CN112667569B publication Critical patent/CN112667569B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a feature deriving method, which comprises the following steps: receiving a data acquisition request of a user, wherein the data acquisition request comprises user information and data set information of the user; verifying the user information to determine whether the user has rights to acquire data; when verification is passed, inquiring target data corresponding to the data set information based on the data acquisition request, and constructing a target data set according to the inquired target data; establishing a feature derivative task, and performing multi-process feature calculation on the target data set based on the feature derivative task to obtain preprocessing data; and performing feature derivation processing on the preprocessed data based on a preset machine learning algorithm to obtain model data. The invention has the beneficial effects that: the effectiveness and the excavation efficiency of the feature excavation are improved.

Description

Feature method, feature system, computer device, and computer-readable storage medium
Technical Field
Embodiments of the present invention relate to the field of big data processing, and in particular, to a feature method, a feature system, a computer device, and a computer readable storage medium.
Background
The AI algorithm model is an important tool for supporting the strategic direction of the financial banking industry, and can quickly and efficiently help business adjustment strategy. Feature engineering is crucial in constructing the whole AI algorithm model, but the existing feature engineering construction has the following problems:
the feature screening efficiency is low: and the code is manually written, a plurality of variables are analyzed, effective characteristics are mined, and the efficiency is low. Feature excavation is difficult: for multiple combined features, mining is difficult and multiple scenarios cannot be multiplexed. Feature mining is not comprehensive enough: when a scene with a large feature type scale is faced, the mining of weak data is easy to ignore.
Disclosure of Invention
Accordingly, it is an object of embodiments of the present invention to provide a feature method, system, computer device and computer readable storage medium to solve the problems of difficulty and inefficiency in feature mining.
To achieve the above object, an embodiment of the present invention provides a feature deriving method, including:
receiving a data acquisition request of a user, wherein the data acquisition request comprises user information and data set information of the user;
verifying the user information to determine whether the user has rights to acquire data;
when verification is passed, inquiring target data corresponding to the data set information based on the data acquisition request, and constructing a target data set according to the inquired target data;
establishing a feature derivative task, and performing multi-process feature calculation on the target data set based on the feature derivative task to obtain preprocessing data;
and performing feature derivation processing on the preprocessed data based on a preset machine learning algorithm to obtain model data.
Further, when the verification is passed, querying target data corresponding to the data set information based on the data acquisition request, and constructing a target data set according to the queried target data includes:
generating a plurality of query statements based on the dataset information in the data acquisition request when verification passes;
querying target data corresponding to the data set information through the plurality of query sentences;
burying points in advance for the plurality of inquiry sentences to obtain a data inquiry process of the target data, and generating a log file based on the inquiry process;
and storing the target data into a data file, and associating the data file with the log file to construct the target data set.
Further, the establishing a feature deriving task, performing multi-process feature computation on the target data set based on the feature deriving task, and obtaining preprocessing data includes:
establishing a feature derivative task according to the target data set;
and carrying out multi-process feature calculation on the target data set based on the feature derivative task so as to sort the target data set according to a calculation result and obtain preprocessing data.
Further, the performing feature derivation processing on the preprocessed data based on a preset machine learning algorithm to obtain model data includes:
discretizing the preprocessed data to obtain a plurality of characteristic data;
and performing cross operation on the characteristic data through the machine learning algorithm to obtain a plurality of model data.
Further, the performing the cross operation on the feature data to obtain model data includes:
calculating the weight value and the information quantity of each characteristic data through the machine learning algorithm to obtain a first-order derivative result;
calculating the weight value and the information quantity of the feature data combined pairwise through the machine learning algorithm to obtain a second-order derivative result;
and screening the preprocessing data based on the first-order derivative result and the second-order derivative result to obtain model data.
Further, after the preprocessing data is filtered based on the first-order derivative result and the second-order derivative result to obtain model data, the method further includes:
and respectively drawing effect graphs of the first-order derivative result and the second-order derivative result.
Further, the method further comprises:
the model data is stored into a blockchain.
To achieve the above object, an embodiment of the present invention provides a feature deriving system, including:
the receiving module is used for receiving a data acquisition request of a user, wherein the data acquisition request comprises user information and data set information of the user;
the verification module is used for verifying the user information to determine whether the user has permission to acquire data;
the query module is used for querying target data corresponding to the data set information based on the data acquisition request when verification passes, and constructing a target data set according to the queried target data;
the computing module is used for establishing a feature derivative task, and carrying out multi-process feature computation on the target data set based on the feature derivative task to obtain preprocessing data;
and the processing module is used for carrying out feature derivation processing on the preprocessing data based on a preset machine learning algorithm to obtain model data.
To achieve the above object, an embodiment of the present invention provides a computer device, including a memory, a processor, and a computer program stored on the memory, where the computer program is executable on the processor to implement the steps of the feature derivation method as described in any one of the above.
To achieve the above object, an embodiment of the present invention provides a computer-readable storage medium having stored therein a computer program executable by at least one processor to cause the at least one processor to perform the steps of the feature derivation method as described in any one of the above.
The feature method, the system, the computer equipment and the computer readable storage medium provided by the embodiment of the invention are based on the feature deriving method of the feature deriving system operation, firstly, whether a user can perform feature processing is judged through login permission, and then, the data set information is subjected to multi-process feature calculation and derivation processing, and the first-order derivation processing and the second-order derivation processing adopted during the derivation processing are performed to obtain model data. The development manpower is saved, and the efficiency of feature screening is improved.
Drawings
FIG. 1 is a flow chart of a first embodiment of the feature deriving method of the present invention.
FIG. 2 is a schematic diagram of a program module of a second embodiment of the deriving system according to the present invention.
Fig. 3 is a schematic diagram of a hardware structure of a third embodiment of the computer device of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
Referring to FIG. 1, a flow chart of steps of a feature deriving method according to a first embodiment of the invention is shown. It will be appreciated that the flow charts in the method embodiments are not intended to limit the order in which the steps are performed. An exemplary description will be made below with the computer device 2 as an execution subject. Specifically, the following is described.
Step S100, a data acquisition request of a user is received, wherein the data acquisition request comprises user information and data set information of the user.
Specifically, a first node of the feature derivation system receives a data acquisition request of a user, the first node is an aspectJ frame, the data acquisition request of the user is acquired through the aspectJ frame, and the aspectJ frame is a section frame and is used for carrying out security check on user information when the data acquisition request is carried out so as to ensure the security of the data. The user information comprises user name, user grade, user password and the like of the user, and the data set information is identification information of a data set which is required to be subjected to feature derivation by the user. The user performs data acquisition operation, selects information of required target data, such as name field information of the target data or field information of a data set where the data set is located, and the like, generates a data acquisition request according to the data set information and the user information to perform data acquisition, wherein the data acquisition request is used for acquiring a corresponding data set from a data storage library such as a database or a hadoop data warehouse according to the data set information, and before the data acquisition request, a plurality of data sets are stored in the database in advance.
Step S120, verifying the user information to determine whether the user has permission to acquire data.
Specifically, whether the user has permission to acquire data is determined according to the user name or the user grade in the user information, and if the user passes the verification, a data acquisition request of the user is received, and the operation is continued. If the authority of the user is insufficient, rejecting the data acquisition request of the user, and providing corresponding similar data sets for recommendation, wherein the similar data sets are data sets which are queried according to the authority of the user so as to perform other characteristic derivative operations.
And step S140, when the verification is passed, inquiring target data corresponding to the data set information based on the data acquisition request, and constructing a target data set according to the inquired target data.
Specifically, when the user information verification passes, a query statement is generated based on the data set information to perform data query. When the data volume of the target data to be queried is huge, data query is carried out through a plurality of query sentences, and the query sentences are spliced to obtain a total query sentence, so that each query sentence is carried out in a multi-thread mode, the query speed is increased, and the target data set is obtained. And the AspectJ framework is directly connected with the hadoop data warehouse, can be directly connected with the hadoop data warehouse, and can automatically pull a large-scale data set.
In an exemplary embodiment, the step S140 specifically includes:
step S141, when the verification passes, generating a plurality of query sentences based on the data set information in the data acquisition request.
Specifically, the target data can be classified according to the attribute of the target data, and a query statement is generated on the target data with the same attribute, so that data query can be quickly performed. For example: target data of the money attribute is queried through the same query statement.
Step S142, querying target data corresponding to the data set information through the plurality of query sentences.
Specifically, the queried target data is saved in a data center so as to trace the data, for example, the target data is queried from which database.
Step S143, burying points in advance for the plurality of query sentences to obtain a data query process of the target data, and generating a log file based on the query process.
Specifically, log files generated in the query process of the target data can be stored, for example, how the query statement corresponding to each target data is, if errors occur in the queried data results, the error reasons can be conveniently located.
And step S144, storing the target data into a data file, and associating the data file with the log file to construct the target data set.
Specifically, the target data is stored in the form of a data file, the data file also comprises a source of the target data, the data file is associated with the log file, a target data set is obtained and stored in a data center, and the data center can also be a blockchain so as to better save the data and trace the source of the data.
And step S160, establishing a feature derivative task, and performing multi-process feature calculation on the target data set based on the feature derivative task to obtain preprocessing data.
Specifically, a feature deriving task is generated according to a target data set for feature derivation, and the target data set is subjected to multi-process feature calculation to obtain preprocessing data. The feature derivation task can be performed by a message distribution platform, and the target data set is acquired in the data center, so that multi-process feature calculation is performed on the feature calculation level to perform extensive processing derivation on the feature, such as multi-process feature calculation of maximum value, minimum value, average value and the like of related features, so as to perform data discretization processing. For example: when the maximum value in the target data is 280 and the minimum value is 20, the target data can be discretized into [20-50] … … [241-270], [270+ ] fields for feature derivation.
In an exemplary embodiment, the step S160 specifically includes:
step S161, establishing a feature derivative task according to the target data set.
And step S162, performing multi-process feature calculation on the target data set based on the feature deriving task so as to sort the target data set according to a calculation result and obtain preprocessing data.
Specifically, a feature derived task of each target data set is established, multi-process feature calculation is performed on the target data sets based on the feature derived task, each target data set corresponds to one process, the feature calculation is to calculate the maximum value, the minimum value, the average value and the like of each target data set, order each target data set according to the calculation result, order the target data in the target data sets, and obtain preprocessing data so as to facilitate feature derived processing.
And step S180, performing feature derivation processing on the preprocessed data based on a preset machine learning algorithm to obtain model data.
Specifically, the feature derivation process is to perform some combination on the features of the existing preprocessed data, and generate new features with meaning, that is, features of the model data, so as to mine the data through the model.
In an exemplary embodiment, the step S180 specifically includes:
step S181, performing discretization processing on the preprocessed data, to obtain a plurality of feature data.
And step S182, performing feature derivation processing on the feature data through the machine learning algorithm to obtain a plurality of model data.
Specifically, discretizing the preprocessed data to obtain a plurality of feature data, calculating the weight value and the information quantity of the feature data, selecting the feature data with the weight value and the information quantity larger than a preset threshold, and inputting the feature data into a machine learning algorithm for feature derivation processing to obtain model data.
In an exemplary embodiment, the step S182 specifically includes:
step S182A, calculating a weight value and an information amount of each feature data by the machine learning algorithm, so as to obtain a first-order derivative result.
Specifically, the first-order derivatization process adopts a machine learning algorithm to perform feature ordering on the preprocessed data, embeds a random forest of a front edge machine learning algorithm, takes the importance of model variables as a feature index, and is used for assisting in judging the effectiveness of target data. The random forest model itself is a model for prediction, but in the prediction process, the importance of feature data may be ranked, and feature screening may then be performed by such ranking. The method comprises the steps of rearranging the sequenced preprocessing data into the sequence of a certain column of characteristic data, observing the accuracy of how much model is reduced, wherein the certain column of characteristic data can be understood as a column of characteristic data with larger information quantity, and the information quantity is calculated according to the weight value.
The method for calculating the importance of a certain feature X in a random forest is as follows:
and obtaining the frequency of occurrence of the characteristic X in the tree node in each tree of the random forest, calculating the duty ratio of the characteristic X in each tree as a weight value, calculating the information quantity according to the weight value, and further reducing the influence value of the characteristic on the model to the minimum according to the information quantity.
The full name of WOE is "Weight of Evidence", i.e. the weight value. WOE is a coded form of the original argument. To WOE encode a variable, each target data set needs to be first subjected to packet processing (also called discretization, binning, etc.). After grouping, for group i, i < = r, WOE is calculated as follows:
WOE i =In(py i /pn i )=In((y i /y r )/(n i /n r )),
WOE represents the ratio of the responsive target data y to the non-responsive target data n in the current target data set, the difference in this ratio between the responsive target data and all target data. This difference is expressed by the ratio of the two ratios, and taking the logarithm. The larger the WOE, the greater the likelihood of the data in the packet responding, the smaller the WOE, the smaller the variance, and the less the likelihood of the data in the packet responding. It can be understood that the smaller the correlation between feature data in the target data set, the smaller the weight value, and the larger the correlation between feature data, the larger the weight value.
The IV is generally known as Information Value, i.e. information value, or information quantity, and when a classification model is constructed by using model regression, decision tree, etc., feature data is often required to be screened. The predictive power of the feature data can be measured by IV indicators.
IV i =(py i -pn i )*WOE i =(py i -pn i )*In(py i /pn i )=In(py i -pn i )*In((y i /y r )/(n i /n r )),
Feature screening of different dimension variables can be completed through the formula, and the screened features are used as model features of a machine learning algorithm to obtain a first-order derivative result.
And step S182B, calculating the weight value and the information quantity of the feature data combined pairwise through the machine learning algorithm to obtain a second-order derivative result.
In particular, second order derivation is achieved by randomly combining two or more characteristic data. And then, the combined characteristic data is calculated again according to the calculation method of the importance of a certain characteristic X in the random forest, and a second-order derivative result is obtained.
And step S182C, screening the preprocessing data based on the first-order derivative result and the second-order derivative result to obtain model data.
Specifically, the pre-processing data is screened according to the weight value and the information quantity of the pre-processing data in the first-order derivative result and the second-order derivative result, so that model data are obtained. The accuracy of the model is improved by applying the model data after the derivative treatment to the model; when the model is applied to data mining, development manpower is saved, and the efficiency of feature screening is improved.
In an exemplary embodiment, after the step S182C, the method further includes:
and respectively drawing effect graphs of the first-order derivative result and the second-order derivative result.
Specifically, a 2-weft effect diagram is drawn according to a first-order derivative effect diagram, a 2-weft effect diagram is drawn according to a second-order derivative effect diagram, and the business is helped to intuitively know the form of the data characteristics by establishing the 2-weft effect diagram and the 3-weft effect diagram so as to formulate a pertinence strategy according to the effect diagram under the condition that an interpretability principle is met.
In an exemplary embodiment, the method further comprises:
the model data is stored into a blockchain.
Specifically, uploading model data to the blockchain may ensure its security and fair transparency to the user. The user device may download the model data from the blockchain to verify whether the model data has been tampered with. The blockchain referred to in this example is a novel mode of application for computer technology such as distributed data storage, point-to-point transmission, consensus mechanisms, encryption algorithms, and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
Example two
With continued reference to fig. 2, a schematic program module of a second embodiment of the feature deriving system of the present invention is shown. In this embodiment, the feature derivation system 20 may include or be partitioned into one or more program modules that are stored in a storage medium and executed by one or more processors to perform the present invention and implement the feature derivation methods described above. Program modules in accordance with the embodiments of the present invention are directed to a series of computer program instruction segments capable of performing the specified functions, which are more suitable than the program itself for describing the execution of the feature deriving system 20 in a storage medium. The following description will specifically describe functions of each program module of the present embodiment:
the receiving module 200 is configured to receive a data acquisition request of a user, where the data acquisition request includes user information and data set information of the user.
Specifically, a first node of the feature derivation system receives a data acquisition request of a user, the first node is an aspectJ frame, the data acquisition request of the user is acquired through the aspectJ frame, and the aspectJ frame is a section frame and is used for carrying out security check on user information when the data acquisition request is carried out so as to ensure the security of the data. The user information comprises user name, user grade, user password and the like of the user, and the data set information is identification information of a data set which is required to be subjected to feature derivation by the user. The user performs data acquisition operation, selects information of required target data, such as name field information of the target data or field information of a data set where the data set is located, and the like, generates a data acquisition request according to the data set information and the user information to perform data acquisition, wherein the data acquisition request is used for acquiring a corresponding data set from a data storage library such as a database or a hadoop data warehouse according to the data set information, and before the data acquisition request, a plurality of data sets are stored in the database in advance.
And the verification module 202 is configured to verify the user information to determine whether the user has permission to acquire data.
Specifically, whether the user has permission to acquire data is determined according to the user name or the user grade in the user information, and if the user passes the verification, a data acquisition request of the user is received, and the operation is continued. If the authority of the user is insufficient, rejecting the data acquisition request of the user, and providing corresponding similar data sets for recommendation, wherein the similar data sets are data sets which are queried according to the authority of the user so as to perform other characteristic derivative operations.
And the query module 204 is configured to query target data corresponding to the data set information based on the data acquisition request when the verification is passed, and construct a target data set according to the queried target data.
Specifically, when the user information verification passes, a query statement is generated based on the data set information to perform data query. When the data volume of the target data to be queried is huge, data query is carried out through a plurality of query sentences, and the query sentences are spliced to obtain a total query sentence, so that each query sentence is carried out in a multi-thread mode, the query speed is increased, and the target data set is obtained. And the AspectJ framework is directly connected with the hadoop data warehouse, can be directly connected with the hadoop data warehouse, and can automatically pull a large-scale data set.
In an exemplary embodiment, the query module 204 is specifically configured to:
when the verification passes, a plurality of query statements are generated based on the data set information in the data acquisition request.
Specifically, the target data can be classified according to the attribute of the target data, and a query statement is generated on the target data with the same attribute, so that data query can be quickly performed. For example: target data of the money attribute is queried through the same query statement.
And querying target data corresponding to the data set information through the plurality of query sentences.
Specifically, the queried target data is saved in a data center so as to trace the data, for example, the target data is queried from which database.
Burying points in advance for the plurality of query sentences to obtain a data query process of the target data, and generating a log file based on the query process.
Specifically, log files generated in the query process of the target data can be stored, for example, how the query statement corresponding to each target data is, if errors occur in the queried data results, the error reasons can be conveniently located.
And storing the target data into a data file, and associating the data file with the log file to construct the target data set.
Specifically, the target data is stored in the form of a data file, the data file also comprises a source of the target data, the data file is associated with the log file, a target data set is obtained and stored in a data center, and the data center can also be a blockchain so as to better save the data and trace the source of the data.
And the computing module 206 is configured to establish a feature derivation task, and perform multi-process feature computation on the target data set based on the feature derivation task to obtain preprocessed data.
Specifically, a feature deriving task is generated according to a target data set for feature derivation, and the target data set is subjected to multi-process feature calculation to obtain preprocessing data. The feature derivation task can be performed by a message distribution platform, and the target data set is acquired in the data center, so that multi-process feature calculation is performed on the feature calculation level to perform extensive processing derivation on the feature, such as multi-process feature calculation of maximum value, minimum value, average value and the like of related features, so as to perform data discretization processing.
In an exemplary embodiment, the computing module 206 is specifically configured to:
and establishing a feature derivative task according to the target data set.
And carrying out multi-process feature calculation on the target data set based on the feature derivative task so as to sort the target data set according to a calculation result and obtain preprocessing data.
Specifically, a feature derived task of each target data set is established, multi-process feature calculation is performed on the target data sets based on the feature derived task, each target data set corresponds to one process, the feature calculation is to calculate the maximum value, the minimum value, the average value and the like of each target data set, order each target data set according to the calculation result, order the target data in the target data sets, and obtain preprocessing data so as to facilitate feature derived processing.
And the processing module 208 is configured to perform feature derivation processing on the preprocessed data based on a preset machine learning algorithm, so as to obtain model data.
Specifically, the feature derivation process is to perform some combination of features of existing preprocessed data, and generate new features with meaning, that is, features of model data, so as to mine the data through the model.
In an exemplary embodiment, the processing module 208 is specifically configured to:
discretizing the preprocessed data to obtain a plurality of characteristic data.
And performing feature derivation processing on the feature data through the machine learning algorithm to obtain a plurality of model data.
Specifically, discretizing the preprocessed data to obtain a plurality of feature data, calculating the weight value and the information quantity of the feature data, selecting the feature data with the weight value and the information quantity larger than a preset threshold, and inputting the feature data into a machine learning algorithm for feature derivation processing to obtain model data.
Example III
Referring to fig. 3, a hardware architecture diagram of a computer device according to a third embodiment of the present invention is shown. In this embodiment, the computer device 2 is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction. The computer device 2 may be a rack server, a blade server, a tower server, or a rack server (including a stand-alone server, or a server cluster made up of multiple servers), or the like. As shown in fig. 3, the computer device 2 includes, but is not limited to, at least a memory 21, a processor 22, a network interface 23, and a feature derivation system 20, which are communicatively coupled to each other via a system bus. Wherein:
in this embodiment, the memory 21 includes at least one type of computer-readable storage medium including flash memory, a hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the memory 21 may be an internal storage unit of the computer device 2, such as a hard disk or a memory of the computer device 2. In other embodiments, the memory 21 may also be an external storage device of the computer device 2, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the computer device 2. Of course, the memory 21 may also include both internal storage units of the computer device 2 and external storage devices. In this embodiment, the memory 21 is typically used to store an operating system and various types of application software installed on the computer device 2, such as program codes of the feature deriving system 20 of the second embodiment. Further, the memory 21 may be used to temporarily store various types of data that have been output or are to be output.
The processor 22 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 22 is typically used to control the overall operation of the computer device 2. In this embodiment, the processor 22 is configured to execute the program code stored in the memory 21 or process data, for example, execute the feature deriving system 20, to implement the feature deriving method of the first embodiment.
The network interface 23 may comprise a wireless network interface or a wired network interface, which network interface 23 is typically used for establishing a communication connection between the server 2 and other electronic devices. For example, the network interface 23 is used to connect the server 2 to an external terminal through a network, establish a data transmission channel and a communication connection between the server 2 and the external terminal, and the like. The network may be an Intranet (Intranet), the Internet (Internet), a global system for mobile communications (Global System of Mobile communication, GSM), wideband code division multiple access (Wideband Code Division Multiple Access, WCDMA), a 4G network, a 5G network, bluetooth (Bluetooth), wi-Fi, or other wireless or wired network. It is noted that fig. 3 only shows a computer device 2 having components 20-23, but it is understood that not all of the illustrated components are required to be implemented, and that more or fewer components may alternatively be implemented.
In this embodiment, the feature deriving system 20 stored in the memory 21 can also be divided into one or more program modules, which are stored in the memory 21 and executed by one or more processors (the processor 22 in this embodiment) to complete the present invention.
For example, fig. 2 shows a schematic diagram of a program module for implementing the second embodiment of the feature deriving system 20, where the feature deriving system 20 may be divided into the receiving module 200, the verifying module 202, the querying module 204, the calculating module 206 and the processing module 208. Program modules in the present invention are understood to mean a series of computer program instruction segments capable of performing a specified function, more appropriately than a program, describing the execution of the feature-derived system 20 in the computer device 2. The specific functions of the program modules 200-208 are described in detail in the second embodiment, and are not described herein.
Example IV
The present embodiment also provides a computer-readable storage medium such as a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application store, etc., on which a computer program is stored, which when executed by a processor, performs the corresponding functions. The computer-readable storage medium of the present embodiment is used for a computer program, which when executed by a processor implements the feature deriving method of the first embodiment.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (7)

1. A method of deriving features comprising:
receiving a data acquisition request of a user, wherein the data acquisition request comprises user information and data set information of the user;
verifying the user information to determine whether the user has rights to acquire data;
when verification is passed, inquiring target data corresponding to the data set information based on the data acquisition request, and constructing a target data set according to the inquired target data;
establishing a feature derivative task, and performing multi-process feature calculation on the target data set based on the feature derivative task to obtain preprocessing data;
performing feature derivation processing on the preprocessed data based on a preset machine learning algorithm to obtain model data;
the pre-processing data is subjected to feature derivation processing based on a preset machine learning algorithm, and the obtaining of model data comprises the following steps:
discretizing the preprocessed data to obtain a plurality of characteristic data;
performing cross operation on the characteristic data through the machine learning algorithm to obtain a plurality of model data;
the step of performing cross operation on the characteristic data to obtain model data comprises the following steps:
calculating the weight value and the information quantity of each characteristic data through the machine learning algorithm to obtain a first-order derivative result;
calculating the weight value and the information quantity of the feature data combined pairwise through the machine learning algorithm to obtain a second-order derivative result;
screening the preprocessing data based on the first-order derivative result and the second-order derivative result to obtain model data;
the filtering the preprocessing data based on the first-order derivative result and the second-order derivative result to obtain model data further comprises:
and respectively drawing effect graphs of the first-order derivative result and the second-order derivative result.
2. The feature deriving method according to claim 1, wherein when the verification is passed, querying target data corresponding to the dataset information based on the data acquisition request, and constructing a target dataset from the queried target data comprises:
generating a plurality of query statements based on the dataset information in the data acquisition request when verification passes;
querying target data corresponding to the data set information through the plurality of query sentences;
burying points in advance for the plurality of inquiry sentences to obtain a data inquiry process of the target data, and generating a log file based on the inquiry process;
and storing the target data into a data file, and associating the data file with the log file to construct the target data set.
3. The feature derivation method of claim 1, wherein the establishing a feature derivation task, performing a multi-process feature calculation on the target data set based on the feature derivation task, comprises:
establishing a feature derivative task according to the target data set;
and carrying out multi-process feature calculation on the target data set based on the feature derivative task so as to sort the target data set according to a calculation result and obtain preprocessing data.
4. The feature derivation method of claim 1, further comprising:
the model data is stored into a blockchain.
5. A feature derivation system, comprising:
the receiving module is used for receiving a data acquisition request of a user, wherein the data acquisition request comprises user information and data set information of the user;
the verification module is used for verifying the user information to determine whether the user has permission to acquire data;
the query module is used for querying target data corresponding to the data set information based on the data acquisition request when verification passes, and constructing a target data set according to the queried target data;
the computing module is used for establishing a feature derivative task, and carrying out multi-process feature computation on the target data set based on the feature derivative task to obtain preprocessing data;
the processing module is used for carrying out feature derivation processing on the preprocessing data based on a preset machine learning algorithm to obtain model data;
the processing module is also used for carrying out discretization processing on the preprocessed data to obtain a plurality of characteristic data; performing cross operation on the characteristic data through the machine learning algorithm to obtain a plurality of model data;
the processing module is also used for calculating the weight value and the information quantity of each characteristic data through the machine learning algorithm to obtain a first-order derivative result; calculating the weight value and the information quantity of the feature data combined pairwise through the machine learning algorithm to obtain a second-order derivative result; screening the preprocessing data based on the first-order derivative result and the second-order derivative result to obtain model data; and respectively drawing effect graphs of the first-order derivative result and the second-order derivative result.
6. A computer device comprising a memory, a processor, the memory having stored thereon a computer program executable on the processor, the computer program being executable by the processor to perform the steps of the feature derivation method of any one of claims 1-4.
7. A computer-readable storage medium, in which a computer program is stored, the computer program being executable by at least one processor to cause the at least one processor to perform the steps of the feature derivation method of any one of claims 1-4.
CN202011542660.4A 2020-12-23 2020-12-23 Feature method, feature system, computer device, and computer-readable storage medium Active CN112667569B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011542660.4A CN112667569B (en) 2020-12-23 2020-12-23 Feature method, feature system, computer device, and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011542660.4A CN112667569B (en) 2020-12-23 2020-12-23 Feature method, feature system, computer device, and computer-readable storage medium

Publications (2)

Publication Number Publication Date
CN112667569A CN112667569A (en) 2021-04-16
CN112667569B true CN112667569B (en) 2024-04-16

Family

ID=75409401

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011542660.4A Active CN112667569B (en) 2020-12-23 2020-12-23 Feature method, feature system, computer device, and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN112667569B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114064976A (en) * 2021-10-20 2022-02-18 同盾科技有限公司 Data feature calculation method, system, electronic device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109935338A (en) * 2019-03-07 2019-06-25 平安科技(深圳)有限公司 Data prediction processing method, device and computer equipment based on machine learning
CN110232292A (en) * 2019-05-06 2019-09-13 平安科技(深圳)有限公司 Data access authority authentication method, server and storage medium
CN110442654A (en) * 2019-07-08 2019-11-12 深圳壹账通智能科技有限公司 Promise breaking information query method, device, computer equipment and storage medium
CN111414407A (en) * 2020-02-13 2020-07-14 中国平安人寿保险股份有限公司 Data query method and device of database, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109935338A (en) * 2019-03-07 2019-06-25 平安科技(深圳)有限公司 Data prediction processing method, device and computer equipment based on machine learning
CN110232292A (en) * 2019-05-06 2019-09-13 平安科技(深圳)有限公司 Data access authority authentication method, server and storage medium
CN110442654A (en) * 2019-07-08 2019-11-12 深圳壹账通智能科技有限公司 Promise breaking information query method, device, computer equipment and storage medium
CN111414407A (en) * 2020-02-13 2020-07-14 中国平安人寿保险股份有限公司 Data query method and device of database, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于二次组合的特征工程与XGBoost模型的用户行为预测;杨立洪;白肇强;;科学技术与工程;20180518(第14期);全文 *

Also Published As

Publication number Publication date
CN112667569A (en) 2021-04-16

Similar Documents

Publication Publication Date Title
CN110609759B (en) Fault root cause analysis method and device
CN111563016B (en) Log collection and analysis method and device, computer system and readable storage medium
CN111866016A (en) Log analysis method and system
CN113726784A (en) Network data security monitoring method, device, equipment and storage medium
CN111274202B (en) Electronic contract generation method, device, computer equipment and storage medium
CN114493255A (en) Enterprise abnormity monitoring method based on knowledge graph and related equipment thereof
CN112667569B (en) Feature method, feature system, computer device, and computer-readable storage medium
CN112446637A (en) Building construction quality safety online risk detection method and system
CN114218174B (en) Industrial internet data storage method, system and storage medium based on block chain
CN113420887A (en) Prediction model construction method and device, computer equipment and readable storage medium
CN112529543A (en) Method, device and equipment for verifying mutual exclusion relationship of workflow and storage medium
CN117435480A (en) Binary file detection method and device, electronic equipment and storage medium
CN111738356A (en) Object feature generation method, device, equipment and storage medium for specific data
CN112181836A (en) Test case generation method, system, device and storage medium
CN113704624B (en) Policy recommendation method, device, equipment and medium based on user distribution
CN111209158B (en) Mining monitoring method and cluster monitoring system for server cluster
CN112732925A (en) Method for determining investment data based on atlas, storage medium and related equipment
CN112287663A (en) Text parsing method, equipment, terminal and storage medium
CN112287005A (en) Data processing method, device, server and medium
CN112750047A (en) Behavior relation information extraction method and device, storage medium and electronic equipment
CN113688049B (en) Retrospective detection method, retrospective detection device, retrospective detection equipment and retrospective detection medium based on input information
CN113724065B (en) Auxiliary collecting method, device, equipment and storage medium based on flow guidance
CN114741673B (en) Behavior risk detection method, clustering model construction method and device
CN113886343A (en) Transaction data abnormity monitoring method, system, equipment and medium
CN114663211A (en) Cloud computing based financial report analysis method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant