CN114138864A

CN114138864A - Feature data extraction method and device based on artificial intelligence and related equipment

Info

Publication number: CN114138864A
Application number: CN202111435425.1A
Authority: CN
Inventors: 季德志
Original assignee: Ping An Bank Co Ltd
Current assignee: Ping An Bank Co Ltd
Priority date: 2021-11-29
Filing date: 2021-11-29
Publication date: 2022-03-04

Abstract

The invention relates to the technical field of big data, and provides a feature data extraction method, a device and related equipment based on artificial intelligence, wherein the method comprises the following steps: performing logic processing on the flow type data set according to the type of the data warehouse and a preset processing engine to generate a target flow type data set; performing first preprocessing and second preprocessing on a target stream data set to obtain a second characteristic data set; performing first-order derivation on the second characteristic data set to obtain a third characteristic data set; and performing feature scoring on each feature data in the third feature data set by adopting a plurality of preset evaluation systems to obtain a plurality of feature scoring values of each feature data, and then performing feature data extraction to obtain a feature data extraction result. According to the invention, each feature data is subjected to feature scoring by adopting a plurality of preset evaluation systems, and then feature data extraction is carried out, so that effective features are screened out, and the accuracy and effectiveness of feature data extraction are improved.

Description

Feature data extraction method and device based on artificial intelligence and related equipment

Technical Field

The invention relates to the technical field of big data, in particular to a method and a device for extracting feature data based on artificial intelligence and related equipment.

Background

With the development of artificial intelligence, the machine learning method is widely used by people, and the quality or the badness of the characteristic engineering which needs to be carried out on a data set in the machine learning method directly influences the final effect of a model in the machine learning method. Conventional feature engineering often implies a large amount of manual intervention by which to obtain feature data sets.

However, the manual intervention means that the extraction of the feature data set is limited to certain developed individual thinking spaces and cannot be truly 'wide and deep', so that the integrity and accuracy of the extracted feature data set are low.

Therefore, it is necessary to provide a method for extracting a feature data set quickly and accurately.

Disclosure of Invention

In view of the above, it is necessary to provide a feature data extraction method, device and related apparatus based on artificial intelligence, in which feature scoring is performed on each feature data by using a plurality of preset evaluation systems, and then feature data extraction is performed, so as to screen out effective features, thereby improving accuracy and effectiveness of feature data extraction.

The first aspect of the present invention provides a feature data extraction method based on artificial intelligence, the method comprising:

analyzing the received characteristic data extraction request to obtain a stream type data set and a data warehouse type corresponding to the stream type data set;

performing logic processing on the stream type data set according to the data warehouse type and a preset processing engine to generate a target stream type data set;

performing first preprocessing on the target stream data set to obtain a first characteristic data set;

performing second preprocessing on the first characteristic data set to obtain a second characteristic data set;

performing first-order derivation on the second characteristic data set by adopting a preset first-order derivation algorithm to obtain a third characteristic data set;

performing feature scoring on each feature data in the third feature data set by adopting a plurality of preset evaluation systems to obtain a plurality of feature scoring values of each feature data;

and extracting the feature data of the third feature data set based on the plurality of feature scoring values of each feature data to obtain a feature data extraction result of the feature data extraction request.

Optionally, the performing logic processing on the stream-type data set according to the data warehouse type and a preset processing engine to generate a target stream-type data set includes:

acquiring a template identification code from the configuration requirement in the characteristic data extraction request;

acquiring a corresponding configuration template based on the template identification code, and selecting an aggregation main key, a date field, a numerical value field and a character field according to the configuration template;

configuring the flow data set based on the aggregation main key, the date field, the numerical value field and the character field to obtain a flow data set;

and automatically generating a processing logic according to the data warehouse type and a preset processing engine, and logically processing the flow type data set by adopting the processing logic to generate a target flow type data set.

Optionally, the performing a first preprocessing on the target pipeline data set to obtain a first feature data set includes:

dividing the target pipeline type data set for the first time according to a plurality of preset first field types to obtain preset subdata sets of each first field type;

performing operator processing on each sub data in each preset sub data set of each first field type by adopting an operator corresponding to each preset first field type to obtain a characteristic data set of each preset first field type;

and merging the plurality of preset feature data sets of the plurality of first field types, and determining the merged plurality of feature data sets as a first feature data set.

Optionally, the second preprocessing on the first feature data set to obtain a second feature data set includes:

and performing second division on the first characteristic data set according to a plurality of preset second field types to obtain a second characteristic sub data set of each preset second field type, and determining a plurality of second characteristic sub data corresponding to the plurality of preset second field types as the second characteristic data set.

Optionally, the performing first-order derivation on the second feature data set by using a preset first-order derivation algorithm to obtain a third feature data set includes:

extracting a preset second feature sub data set of each second field type from the second feature set;

performing first-order derivation on each feature subdata in a second feature subdata set corresponding to each preset second field type by adopting a preset first-order derivation algorithm corresponding to each second field type to obtain a preset third feature subdata set of each second field type;

and merging the plurality of preset third characteristic sub-data sets of the plurality of second field types, and determining the merged plurality of third characteristic sub-data sets as third characteristic data sets.

Optionally, the extracting the feature data of the third feature data set based on the plurality of feature score values of each feature data, and obtaining the feature data extraction result of the feature data extraction request includes:

performing weighted calculation on a plurality of feature score values of each feature data in the third feature data set by adopting a preset weighting algorithm to obtain a weighted value of each feature data;

and selecting a plurality of feature data with larger weighted values from the calculated weighted values, and determining the feature data as the feature data extraction result of the feature data extraction request.

Optionally, the preset first field type includes a value field type, a category field type and a date field type, and the preset second field type includes a value type and a character type.

A second aspect of the present invention provides an artificial intelligence-based feature data extraction apparatus, the apparatus comprising:

the analysis and acquisition module is used for analyzing the received characteristic data extraction request and acquiring a stream type data set and a data warehouse type corresponding to the stream type data set;

the logic processing module is used for performing logic processing on the stream type data set according to the data warehouse type and a preset processing engine to generate a target stream type data set;

the first preprocessing module is used for performing first preprocessing on the target stream data set to obtain a first characteristic data set;

the second preprocessing module is used for performing second preprocessing on the first characteristic data set to obtain a second characteristic data set;

the first-order derivation module is used for performing first-order derivation on the second characteristic data set by adopting a preset first-order derivation algorithm to obtain a third characteristic data set;

the scoring module is used for performing feature scoring on each feature data in the third feature data set by adopting a plurality of preset evaluation systems to obtain a plurality of feature scoring values of each feature data;

and the extraction module is used for extracting the feature data of the third feature data set based on the plurality of feature scoring values of each feature data to obtain a feature data extraction result of the feature data extraction request.

A third aspect of the invention provides an electronic device comprising a processor and a memory, the processor being configured to implement the artificial intelligence based feature data extraction method when executing a computer program stored in the memory.

A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the artificial intelligence based feature data extraction method.

In summary, according to the feature data extraction method, the feature data extraction device and the related equipment based on artificial intelligence, the target pipeline data set is generated by logically processing the pipeline data set according to the data warehouse type and a preset processing engine, so that manual intervention is avoided, and the efficiency and the accuracy of generating the target pipeline data set are improved. The target pipeline type data set is subjected to first preprocessing to obtain a first characteristic data set, the second characteristic data set is subjected to first-order derivation by adopting a preset first-order derivation algorithm to obtain a third characteristic data set, the number of the target pipeline type data sets is expanded from different dimensions, the data dimensions of the third characteristic data set are enriched, and the integrity of the third characteristic data set is improved. And performing feature scoring on each feature data in the third feature data set by adopting a plurality of preset evaluation systems to obtain a plurality of feature score values of each feature data, extracting the feature data, performing model training by adopting the third feature data set when calculating the feature score values, obtaining a plurality of feature score values after the model training is completed, improving the accuracy of the feature score values, screening effective features based on the feature score values, and further improving the accuracy and effectiveness of feature data extraction results.

Drawings

Fig. 1 is a flowchart of a feature data extraction method based on artificial intelligence according to an embodiment of the present invention.

Fig. 2 is a structural diagram of an artificial intelligence-based feature data extraction device according to a second embodiment of the present invention.

Fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention.

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a detailed description of the present invention will be given below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflict.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

Example one

In this embodiment, the method for extracting feature data based on artificial intelligence may be applied to an electronic device, and for an electronic device that needs to perform feature data extraction based on artificial intelligence, the function of extracting feature data based on artificial intelligence provided by the method of the present invention may be directly integrated on the electronic device, or may be run in the electronic device in the form of a Software Development Kit (SDK).

The embodiment of the invention can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning, deep learning and the like.

As shown in fig. 1, the method for extracting feature data based on artificial intelligence specifically includes the following steps, and the order of the steps in the flowchart may be changed, and some steps may be omitted according to different requirements.

And S11, analyzing the received characteristic data extraction request, and acquiring a stream type data set and a data warehouse type corresponding to the stream type data set.

In this embodiment, when a user performs feature data extraction, the user initiates a feature data extraction request to a server through a client, specifically, the client may be a smart phone, an IPAD, or other existing intelligent devices, the server may be a feature data extraction subsystem, and in the feature data extraction process, for example, the client may send the feature data extraction request to the feature data extraction subsystem, and the feature data extraction subsystem is configured to receive the feature data extraction request sent by the client.

In this embodiment, when the feature data extraction subsystem receives the feature data extraction request, the feature data extraction request is analyzed to obtain a pipeline data set and a data warehouse type corresponding to the pipeline data set, and specifically, the data warehouse type may be a hive data warehouse.

In this embodiment, the pipelined data set refers to a transaction pipelined type data set, for example, transaction pipelined data for a credit card, where the transaction pipelined data of the credit card includes other transaction pipelined data such as transaction records, transaction time, payment information, and the like.

In an optional embodiment, the parsing the received feature data extraction request to obtain the pipelined data set includes:

analyzing the received characteristic data extraction request to obtain a plurality of data calling requests;

acquiring a corresponding calling interface according to each data calling request;

forming a query link calling interface list according to the data calling requests and the corresponding calling interfaces;

and sequentially calling each calling interface from the head of the queue of the calling interface list in the inquiry link to acquire the pipelined data corresponding to each calling interface, and determining a plurality of pipelined data corresponding to the data calling requests as pipelined data sets.

In this embodiment, the pipelined data set is obtained from a relational database and a hive data warehouse, where the relational database may include mysql database, oracle database, and other relational databases that can obtain the pipelined data set.

In this embodiment, the feature extraction request may include a data call interface, specifically, the data call interface refers to a call interface of a database, and call interfaces corresponding to different databases are different, and by obtaining corresponding pipeline data from the corresponding call interface, the method has pertinence, and improves accuracy and efficiency of an obtained pipeline data set.

And S12, performing logic processing on the stream type data set according to the data warehouse type and a preset processing engine to generate a target stream type data set.

In this embodiment, the feature extraction request further includes a configuration requirement, where the configuration requirement finds other requirements, such as an aggregation dimension requirement and a pipeline time requirement, that include pipeline data.

In an optional embodiment, the performing logic processing on the stream-type data set according to the data warehouse type and a preset processing engine to generate a target stream-type data set includes:

In this embodiment, the template identification code is used to uniquely identify a configuration template, where the configuration template refers to a template preset by a user for configuring data. According to the configuration template, a corresponding aggregation main key, a date field, a numerical value field and a character field can be selected, wherein the aggregation main key refers to a target field of aggregation statistics corresponding to each configuration template, for example, for a streamline data set of a customer a, aggregation statistics are performed by using a card number of the customer a, and the aggregation main key is the card number of the customer a.

In this embodiment, the preset machining engine may be a spark calculation engine, a hive calculation engine, or a presto calculation engine, or another offline big data calculation engine.

In this embodiment, the access range of the pipeline data supports partition access and custom filter conditions, for example, when data is selected according to a date field, partition access or access through the custom filter conditions may be adopted.

For example, when aggregation statistics needs to be performed according to running water data of a time period, the time period may select data of a last week, a last month, or a last 3 months according to configuration requirements, and the like, if the date of the running water data is accurate to the time minute, the calculation may be further accurately divided, for example, each type of field between 9 to 10 points per day of a last month may be selected for performing logic processing calculation.

In this embodiment, the aggregation main key, the date field, the numerical value field and the character field are selected through the configuration template to perform pipeline data configuration, and after the configuration is completed, the preset processing engine is executed according to the data warehouse type, and a corresponding processing logic is automatically generated, for example, the hive data warehouse calls a hive calculation engine, and the hive sql processing logic is automatically generated and a processing task is submitted to a cluster of the hive data warehouse to perform data processing.

In the embodiment, the stream type data set is automatically and efficiently processed logically according to the type of the data warehouse and a preset processing engine, so that manual intervention is avoided, and the efficiency and the accuracy of generating the target stream type data set are improved.

And S13, performing first preprocessing on the target pipeline type data set to obtain a first characteristic data set.

In this embodiment, the first preprocessing includes dividing the target pipeline data and performing operator processing.

In an optional embodiment, the performing the first preprocessing on the target pipeline type data set to obtain a first feature data set includes:

In this embodiment, a plurality of first field types may be preset, and specifically, the preset first field types may include a numeric field type, a category field type, and a date field type.

In this embodiment, each first field type corresponds to different operator processes, and specifically, the operator corresponding to the numerical field type includes: maximum, minimum, median, mean, standard deviation, coefficient of variation, sum, kurtosis, and the like; the category field type operator includes: the number of categories, the highest frequency, top2, top3, the lowest frequency, the total amount, the null value amount and the like; the date field type operator includes: the first day, day of the week, month, most frequent day of the week, most frequent month, etc. of each month.

In this embodiment, the target pipeline data set is subjected to first preprocessing, and in the first preprocessing process, according to different field types, operator processing is performed on each preset sub data in each sub data set of the first field type by using a corresponding operator, so that the number of the target pipeline data set is expanded from different dimensions, and then the first characteristic data set is obtained, and the integrity of the first characteristic data set is improved.

And S14, performing second preprocessing on the first characteristic data set to obtain a second characteristic data set.

In this embodiment, the second preprocessing includes performing second division on the data of the first feature data set, specifically, the performing second preprocessing on the first feature data set to obtain a second feature data set includes:

In this embodiment, the second field type may be preset, and specifically, the preset second field type may include a numeric type and a character type.

In this embodiment, since the feature data corresponding to the value field type, the category field type, and the date field type in the first feature data includes both the value type data and the character type data, the first feature data set is subjected to the second preprocessing according to a plurality of preset second field types, which is convenient for performing the first-order derivation processing on the second feature data set subsequently.

And S15, performing first-order derivation on the second characteristic data set by adopting a preset first-order derivation algorithm to obtain a third characteristic data set.

In this embodiment, the first-order derivation refers to performing further dimension expansion on the second feature data set.

In an optional embodiment, the performing first-order derivation on the second feature data set by using a preset first-order derivation algorithm to obtain a third feature data set includes:

In this embodiment, the first-order derivation algorithms corresponding to each second field type are different, and specifically, the first-order derivation algorithms corresponding to the numerical type include: logarithmic transformation, exponential transformation, square transformation, cubic transformation, upward and downward interception and the like; the first-order derivation algorithm for character type correspondence includes: woe, onehot code, etc., wherein the woe (weight of evidence) is commonly used for feature transformation.

In this embodiment, different first-order derivation is further performed on the feature data in the second feature data set for different second field types, and the number of target pipeline data sets is expanded for the second time from different dimensions, so that the data dimensions of the third feature data set are enriched, and the integrity of the third feature data set is improved.

And S16, performing feature scoring on each feature data in the third feature data set by adopting a plurality of preset evaluation systems to obtain a plurality of feature scoring values of each feature data.

In this embodiment, an evaluation system may be preset, and the preset evaluation system may include: a random forest total algorithm, an information value algorithm and a correlation coefficient algorithm, wherein the third feature data set and a target label value in the feature data extraction request are respectively input into the random forest total algorithm, the information value algorithm and the correlation coefficient algorithm for model training, and after the training is completed, a random forest coefficient of the random forest algorithm, an iv (information value) value coefficient of the information value algorithm and a correlation coefficient of the correlation coefficient algorithm are obtained, wherein training processes of the random forest total algorithm, the information value algorithm and the correlation coefficient algorithm are all the prior art, and the detailed description is omitted herein.

In this embodiment, the feature score values refer to a random forest coefficient, an iv (information value) value coefficient, and a correlation coefficient.

And S17, extracting the feature data of the third feature data set based on the plurality of feature score values of each feature data to obtain the feature data extraction result of the feature data extraction request.

In this embodiment, the feature data extraction result includes effective feature data screened based on the plurality of feature score values.

In an optional embodiment, the extracting the feature data of the third feature data set based on the plurality of feature score values of each feature data, and obtaining the feature data extraction result of the feature data extraction request includes:

In this embodiment, a weighting algorithm may be preset, and specifically, the preset weighting algorithm may be a weighted random algorithm, a weighted average method, and the like, where the weighted random algorithm and the weighted average method are both prior art, and details are not described in this embodiment.

In the embodiment, different first-order derivatives are further performed on the feature data in the second feature data set according to different second field types, so that a third feature data set is enriched, when feature score values are calculated subsequently, model training is performed by adopting the third feature data set, a plurality of feature score values after model training is completed are obtained, the accuracy of the feature score values is improved, effective features are screened out based on the feature score values, and the accuracy and the effectiveness of feature data extraction results are improved.

In summary, in the feature data extraction method based on artificial intelligence according to this embodiment, the target pipeline data set is generated by performing logic processing on the pipeline data set according to the data warehouse type and a preset processing engine, so that manual intervention is avoided, and the efficiency and accuracy of generating the target pipeline data set are improved. The target pipeline type data set is subjected to first preprocessing to obtain a first characteristic data set, the second characteristic data set is subjected to first-order derivation by adopting a preset first-order derivation algorithm to obtain a third characteristic data set, the number of the target pipeline type data sets is expanded from different dimensions, the data dimensions of the third characteristic data set are enriched, and the integrity of the third characteristic data set is improved. And performing feature scoring on each feature data in the third feature data set by adopting a plurality of preset evaluation systems to obtain a plurality of feature score values of each feature data, extracting the feature data, performing model training by adopting the third feature data set when calculating the feature score values, obtaining a plurality of feature score values after the model training is completed, improving the accuracy of the feature score values, screening effective features based on the feature score values, and further improving the accuracy and effectiveness of feature data extraction results.

Example two

In some embodiments, the artificial intelligence based feature data extraction apparatus 20 may include a plurality of functional modules comprised of program code segments. Program code of the various program segments in the artificial intelligence based feature data extraction apparatus 20 may be stored in a memory of the electronic device and executed by the at least one processor to perform (see detailed description of fig. 1) the functions of artificial intelligence based feature data extraction.

In this embodiment, the artificial intelligence based feature data extraction device 20 may be divided into a plurality of functional modules according to the functions performed by the device. The functional module may include: the system comprises a parsing and obtaining module 201, a logic processing module 202, a first preprocessing module 203, a second preprocessing module 204, a first order derivation module 205, a scoring module 206 and an extraction module 207. The module referred to herein is a series of computer readable instruction segments stored in a memory that can be executed by at least one processor and that can perform a fixed function. In the present embodiment, the functions of the modules will be described in detail in the following embodiments.

The analyzing and acquiring module 201 is configured to analyze the received feature data extraction request, and acquire a pipeline data set and a data warehouse type corresponding to the pipeline data set.

In an optional embodiment, the parsing and obtaining module 201 parses the received feature data extraction request, and obtaining the pipelined data set includes:

And the logic processing module 202 is configured to perform logic processing on the stream type data set according to the data warehouse type and a preset processing engine to generate a target stream type data set.

In an optional embodiment, the logic processing module 202 performs logic processing on the pipeline data set according to the data warehouse type and a preset processing engine, and generating a target pipeline data set includes:

The first preprocessing module 203 is configured to perform first preprocessing on the target pipeline data set to obtain a first feature data set.

In an optional embodiment, the first preprocessing module 203 performs first preprocessing on the target pipeline data set to obtain a first feature data set, and includes:

The second preprocessing module 204 is configured to perform second preprocessing on the first feature data set to obtain a second feature data set.

In this embodiment, the second preprocessing includes performing second division on the data of the first feature data set, specifically, the second preprocessing module 204 performs second preprocessing on the first feature data set to obtain a second feature data set, where the second preprocessing includes:

The first-order derivation module 205 is configured to perform first-order derivation on the second feature data set by using a preset first-order derivation algorithm to obtain a third feature data set.

In an optional embodiment, the first-order derivation module 205 performs first-order derivation on the second feature data set by using a preset first-order derivation algorithm, and obtaining a third feature data set includes:

A scoring module 206, configured to perform feature scoring on each feature data in the third feature data set by using a plurality of preset scoring systems, so as to obtain a plurality of feature scoring values of each feature data.

An extracting module 207, configured to perform feature data extraction on the third feature data set based on the multiple feature score values of each feature data set, so as to obtain a feature data extraction result of the feature data extraction request.

In an optional embodiment, the extracting module 207 performs feature data extraction on the third feature data set based on a plurality of feature score values of each feature data, and obtaining the feature data extraction result of the feature data extraction request includes:

In summary, in the artificial intelligence-based feature data extraction apparatus according to this embodiment, the stream data set is logically processed according to the data warehouse type and a preset processing engine to generate a target stream data set, so that manual intervention is avoided, and efficiency and accuracy of generating the target stream data set are improved. The target pipeline type data set is subjected to first preprocessing to obtain a first characteristic data set, the second characteristic data set is subjected to first-order derivation by adopting a preset first-order derivation algorithm to obtain a third characteristic data set, the number of the target pipeline type data sets is expanded from different dimensions, the data dimensions of the third characteristic data set are enriched, and the integrity of the third characteristic data set is improved. And performing feature scoring on each feature data in the third feature data set by adopting a plurality of preset evaluation systems to obtain a plurality of feature score values of each feature data, extracting the feature data, performing model training by adopting the third feature data set when calculating the feature score values, obtaining a plurality of feature score values after the model training is completed, improving the accuracy of the feature score values, screening effective features based on the feature score values, and further improving the accuracy and effectiveness of feature data extraction results.

EXAMPLE III

Fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention. In the preferred embodiment of the present invention, the electronic device 3 comprises a memory 31, at least one processor 32, at least one communication bus 33 and a transceiver 34.

It will be appreciated by those skilled in the art that the configuration of the electronic device shown in fig. 3 does not constitute a limitation of the embodiment of the present invention, and may be a bus-type configuration or a star-type configuration, and the electronic device 3 may include more or less other hardware or software than those shown, or a different arrangement of components.

In some embodiments, the electronic device 3 is an electronic device capable of automatically performing numerical calculation and/or information processing according to instructions set or stored in advance, and the hardware thereof includes but is not limited to a microprocessor, an application specific integrated circuit, a programmable gate array, a digital processor, an embedded device, and the like. The electronic device 3 may also include a client device, which includes, but is not limited to, any electronic product that can interact with a client through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, a digital camera, and the like.

It should be noted that the electronic device 3 is only an example, and other existing or future electronic products, such as those that can be adapted to the present invention, should also be included in the scope of the present invention, and are included herein by reference.

In some embodiments, the memory 31 is used for storing program codes and various data, such as the artificial intelligence based feature data extraction device 20 installed in the electronic device 3, and realizes high-speed and automatic access to programs or data during the operation of the electronic device 3. The Memory 31 includes a Read-Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), a One-time Programmable Read-Only Memory (OTPROM), an electronically Erasable rewritable Read-Only Memory (Electrically-Erasable Programmable Read-Only Memory (EEPROM)), an optical Read-Only disk (CD-ROM) or other optical disk Memory, a magnetic disk Memory, a tape Memory, or any other medium readable by a computer capable of carrying or storing data.

In some embodiments, the at least one processor 32 may be composed of an integrated circuit, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The at least one processor 32 is a Control Unit (Control Unit) of the electronic device 3, connects various components of the electronic device 3 by using various interfaces and lines, and executes various functions and processes data of the electronic device 3 by running or executing programs or modules stored in the memory 31 and calling data stored in the memory 31.

In some embodiments, the at least one communication bus 33 is arranged to enable connection communication between the memory 31 and the at least one processor 32 or the like.

Although not shown, the electronic device 3 may further include a power supply (such as a battery) for supplying power to each component, and optionally, the power supply may be logically connected to the at least one processor 32 through a power management device, so as to implement functions of managing charging, discharging, and power consumption through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 3 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.

It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.

The integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, an electronic device, or a network device) or a processor (processor) to execute parts of the methods according to the embodiments of the present invention.

In a further embodiment, in conjunction with fig. 2, the at least one processor 32 may execute operating means of the electronic device 3 and various installed applications (such as the artificial intelligence based feature data extraction device 20), program code, and the like, such as the modules described above.

The memory 31 has program code stored therein, and the at least one processor 32 can call the program code stored in the memory 31 to perform related functions. For example, the modules illustrated in fig. 2 are program codes stored in the memory 31 and executed by the at least one processor 32, so as to implement the functions of the modules for the purpose of artificial intelligence-based feature data extraction.

Illustratively, the program code may be partitioned into one or more modules/units that are stored in the memory 31 and executed by the processor 32 to accomplish the present application. The one or more modules/units may be a series of computer readable instruction segments capable of performing certain functions, which are used for describing the execution process of the program code in the electronic device 3. For example, the program code may be partitioned into a parsing and acquisition module 201, a logic processing module 202, a first preprocessing module 203, a second preprocessing module 204, a first order derivation module 205, a scoring module 206, and an extraction module 207.

In one embodiment of the present invention, the memory 31 stores a plurality of computer-readable instructions that are executed by the at least one processor 32 to implement artificial intelligence based feature data extraction functionality.

Specifically, the at least one processor 32 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1, and details are not repeated here.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or that the singular does not exclude the plural. A plurality of units or means recited in the present invention may also be implemented by one unit or means through software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A feature data extraction method based on artificial intelligence is characterized by comprising the following steps:

2. The artificial intelligence based feature data extraction method of claim 1, wherein the performing logic processing on the pipelined data set according to the data warehouse type and a preset processing engine to generate a target pipelined data set comprises:

3. The artificial intelligence based feature data extraction method of claim 1, wherein the performing a first preprocessing on the target pipeline data set to obtain a first feature data set comprises:

4. The artificial intelligence based feature data extraction method of claim 1, wherein the second preprocessing the first feature data set to obtain a second feature data set comprises:

5. The artificial intelligence based feature data extraction method according to any one of claims 1 to 4, wherein the performing first-order derivation on the second feature data set by using a preset first-order derivation algorithm to obtain a third feature data set comprises:

6. The artificial intelligence based feature data extraction method according to claim 1, wherein the extracting feature data of the third feature data set based on a plurality of feature score values of each of the feature data, and obtaining the feature data extraction result of the feature data extraction request includes:

7. The artificial intelligence based feature data extraction method of claim 5, wherein the preset first field types include a numerical field type, a category field type and a date field type, and the preset second field types include a numerical type and a character type.

8. An artificial intelligence-based feature data extraction device, characterized in that the device comprises:

9. An electronic device, characterized in that the electronic device comprises a processor and a memory, the processor being configured to implement the artificial intelligence based feature data extraction method according to any one of claims 1 to 7 when executing the computer program stored in the memory.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the artificial intelligence based feature data extraction method according to any one of claims 1 to 7.