CN110442417A - Feature Extraction Method, machine learning method and its device - Google Patents

Feature Extraction Method, machine learning method and its device Download PDF

Info

Publication number
CN110442417A
CN110442417A CN201910743847.1A CN201910743847A CN110442417A CN 110442417 A CN110442417 A CN 110442417A CN 201910743847 A CN201910743847 A CN 201910743847A CN 110442417 A CN110442417 A CN 110442417A
Authority
CN
China
Prior art keywords
item
feature extraction
data record
field
configuration item
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910743847.1A
Other languages
Chinese (zh)
Inventor
白杨
陈雨强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
4Paradigm Beijing Technology Co Ltd
Original Assignee
4Paradigm Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 4Paradigm Beijing Technology Co Ltd filed Critical 4Paradigm Beijing Technology Co Ltd
Priority to CN201910743847.1A priority Critical patent/CN110442417A/en
Publication of CN110442417A publication Critical patent/CN110442417A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/451Execution arrangements for user interfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

Provide Feature Extraction Method, machine learning method and corresponding device.Feature Extraction Method includes: acquisition data record;Obtain the feature extraction configuration item for being used to limit and how extracting predetermined characteristic from data record, wherein, the feature extraction configuration item of every kind of predetermined characteristic includes source field item and processing method item, source field item is used to the field restriction of data record involved in every kind of predetermined characteristic be source field, processing method item is for the specified reference to the data processing function for being previously programmed as executable code, wherein, the field value for the source field that data processing function is used to limit for origin source field item executes the data processing for extracting every kind of predetermined characteristic;Data processing is executed based on field value of the feature extraction configuration item to data record to obtain the characteristic value of the predetermined characteristic.Feature extraction according to an embodiment of the present invention and machine learning techniques enhance the flexibility of programming and the reusability of code, particularly suitable for big data application.

Description

Feature Extraction Method, machine learning method and its device
The application be the applying date be on January 8th, 2016, application No. is 201610011587.5, entitled " feature extraction sides The divisional application of the patent application of method, machine learning method and its device ".
Technical field
The present invention relates generally to information technology field, relate more specifically to Feature Extraction Method, machine learning method with And corresponding device.
Background technique
In information technology fields such as data mining, machine learning, handled object is data, to immense data into Before row processing, will usually feature extraction be carried out to data.
Feature can be used as the raw material of data processing, and in brief, every data record may include multiple fields, and feature The part of each field itself or field or the combination of field or the transformation of field or other processing results etc. can be indicated, with side Help the internal association and latent meaning for preferably reflecting data distribution.With the field of data mining as an example, being characterized in engineering The raw material of learning system, have significant impact to final mask, wherein efficiently and accurately extracting feature can help to learn Process preferably refines data rule, from the internal association and subtext in multiple angles dialysis data distribution.This process It is known as Feature Engineering in machine learning.Material of the output of Feature Engineering as machine learning, quality directly determine The accuracy that Machine Learning Problems are portrayed, and then influence the superiority and inferiority of model.
In fact, being not limited to the Feature Engineering in machine learning field, in existing any data processing system, usually It requires to carry out feature extraction, and in order to extract corresponding feature from each field contents, generally need programmer for every One category feature writes executable program code to carry out feature extraction.
For example, when wishing to obtain the year information in data-oriented (" data ") in the time field of every record, it can To be realized by executing following one section of python program
#param:list-data stores records of fields as list of dictionary
#param:string-‘YYYY-MM-DD’formatted date field
#return:list-Year sequence for each record
def getYearOf(data):
TimeFields=[rec [' time '] for rec in data]
Years=map (lambda x:x.split ('-') [0], timeFields)
return years
In above procedure, one section is defined for extracting each data record (rec) as former state from data source (data) Code of time (year) field as time feature, wherein the extraction time field first from the record of data source, and pressing The yyyy (0 part is designated as under that is) being partitioned into "-" is extracted according to the specific format (yyyy-mm-dd) of time field, it will It is mapped to feature years, and returns to the time value of extraction.
As it can be seen that this section of program for the format of data (year field) and the output of feature extraction all done it is stronger about Beam.That is, this section of feature extraction code is the data and specific output customization for specific format.Therefore, generally, if The data formats of given data is different, and/or the feature output to be obtained is different, then require for its specific format, Used algorithm writes the totally different code of content.Even if only the field input sequence of data record or feature output sequence Difference will rewrite the code of a set of Complete customization.This not only brings complicated work load to programmer, but also Biggish expense will be expended in program operation.In view of the diversification of practical application scene, the diversification of data requirement, it is this quite Power way is difficult extension and multiplexing.
Therefore, the existing thinking for every kind of data format and the extraction a set of different disposal process of content development is to asking As a result the traversal of topic scale causes the exploitation complexity nonlinear of feature extraction to increase, while running complexity and being also difficult to constrain.
Summary of the invention
In view of the foregoing, it is made that the present invention.
According to an aspect of the invention, there is provided a kind of method for carrying out feature extraction for data record, can wrap Include: data record obtaining step obtains data record;Feature extraction configuration item obtaining step is obtained for how limiting from institute State the feature extraction configuration item that data record extracts predetermined characteristic, wherein the feature extraction configuration item of every kind of predetermined characteristic includes Source field item and processing method item, source field item are used for the field of data record involved in every kind of predetermined characteristic It is limited to source field, processing method item is used for the specified reference to the data processing function for being previously programmed as executable code, Wherein, the field value for the source field that the data processing function is used to limit for origin source field item is executed for extracting State the data processing of every kind of predetermined characteristic;And characteristic value obtaining step, based on feature extraction configuration item to the data record Field value execute data processing to obtain the characteristic value of the predetermined characteristic.
Further, Feature Extraction Method according to an embodiment of the present invention, wherein the feature extraction configuration item obtains step It suddenly may include: from the configuration file reading feature extraction configuration item for being provided with feature extraction configuration item or according to the input of user Operation is to obtain feature extraction configuration item, wherein configuration file is stored locally or remotely reception.
Further, Feature Extraction Method according to an embodiment of the present invention, wherein the feature extraction configuration item obtains step It suddenly may include: the interface shown to user for feature extraction configuration item to be arranged;It is executed on the interface according to user Input operation is to generate the configuration file provided with feature extraction configuration item;And feature is read from the configuration file of generation and is taken out Take configuration item.
Further, Feature Extraction Method according to an embodiment of the present invention, wherein for feature extraction configuration item to be arranged Interface can be graphic user interface, and the graphic user interface may include the text editing for manual editing's configuration file Interface and/or for showing the content options of feature extraction configuration item for the imported interface of the selection manually selected.
Further, Feature Extraction Method according to an embodiment of the present invention, wherein obtained in the feature extraction configuration item In step, it can be cut between text editing interface and the imported interface of selection in response to the changing interface operation input of user It changes, the feature extraction configuration item setting result under interface is synchronously displayed under the interface after switching before the handover.
Further, Feature Extraction Method according to an embodiment of the present invention, wherein in selecting imported interface, at least Show data record can be as the feature extraction configuration item of the predetermined characteristic of each field and setting of source field.
Further, Feature Extraction Method according to an embodiment of the present invention, wherein defeated including selecting in graphic user interface In the case where entering type interface, the step of showing the interface for feature extraction configuration item to be arranged to user may include: by user The field selected from each field be shown as setting source field, it is described come source item field selected while, By processing method list display near source field, and the processing method that user selects from processing method list is shown as The processing method of setting.
Further, Feature Extraction Method according to an embodiment of the present invention, wherein processing method item list includes all places Reason method and all processing methods are in state of activation, alternatively, processing method item list includes all processing methods but only The processing method that can be applied to source field item is active, alternatively, the list of processing method item only includes that can apply In the processing method of source field item.
Further, Feature Extraction Method according to an embodiment of the present invention, wherein the feature of every kind of predetermined characteristic is taken out Taking configuration item can also include processing parameter item corresponding with the processing method item, and the processing parameter item is described for limiting The parameter that data processing function is related to.
Further, Feature Extraction Method according to an embodiment of the present invention, wherein the feature of every kind of predetermined characteristic is taken out Taking configuration item can also include that storage location identifies, and be used to indicate and be with corresponding calculate of characteristic value of every kind of predetermined characteristic The storage region of number in memory.
Further, Feature Extraction Method according to an embodiment of the present invention, wherein in the characteristic value obtaining step, It can be performed in parallel at data to the pieces of data record in the data record or by a plurality of each group of data record formed Reason.
Further, Feature Extraction Method according to an embodiment of the present invention, wherein in the characteristic value obtaining step, Data processing can be performed in parallel by distributed computing cluster.
According to another aspect of the present invention, a kind of machine learning method that computer executes is provided, may include: data Obtaining step is recorded, data record is obtained;Feature extraction configuration item obtaining step obtains to be used to limit and how remember from the data The feature extraction configuration item of predetermined characteristic is extracted in record, wherein the feature extraction configuration item of every kind of predetermined characteristic includes source field Item and processing method item, source field item are used to the field restriction of data record involved in every kind of predetermined characteristic be next Source field, processing method item is for the specified reference to the data processing function for being previously programmed as executable code, wherein described The field value for the source field that data processing function is used to limit for origin source field item executes pre- for extracting described every kind Determine the data processing of feature;Characteristic value obtaining step is executed based on field value of the feature extraction configuration item to the data record Data processing is to obtain the characteristic value of the predetermined characteristic;Sample obtains step, is at least partially based on the characteristic value and obtains step Suddenly the characteristic value obtained forms feature vector, the sample as machine learning;And machine learning step, it is based on the sample Carry out machine learning.
Further, machine learning method according to an embodiment of the present invention, wherein in the machine learning step, base At least one among model training, model measurement and model application is carried out in the sample.
According to another aspect of the present invention, a kind of computing device that feature extraction is carried out for data record, packet are provided Storage unit and processor are included, set of computer-executable instructions conjunction is stored in storage unit, is referred to when the computer is executable When set being enabled to be executed by the processor, following step: data record obtaining step is executed, obtains data record;Feature extraction Configuration item obtaining step obtains the feature extraction configuration item for being used to limit and how extracting predetermined characteristic from the data record, In, the feature extraction configuration item of every kind of predetermined characteristic includes source field item and processing method item, and source field item is used for institute The field restriction for stating data record involved in every kind of predetermined characteristic is source field, and processing method item is compiled for specified to preparatory Journey is the reference of the data processing function of executable code, wherein the data processing function is used to be directed to origin source field item The field value of the source field of restriction executes the data processing for extracting every kind of predetermined characteristic;And characteristic value obtains step Suddenly, data processing is executed to obtain the spy of the predetermined characteristic based on field value of the feature extraction configuration item to the data record Value indicative.
According to another aspect of the present invention, provide it is a kind of carry out machine learning computing device, including storage unit and Processor is stored with set of computer-executable instructions conjunction in storage unit, when the set of computer-executable instructions close it is described When processor executes, following step: data record obtaining step is executed, obtains data record;Feature extraction configuration item obtains step Suddenly, the feature extraction configuration item for being used to limit and how extracting predetermined characteristic from the data record is obtained, wherein every kind of predetermined spy The feature extraction configuration item of sign includes source field item and processing method item, and source field item is used for every kind of predetermined characteristic The field restriction of related data record is source field, and processing method item is for specified to being previously programmed as executable code Data processing function reference, wherein the data processing function be used for for origin source field item limit source field Field value execute the data processing for extracting every kind of predetermined characteristic;Characteristic value obtaining step, is matched based on feature extraction It sets item and data processing is executed to obtain the characteristic value of the predetermined characteristic to the field value of the data record;Sample is walked Suddenly, it is at least partially based on the characteristic value that the characteristic value obtaining step obtains, forms feature vector, the sample as machine learning This;And machine learning step, machine learning is carried out based on the sample.
According to another aspect of the present invention, a kind of feature extraction dress that feature extraction is carried out for data record is provided It sets, may include: data record acquiring unit, be configured to obtain data record;Feature extraction configuration item acquiring unit, is configured to Obtain the feature extraction configuration item for being used to limit and how extracting predetermined characteristic from the data record, wherein every kind of predetermined characteristic Feature extraction configuration item include source field item and processing method item, source field item is used for every kind of predetermined characteristic institute The field restriction for the data record being related to is source field, and processing method item is for specified to being previously programmed as executable code The reference of data processing function, wherein the source field that the data processing function is used to limit for origin source field item Field value executes the data processing for extracting every kind of predetermined characteristic;And characteristic value acquiring unit, it is configured to spy Sign extracts configuration item and executes data processing to the field value of the data record to obtain the characteristic value of the predetermined characteristic.
Further, feature extraction device according to an embodiment of the present invention, wherein the feature extraction configuration item obtains single Member can read feature extraction configuration item from the configuration file for being provided with feature extraction configuration item or be operated according to the input of user To obtain feature extraction configuration item, wherein configuration file is stored locally or remotely reception.
Further, feature extraction device according to an embodiment of the present invention, wherein the feature extraction configuration item obtains single Member can show the interface for feature extraction configuration item to be arranged to user, be grasped according to the input that user executes on the interface Make to generate the configuration file provided with feature extraction configuration item, and reads feature extraction configuration from the configuration file of generation .
Further, feature extraction device according to an embodiment of the present invention, wherein for feature extraction configuration item to be arranged Interface can be graphic user interface, and the graphic user interface may include text editing circle for manual editing's configuration file Face and/or for showing the content options of feature extraction configuration item for the imported interface of the selection manually selected.
Further, feature extraction device according to an embodiment of the present invention, wherein the feature extraction configuration item obtains single Member can switch between text editing interface and the imported interface of selection in response to the changing interface operation input of user, cut The feature extraction configuration item setting result changed under front interface is synchronously displayed under the interface after switching.
Further, feature extraction device according to an embodiment of the present invention, wherein, can be in selecting imported interface At least show capable of matching as the feature extraction of the predetermined characteristic of each field and setting of source field for data record Set item.
Further, feature extraction device according to an embodiment of the present invention, wherein defeated including selecting in graphic user interface In the case where entering type interface, word that the feature extraction configuration item acquiring unit can select user from each field Section be shown as setting source field, it is described come source item field selected while, by processing method list display in source Near field, and by the processing method that user selects from processing method list be shown as setting processing method.
Further, feature extraction device according to an embodiment of the present invention, wherein processing method item list may include institute There is processing method and all processing methods are in state of activation, alternatively, processing method item list may include all processing sides The method but processing method that only can be applied to source field item is active, alternatively, processing method item list can be only Processing method including can be applied to source field item.
Further, feature extraction device according to an embodiment of the present invention, wherein the feature of every kind of predetermined characteristic is taken out Taking configuration item can further include processing parameter item corresponding with the processing method item, and the processing parameter item can be used for limiting The parameter that the data processing function is related to.
Further, feature extraction device according to an embodiment of the present invention, wherein the feature of every kind of predetermined characteristic is taken out Taking configuration item can further include storage location mark, be used to indicate and be with corresponding calculate of characteristic value of every kind of predetermined characteristic The storage region of number in memory.
Further, feature extraction device according to an embodiment of the present invention, wherein the characteristic value acquiring unit can be right Pieces of data in the data record records or is performed in parallel data processing by a plurality of each group of data record formed.
Further, feature extraction device according to an embodiment of the present invention, the characteristic value acquiring unit can be by dividing Cloth computing cluster is performed in parallel data processing.
According to another aspect of the present invention, a kind of machine learning device is provided, may include: that data record obtains list Member is configured to obtain data record;Feature extraction configuration item acquiring unit is configured to obtain for how limiting from the data Record extracts the feature extraction configuration item of predetermined characteristic, wherein the feature extraction configuration item of every kind of predetermined characteristic includes carrying out source word Section item and processing method item, source field item for being by the field restriction of data record involved in every kind of predetermined characteristic Source field, processing method item is for the specified reference to the data processing function for being previously programmed as executable code, wherein institute State data processing function for for origin source field item limit source field field value execution it is every kind described for extracting The data processing of predetermined characteristic;Characteristic value acquiring unit is configured to feature extraction configuration item to the word of the data record Segment value executes data processing to obtain the characteristic value of the predetermined characteristic;Sample obtaining unit is configured to be at least partially based on institute The characteristic value of characteristic value acquiring unit acquisition is stated, feature vector, the sample as machine learning are formed;And machine learning list Member is configured to the sample and carries out machine learning.
Further, machine learning device according to an embodiment of the present invention, wherein the machine learning unit can be based on The sample carries out at least one among model training, model measurement and model application.
Feature extraction technique and machine learning techniques according to an embodiment of the present invention, can be independently of feature extraction main program Change each feature extraction configuration item as needed, so as to according to scene to feature extraction carry out effective " abstract " and " expression " is both not necessarily to material alterations feature extraction main program, while neatly can independently write or increase data processing function, Enhance the flexibility of programming and the reusability of code.Accordingly, for different databases, as long as defined feature is taken out as needed Take configuration item, so that it may utilize same feature extraction main program and corresponding data processing function, enhance the flexible of programming Property, the reusability of ease for maintenance and code.
Detailed description of the invention
From the detailed description with reference to the accompanying drawing to the embodiment of the present invention, these and/or other aspects of the invention and Advantage will become clearer and be easier to understand, in which:
Fig. 1 shows the overview flow chart of Feature Extraction Method according to an embodiment of the invention.
Fig. 2 shows the examples of feature extraction configuration file content.
Fig. 3 is shown when data record is the sample data in machine learning, executes feature extraction process in a distributed manner Example.
Fig. 4 A shows graphical user circle according to an exemplary embodiment of the present invention for being configured for feature extraction The example in face.
Fig. 4 B shows user and shows processing side while left area chooses single field (for example, " age " field) The example of the partial graphical user interface of method list.
The partial graphical that Fig. 4 C shows user's display processing method list while left area chooses multiple fields is used The example at family interface.
Fig. 5 shows exemplary graphical user circle with the region that can carry out text editing to feature extraction configuration item Face.
Fig. 6 shows the machine learning side of the Feature Extraction Method according to an embodiment of the present invention for applying above-described embodiment The overview flow chart of method.
Fig. 7 shows the configuration block diagram of computing device according to an embodiment of the present invention.
Specific embodiment
In order to make those skilled in the art more fully understand the present invention, with reference to the accompanying drawings and detailed description to this hair It is bright to be described in further detail.
Fig. 1 shows the overview flow chart of Feature Extraction Method 100 according to an embodiment of the invention.The method can It is executed, can also be executed by special feature extraction device by computer program.It here, as an example, can be by realizing The program for stating method is encapsulated as special software package (for example, the library lib), to whether call or call offline described soft online The feature extraction service of consistency can be realized in part packet, and overcome causes since online offline environment is different in the prior art The inconsistent defect of feature extraction result.
In step s 110, data record is obtained.Data can be presented in the form of data record, and data record refers to correspondence One group of complete relevant information of a row information in data source.For example, the institute of related certain client in Customer mail list There is information for data record.
Here data record is the raw material of subsequent feature extraction, wherein every data record can have various The field of type and its corresponding field value.Recorded for describing client about loan repayment is represented under as an example, The example of the single data record of people's information, the feature which extracts will be used for training about customer lending risk Model:
No. Age Work Whether house is had Contact person Birthday Repay the loan mark (label)
1 37 Teacher It is (" y ") Spouse 1979-3-1 It is (" y ")
Listed by table as above, data record describes some essential informations of the client, for example, age (age), work (job), whether possess house (housing), contact person (contact), birthday (birthday), further comprise and repaid about client The mark (lable) of repaying is specifically labeled as " y " and indicates that the client records with positive loan repayment, be labeled as " n " indicates that there is the client negative loan repayment to record.As an example, the feature extracted in above-mentioned data record can be used as Training sample, to train the model for predicting customer lending risk based on machine learning algorithm.
It should be understood that data record can have a variety of different fields to describe the information of various aspects, the content of field and Format is unrestricted.Moreover, data record not necessarily has the mark about prediction target, but can not have any mark Note.
Here, for the acquisition methods of data record, there is no limit, it may include various acquisition online datas or off-line data Mode.For example, one or more data files can be stored on local hard drive or distributed file storage system in advance, lead to Reading data file is crossed to obtain data record;Alternatively, data record can be obtained by access Local or Remote database;Again Alternatively, what data record was also possible to generate in real time, for example, data record can be by particular communication agreement (for example, interface is retouched Predicate says IDL) and obtained in real time from the device of calling feature extraction service.As an example, can by by multiple data records into Row splices and generates complete data record.
Here acquisition data record can be the primary data that obtains and record, is also possible to once obtain a plurality of number According to record.
In the step s 120, the feature extraction configuration for being used to limit and how extracting predetermined characteristic from the data record is obtained , wherein the feature extraction configuration item of every kind of predetermined characteristic includes source field item and processing method item, and source field item is used for It is source field by the field restriction of data record involved in every kind of predetermined characteristic, processing method item is for specified to pre- First it is programmed for the reference of the data processing function of executable code, wherein the data processing function is used to be directed to origin source word The field value for the source field that section item limits executes the data processing for extracting every kind of predetermined characteristic.
In one example, the feature extraction configuration item of every kind of predetermined characteristic further includes corresponding with processing method item Processing parameter item, processing parameter item is for limiting the parameter that the data processing function is related to.The example of processing parameter item is for example Have format parameter, extract interval parameter, divide threshold parameter, mapping ruler parameter etc..By independently of processing method Xiang Laishe Parameter item is set, similar data processing function can be effectively integrated, without writing corresponding letter for every kind of parameter detail Number, to further increase the code efficiency of data processing.
In one example, the feature extraction configuration item of every kind of predetermined characteristic can also include that storage location identifies, and be used for Indicate the storage region of design factor corresponding with the characteristic value of every kind of predetermined characteristic in memory.Here, with machine For study, the corresponding design factor of each characteristic value (for example, weight in machine learning algorithm) quilt of every kind of predetermined characteristic It is respectively stored in the corresponding position in memory, however, as the dimension of sample characteristics constantly extends (even up to several hundred million grades Not) and the space of memory is very limited, it is difficult to one-to-one storage address is distributed for every kind of characteristic value, accordingly, it is desirable to will The storage address of design factor is mapped to limited address space in order to search, for this purpose, will be to the corresponding calculating of each characteristic value Coefficient storage location carries out hash conversion, with the memory address mapped.However, hash conversion can bring rushing on address Prominent, different sample characteristics can be obscured mutually, this can bring biggish error to the calculating of machine learning, for this purpose, can be based on Storage location mark separates to divide the memory space of memory down to the relevant design factor of the different types of predetermined characteristic of major general Storage, for example, can be N bit byte (N is integer) corresponding with feature type by storage location logo design using as memory The upper byte of location, and using the address after hash conversion as the low byte of memory address, thus the memory address after combination It can be distinguished on memory space according to feature type, so that the design factor of variety classes feature will not mistakenly cover each other Lid.
In one example, feature extraction configuration item is previously provided in configuration file, correspondingly, in the step 120, Feature extraction configuration item is read from the configuration file for being provided with feature extraction configuration item, for example, obtaining by parsing configuration file Take feature extraction configuration item.This configuration file is stored locally or remotely reception, as an example, can be software programming people Member writes generation manually.The pre-stored configuration file write by programmer can be read from local data base, can also be led to It crosses network and receives configuration file from other equipment.Here, suppose that the feature extracted is instructed being used to carry out the model in machine learning Practicing, then programmer is when writing configuration file, according to model built, in conjunction with practical scene modeling, determine feature needed for model, into And for each characteristic Design configuration item to obtain configuration file.It alternatively, can also be by showing to software users for setting The interface (such as graphic user interface) for setting feature extraction configuration item, it is automatic according to the input operation that user executes on interface Generate configuration file.It is exemplary to being carried out by interface by the method that user's custom features extract configuration item later in association with attached drawing It is described in detail.
In another example, it can be operated according to the input of user to obtain feature extraction configuration item.As an example, can be It executes and directly acquires feature extraction configuration item among feature extraction process in real time without forming any configuration file, such as journey Graphic user interface is popped up in program process in real time, guidance user carries out the selection of feature extraction configuration item, to obtain spy Sign extracts configuration item.
An exemplary embodiment of the present invention can change as needed each spy independently of feature extraction main program Sign extracts configuration item, so as to carry out effective " abstract " and " expression " to feature extraction according to scene, both changes without substance Become feature extraction main program, while neatly can independently write or increase data processing function, enhance the flexibility of programming with The reusability of code.Accordingly, for different databases, as long as defined feature extracts configuration item as needed, so that it may utilize Same feature extraction main program and corresponding data processing function enhance the flexibility of programming, ease for maintenance and code Reusability.
Fig. 2 shows the examples of the feature extraction configuration item stored in configuration file.
Configuration file shown in Fig. 2 shares 10 rows, and wherein the first row defines 6 fields in data record: age (year Age), job (work), housing (house), contact (contact person), birthday (birthday) and y (label);Second row arrives Tenth row defines the feature extraction configuration item for each feature, wherein may include source field item and processing method item.This Outside, in order to effectively further manage the extraction of each feature, the feature name of also settable each feature, also, can also needle Corresponding processing parameter item is arranged to certain processing methods.
As shown in Fig. 2, every row is divided into three column or four column from the second row to the tenth row.First row specifies extracted feature Title, as seen from Figure 2 this 9 feature names be respectively " F_AGE ", " F_JOB ", " F_HOUSING ", " F_CONTACT ", "F_YEAR","F_MONTH","F_YEAR","F_PROFILE",".label".For each feature, secondary series specifies phase The source field item answered, that is, the feature extracted are originated from which of data record or which field;Third column specify place Reason method item, i.e., always the reference of the field of source item to the intermediate treatment method for exporting feature specifies tune by the reference Data processing function, data processing function can be programmed good software module, routine, library function etc..4th column (if any) parameter corresponding with processing method item is specified.Specifically, it in example shown in Fig. 2, is set in feature The title of institute's extraction feature is set;The source field of institute's extraction feature is provided in depends;Specified in method for Predetermined characteristic value is obtained, for source field specified by depends, it should execute which type of data processing (that is, calling Which data processing function);The format for the data that data processing method is related to is provided in args.
Here, processing method item is used to call the corresponding letter for executing pre-determined draw processing for the field value of source field Number, as an example, not a limit, is given below data processing corresponding to some processing methods, wherein certain processing methods are special It is not directed to machine learning field:
1.Direct (is directly extracted): being exported as former state to source field, example: " 1 " -> " 1 ".
2.ExpNormalizer (index is discrete): it is that the log value at bottom exports that logarithm source field, which takes 2, example: " 2 " -> “1”。
3.Combine (field combination): after being divided to multiple source fields with " | ", combination output, example: " 1 ", " 2 " -> " 1 |2”。
4.DataCalc (date intervals): the time interval (day is unit) on two dates, example: " 1900-01- are calculated 02 ", " 1900-01-10 " -> " 9 ".
5.GetYearOfDate (time on date): the time in interception date field, example: " 1900-01-02 " -> “1900”。
6.GetMonthOfDate (month on date): the month in interception date field, example: " 1900-01-02 " -> “01”。
7.GetDayOfDate (date gift): the gift in interception date field, example: " 1900-01-02 " -> " 02 ".
8.NumberFloor (lower to be rounded): logarithm type-word section carries out lower rounding, example: " 7.89 " -> " 7 ".
9.LabelDirect (numeral mark): the sample labeling method in machine learning, directly output source field conduct Label (label), label must be integers.
10.LabelBeta (field mark): the sample labeling method in machine learning, if containing in source field " pos " is then labeled as positive sample, is otherwise negative sample.
11.LabelBinary (classification marker): the numeral mark method in machine learning, if source field is " 1 ", It is otherwise negative sample labeled as positive sample.
It should be noted that it is merely illustrative above in conjunction with Data field names, the processing method definition of Fig. 2 description etc., it can To carry out different designs as needed.
In one example, it can be performed based on the configuration project set in configuration file for example shown in Fig. 2 Program code, such as can use special analysis program and the configuration project set in configuration file is parsed to be formed Corresponding executable program code, which, which is performed, to execute what processing method item was specified to source field Data processing, and the characteristic value obtained is given to defined feature.In one example, each configuration obtained by parsing The executable program code that project obtains can be used as a complete structure and be saved, to complete the process of subsequent execution.
Fig. 1 is returned to, after completing feature extraction configuration item obtaining step S120, proceeds to step S130.
In step s 130, data processing is executed to obtain based on field value of the feature extraction configuration item to the data record Take the characteristic value of the predetermined characteristic.Here, as an example, the executable program as obtained from parsing configuration file can be run Code, alternatively, main program can be extracted come operation characteristic according to the feature extraction configuration item inputted in real time, thus for the number read Scheduled data processing is executed according to the field value of relevant sources field in record to obtain corresponding characteristic value.
Specifically, still by taking feature extraction configuration item shown in Fig. 2 as an example, by executing step S130, then by each record In age (age) field as former state output be assigned to feature F_AGE, similarly, by each record work (job) field, Output is assigned to feature F_JOB, F_HOUSING, F_ as former state respectively for house (housing) field, contact person (contact) field CONTACT;Year (YYMM), the moon (mm) and day (dd) in birthday (birthday) field of YYMM-mm-dd format is extracted To be assigned to feature F_YEAR, FMONTH and F_DATE;Age (age) field and work (job) field are exported as former state together Give feature F_PROFILE;And mark (y) field is directly output to feature label.
Thus each feature extracted can be combined to feature vector, or other feature is combined to form feature vector.These Feature vector can be used for subsequent any data statistics, analysis, calculating and/or other processing.
As an example, described eigenvector can be used as the training sample in machine learning.Each data record is performed both by Features described above extracts, and then forms training sample set.Training sample set can be applied to machine learning algorithm or other algorithms with Carry out data mining.
An exemplary embodiment of the present invention, by the configuration item purpose compressed structure taken out, data dependence is only It is only limited in currently processed data record.Correspondingly, simply data record sheet can be carried out based on capable file cutting, And then feature extraction concurrently is realized to each row fragment marked off.That is, in the characteristic value obtaining step, it can be to described Pieces of data in data record records or is performed in parallel data processing by a plurality of each group of data record formed.For example, In In one example, in characteristic value obtaining step, feature extraction is carried out to each row data record with behavior unit, that is, traversal every Each column of data record are to execute data processing according to the feature source field and processing method configured.Here, as showing Example can be using distributed computing cluster come to each in the offline application scenarios for historical data to carry out feature extraction Row executes characteristic value obtaining step.
Fig. 3 is shown when data record is the sample data in machine learning, executes feature extraction process in a distributed manner Example, wherein sample data source can be data record sheet, and each of these row is recorded as a data, it is each column pair Answer a field.Here, data record sheet can be carried out each row fragment being obtained, then for each based on capable file cutting The characteristic value acquisition of a row fragment can execute parallel.For example, can be by each working node in distributed computing cluster come simultaneously The characteristic value of each row fragment is extracted capablely.
In another example, other than the feature extraction to each row executes parallel, the inside being expert at can be to obtain Feature be unit data processing is performed in parallel to obtain the characteristic value of feature for each feature.
It should be noted that in Fig. 1, data record obtaining step and feature extraction configuration item obtaining step spatially by Sequence is listed, but it is not intended that temporal ordinal relation.In fact, being taken out for data record obtaining step and feature Take configuration item obtaining step execution sequence there is no limit, in the case where not violating context logic relationship, each step can To carry out or be executed according to reverse order parallel.
It describes according to an embodiment of the present invention feature extraction to be arranged by graphic user interface by user with reference to the accompanying drawing The method example of configuration item.It should be noted that graphic user interface here is only as an example, any other shape also can be used in the present invention The input interface of formula.The feature extraction configuration item being arranged by the interface can be used to form corresponding configuration file so as to subsequent Each feature extraction configuration item is read from the configuration file, the feature extraction configuration item that can also will be arranged by the interface Feature extraction main program is directly applied to without generating any configuration file.
Fig. 4 A shows graphical user circle according to an exemplary embodiment of the present invention for being configured for feature extraction The example in face 200, the graphic user interface 200 of Fig. 4 A can be applied to carry out the Modeling Platform of model training, can also suitably repair It is applied to the scene of any other feature extraction after changing.Wherein, 201 bank basic data of input table can indicate the original of bank Beginning data, target value 202y indicate that the label of training sample, 203 bankdata_out of output table indicate the mark sheet extracted.
In above-mentioned graphic user interface 200, can at least show data record can be as each of source field The feature extraction configuration item of field and the predetermined characteristic of setting.In addition, as an example, may also display other about data source or The information of data output.Particularly, as shown in Figure 4 A, left area shows each field of data record in input table, packet Include field name 204 and field attribute 205;Right area shows the configuration page of configuration feature, as an example, the configuration page It may include for showing the content options of feature extraction configuration item for the imported interface of the selection manually selected, wherein each For the hand-manipulating of needle to a specific feature, be correspondingly configured with this feature comes source item 206, processing method 207 and feature name 208.
As an example, can be operated according to user to the setting for each field that left area is shown, correspondingly in right side region Domain shows each feature configuration project of user setting.In one example, user can matching of showing of manual editing's right area Set project.
Particularly, can first on a graphical user interface (for example, left area) show data record each field, When user chooses the field of (for example, choosing by clicking) some or certain displays, user is chosen in the configuration page Field be set as the source field of setting and processing method list display exist and while the source field is selected On graphic user interface, here, as an example, processing method list be displayed at user selection source field nearby so as to The processing method that will be shown in the configuration page is therefrom selected in user;Here, in the processing method list, all processing Method can be in state of activation;Alternatively, can only include the processing method that can be applied to the source field item chosen;Alternatively, It may include whole processing methods but the processing method that will apply is shown as state of activation and will be unable to the processing method of application It is shown as disabled status.
While Fig. 4 B shows the single field (for example, " age " field) 301 in left area and is easily selected by a user, to The example of the partial graphical user interface 300 of user's display processing method list 302.For example, " age " field when the user clicks When 301, it is selective that processing method list 302 is popped up on right side near " age " field.It can be arranged in processing method list 302 All processing methods out, and the processing method that user is currently selected is highlighted.In addition, can also be only in processing method list Display can be applied to the processing method of " age " field of selection in 302, alternatively, only will in processing method list 302 The processing method of " age " field applied to selection activated (for example, be shown as optional state or highlight state) and Other processing methods are shown as illegal state.
While Fig. 4 C shows multiple fields 401,402,403 in left area and is easily selected by a user, shown to user The example of the partial graphical user interface 400 of processing method list 404.This indicates that user can choose more than one in left side Source field 401,402 and 403 correspondingly can pop up processing method list 404, choose for user and answer these source fields Processing method.Similarly, mode appropriate can be used to pop up processing method list 404, also, processing method list 404 can need not include that all processing methods correspondingly can be dynamically adjusted and handled according to the source field that left side selects The processing method shown in method list 404.
Content options in addition to above-mentioned display feature extraction configuration item are for manually selecting (for example, being clicked by mouse Mode) selection imported interface except, can also be using other forms for the interface of feature extraction configuration item, example to be arranged Such as, it for the text editing interface of manual editing's configuration file, allows users to directly write in text editing interface and " match Set file ", since configuration file itself has the repeatability in content, can by text editing operations (for example, duplication, paste, Dragging etc.) it is rapidly completed writing for " configuration file ".
Fig. 5 shows exemplary graphical user circle with the region that can carry out text editing to feature extraction configuration item Face 500.The left side of graphic user interface 500 has similarity with graphic user interface shown in Fig. 4 B and Fig. 4 C, only figure The right area of user interface 500 shows the text editing interface 501 for manual editing's configuration file, and user can be in text Manual editing's feature extraction configures project, including configuration feature item title, source field item, processing method in editing interface 501 Item, processing parameter item etc..By the text editing operations that are executed in text editing interface (such as duplication, paste, dragging etc.), User being capable of high efficiency progress feature extraction configuration item purpose setting.
Above two graphic user interface can be simultaneously displayed on screen, can also be shown separately according to the user's choice On screen, for example, the changing interface operation input in response to user is cut between text editing interface and the imported interface of selection (display switching or activation switching) is changed, the feature extraction configuration item setting result under interface, which is synchronously displayed, before the handover cuts Under interface after changing.Correspondingly, convenience of the user using two kinds of configuration interfaces operationally, is more effectively arranged multiple spies Extraction mode is levied, for example, user can complete representativeness by the selection input mode such as click first in the imported interface of selection Feature extraction configuration, then switch under text editing interface, since the result being arranged before can synchronously be shown in text In editing interface, user quickly completes the extraction item setting of big measure feature in combination with operations such as duplication stickups.
Features described above, which extracts mode, can be applied to any suitable scene, below will be using machine learning field as example pair It is described.
In existing machine learning field, in order to carry out model based on a large amount of structuring or unstructured data Training, test or application, generally require to expend more manpower in the Feature Engineering stage, for example, it is desired to the preparatory needle of programming personnel The extraction code of each feature is write to specific feature extraction rule.Correspondingly, make in Modeling Platform etc. for client In modeling product, generally require input Modeling Platform be extract training data (that is, extract good feature to Amount), and user is difficult to flexibly set or adjust the object and rule about feature extraction so that the use of Modeling Platform by Limitation.
According to another embodiment of the present invention, a kind of machine learning method using features described above abstracting method is provided, The machine learning method can be applied to the system that Modeling Platform etc. carries out data modeling convenient for user (for example, business personnel) Always.It is illustrated below with reference to Fig. 6.
Fig. 6 shows the machine learning side of the Feature Extraction Method according to an embodiment of the present invention for applying above-described embodiment The overview flow chart of method 600.Here, as an example, the program for realizing the method can be encapsulated as to special software package (example Such as, the library lib), for example, step S610, S620 and S630 can be encapsulated as to individual software package, thus whether call online or The software package is called offline, the feature extraction service of consistency can be realized, and is overcome in the prior art due to online offline The defect that environment is different and causes feature extraction result inconsistent.In addition, step S640 can be also encapsulated as to individual software package, To whether call or call offline the software package online, machine learning can be carried out based on the feature of extraction.
In step S610, data record obtaining step is executed, obtains data record.
In step S620, feature extraction configuration item obtaining step is executed, obtains to be used to limit and how remember from the data The feature extraction configuration item of predetermined characteristic is extracted in record, wherein the feature extraction configuration item of every kind of predetermined characteristic includes source field Item and processing method item, source field item are used to the field restriction of data record involved in every kind of predetermined characteristic be next Source field, processing method item is for the specified reference to the data processing function for being previously programmed as executable code, wherein described The field value for the source field that data processing function is used to limit for origin source field item executes pre- for extracting described every kind Determine the data processing of feature.
In step S630, characteristic value obtaining step is executed, based on feature extraction configuration item to the word of the data record Segment value executes data processing to obtain the characteristic value of the predetermined characteristic.
The specific implementation of above step S610, S620 and S630 and function can refer in conjunction with Fig. 1 describe step S110, S120 and S130, which is not described herein again.
It in step S640, executes sample and obtains step, be at least partially based on the spy that the characteristic value obtaining step obtains Value indicative forms feature vector, the sample as machine learning.
It pair in one example, is to obtain whole dimensions of feature vector to such feature extraction of data record, i.e., It is recorded in every data, is based on feature extraction configuration item, data processing is executed for the relevant field of data record, is obtained each The characteristic value of feature, the eigenvalue cluster of these each dimensions form complete machine learning sample altogether.
In another example, the characteristic value of obtained each dimension can be the feature of partial dimensional, Ke Yihe The feature of other dimensions combines, and forms last feature vector.Here to the form of other feature or source, there is no limit, It can be from external, either can be and obtained locally with similar or different Feature Extraction Method.
In step S650, machine learning step is executed, machine learning is carried out based on the sample.Here, institute can be based on State at least one among sample progress model training, model measurement and model application.
Here, it when carrying out model training, for the machine learning algorithm specifically used, is not particularly limited, can be Such as the various machine learning sides such as neural network, Bayesian network, support vector machines, decision tree, genetic algorithm, expert system Method.It should be noted that after establishing pattern drawing based on training data, it can be for the data for being used in testing for model performance Record, obtains test sample using same Feature Extraction Method, which is input in the model that training obtains, Can judgment models performance.In addition, can also be to the data record that will be predicted using model, using same feature extraction Method will be input in model, to obtain corresponding prediction result come the sample that is applied using sample.Here, for model Targeted problem there is no limit, it is different according to being executed for task, can be for example whether have for factory's workpiece it is scarce Sunken judgement, the environment of plant whether An Quan judgement, the judgement, etc. of someone creditworthiness.
Feature Extraction Method according to an embodiment of the present invention is utilized in machine learning method according to an embodiment of the present invention, special The feature extraction and sample set for not being suitable for big data obtain, additionally it is possible to which the user convenient for Modeling Platform directly participates in machine Each process of device study, for example, the foundation of model, training and application process.
Another embodiment according to the present invention provides a kind of feature extraction dress that feature extraction is carried out for data record It sets, comprising: data record acquiring unit is configured to obtain data record;Feature extraction configuration item acquiring unit, is configured to obtain For limit how from the data record extract predetermined characteristic feature extraction configuration item, wherein the spy of every kind of predetermined characteristic It includes source field item and processing method item that sign, which extracts configuration item, and source field item is used for will be involved by every kind of predetermined characteristic The field restriction of data record be source field, processing method item is for specified to the data for being previously programmed as executable code Handle the reference of function, wherein the field for the source field that the data processing function is used to limit for origin source field item Value executes the data processing for extracting every kind of predetermined characteristic;Characteristic value acquiring unit is configured to feature extraction and matches It sets item and data processing is executed to obtain the characteristic value of the predetermined characteristic to the field value of the data record.
Another embodiment according to the present invention provides a kind of machine learning device, may include: that data record obtains list Member is configured to obtain data record;Feature extraction configuration item acquiring unit is configured to obtain for how limiting from the data Record extracts the feature extraction configuration item of predetermined characteristic, wherein the feature extraction configuration item of every kind of predetermined characteristic includes carrying out source word Section item and processing method item, source field item for being by the field restriction of data record involved in every kind of predetermined characteristic Source field, processing method item is for the specified reference to the data processing function for being previously programmed as executable code, wherein institute State data processing function for for origin source field item limit source field field value execution it is every kind described for extracting The data processing of predetermined characteristic;Characteristic value acquiring unit is configured to feature extraction configuration item to the word of the data record Segment value executes data processing to obtain the characteristic value of the predetermined characteristic;Training sample obtaining unit is configured at least partly base In the characteristic value that the characteristic value acquiring unit obtains, feature vector, the sample as machine learning are formed;And machine learning Unit is configured to the sample and carries out machine learning.
It should be noted that features described above draw-out device and machine learning device can be completely dependent on the operation of computer program to realize Corresponding function, that is, each unit is as module corresponding with each step in the function structure with computer program, so that entirely Device is called by special software package (for example, the library lib), in the corresponding feature extraction of online or offline realization or machine Device learning functionality.
On the other hand, above-mentioned each unit can also pass through hardware, software, firmware, middleware, microcode or its any group It closes to realize.When with software, firmware, middleware or microcode realization when, for carry out required task program code or Code section can store in the computer-readable medium of such as storage medium.Processor can carry out required task.
Here, the embodiment of the present invention can be implemented as computing device, including storage unit and processor, deposit in storage unit Set of computer-executable instructions conjunction is contained, when the set of computer-executable instructions, which is closed, to be executed by the processor, in execution State Feature Extraction Method and/or machine learning method.
Fig. 7 shows the configuration block diagram of computing device 1100 according to an embodiment of the present invention.
As shown in fig. 7, computing device 1100 includes central processing unit 1110, memory 1130, display 1140, network Interface 1150 and the input equipment 1200 that can be connected via wired or wireless way.Memory 1130, display 1140, Network interface 1150, input equipment 1200 are connected to central processing unit 1110 via bus 1120.Memory 1130 includes interior 1131 and external memory 1132 are deposited, in the normal operation of computing device 1100, operating system and each is populated in memory 1131 Kind application program;External memory 1132 can be ROM, hard disk or solid-state disk, can store BIOS, data, program etc. above.
The meter of the Feature Extraction Method and/or machine learning method that can implement the embodiment of the present invention is stored in memory Calculation machine instruction set, when the computer instruction set is executed by central processing unit, so that executing spy according to an embodiment of the present invention Levy abstracting method and/or machine learning method.It should be noted that central processing unit here can be and physically or logically be distributed Computing cluster, and be not limited to the calculating equipment of single machine.
Particularly, an embodiment according to the present invention provides a kind of meter that feature extraction is carried out for data record Device, including storage unit and processor are calculated, set of computer-executable instructions conjunction is stored in storage unit, when the computer When executable instruction set is executed by the processor, following step: data record obtaining step is executed, obtains data record; Feature extraction configuration item obtaining step obtains to be used to limit and how match from the feature extraction that the data record extracts predetermined characteristic Set item, wherein the feature extraction configuration item of every kind of predetermined characteristic includes source field item and processing method item, and source field item is used In being source field by the field restriction of data record involved in every kind of predetermined characteristic, processing method item is for specified pair It is previously programmed as the reference of the data processing function of executable code, wherein the data processing function is used to be directed to origin source The field value for the source field that field item limits executes the data processing for extracting every kind of predetermined characteristic;And characteristic value Obtaining step executes data processing based on field value of the feature extraction configuration item to the data record to obtain the predetermined spy The characteristic value of sign.
An embodiment according to the present invention, provide it is a kind of carry out machine learning computing device, including storage unit and Processor is stored with set of computer-executable instructions conjunction in storage unit, when the set of computer-executable instructions close it is described When processor executes, following step: data record obtaining step is executed, obtains data record;Feature extraction configuration item obtains step Suddenly, the feature extraction configuration item for being used to limit and how extracting predetermined characteristic from the data record is obtained, wherein every kind of predetermined spy The feature extraction configuration item of sign includes source field item and processing method item, and source field item is used for every kind of predetermined characteristic The field restriction of related data record is source field, and processing method item is for specified to being previously programmed as executable code Data processing function reference, wherein the data processing function be used for for origin source field item limit source field Field value execute the data processing for extracting every kind of predetermined characteristic;Characteristic value obtaining step, is matched based on feature extraction It sets item and data processing is executed to obtain the characteristic value of the predetermined characteristic to the field value of the data record;Sample is walked Suddenly, it is at least partially based on the characteristic value that the characteristic value obtaining step obtains, forms feature vector, the sample as machine learning This;And machine learning step, machine learning is carried out based on the sample.
Various embodiments of the present invention are described above, above description is exemplary, and non-exclusive, and It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill Many modifications and changes are obvious for the those of ordinary skill in art field.Therefore, protection scope of the present invention is answered This is subject to the protection scope in claims.

Claims (10)

1. a kind of method for carrying out feature extraction for data record, comprising:
Data record obtaining step obtains data record;
Feature extraction configuration item obtaining step is obtained to be used to limit and how be taken out from the feature that the data record extracts predetermined characteristic Take configuration item, wherein the feature extraction configuration item of every kind of predetermined characteristic includes source field item and processing method item, source field Item is for being source field by the field restriction of data record involved in every kind of predetermined characteristic, and processing method item is for referring to The fixed reference to the data processing function for being previously programmed as executable code, wherein the data processing function be used for for by The field value for the source field that source field item limits executes the data processing for extracting every kind of predetermined characteristic;And
Characteristic value obtaining step executes data processing based on field value of the feature extraction configuration item to the data record to obtain The characteristic value of the predetermined characteristic.
2. according to the method described in claim 1, wherein, the feature extraction configuration item obtaining step includes: special from being provided with The configuration file that sign extracts configuration item reads feature extraction configuration item or operated according to the input of user matches to obtain feature extraction Set item, wherein configuration file is stored locally or remotely reception.
3. according to the method described in claim 1, wherein, the feature extraction configuration item obtaining step includes:
The interface for feature extraction configuration item to be arranged is shown to user;
The input executed on the interface according to user operates to generate the configuration file provided with feature extraction configuration item;With And
Feature extraction configuration item is read from the configuration file of generation.
4. according to the method described in claim 3, wherein, the interface for feature extraction configuration item to be arranged is graphical user circle Face, the graphic user interface include for the text editing interface of manual editing's configuration file and/or for showing that feature is taken out Take the content options of configuration item for the imported interface of the selection manually selected.
5. according to the method described in claim 4, wherein, in the feature extraction configuration item obtaining step, in response to user Changing interface operation input switch between text editing interface and the imported interface of selection, the feature under interface before the handover Configuration item setting result is extracted to be synchronously displayed under the interface after switching.
6. method described in any claim in -5 according to claim 1, wherein the feature of every kind of predetermined characteristic is taken out Taking configuration item further includes storage location mark, is used to indicate design factor corresponding with the characteristic value of every kind of predetermined characteristic and exists Storage region in memory.
7. the machine learning method that a kind of computer executes, comprising:
Data record obtaining step obtains data record;
Feature extraction configuration item obtaining step is obtained to be used to limit and how be taken out from the feature that the data record extracts predetermined characteristic Take configuration item, wherein the feature extraction configuration item of every kind of predetermined characteristic includes source field item and processing method item, source field Item is for being source field by the field restriction of data record involved in every kind of predetermined characteristic, and processing method item is for referring to The fixed reference to the data processing function for being previously programmed as executable code, wherein the data processing function be used for for by The field value for the source field that source field item limits executes the data processing for extracting every kind of predetermined characteristic;
Characteristic value obtaining step executes data processing based on field value of the feature extraction configuration item to the data record to obtain The characteristic value of the predetermined characteristic;
Sample obtains step, is at least partially based on the characteristic value that the characteristic value obtaining step obtains, and forms feature vector, as The sample of machine learning;And
Machine learning step carries out machine learning based on the sample.
8. a kind of computing device for carrying out feature extraction for data record, including storage unit and processor, in storage unit It is stored with set of computer-executable instructions conjunction, when the set of computer-executable instructions, which is closed, to be executed by the processor, is executed Following step:
Data record obtaining step obtains data record;
Feature extraction configuration item obtaining step is obtained to be used to limit and how be taken out from the feature that the data record extracts predetermined characteristic Take configuration item, wherein the feature extraction configuration item of every kind of predetermined characteristic includes source field item and processing method item, source field Item is for being source field by the field restriction of data record involved in every kind of predetermined characteristic, and processing method item is for referring to The fixed reference to the data processing function for being previously programmed as executable code, wherein the data processing function be used for for by The field value for the source field that source field item limits executes the data processing for extracting every kind of predetermined characteristic;And
Characteristic value obtaining step executes data processing based on field value of the feature extraction configuration item to the data record to obtain The characteristic value of the predetermined characteristic.
9. a kind of computing device for carrying out machine learning, including storage unit and processor are stored with computer in storage unit Executable instruction set executes following step when the set of computer-executable instructions, which is closed, to be executed by the processor:
Data record obtaining step obtains data record;
Feature extraction configuration item obtaining step is obtained to be used to limit and how be taken out from the feature that the data record extracts predetermined characteristic Take configuration item, wherein the feature extraction configuration item of every kind of predetermined characteristic includes source field item and processing method item, source field Item is for being source field by the field restriction of data record involved in every kind of predetermined characteristic, and processing method item is for referring to The fixed reference to the data processing function for being previously programmed as executable code, wherein the data processing function be used for for by The field value for the source field that source field item limits executes the data processing for extracting every kind of predetermined characteristic;
Characteristic value obtaining step executes data processing based on field value of the feature extraction configuration item to the data record to obtain The characteristic value of the predetermined characteristic;
Sample obtains step, is at least partially based on the characteristic value that the characteristic value obtaining step obtains, and forms feature vector, as The sample of machine learning;And
Machine learning step carries out machine learning based on the sample.
10. a kind of feature extraction device for carrying out feature extraction for data record, comprising:
Data record acquiring unit is configured to obtain data record;
Feature extraction configuration item acquiring unit is configured to obtain for how limiting from data record extraction predetermined characteristic Feature extraction configuration item, wherein the feature extraction configuration item of every kind of predetermined characteristic includes source field item and processing method item, is come Source field item is used to the field restriction of data record involved in every kind of predetermined characteristic be source field, processing method item For the specified reference to the data processing function for being previously programmed as executable code, wherein the data processing function is used for The data processing for extracting every kind of predetermined characteristic is executed for the field value for the source field that origin source field item limits;
Characteristic value acquiring unit is configured to feature extraction configuration item and executes data processing to the field value of the data record To obtain the characteristic value of the predetermined characteristic.
CN201910743847.1A 2016-01-08 2016-01-08 Feature Extraction Method, machine learning method and its device Pending CN110442417A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910743847.1A CN110442417A (en) 2016-01-08 2016-01-08 Feature Extraction Method, machine learning method and its device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610011587.5A CN105677353A (en) 2016-01-08 2016-01-08 Feature extraction method and machine learning method and device thereof
CN201910743847.1A CN110442417A (en) 2016-01-08 2016-01-08 Feature Extraction Method, machine learning method and its device

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201610011587.5A Division CN105677353A (en) 2016-01-08 2016-01-08 Feature extraction method and machine learning method and device thereof

Publications (1)

Publication Number Publication Date
CN110442417A true CN110442417A (en) 2019-11-12

Family

ID=56299543

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201910743847.1A Pending CN110442417A (en) 2016-01-08 2016-01-08 Feature Extraction Method, machine learning method and its device
CN201610011587.5A Pending CN105677353A (en) 2016-01-08 2016-01-08 Feature extraction method and machine learning method and device thereof

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201610011587.5A Pending CN105677353A (en) 2016-01-08 2016-01-08 Feature extraction method and machine learning method and device thereof

Country Status (1)

Country Link
CN (2) CN110442417A (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610239B (en) * 2016-09-27 2024-04-12 第四范式(北京)技术有限公司 Feature processing method and feature processing system for machine learning
CN108154237B (en) * 2016-12-06 2022-04-05 华为技术有限公司 Data processing system and method
CN106779088B (en) * 2016-12-06 2019-04-23 第四范式(北京)技术有限公司 Execute the method and system of machine learning process
CN107169574A (en) * 2017-05-05 2017-09-15 第四范式(北京)技术有限公司 Using nested machine learning model come the method and system of perform prediction
CN107169573A (en) * 2017-05-05 2017-09-15 第四范式(北京)技术有限公司 Using composite machine learning model come the method and system of perform prediction
CN107273131A (en) * 2017-06-22 2017-10-20 艾凯克斯(嘉兴)信息科技有限公司 A kind of machine learning method applied to Configurable BOM
CN113220688A (en) * 2017-07-04 2021-08-06 第四范式(北京)技术有限公司 Method and device for splicing data records
CN107766946B (en) * 2017-09-28 2020-06-23 第四范式(北京)技术有限公司 Method and system for generating combined features of machine learning samples
CN108008942B (en) * 2017-11-16 2020-04-07 第四范式(北京)技术有限公司 Method and system for processing data records
CN108090516A (en) * 2017-12-27 2018-05-29 第四范式(北京)技术有限公司 Automatically generate the method and system of the feature of machine learning sample
CN108228861B (en) * 2018-01-12 2020-09-01 第四范式(北京)技术有限公司 Method and system for performing feature engineering for machine learning
CN108681426B (en) * 2018-05-25 2020-08-11 第四范式(北京)技术有限公司 Method and system for performing feature processing on data
CN110209902B (en) * 2018-08-17 2023-11-14 第四范式(北京)技术有限公司 Method and system for visualizing feature generation process in machine learning process
CN109144648B (en) * 2018-08-21 2020-06-23 第四范式(北京)技术有限公司 Method and system for uniformly performing feature extraction
CN111273953B (en) * 2018-11-19 2021-07-16 Oppo广东移动通信有限公司 Model processing method, device, terminal and storage medium
CN110427222A (en) * 2019-06-24 2019-11-08 北京达佳互联信息技术有限公司 Data load method, device, electronic equipment and storage medium
CN110334131A (en) * 2019-07-09 2019-10-15 西安点告网络科技有限公司 The method and apparatus of feature extraction for machine learning model
CN110569271B (en) * 2019-09-17 2022-11-15 第四范式(北京)技术有限公司 Data processing method and system for extracting features
CN110633078B (en) * 2019-09-20 2020-12-15 第四范式(北京)技术有限公司 Method and device for automatically generating feature calculation codes
CN110795424A (en) * 2019-09-30 2020-02-14 北京淇瑀信息科技有限公司 Feature engineering variable data request processing method and device and electronic equipment
CN110851500B (en) * 2019-11-07 2022-10-28 北京集奥聚合科技有限公司 Method for generating expert characteristic dimension required by machine learning modeling

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060271533A1 (en) * 2005-05-26 2006-11-30 Kabushiki Kaisha Toshiba Method and apparatus for generating time-series data from Web pages
CN101958987A (en) * 2009-07-14 2011-01-26 中国电信股份有限公司 Method and system for dynamically converting telecommunications service data
CN102243649A (en) * 2011-06-07 2011-11-16 上海交通大学 Semi-automatic information extraction processing device of ontology
CN102622354A (en) * 2011-01-27 2012-08-01 北京世纪读秀技术有限公司 Aggregated data quick searching method based on feature vector
CN103914478A (en) * 2013-01-06 2014-07-09 阿里巴巴集团控股有限公司 Webpage training method and system and webpage prediction method and system
CN104424263A (en) * 2013-08-29 2015-03-18 腾讯科技(深圳)有限公司 Data recording method and data recording device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100280990A1 (en) * 2009-04-30 2010-11-04 Castellanos Maria G Etl for process data warehouse
CN101763261B (en) * 2009-12-28 2013-01-23 山东中创软件商用中间件股份有限公司 Method and system for extracting, converting and loading data
CN104881488B (en) * 2015-06-05 2017-04-05 焦点科技股份有限公司 Configurable information extraction method based on relation table

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060271533A1 (en) * 2005-05-26 2006-11-30 Kabushiki Kaisha Toshiba Method and apparatus for generating time-series data from Web pages
CN101958987A (en) * 2009-07-14 2011-01-26 中国电信股份有限公司 Method and system for dynamically converting telecommunications service data
CN102622354A (en) * 2011-01-27 2012-08-01 北京世纪读秀技术有限公司 Aggregated data quick searching method based on feature vector
CN102243649A (en) * 2011-06-07 2011-11-16 上海交通大学 Semi-automatic information extraction processing device of ontology
CN103914478A (en) * 2013-01-06 2014-07-09 阿里巴巴集团控股有限公司 Webpage training method and system and webpage prediction method and system
CN104424263A (en) * 2013-08-29 2015-03-18 腾讯科技(深圳)有限公司 Data recording method and data recording device

Also Published As

Publication number Publication date
CN105677353A (en) 2016-06-15

Similar Documents

Publication Publication Date Title
CN110442417A (en) Feature Extraction Method, machine learning method and its device
CN105487864B (en) The method and apparatus of Code automatic build
US10769721B2 (en) Intelligent product requirement configurator
US10248720B1 (en) Systems and methods for preparing raw data for use in data visualizations
CN113939829A (en) Data sampling for model exploration
CN107578140A (en) Guide analysis system and method
WO2018079225A1 (en) Automatic prediction system, automatic prediction method and automatic prediction program
Pandey et al. Examining the Role of Enterprise Resource Planning (ERP) in Improving Business Operations in Companies
US20220351004A1 (en) Industry specific machine learning applications
US9298686B2 (en) System and method for simulating discrete financial forecast calculations
CN104541297A (en) Extensibility for sales predictor (SPE)
CN112463986A (en) Information storage method and device
Efford et al. Package ‘secr’
US20220351051A1 (en) Analysis system, apparatus, control method, and program
Winters Practical predictive analytics
CN114692889A (en) Meta-feature training model for machine learning algorithm
CN108701153B (en) Method, system and computer readable storage medium for responding to natural language query
Van Orshoven et al. Upgrading geographic information systems to spatio-temporal decision support systems
WO2020060720A1 (en) Analyzing natural language expressions in a data visualization user interface
CN110333844B (en) Calculation formula processing method and device
Budaev et al. Development of the Web Service «Analysis of Demographic Indicators of the Region»
CN111199287A (en) Feature engineering real-time recommendation method and device and electronic equipment
Gervas Analysis of User Interface design methods
CN113821296B (en) Visual interface generation method, electronic equipment and storage medium
Loureiro et al. Predicting multiple domain queue waiting time via machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination