CN108108820A - For selecting the method and system of the feature of machine learning sample - Google Patents

For selecting the method and system of the feature of machine learning sample Download PDF

Info

Publication number
CN108108820A
CN108108820A CN201711383339.4A CN201711383339A CN108108820A CN 108108820 A CN108108820 A CN 108108820A CN 201711383339 A CN201711383339 A CN 201711383339A CN 108108820 A CN108108820 A CN 108108820A
Authority
CN
China
Prior art keywords
feature
candidate
machine learning
candidate feature
importance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711383339.4A
Other languages
Chinese (zh)
Inventor
戴文渊
杨强
陈雨强
罗远飞
涂威威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
4Paradigm Beijing Technology Co Ltd
Original Assignee
4Paradigm Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 4Paradigm Beijing Technology Co Ltd filed Critical 4Paradigm Beijing Technology Co Ltd
Priority to CN202310907091.6A priority Critical patent/CN116882520A/en
Priority to CN201711383339.4A priority patent/CN108108820A/en
Publication of CN108108820A publication Critical patent/CN108108820A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

Provide a kind of method and system for the feature for being used to select machine learning sample.The described method includes:(A) candidate characteristic set is divided into multiple candidate feature subsets;(B) for each candidate feature subset, corresponding feature pool machine learning model is obtained;(C) importance of each candidate feature among corresponding candidate feature subset is determined according to the difference between effect of the feature pool machine learning model on original test data collection and conversion test data set;And (D) therefrom filters out at least one candidate feature of high importance using the target signature as machine learning sample for each candidate feature subset according to the importance of its each candidate feature.According to described method and system, relatively important sample characteristics can be preferably filtered out in the case where calculation resources are limited.

Description

For selecting the method and system of the feature of machine learning sample
Technical field
All things considered of the present invention is related to artificial intelligence field, is used to select machine learning sample more specifically to one kind The method and system of this feature.
Background technology
With the appearance of mass data, artificial intelligence technology is rapidly developed, and in order to be excavated from mass data Bid value based on data record, it is necessary to generate the sample suitable for machine learning.
Here, per data, record can be seen as the description as described in an event or object, corresponding to an example or sample Example.In data record, each item of performance or property including reflection event or object in terms of certain, these items can claim For " attribute ".The processing such as Feature Engineering are carried out by the attribute information to data record, can be generated including various features Machine learning sample.
In practice, the extraction of the selecting of the prediction effect of machine learning model and model, available data and sample characteristics There is relation.In addition, using machine learning techniques when also needs to objective ask in face of computing resource is limited, sample data is insufficient etc. Topic.Therefore, the feature of machine learning sample how is efficiently extracted out from each attribute of original data record, it will to machine The effect of learning model brings very big influence.For example, it can be calculated each special according to the tree-model trained based on XGBoost The expectation division gain of sign, then calculates feature importance, and screens feature based on the importance.Although aforesaid way can be examined Consider the interaction between feature, but training cost is high, and different parameters are affected to feature importance.
In fact, during feature is screened, the knowledge that technical staff not only grasps machine learning is generally required, is also needed There is deep understanding to actual prediction problem, and forecasting problem often combines the different practical experiences of different industries, causes It is extremely difficult to satisfied effect.
The content of the invention
Exemplary embodiment of the present invention, which is intended to overcome, to be difficult to effectively filter out machine learning sample spy in the prior art The defects of sign.
Exemplary embodiment according to the present invention provides a kind of method for the feature for being used to select machine learning sample, bag It includes:(A) candidate characteristic set is divided into multiple candidate feature subsets;(B) for each candidate feature subset, obtain corresponding Feature pool machine learning model, wherein, the feature pool machine learning model corresponds to each described candidate feature subset; (C) according to feature pool machine learning model original test data collection and conversion test data set on effect between difference come Determine the importance of each candidate feature among corresponding candidate feature subset, wherein, conversion test data set refers to pass through The number that the original value of the candidate feature to be determined to its importance of original test data concentration replaces with transformed value and obtains According to collection;And (D) therefrom filters out importance for each candidate feature subset according to the importance of its each candidate feature Higher at least one candidate feature is using the target signature as machine learning sample.
Optionally, in the method, the transformed value includes at least one among following item:Zero, random number, The value obtained after original value upset order by its importance for concentrating original test data candidate feature to be determined.
Optionally, the method further includes:(E) target signature is removed from candidate characteristic set to update candidate characteristic set;And And after step (E), the method is performed since step (A) based on newer candidate characteristic set again, until completing The selection of all target signatures.
Optionally, in the method, in step (E), also add while target signature is removed from candidate characteristic set Enter new candidate feature to update candidate characteristic set.
Optionally, in the method, the new candidate feature is by carrying out combinations of features between candidate feature And newly-generated assemblage characteristic.
Optionally, in the method, in step (B), by concurrently training multiple feature pool machine learning models To obtain feature pool machine learning model corresponding with each candidate feature subset.
Optionally, the method further includes:(E) using the target signature filtered out as newer candidate characteristic set;Also, After step (E), the method is performed since step (A) based on newer candidate characteristic set again, it is pre- until meeting If target signature screening termination condition.
In accordance with an alternative illustrative embodiment of the present invention, provide a kind of for selecting the feature of machine learning sample to be System, including:Candidate feature subset divides device, for candidate characteristic set to be divided into multiple candidate feature subsets;Feature pool machine Device learning model acquisition device for being directed to each candidate feature subset, obtains corresponding feature pool machine learning model, In, the feature pool machine learning model corresponds to each described candidate feature subset;Candidate feature importance determining device, For according to feature pool machine learning model original test data collection and conversion test data set on effect between difference Determine the importance of each candidate feature among corresponding candidate feature subset, wherein, conversion test data set refers to lead to The original value for crossing the candidate feature to be determined to its importance of original test data concentration replaces with transformed value and obtains Data set;And
Target signature selection device, for being directed to each candidate feature subset, according to the important of its each candidate feature Property therefrom filters out at least one candidate feature of high importance using the target signature as machine learning sample.
Optionally, in the system, the transformed value includes at least one among following item:Zero, random number, The value obtained after original value upset order by its importance for concentrating original test data candidate feature to be determined.
Optionally, in the system, candidate feature subset division device also from candidate characteristic set remove target signature with Candidate characteristic set is updated, and newer candidate characteristic set is divided into multiple candidate feature subsets, it is special until completing all targets The selection of sign.
Optionally, in the system, candidate feature subset division device is removing target signature from candidate characteristic set New candidate feature is additionally added to update candidate characteristic set simultaneously.
Optionally, in the system, the new candidate feature is by carrying out combinations of features between candidate feature And newly-generated assemblage characteristic.
Optionally, in the system, feature pool machine learning model acquisition device is by concurrently training multiple features Pond machine learning model obtains feature pool machine learning model corresponding with each candidate feature subset.
Optionally, in the system, character subset divides device also using the target signature filtered out as newer time Feature set is selected, and newer candidate characteristic set is divided into multiple candidate feature subsets, until meeting default target signature sieve Select termination condition.
In accordance with an alternative illustrative embodiment of the present invention, a kind of calculating for the feature for being used to select machine learning sample is provided Machine readable medium, wherein, record has any as described above for selecting machine learning sample on the computer-readable medium The computer program of the method for this feature.
In accordance with an alternative illustrative embodiment of the present invention, a kind of calculating for the feature for being used to select machine learning sample is provided Device, including storage unit and processor, wherein, set of computer-executable instructions conjunction is stored in storage unit, when the meter When calculation machine executable instruction set is performed by the processor, execution is any to be used to select machine learning sample as described above The method of feature.
It, will according to an exemplary embodiment of the present invention in selecting the method and system of feature of machine learning sample Candidate characteristic set carries out subset division, for each candidate feature subset marked off, using feature pool machine learning model, adopts Mode is weighed with specific importance to determine the importance of wherein each candidate feature, so as in the limited feelings of calculation resources Relatively important sample characteristics are preferably filtered out under condition.
Description of the drawings
From the detailed description below in conjunction with the accompanying drawings to the embodiment of the present invention, these and/or other aspect of the invention and Advantage will become clearer and be easier to understand, wherein:
Fig. 1 shows the frame of the system of the feature according to an exemplary embodiment of the present invention for being used to select machine learning sample Figure;
Fig. 2 shows the flow of the method for the feature according to an exemplary embodiment of the present invention for being used to select machine learning sample Figure;
Fig. 3 shows the flow chart of the method for training characteristics pond according to an exemplary embodiment of the present invention machine learning model; And
Fig. 4 show another exemplary embodiment according to the present invention for selecting the method for the feature of machine learning sample Flow chart.
Specific embodiment
In order to which those skilled in the art is made to more fully understand the present invention, with reference to the accompanying drawings and detailed description to this hair Bright exemplary embodiment is described in further detail.
In an exemplary embodiment of the present invention, screening machine learning sample feature is carried out in the following manner:By all times Feature is selected to be divided into multiple subsets, for each candidate feature subset, using corresponding feature pool machine learning model in difference Performance difference in test data set determines the importance of wherein each candidate feature, and therefrom selects more important feature It is used as the target signature of machine learning sample.
Here, machine learning is the inevitable outcome that artificial intelligence study develops to certain phase, is directed to passing through calculating Means, improve the performance of system itself using experience.In computer systems, " experience " is usually deposited in the form of " data " By machine learning algorithm, " model " can be being generated from data, that is to say, that machine learning is supplied to calculate empirical data Method can just be based on these empirical datas and generate model, and when in face of news, model can provide corresponding judgement, i.e. prediction As a result.Whether training machine learning model or predicted using trained machine learning model, data are required for turning It is changed to the machine learning sample including various features.Machine learning can be implemented as " supervised learning ", " unsupervised learning " or The form of " semi-supervised learning ", it should be noted that exemplary embodiment of the present invention is to specific machine learning algorithm and without spy Definite limitation.Further, it should also be noted that during training and application model, other means such as statistic algorithm are may also be combined with.
Fig. 1 shows the frame of the system of the feature according to an exemplary embodiment of the present invention for being used to select machine learning sample Figure.Feature selecting system shown in FIG. 1 includes candidate feature subset division device 100, feature pool machine learning model obtains dress Put 200, candidate feature importance determining device 300 and target signature selection device 400.
Particularly, candidate feature subset division device 100 is used to candidate characteristic set being divided into multiple candidate features Collection.Here, candidate characteristic set may include at least one candidate feature, which can be the attribute letter for data record Breath carries out feature obtained from any characteristic processing.Correspondingly, candidate feature subset division device 100 can be according to any appropriate Mode divides the candidate feature included by candidate characteristic set, to obtain multiple candidate feature subsets.As an example, candidate Character subset division device 100 can carry out random division to whole candidate features so that each candidate feature is concentrated comprising phase With the candidate feature of quantity.Alternatively, candidate feature subset division device 100 will can in some aspects have uniformity and/or pass The candidate feature of connection property is divided into identical candidate feature and concentrates, for example, the candidate feature concentration after division can only include The identical one group of feature (that is, discrete features or continuous feature etc.) of value type;In another example the candidate feature after division is concentrated One group of similar feature of business meaning can only be included (for example, the feature on user itself or the feature on trade property Deng).It is to be understood that exemplary embodiment of the present invention is not intended to limit the specific dividing mode of candidate feature subset.
Here, candidate feature subset division device 100 can receive candidate from the other component in system or from its exterior Feature set, and the candidate characteristic set of reception is divided.
Alternately, candidate feature subset division device 100 can additionally be responsible for the attribute information based on data record To generate candidate feature.For this purpose, as an example, candidate feature subset division device 100 can further obtain data record, In, the data record includes multiple attribute informations.For example, candidate feature subset division device 100 can obtain it is labeled Historgraphic data recording, for carry out Supervised machine learning.
Data that above-mentioned historgraphic data recording can be the data generated online, previously generate and store can also be logical Cross input unit or transmission medium and from the data of external reception.These data can relate to the attribute letter of personal, enterprise or tissue Breath, for example, identity, educational background, occupation, assets, contact method, debt, income, the information such as get a profit, pay taxes.Alternatively, these data The attribute information of business relevant item is can also refer to, for example, on the turnover of deal contract, both parties, subject matter, transaction The information such as place.It should be noted that the attribute information content mentioned in exemplary embodiment of the present invention can relate to any object or thing Performance or property of the business in terms of certain, and be not limited to be defined individual, object, tissue, unit, mechanism, project, event etc. Or description.
Candidate feature subset division device 100 can obtain structuring or the unstructured data of separate sources, for example, text Data or numeric data etc..The data record of acquisition can be used to form machine learning sample, participate in the training of machine learning model/ Test process.These data can be derived from inside the entity for it is expected to obtain model prediction result, for example, pre- from it is expected to obtain Survey bank, enterprise, school of result etc.;These data can be also derived from beyond above-mentioned entity, for example, being provided from data Business, internet (for example, social network sites), mobile operator, APP operator, express company, credit institution etc..Optionally, it is above-mentioned Use can be combined in internal data and external data, to form the machine learning sample for carrying more information.
Above-mentioned data can be input to candidate feature subset division device 100 by input unit or by candidate feature Collection division device 100 automatically generates according to existing data or can divide device 100 from network by candidate feature subset (for example, storage medium (for example, data warehouse) on network) obtains, in addition, the intermediate data switch of such as server It can help to candidate feature subset and divide device 100 from the corresponding data of external data source acquisition.Here, the data of acquisition can quilt The data conversion modules such as the text analysis model in candidate feature subset division device 100 are converted to the form being easily processed.
Here, candidate feature subset division device 100 can be primarily based on multiple attribute information next life of historgraphic data recording Into candidate feature, in the process, any appropriate characteristic processing mode, which can be used, in candidate feature subset division device 100 comes To single single order candidate feature or the combination candidate feature of higher order (for example, second order, three ranks etc.), wherein, " rank " represents ginseng With the single feature number combined.
As an example, can be continuous feature by the candidate feature that candidate feature subset division device 100 generates, wherein, Candidate feature subset divide device 100 by least one Continuous valued attributes information among the multiple attribute information and/ Or centrifugal pump attribute information is handled and generates the continuous feature.
Particularly, at least a portion attribute information based on historgraphic data recording can generate corresponding continuous feature, this In, continuously it is characterized in and a kind of opposite feature of discrete features (for example, category feature), value can be had centainly continuously The numerical value of property, for example, distance, age, amount of money etc..Relatively, as an example, the value of discrete features does not have continuity, example Such as, can be " from Beijing ", " from Shanghai " or the unordered classification such as " from Tianjin ", " gender is man ", " gender is female " Feature.
Citing is got on very well, can be by the Continuous valued attributes information of certain in historgraphic data recording directly as corresponding continuous spy Sign, for example, can will be apart from attribute informations such as, age, the amount of money directly as corresponding continuous feature.That is, continuous feature It itself can be formed by the Continuous valued attributes information among the multiple attribute information.It alternatively, also can be by historgraphic data recording In some attribute informations (for example, Continuous valued attributes and/or centrifugal pump attribute information) handled, it is corresponding continuous to obtain Feature, for example, using height with the ratio of weight as corresponding continuous feature.Particularly, the continuous feature can be by institute The centrifugal pump attribute information among multiple attribute informations is stated to carry out continuous transformation and formed.As an example, the continuous transformation can Instruction counts the value of the centrifugal pump attribute information.For example, continuous feature may indicate that some centrifugal pump attribute informations The statistical information of prediction target on machine learning model.Citing is got on very well, can be by seller in the example of prediction purchase probability Trade company numbers the probability statistics for the history buying behavior that this centrifugal pump attribute information is transformed on corresponding seller trade company coding Feature.
It can be combined between continuous feature as described above by modes such as arithmetical operations, using as according to this hair The combination candidate feature of bright exemplary embodiment.
As another example, the candidate feature generated by candidate feature subset division device 100 can be discrete features, In, candidate feature subset division device 100 passes through at least one Continuous valued attributes information among the multiple attribute information And/or centrifugal pump attribute information is handled and generates the discrete features.
Particularly, at least a portion attribute information based on historgraphic data recording can generate corresponding discrete features, lift Example is got on very well, can be by the centrifugal pump attribute information of certain in historgraphic data recording directly as corresponding discrete features, that is to say, that Discrete features itself can be formed by the centrifugal pump attribute information among the multiple attribute information.It alternatively, also can be by history Some attribute informations (for example, Continuous valued attributes and/or centrifugal pump attribute information) in data record are handled, to obtain phase The discrete features answered.
Here, can by continuous feature (for example, Continuous valued attributes information in itself or by centrifugal pump attribute information through company The continuous feature for continuing conversion and being formed) discretization is carried out to obtain corresponding discrete features.Preferably, carried out to continuous feature During discretization, candidate feature subset division device 100 can be directed to each continuous feature, perform at least one branch mailbox computing, with The discrete features being made of at least one branch mailbox feature are generated, wherein, each branch mailbox computing corresponds to a branch mailbox feature, so as to It is enough simultaneously obtain it is multiple from different angles, scale/level portray the discrete features of some attributes of original data record.
After candidate feature subset division device 100 marks off each candidate feature subset, further, feature pool machine Device learning model acquisition device 200 can be directed to each candidate feature subset, obtain corresponding feature pool machine learning model, In, the feature pool machine learning model corresponds to each described candidate feature subset.
Exemplary embodiment according to the present invention for each candidate feature subset, need to obtain corresponding feature pool machine Device learning model.Here, feature pool machine learning model acquisition device 200 itself can complete the instruction of feature pool machine learning model Practice, also can obtain trained feature pool machine learning model from outside.Here, the feature pool machine learning model Sample characteristics part may include the candidate feature included by each described candidate feature subset.It, can structure accordingly, as example The training sample of feature pool machine learning model is built, and based on these training samples come training characteristics pond machine learning model.It should Note that exemplary embodiment of the present invention is not intended to limit the algorithm of feature pool machine learning model.Preferably, different candidate features The corresponding feature pool machine learning model of subset can be based on identical machine learning model algorithm.
Candidate feature importance determining device 300 be used for according to feature pool machine learning model in original test data collection and The difference between the effect in test data set is converted to determine each candidate feature among corresponding candidate feature subset Importance, conversion test data set refer to the original by its importance concentrated to original test data candidate feature to be determined The data set that beginning value replaces with transformed value and obtains.
Here, candidate feature importance determining device 300 can utilize the feature pool corresponding to each candidate feature subset Machine learning model determines the importance of each candidate feature in respective subset.As an example, feature pool machine learning mould The effect of type may include AUC (ROC (Receiver Operating Characteristics, the Receiver Operating of feature pool machine learning model Characteristic) area under a curve, Area Under ROC Curve) or to rate loss (logistic loss) etc..
As an example it is supposed that some candidate feature subset includes whole candidate feature { f1,f2,…,fnAmong three spies Levy { f1,f3,f5, correspondingly, AUC of the feature pool machine learning model on original test data collection can reflect characteristic set {f1,f3,f5Predictive ability.Here, it is definite candidate feature f5Importance, can be by being wrapped to original test data collection Feature f in each test sample included5Original value handled to obtain conversion test data set, and and then obtain special Levy AUC of the pond machine learning model in conversion test data set.On this basis, the difference between above-mentioned two AUC can use In reflection candidate feature f5Importance.As an example, in conversion process, the transformed value may include among following item extremely One item missing:Zero, random number are taken by its importance for concentrating original test data the original of candidate feature to be determined The value obtained after value upset order.It that is, can be by the feature f in each original test sample5Original value replace with zero Value, random number or by by feature f5Original value upset order after obtained value.Here, in definite candidate feature During the importance of each candidate feature among collection, it is preferable that using identical original test data collection and its each accordingly A conversion test data set.
Target signature selection device 400 is directed to each candidate feature subset, according to the importance of its each candidate feature At least one candidate feature of high importance is therefrom filtered out using the target signature as machine learning sample.Here, target Feature selecting device 400 can be concentrated from each candidate feature and select candidate feature of high importance respectively, using as mesh Mark feature.For example, target signature selection device 400 can root according to the rule pre-set come carry out target signature selection (example automatically Such as, each candidate feature concentrates the candidate feature of the selection highest predetermined quantity of importance);Alternatively, target signature selection dress Put 400 can according to the instruction of user among candidate feature subset selection target feature, for this purpose, can to user show candidate feature The important ratio of each candidate feature is to situation in subset, and correspondingly, target signature selection device 400 can receive user and therefrom select Select the instruction of target signature, and according to the instruction of user come selection target feature.
By the above-mentioned means, a part of relatively important target signature can be effectively filtered out among candidate feature.Make For optional mode, these target signatures can form final machine learning sample individually or with reference to other features.
The feature according to an exemplary embodiment of the present invention for being used to select machine learning sample is described referring to Fig. 2 The flow chart of method.Here, as an example, method shown in Fig. 2 can be as shown in Figure 1 feature selecting system perform, also may be used It is realized completely by computer program with software mode, side shown in Fig. 2 can be also performed by the computing device of particular configuration Method.For convenience, it is assumed that the feature selecting system of method shown in Fig. 2 as shown in Figure 1 performs.
With reference to Fig. 2, in the step s 100, by candidate feature subset divide device 100 candidate characteristic set is divided into it is multiple Candidate feature subset.
As described above, candidate feature subset division device 100 can in any suitable fashion carry out candidate characteristic set Division.Here, candidate feature subset division device 100 can receive the candidate characteristic set that is provided by its other party and to the candidate of reception Feature set is divided;Alternatively, candidate feature subset division device 100 can voluntarily generate candidate characteristic set, in this case, As an example, candidate feature subset division device 100 can be additionally responsible for carrying out the processing such as feature for data record to obtain phase The candidate characteristic set answered.
Next, in step s 200, each candidate feature is directed to by feature pool machine learning model acquisition device 200 Subset obtains corresponding feature pool machine learning model, wherein, the feature pool machine learning model correspond to it is described each Candidate feature subset.
As described above, feature pool machine learning model acquisition device 200 itself can complete feature pool machine learning model Training also can obtain trained feature pool machine learning model from outside.
As an example, training characteristics pond according to an exemplary embodiment of the present invention machine learning is described below in conjunction with Fig. 3 The flow chart of the method for model, this method can be performed by the feature selecting system described in Fig. 1, also can be by other systems or dress It puts to perform.
With reference to Fig. 3, in step s 110, historgraphic data recording can be obtained, here, candidate feature that can be as shown in Figure 1 Collection division device 100 obtains historgraphic data recording, outside other devices that also can be not shown in the system by Fig. 1 or the system Other devices in portion obtain historgraphic data recording.As an example, the historgraphic data recording may include to ask on machine learning The mark of topic and at least one attribute information for generating each feature of machine learning sample.
Here, historgraphic data recording is the true record on the Machine Learning Problems for it is expected prediction, may include attribute Information and mark two parts, such historgraphic data recording can be used to form machine learning sample, as the material of machine learning, And exemplary embodiment of the present invention is intended to filter out the comparatively important machine learning sample generated based on attribute information Feature.
Particularly, as an example, can manually, semi- or fully automated mode gather historical data or right The original historical data of acquisition is handled so that treated, and historgraphic data recording has appropriate form or form.As Example can gather historical data in bulk.
Here, the historgraphic data recording that user is manually entered can be received by input unit (for example, work station).In addition, Can historgraphic data recording from data source systems be taken out by full automatic mode, for example, by with software, firmware, hardware or It combines the timer mechanism realized and requested historical data is obtained come systematically request data source and from response.The number It may include one or more databases or other servers according to source.Can be realized via internal network and/or external network it is complete from The dynamic mode for obtaining data, wherein may include to transmit encrypted data by internet.In server, database, network etc. It is configured as in the case of communicating with one another, data acquisition can be carried out automatically in the case of no manual intervention, it should be noted that Certain user still may be present under this mode and input operation.Semiautomatic fashion between manual mode and full-automatic mode it Between.Semiautomatic fashion and full-automatic mode difference lies in by user activated trigger mechanism instead of such as timer mechanism. In this case, specific input by user, the just request of generation extraction data is being received.Data are obtained every time When, it is preferable that it can be by the history data store of capture in the nonvolatile memory.As an example, availability data warehouse comes It is stored in during obtaining the initial data that gathers and treated data.
The historgraphic data recording of above-mentioned acquisition can derive from identical or different data source, that is to say, that every history number Can also be the splicing result of different historgraphic data recordings according to record.For example, credit is opened to bank's application except obtaining client Outside the information data recording (it includes the attribute information fields such as income, educational background, post, Assets) filled in during card, as Example can also obtain other data records of the client in the bank, for example, loan documentation, current transaction data etc., these are obtained The data record taken can together on the client whether be cheat client Sign mosaics be complete historgraphic data recording.This Outside, the data from other privately owned sources or common source can be also obtained, for example, data from metadata provider, deriving from The data of internet (for example, social network sites), data, the data from APP operators, source from mobile operator Data in express company, from data of credit institution etc..
Optionally, the data collected can be deposited by hardware cluster (Hadoop clusters, Spark clusters etc.) Storage and/or processing, for example, storage, classification and other off-line operations.In addition, the data of acquisition can also be carried out at online stream Reason.
As an example, the unstructured datas such as the text of acquisition can be turned by data conversion modules such as text analysis models It is changed to and is easier to the structural data used to be further processed or quote subsequently.Text based data may include Email, document, webpage, figure, spreadsheet, call center's daily record, transaction reporting etc..
Next, in the step s 120, each feature pool machine can be generated based on the attribute information of historgraphic data recording The training sample of learning model.Here, as described above, the sample characteristics of each feature pool machine learning model may correspond to each Candidate feature subset part, wherein, candidate feature subset be from whole candidate features form candidate characteristic set division.
As an example, can original candidates feature set be generated based on the attribute information of historgraphic data recording.For example, it can pass through The attribute information of historgraphic data recording is screened, be grouped or further additional treatments etc. and to obtain corresponding original candidates special Sign.Exemplary embodiment according to the present invention can generate original candidates feature, example according to any appropriate characteristic processing mode Such as, it is contemplated that content, meaning, value continuity, value range, valued space scale, Deletional, importance of attribute information etc. Factor.
In addition, after candidate characteristic set is determined, can be incited somebody to action according to dividing mode according to an exemplary embodiment of the present invention Candidate characteristic set is divided into each candidate feature subset.After each candidate feature subset is determined, it can be accordingly based on and go through History data record generates the training sample of each feature pool machine learning model respectively.
It, can be using the training sample of generation come training characteristics pond machine learning model in step S230.Particularly, may be used It, can base between each feature pool machine learning model according to default machine learning algorithm come training characteristics pond machine learning model In identical or different model algorithm.Exemplary embodiment according to the present invention, can be by concurrently training multiple feature pool machines Device learning model obtains feature pool machine learning model corresponding with each candidate feature subset.
The exemplary training method of feature pool machine learning model is enumerated above, however, it should be understood that the example of the present invention Property embodiment is not limited to above-mentioned example.
Referring back to Fig. 2, after feature pool machine learning model corresponding with each candidate feature subset respectively is obtained, In step S300, by candidate feature importance determining device 300 according to feature pool machine learning model in original test data collection Difference between the effect in conversion test data set determines each candidate feature among corresponding candidate feature subset Importance, wherein, conversion test data set refer to through its importance concentrated to original test data candidate to be determined The data set that the original value of feature replaces with transformed value and obtains.
Here, for each feature pool machine learning model, candidate feature subset may include at least one candidate feature, Correspondingly, prediction effect of the feature pool machine learning model on original test data collection can be obtained.In addition, can by according to The value of each candidate feature on secondary conversion original test data collection is converting to obtain the feature pool machine learning model Prediction effect in test data set.Difference between above two prediction effect is that can be used to weigh the weight of each candidate feature The property wanted.
As an example it is supposed that some corresponding candidate feature subset of feature pool machine learning model includes candidate feature { f1, f2,…,fn, prediction effect of this feature pond machine learning model on original test data collection is represented by AUCall, at this In example, in order to determine { f1,f2,…,fnAmong any candidate feature fiImportance (wherein, 1≤i≤n), can be corresponding Ground is handled to obtain for target signature f to original test data collectioniConversion test data set, for example, by original survey Try the feature f in each test sample of data setiOriginal value replace with other values, for example, zero, random number or Person is by feature fiValue upset between each test sample order after obtain value.Correspondingly, features described above can be obtained Test effect AUC of the pond machine learning model in conversion test data seti
Effect of the feature pool machine learning model on original test data collection and conversion test data set is being obtained respectively It, can be by difference (that is, the AUC between two effects after fruitall-AUCi) as measurement candidate feature fiImportance reference.
Next, in step S400, each candidate feature subset is directed to by target signature selection device 400, according to The importance of its each candidate feature therefrom filters out at least one candidate feature of high importance using as machine learning sample This target signature.
Here, target signature selection device 400 can be directed to each candidate feature subset automatically or according to user's instruction, Therefrom filter out relatively important target signature.As an example, can feature importance be shown to user by patterned form Definitive result.For example, the importance of each candidate feature subset can be shown as figure or form, so that user therefrom selects Target signature.For this purpose, system shown in FIG. 1 may also include input unit (not shown), for sensing user for selection target Feature etc. and carry out input operation.
As can be seen that exemplary embodiment according to the present invention, can be directed to each candidate feature subset, utilize corresponding spy Pond machine learning model is levied, effectively filters out relatively important candidate feature.
As an example, the process of above-mentioned screening target signature can be applied to multiple rounds constantly sequentially screens target spy The situation of sign, for example, constantly filtering out the situation of target signature from original candidate feature concentration.In another example it iteratively gives birth to Screen into assemblage characteristic and therefrom the situation of important assemblage characteristic.In another example the further iteration from the target signature filtered out Ground filters out even more important target signature.It will be understood by those skilled in the art that the unlimited fixture of exemplary embodiment of the present invention The iterative manner of body.
Fig. 4 show another exemplary embodiment according to the present invention for selecting the method for the feature of machine learning sample Flow chart.This method can be as shown in Figure 1 feature selecting system perform, also can be completely by computer program with software mode It realizes, method shown in Fig. 4 can be also performed by the computing device of particular configuration.
With reference to Fig. 4, in the step s 100, current candidate characteristic set can be divided into multiple candidate feature subsets.Here, Current candidate characteristic set can be for the update result after often wheel Feature Selection.
Next, in step s 200, each candidate feature subset can be directed to, obtains corresponding feature pool machine learning Model, wherein, the sample characteristics of feature pool machine learning model correspond to each described candidate feature subset.
Then, in step S300, can be tested according to feature pool machine learning model in original test data collection and conversion The difference between effect on data set determines the importance of each candidate feature among corresponding candidate feature subset, In, conversion test data set refers to take by its importance for concentrating original test data the original of candidate feature to be determined The data set that value replaces with transformed value and obtains.
In step S400, can be directed to each candidate feature subset, according to its each candidate feature importance therefrom At least one candidate feature of high importance is filtered out using the target signature as machine learning sample.
In step S500, it may be determined whether needs continue to screen target signature.It here, can be according to default target signature Termination condition is screened to determine whether also needing to continue the screening of target signature.As an example, target signature screening terminates Condition can filter out enough target signatures, alternatively, target signature screening termination condition can filter out foot Enough important target signatures.
As an example it is supposed that needing to continue to screen target signature because the target signature filtered out is insufficient to, can perform Step S550, wherein, target signature can be removed from candidate characteristic set to update candidate characteristic set, so as to then be based on newer time Feature set is selected to perform Feature Selection again, the selection until completing all target signatures.Optionally, gone from candidate characteristic set New candidate feature can be further also added in update candidate characteristic set while except target signature.For example, the new candidate Feature can be by carrying out combinations of features between candidate feature and newly-generated assemblage characteristic.It here, can be according on group The search strategy of feature is closed, carrys out assemblage characteristic in each round time generation machine learning sample in iterative fashion using as new Candidate feature.
As another example, it is assumed that need to continue further to be screened from the target signature that epicycle filters out, can hold Row step S550, wherein, it can be using the target signature filtered out as newer candidate characteristic set, so as to then be based on newer time Feature set is selected to perform Feature Selection again, until filtering out target signature important enough.
After step S550, as an example, step S100 can be again returned to divide updated candidate feature subset. As an example, after target signature is concentrated removal from former candidate feature, can correspondingly be concentrated in each candidate feature Delete target feature;Alternatively, entirely different mode, which can be used, re-starts updated candidate characteristic set division to obtain Newer candidate feature subset, for example, included new combination candidate feature is concentrated for updated candidate feature, it can Ready-portioned each candidate feature is concentrated before only the new combination candidate feature is assigned to, and can also repartition update Candidate feature subset afterwards.
In step s 200, newer candidate feature subset is may correspond to obtain new feature pool machine learning model. Next, step S300 and step S400 can be continued to execute, to filter out the target signature of current round.And so on, until Meet default target signature screening termination condition, be then determined as no longer needing to continue Screening Treatment in step S500, The method terminates, and the selection result further can be utilized or handled subsequently.
Device illustrated in fig. 1 can be individually configured to perform appointing for the software of specific function, hardware, firmware or above-mentioned item Meaning combination.For example, these devices or unit may correspond to dedicated integrated circuit, pure software code is can also correspond to, also It may correspond to the module that software is combined with hardware.In addition, the one or more functions that these devices are realized also can be by physics Component in entity device (for example, processor, client or server etc.) is sought unity of action.
It is described above by reference to Fig. 1 to Fig. 4 according to an exemplary embodiment of the present invention for selecting machine learning sample The method and system of feature.It is to be understood that the above method can be realized by the program being recorded in computer-readable media, for example, Exemplary embodiment according to the present invention, it is possible to provide it is a kind of for select machine learning sample feature computer-readable Jie Matter, wherein, record is useful for performing the computer program of following methods step on the computer-readable medium:(A) will wait Feature set is selected to be divided into multiple candidate feature subsets;(B) for each candidate feature subset, corresponding feature pool machine is obtained Learning model, wherein, the feature pool machine learning model corresponds to each described candidate feature subset;(C) according to feature Difference between effect of the pond machine learning model on original test data collection and conversion test data set is corresponding to determine The importance of each candidate feature among candidate feature subset, wherein, conversion test data set refers to by original test The data set that the original value of its importance in data set candidate feature to be determined replaces with transformed value and obtains;And (D) for each candidate feature subset, according to the importance of its each candidate feature therefrom filter out it is of high importance extremely Lack a candidate feature using the target signature as machine learning sample.
Computer program in above computer readable medium can be in client, host, agent apparatus, server etc. Run in the environment disposed in computer equipment, it should be noted that the computer program can be additionally used in perform except above-mentioned steps with Outer additional step or performed when performing above-mentioned steps more specifically handles, these additional steps and is further processed Content is described referring to figs. 1 to Fig. 4, here in order to avoid repetition will be repeated no longer.
It should be noted that feature selecting system according to an exemplary embodiment of the present invention can be completely dependent on the operation of computer program To realize corresponding function, i.e. each device is corresponding with each step to the function structure of computer program so that whole system It is called by special software package (for example, lib storehouses), to realize corresponding function.
On the other hand, each device shown in FIG. 1 can also by hardware, software, firmware, middleware, microcode or its It is combined to realize.When being realized with software, firmware, middleware or microcode, for performing the program code of corresponding operating Or code segment can be stored in the computer-readable medium of such as storage medium so that processor can be by reading and running Corresponding program code or code segment perform corresponding operation.
For example, exemplary embodiment of the present invention is also implemented as computing device, which includes storage unit And processor, set of computer-executable instructions conjunction is stored in storage unit, when the set of computer-executable instructions is closed by institute When stating processor execution, feature selection approach is performed.
Particularly, the computing device can be deployed in server or client, can also be deployed in distributed network On node apparatus in network environment.In addition, the computing device can be PC computers, board device, personal digital assistant, intelligence Can mobile phone, web applications or other be able to carry out the device of above-metioned instruction set.
Here, the computing device is not necessarily single computing device, can also be it is any can be alone or in combination Perform the device of above-metioned instruction (or instruction set) or the aggregate of circuit.Computing device can also be integrated control system or system A part for manager can be configured as with Local or Remote (for example, via wireless transmission) with the portable of interface inter-link Formula electronic device.
In the computing device, processor may include central processing unit (CPU), graphics processor (GPU), may be programmed and patrol Collect device, dedicated processor systems, microcontroller or microprocessor.As an example, not a limit, processor may also include simulation Processor, digital processing unit, microprocessor, polycaryon processor, processor array, network processing unit etc..
Some operations in feature selection approach according to an exemplary embodiment of the present invention can be realized by software mode, Some operations can be realized by hardware mode, in addition, can also realize these operations by way of software and hardware combining.
Processor can run the instruction being stored in one of storage unit or code, wherein, the storage unit can be with Store data.Instruction and data can be also sent and received via Network Interface Unit and by network, wherein, the network connects Any of transport protocol can be used in mouth device.
Storage unit can be integral to the processor and be integrated, for example, RAM or flash memory are arranged in integrated circuit microprocessor etc. Within.In addition, storage unit may include independent device, such as, external dish driving, storage array or any Database Systems can Other storage devices used.Storage unit and processor can operationally be coupled or can for example by I/O ports, Network connection etc. communicates so that processor can read the file being stored in storage unit.
In addition, the computing device may also include video display (such as, liquid crystal display) and user interaction interface is (all Such as, keyboard, mouse, touch input device etc.).The all components of computing device can be connected to each other via bus and/or network.
Operation involved by feature selection approach according to an exemplary embodiment of the present invention can be described as it is various interconnection or The functional block of coupling or function diagram.However, these functional blocks or function diagram can be equably integrated into single logic dress It puts or is operated according to non-exact border.
For example, the as described above, computing device for being used to select the feature of machine learning sample according to embodiments of the present invention It may include storage unit and processor, wherein, set of computer-executable instructions conjunction is stored in storage unit, when the computer When executable instruction set is performed by the processor, following step is performed:(A) candidate characteristic set is divided into multiple candidates spies Levy subset;(B) for each candidate feature subset, corresponding feature pool machine learning model is obtained, wherein, the feature pool Machine learning model corresponds to each described candidate feature subset;(C) according to feature pool machine learning model in original test The difference between effect on data set and conversion test data set determines each time among corresponding candidate feature subset The importance of feature is selected, wherein, conversion test data set refers to be determined by its importance concentrated to original test data Candidate feature the original value data set that replaces with transformed value and obtain;And (D) is directed to each candidate feature subset, Importance according to its each candidate feature therefrom filters out at least one candidate feature of high importance using as engineering Practise the target signature of sample.
The foregoing describe each exemplary embodiments of the present invention, it should be appreciated that foregoing description is only exemplary, and exhaustive Property, the invention is not restricted to disclosed each exemplary embodiments.Without departing from the scope and spirit of the invention, originally Many modifications and changes will be apparent from for field technology personnel.Therefore, protection scope of the present invention should be with right It is required that scope subject to.

Claims (10)

1. it is a kind of for selecting the method for the feature of machine learning sample, including:
(A) candidate characteristic set is divided into multiple candidate feature subsets;
(B) for each candidate feature subset, corresponding feature pool machine learning model is obtained, wherein, the feature pool machine Device learning model corresponds to each described candidate feature subset;
(C) according to feature pool machine learning model original test data collection and conversion test data set on effect between difference It is different to determine the importance of each candidate feature among corresponding candidate feature subset, wherein, conversion test data set refers to Transformed value is replaced with by the original value of its importance concentrated to original test data candidate feature to be determined to obtain Data set;And
(D) for each candidate feature subset, it is higher therefrom to filter out importance according to the importance of its each candidate feature At least one candidate feature using the target signature as machine learning sample.
2. the method for claim 1, wherein the transformed value includes at least one among following item:It is zero, random It is obtained after numerical value, the original value upset order by its importance for concentrating original test data candidate feature to be determined Value.
3. the method as described in claim 1 further includes:
(E) target signature is removed from candidate characteristic set to update candidate characteristic set;
Also, after step (E), the method is performed since step (A) based on newer candidate characteristic set again, directly To the selection for completing all target signatures.
4. method as claimed in claim 3, wherein, in step (E), while target signature is removed from candidate characteristic set New candidate feature is additionally added to update candidate characteristic set.
5. method as claimed in claim 4, wherein, the new candidate feature is by carrying out feature between candidate feature It combines and newly-generated assemblage characteristic.
6. the method for claim 1, wherein in step (B), by concurrently training multiple feature pool machine learning Model obtains feature pool machine learning model corresponding with each candidate feature subset.
7. the method as described in claim 1 further includes:
(E) using the target signature filtered out as newer candidate characteristic set;
Also, after step (E), the method is performed since step (A) based on newer candidate characteristic set again, directly Termination condition is screened to default target signature is met.
8. it is a kind of for selecting the system of the feature of machine learning sample, including:
Candidate feature subset divides device, for candidate characteristic set to be divided into multiple candidate feature subsets;
Feature pool machine learning model acquisition device for being directed to each candidate feature subset, obtains corresponding feature pool machine Device learning model, wherein, the feature pool machine learning model corresponds to each described candidate feature subset;
Candidate feature importance determining device, for being surveyed according to feature pool machine learning model in original test data collection and conversion The difference between the effect on data set is tried to determine the importance of each candidate feature among corresponding candidate feature subset, Wherein, conversion test data set refers to through the original of its importance concentrated to original test data candidate feature to be determined The data set that value replaces with transformed value and obtains;And
Target signature selection device, for being directed to each candidate feature subset, according to its each candidate feature importance from In filter out at least one candidate feature of high importance using the target signature as machine learning sample.
9. it is a kind of for selecting the computer-readable medium of the feature of machine learning sample, wherein, in computer-readable Jie Record is useful for performing in matter described is used for selecting the method for feature of machine learning sample claim 1 to 7 is any Computer program.
10. it is a kind of for selecting the computing device of the feature of machine learning sample, including storage unit and processor, wherein, it deposits Set of computer-executable instructions conjunction is stored in storage component, closes when the set of computer-executable instructions and is performed by the processor When, perform the method for being used to select the feature of machine learning sample as described in claim 1 to 7 is any.
CN201711383339.4A 2017-12-20 2017-12-20 For selecting the method and system of the feature of machine learning sample Pending CN108108820A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202310907091.6A CN116882520A (en) 2017-12-20 2017-12-20 Prediction method and system for predetermined prediction problem
CN201711383339.4A CN108108820A (en) 2017-12-20 2017-12-20 For selecting the method and system of the feature of machine learning sample

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711383339.4A CN108108820A (en) 2017-12-20 2017-12-20 For selecting the method and system of the feature of machine learning sample

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202310907091.6A Division CN116882520A (en) 2017-12-20 2017-12-20 Prediction method and system for predetermined prediction problem

Publications (1)

Publication Number Publication Date
CN108108820A true CN108108820A (en) 2018-06-01

Family

ID=62211434

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202310907091.6A Pending CN116882520A (en) 2017-12-20 2017-12-20 Prediction method and system for predetermined prediction problem
CN201711383339.4A Pending CN108108820A (en) 2017-12-20 2017-12-20 For selecting the method and system of the feature of machine learning sample

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202310907091.6A Pending CN116882520A (en) 2017-12-20 2017-12-20 Prediction method and system for predetermined prediction problem

Country Status (1)

Country Link
CN (2) CN116882520A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110689133A (en) * 2018-06-20 2020-01-14 深信服科技股份有限公司 Method, system and related device for training machine learning engine
CN111079939A (en) * 2019-11-28 2020-04-28 支付宝(杭州)信息技术有限公司 Machine learning model feature screening method and device based on data privacy protection
CN111310930A (en) * 2018-12-11 2020-06-19 富士通株式会社 Optimization device, optimization method, and non-transitory computer-readable storage medium
CN112348043A (en) * 2019-08-09 2021-02-09 杭州海康机器人技术有限公司 Feature screening method and device in machine learning
CN113191824A (en) * 2021-05-24 2021-07-30 北京大米科技有限公司 Data processing method and device, electronic equipment and readable storage medium
CN114268625A (en) * 2020-09-14 2022-04-01 腾讯科技(深圳)有限公司 Feature selection method, device, equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169573A (en) * 2017-05-05 2017-09-15 第四范式(北京)技术有限公司 Using composite machine learning model come the method and system of perform prediction
CN107316082A (en) * 2017-06-15 2017-11-03 第四范式(北京)技术有限公司 For the method and system for the feature importance for determining machine learning sample

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169573A (en) * 2017-05-05 2017-09-15 第四范式(北京)技术有限公司 Using composite machine learning model come the method and system of perform prediction
CN107316082A (en) * 2017-06-15 2017-11-03 第四范式(北京)技术有限公司 For the method and system for the feature importance for determining machine learning sample

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110689133A (en) * 2018-06-20 2020-01-14 深信服科技股份有限公司 Method, system and related device for training machine learning engine
CN110689133B (en) * 2018-06-20 2023-09-05 深信服科技股份有限公司 Method, system and related device for training machine learning engine
CN111310930A (en) * 2018-12-11 2020-06-19 富士通株式会社 Optimization device, optimization method, and non-transitory computer-readable storage medium
CN111310930B (en) * 2018-12-11 2023-07-21 富士通株式会社 Optimizing apparatus, optimizing method, and non-transitory computer-readable storage medium
CN112348043A (en) * 2019-08-09 2021-02-09 杭州海康机器人技术有限公司 Feature screening method and device in machine learning
CN112348043B (en) * 2019-08-09 2024-04-02 杭州海康机器人股份有限公司 Feature screening method and device in machine learning
CN111079939A (en) * 2019-11-28 2020-04-28 支付宝(杭州)信息技术有限公司 Machine learning model feature screening method and device based on data privacy protection
CN114268625A (en) * 2020-09-14 2022-04-01 腾讯科技(深圳)有限公司 Feature selection method, device, equipment and storage medium
CN114268625B (en) * 2020-09-14 2024-01-02 腾讯科技(深圳)有限公司 Feature selection method, device, equipment and storage medium
CN113191824A (en) * 2021-05-24 2021-07-30 北京大米科技有限公司 Data processing method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN116882520A (en) 2023-10-13

Similar Documents

Publication Publication Date Title
CN108090570A (en) For selecting the method and system of the feature of machine learning sample
CN108021984A (en) Determine the method and system of the feature importance of machine learning sample
CN108108820A (en) For selecting the method and system of the feature of machine learning sample
CN107704871A (en) Generate the method and system of the assemblage characteristic of machine learning sample
CN106779088B (en) Execute the method and system of machine learning process
JP6541868B2 (en) Condition-Satisfied Likelihood Prediction Using Recursive Neural Networks
CN107392319A (en) Generate the method and system of the assemblage characteristic of machine learning sample
CN105229633B (en) It is uploaded for realizing data, system, method and apparatus disclosed in processing and predicted query API
CN107729915A (en) For the method and system for the key character for determining machine learning sample
US10083263B2 (en) Automatic modeling farmer
US11663839B1 (en) Polarity semantics engine analytics platform
CN107679549A (en) Generate the method and system of the assemblage characteristic of machine learning sample
CN107871166A (en) For the characteristic processing method and characteristics processing system of machine learning
US20200159690A1 (en) Applying scoring systems using an auto-machine learning classification approach
CN107909087A (en) Generate the method and system of the assemblage characteristic of machine learning sample
CN107316082A (en) For the method and system for the feature importance for determining machine learning sample
CN109242040A (en) Automatically generate the method and system of assemblage characteristic
CN107578140A (en) Guide analysis system and method
CN107169574A (en) Using nested machine learning model come the method and system of perform prediction
US11461343B1 (en) Prescriptive analytics platform and polarity analysis engine
CN107273979A (en) The method and system of machine learning prediction are performed based on service class
CN113609193A (en) Method and device for training prediction model for predicting customer transaction behavior
CN115345530A (en) Market address recommendation method, device and equipment and computer readable storage medium
US11295325B2 (en) Benefit surrender prediction
CN113569162A (en) Data processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180601