CN113297337A - Feature dimension selection method, device, medium and electronic equipment - Google Patents

Feature dimension selection method, device, medium and electronic equipment Download PDF

Info

Publication number
CN113297337A
CN113297337A CN202110847552.6A CN202110847552A CN113297337A CN 113297337 A CN113297337 A CN 113297337A CN 202110847552 A CN202110847552 A CN 202110847552A CN 113297337 A CN113297337 A CN 113297337A
Authority
CN
China
Prior art keywords
feature
dimension
processed
new
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110847552.6A
Other languages
Chinese (zh)
Other versions
CN113297337B (en
Inventor
郑玉玲
于龙斌
王凌云
王梓凝
蔺志峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengfang Financial Technology Co ltd
Original Assignee
Chengfang Financial Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengfang Financial Technology Co ltd filed Critical Chengfang Financial Technology Co ltd
Priority to CN202110847552.6A priority Critical patent/CN113297337B/en
Publication of CN113297337A publication Critical patent/CN113297337A/en
Application granted granted Critical
Publication of CN113297337B publication Critical patent/CN113297337B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Data Mining & Analysis (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Finance (AREA)
  • Technology Law (AREA)
  • General Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a feature dimension selection method, a feature dimension selection device, a feature dimension selection medium and electronic equipment. The method comprises the following steps: selecting a target feature category to be processed from the candidate feature categories; wherein each candidate feature class comprises at least two feature dimensions; if the effective characteristic dimension group exists, splicing each characteristic dimension group belonging to the target characteristic category with the effective characteristic dimension group respectively to obtain a new characteristic dimension group to be processed; wherein the set of feature dimensions is a combination of the feature dimensions; and evaluating the new feature dimension group to be processed, determining an effective new feature dimension group according to an evaluation result, and returning to execute the selection operation of the target feature class until all candidate feature classes are processed. By executing the method and the device, the selection efficiency of the characteristic dimension can be improved, and the performance of the anti-money laundering model is further improved.

Description

Feature dimension selection method, device, medium and electronic equipment
Technical Field
The embodiment of the application relates to the technical field of computer application, in particular to a method, a device, a medium and electronic equipment for selecting a feature dimension.
Background
The money laundering behavior has serious social harmfulness, which not only damages the safety of the financial system and the credit of the financial institution, but also has great damage to the normal economic order and social stability of China. Therefore, the establishment and implementation of anti-money laundering have great significance for the healthy and orderly development of the future economic society of China.
The development of big data and artificial intelligence technology provides a new technical means for anti-money laundering analysis, so that the anti-money laundering is possible to change from post analysis to pre-response. The money laundering behavior prediction accuracy of the anti-money laundering model based on deep learning or machine learning is largely influenced by the number of features and the kind of features. Because the anti-money laundering analysis process involves a large amount of account data, transfer data, and transaction data, which includes a large number of features. Selecting effective features from a large number of features is important for improving the performance of the anti-money laundering model.
Disclosure of Invention
The embodiment of the application provides a feature dimension selection method, a feature dimension selection device, a feature dimension selection medium and electronic equipment, which can effectively reduce a feature search space and improve feature dimension selection efficiency, so that the aim of improving anti-money laundering model performance is fulfilled.
In a first aspect, an embodiment of the present application provides a method for selecting a feature dimension, where the method includes:
selecting a target feature category to be processed from the candidate feature categories; wherein each candidate feature class comprises at least two feature dimensions;
if the effective characteristic dimension group exists, splicing each characteristic dimension group belonging to the target characteristic category with the effective characteristic dimension group respectively to obtain a new characteristic dimension group to be processed; wherein the set of feature dimensions is a combination of the feature dimensions;
evaluating the new feature dimension group to be processed, determining an effective new feature dimension group according to an evaluation result, and returning to execute the selection operation of the target feature class until all candidate feature classes are processed;
and if the effective characteristic dimension group does not exist, taking the characteristic dimension group belonging to the target characteristic category as a new characteristic dimension group to be processed, executing the evaluation operation of the new characteristic dimension group, and determining the effective characteristic dimension according to the evaluation result.
In a second aspect, an embodiment of the present application provides an apparatus for selecting a feature dimension, where the apparatus includes:
the target feature class selection module is used for selecting a target feature class to be processed from the candidate feature classes; wherein each candidate feature class comprises at least two feature dimensions;
the characteristic dimension group splicing module is used for splicing each characteristic dimension group belonging to the target characteristic category with the effective characteristic dimension group to obtain a new characteristic dimension group to be processed if the effective characteristic dimension group exists; wherein the set of feature dimensions is a combination of the feature dimensions;
the to-be-processed new feature dimension group evaluation module is used for evaluating the to-be-processed new feature dimension group, determining an effective new feature dimension group according to an evaluation result and returning to execute the selection operation of the target feature class until all candidate feature classes are processed;
and the to-be-processed new feature dimension group determining module is used for taking the feature dimension group belonging to the target feature category as the to-be-processed new feature dimension group if the effective feature dimension group does not exist, executing the evaluation operation of the new feature dimension group, and determining the effective feature dimension according to the evaluation result.
In a third aspect, an embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements a method for selecting a feature dimension according to an embodiment of the present application.
In a fourth aspect, an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the method for selecting a feature dimension according to the embodiment of the present application when executing the computer program.
According to the technical scheme provided by the embodiment of the application, the target feature category to be processed is selected from the candidate feature categories; if the effective characteristic dimension group exists, splicing each characteristic dimension group belonging to the target characteristic category with the effective characteristic dimension group respectively to obtain a new characteristic dimension group to be processed; and evaluating the new feature dimension group to be processed, determining an effective new feature dimension group according to an evaluation result, and returning to execute the selection operation of the target feature class until all candidate feature classes are processed. According to the method and the device, the characteristic dimensions are classified, and effective characteristic dimension combinations are determined according to the categories of the characteristic dimensions. The search space of the characteristic dimension is reduced, and the selection efficiency of the characteristic dimension is improved.
Drawings
Fig. 1 is a flowchart of a method for selecting a feature dimension according to an embodiment of the present application;
fig. 2 is a flowchart of another feature dimension selection method provided in the second embodiment of the present application;
fig. 3 is a flowchart of a method for selecting a feature dimension according to a third embodiment of the present application;
fig. 4 is a schematic structural diagram of a feature dimension selection device provided in the fourth embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to a sixth embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some of the structures related to the present application are shown in the drawings, not all of the structures.
Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the steps as a sequential process, many of the steps can be performed in parallel, concurrently or simultaneously. In addition, the order of the steps may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.
Example one
Fig. 1 is a flowchart of a feature dimension selection method provided in an embodiment of the present application, where the embodiment is applicable to a case where feature dimensions are selected for feature data when predicting whether a money laundering behavior exists in a user by using an anti-money laundering behavior prediction model. The method can be executed by the selection device of the feature dimension provided by the embodiment of the application, and the device can be realized by software and/or hardware and can be integrated in the electronic equipment running the system.
As shown in fig. 1, the method for selecting the feature dimension includes:
s110, selecting a target feature category to be processed from the candidate feature categories; wherein each candidate feature class comprises at least two feature dimensions.
The method comprises the steps of obtaining feature data to be processed, determining feature dimensions of the feature data, classifying the feature dimensions according to feature dimension classification rules, and taking obtained feature dimension classes as candidate classes. The characteristic dimensions are classified according to the classification rules of the characteristic dimensions, so that the characteristic dimensions complementation among different candidate categories can be ensured. The classification rules of the feature dimensions are determined by technicians according to actual conditions and are not limited herein.
Optionally, the feature dimensions are classified according to the processing degree of the feature dimensions, so as to obtain candidate categories. In an alternative embodiment, the candidate categories include: original feature class, fused feature class, and advanced feature class; wherein the raw feature class comprises raw feature dimensions; the fusion feature class comprises fusion feature dimensions obtained by performing fusion processing on the original feature dimensions; the high-level feature class comprises high-level feature dimensions obtained by further fusing the fused feature dimensions.
The fused feature dimension is a higher-level, higher-abstraction feature dimension relative to the original feature dimension. The high-level feature dimension is a higher-level, higher-abstraction feature dimension relative to the fused feature dimension. Each candidate feature class includes at least two feature dimensions. Exemplarily, in an anti-money laundering application scenario, the original feature class includes feature dimensions such as account information, transaction time, transaction objects, and the like; the fusion characteristic class comprises characteristic dimensions such as transfer total, transfer average and transfer variance; the high level feature classes include: and whether characteristic dimensions such as loops, account attribution communities and information related to other accounts exist in the transfer.
In order to ensure the comprehensiveness of the feature information, the feature dimensions of the original feature class, the fused feature class and the advanced feature class are used as the input of the business model, so that the feature dimension search space faced by the business model is huge. Because the resources supporting the operation of the business model are limited, the business model is difficult to eliminate redundant feature dimensions from a large number of feature dimensions within a limited time, so that effective feature dimensions are selected, and the feature dimension selection efficiency is low.
However, if the feature dimensions of the original feature classes are discarded and only the fused feature class or the advanced feature class is used as the input of the business model, the performance of the business model is limited. This is because the original feature class, the fused feature class, and the advanced feature class are complementary to each other in feature information included in the three types of feature dimensions. The lack of a certain type of feature dimension can result in the lack of feature information, which affects the performance of the business model.
Therefore, for each candidate category, the feature dimension effective for improving the performance of the business model is selected from the candidate categories. The characteristic dimension search space is reduced, and the efficiency of characteristic dimension selection is improved.
And S120, judging whether an effective characteristic dimension group exists or not. If the current state exists, step S131 and the following steps are executed, and if the current state does not exist, step S132 is skipped.
S131, splicing each feature dimension group belonging to the target feature category with the effective feature dimension group respectively to obtain a new feature dimension group to be processed; wherein the set of feature dimensions is a combination of the feature dimensions.
And if the effective characteristic dimension group exists, splicing each characteristic dimension group belonging to the target characteristic category with the effective characteristic dimension group respectively to obtain a new characteristic dimension group to be processed. The effective feature dimension group is a feature dimension group effective for improving the performance of the business model. The set of feature dimensions is a combination of feature dimensions.
The effective characteristic dimension group is a characteristic dimension group which is obtained by carrying out characteristic dimension combination effectiveness evaluation on the characteristic dimensions included by the candidate types and can improve the performance of the business model according to an evaluation result. Exemplary, among the candidate types are: in the case of the original feature class, the fused feature class, and the advanced feature class, the valid feature group may be obtained by performing feature dimension combination validity evaluation on feature dimensions included in the original feature class.
And if the effective characteristic dimension groups are multiple, splicing each characteristic dimension group with each effective characteristic dimension group respectively.
And S140, evaluating the new feature dimension group to be processed, and determining an effective new feature dimension group according to an evaluation result. The process continues to step S150.
And carrying out feature dimension combination effectiveness evaluation on the new feature dimension group to be processed, and screening out a feature dimension group effective for improving the performance of the service model from the new feature dimension group to serve as an effective new feature dimension group.
And then, returning to execute the selection operation of the target feature type until the processing of each candidate category is completed.
S132, taking the feature dimension group belonging to the target feature category as a new feature dimension group to be processed, executing evaluation operation of the new feature dimension group, and determining effective feature dimensions according to an evaluation result. The process continues to step S150.
And if the effective characteristic dimension group does not exist, taking the target characteristic category as the first characteristic category in the candidate categories for characteristic dimension effectiveness evaluation. And performing feature dimension effectiveness evaluation on the feature dimension group belonging to the target feature category.
Along with the gradual deepening of the processing degree of the original feature dimension, the obtained feature dimension is more abstract, and the performance of the business model is ensured in order to accord with the general feature learning process of the business model. In an optional embodiment, the target feature classes to be processed are sequentially selected from the candidate feature classes according to the front-back order of the original feature class, the fused feature class and the advanced feature class.
In a specific embodiment, in the case that the candidate categories only include three types, namely, the original feature class, the fused feature class, and the advanced feature class, the feature dimension selection process is as follows:
and taking the original feature class as a first feature class for feature dimension evaluation, and carrying out feature dimension combination effectiveness evaluation on the feature dimension group of the original feature class to obtain an effective feature dimension group. And then, selecting the fusion feature class as a target feature class, and splicing each feature dimension group belonging to the fusion feature class with the previously obtained effective feature dimension group respectively to obtain a new feature dimension group to be processed due to the existence of the effective feature dimension group. And then, carrying out feature dimension combination effectiveness evaluation on the new feature dimension group to be processed to obtain an effective new feature dimension group. And then, selecting the high-grade feature class as a target feature class, splicing each feature dimension group belonging to the high-grade feature class with an effective new feature dimension group respectively to obtain a feature dimension group splicing result, and then performing feature dimension combination effectiveness evaluation on the feature dimension group splicing result to obtain a final feature dimension group evaluation result.
The feature dimension selection method can reduce the feature dimension search space and improve the feature dimension selection efficiency. Continuing with the above example, if the original feature class, the fused feature class and the advanced feature class respectively include𝑁、𝐾And𝑀a feature dimension, then𝑁+𝐾+𝑀Feature dimension, construct𝑁+𝐾+𝑀The feature vectors of the dimensions are used for representing the selection of each feature dimension, and the feature vectors share𝑁+𝐾+𝑀Each element has two values,respectively 0 and 1. A 1 indicates that the feature dimension is retained, and a 0 indicates that the feature dimension is not retained. Then𝑁+𝐾+𝑀Existence in a search space composed of feature dimensions
Figure DEST_PATH_IMAGE001
In the possible feature dimension combinations of𝑁=𝐾=𝑀In the case of =10, a combination of characteristic dimensions exists
Figure 466392DEST_PATH_IMAGE002
A possible scenario. Because the available resources are limited, it is difficult for a business model to find a valid set of feature dimensions in a limited amount of time.
By using the feature dimension selection method provided by the application, the search space corresponding to each candidate type has the following size: the original feature class corresponds to a search space of size
Figure DEST_PATH_IMAGE003
(ii) a If the number of the selected effective feature dimension groups is 10, the size of the search space corresponding to the fusion feature is
Figure 947927DEST_PATH_IMAGE004
(ii) a If the number of the selected effective new feature dimension groups is still 10, the size of the search space corresponding to the high-level feature is
Figure 749661DEST_PATH_IMAGE004
The total search space size is
Figure DEST_PATH_IMAGE005
. The search space is greatly reduced, and the selection efficiency of the feature dimension is improved.
In an optional embodiment, the business model is an anti-money laundering behavior prediction model, and accordingly, after the processing of each candidate feature class is completed, the method further includes: and predicting whether the money laundering behavior of the user exists or not according to the anti-money laundering behavior prediction model and the effective characteristic dimension group.
The anti-money laundering behavior prediction model is used for predicting whether the money laundering behavior exists in the user, and the optional anti-money laundering behavior prediction model is a machine learning model or a deep learning model. Specifically, the effective characteristic dimension group corresponding characteristic data is input into the anti-money laundering behavior prediction model, and the anti-money laundering behavior prediction model outputs the prediction probability of the money laundering behavior of the user.
And S150, judging whether all the candidate feature types are processed, if so, returning to execute the step S110, and if not, finishing.
According to the technical scheme provided by the embodiment of the application, the target feature category to be processed is selected from the candidate feature categories; if the effective characteristic dimension group exists, splicing each characteristic dimension group belonging to the target characteristic category with the effective characteristic dimension group respectively to obtain a new characteristic dimension group to be processed; and evaluating the new feature dimension group to be processed, determining an effective new feature dimension group according to an evaluation result, and returning to execute the selection operation of the target feature class until all candidate feature classes are processed. According to the method and the device, the characteristic dimensions are classified, and effective characteristic dimension combinations are determined according to the categories of the characteristic dimensions. The search space of the characteristic dimension is reduced, and the selection efficiency of the characteristic dimension is improved.
Example two
Fig. 2 is a flowchart of another feature dimension selection method provided in the second embodiment of the present application. The present embodiment is further optimized on the basis of the above-described embodiments. Specifically, after selecting a target feature class to be processed from the candidate feature classes, the method further includes: determining a characteristic dimension belonging to the target category in the characteristic dimensions according to the target category, and taking the characteristic dimension as a target characteristic dimension; and generating at least two characteristic dimension groups belonging to the target characteristic category according to the target characteristic dimension.
As shown in fig. 2, the method for selecting the feature dimension includes:
s210, selecting a target feature category to be processed from the candidate feature categories; wherein each candidate feature class comprises at least two feature dimensions.
S220, according to the target category, determining a characteristic dimension belonging to the target category in the characteristic dimensions as a target characteristic dimension.
Each candidate feature category comprises at least two feature dimensions, and after the target category is selected, the feature dimension belonging to the target category is selected from all the feature dimensions included in the candidate category to serve as the target feature dimension.
And S230, generating at least two characteristic dimension groups belonging to the target characteristic category according to the target characteristic dimension.
Specifically, each target feature dimension may be randomly combined to generate at least two feature dimension groups of the target feature class. All possible combinations between the target feature dimensions may also be exhausted, ensuring that all sets of feature dimensions are made possible to select.
And S240, judging whether an effective characteristic dimension group exists or not. If the current value is not present, the step S251 and the following steps are executed, and if the current value is not present, the step S252 is skipped.
S251, respectively splicing each feature dimension group belonging to the target feature category with the effective feature dimension group to obtain a new feature dimension group to be processed; wherein the set of feature dimensions is a combination of the feature dimensions.
And S260, evaluating the new feature dimension group to be processed, and determining an effective new feature dimension group according to an evaluation result. The process continues to step S270.
And S252, taking the feature dimension group belonging to the target feature category as a new feature dimension group to be processed, executing evaluation operation of the new feature dimension group, and determining effective feature dimensions according to an evaluation result. The process continues to step S270.
In order to further improve the selection efficiency of the feature dimension, the search space is further compressed. In an optional embodiment, before evaluating the new feature dimension group to be processed and determining a valid new feature dimension group according to an evaluation result, the method further includes: and selecting a set number of feature dimension groups from a search space formed by the new feature dimension groups to be processed by utilizing a feature combination selection algorithm, and updating the new feature dimension groups to be processed.
The feature combination selection algorithm is used for automatically selecting feature dimension combinations in a search space. Wherein the search space is composed of new feature dimensions to be processed. The set number of feature dimension groups are selected from the full search space consisting of the new feature dimensions to be processed through a feature combination selection algorithm, and further compression of the search space can be achieved.
The feature combination selection algorithm may be grid search (grid search), random search (random search), bayesian optimization (TPE, Spearmint, BNN, etc.), or evolutionary algorithm. Preferably, the feature combination selection algorithm is a bayesian optimization method or an evolutionary algorithm with relatively high search efficiency.
Under the condition that the feature combination selection algorithm is a Bayesian optimization method or an evolutionary algorithm, the probability of selecting an effective feature dimension group by the feature combination selection algorithm is improved. Optionally, after each feature dimension group is selected by the feature combination algorithm, the feature dimension group selected by the feature combination selection algorithm is evaluated for validity by using the service model. And updating the related parameters in the feature dimension combination selection algorithm according to the last selected history feature dimension group of the feature dimension combination selection algorithm and the effectiveness evaluation result corresponding to the history feature dimension group, and circularly executing the feature dimension group selection operation of the feature dimension combination selection method and the updating operation of the feature dimension combination selection method until the number of the feature dimension groups selected by the feature dimension combination selection method reaches the set number. Specifically, the number of feature dimension groups whose evaluation results satisfy the requirements may be used as the final selection result of the feature dimension combination selection method. The set number is determined by a skilled person according to actual conditions, and is not limited herein. And updating the new feature dimension group to be processed by selecting the feature dimension groups with set quantity, evaluating the effectiveness of the new feature dimension group, and determining the effective new feature dimension group.
And S270, judging whether all the candidate feature types are processed, if so, returning to execute the step S210, and if not, ending.
According to the technical scheme provided by the embodiment of the application, the feature dimension groups belonging to the target feature categories are generated according to the target feature dimensions, under the condition that the effective feature dimension groups exist, the feature dimension groups belonging to the target feature categories are respectively spliced with the effective feature dimension groups to obtain new feature dimension groups to be processed, and then the evaluation operation is executed on the new feature dimension groups. According to the method and the device, the effective characteristic dimension group is reserved, the characteristic dimension group is selected on the basis of the effective characteristic dimension group, the search space is compressed, and the selection efficiency of the characteristic dimension group is improved.
EXAMPLE III
Fig. 3 is a flowchart of a method for selecting another feature dimension according to a third embodiment of the present application. The present embodiment is further optimized on the basis of the above-described embodiments. Specifically, the optimizing is to evaluate the new feature dimension group to be processed, and determine an effective new feature dimension group according to an evaluation result, and includes: evaluating the effectiveness of the new feature dimension groups to be processed by using the service model to obtain the effectiveness scores of the new feature dimension groups to be processed as evaluation results; sorting the new feature dimension groups to be processed according to the evaluation result; and determining the new feature dimension group to be processed ranked within the set range as a valid new feature dimension group.
As shown in fig. 3, the method for selecting the feature dimension includes:
s310, selecting a target feature category to be processed from the candidate feature categories; wherein each candidate feature class comprises at least two feature dimensions.
And S320, judging whether an effective characteristic dimension group exists or not. If the current state does not exist, the step S331 and the following steps are executed, and if the current state does not exist, the step S332 is skipped.
S331, respectively splicing each feature dimension group belonging to the target feature category with the effective feature dimension group to obtain a new feature dimension group to be processed; wherein the set of feature dimensions is a combination of the feature dimensions.
And S340, evaluating the effectiveness of the new feature dimension groups to be processed by using the service model, and obtaining the effectiveness scores of the new feature dimension groups to be processed as evaluation results.
The business model is used for evaluating the effectiveness of the feature dimension group, and the effectiveness of the feature dimension refers to whether the feature dimension is effective for improving the performance of the business model. And evaluating the effectiveness of the feature dimension group by using the service model, specifically, training the service model based on the new feature dimension group to be processed, and then calculating the performance index of the service model on the verification data set to evaluate the effectiveness of the feature dimension group. Performance metrics for the business model include recall and accuracy. Optionally, the accuracy of the business model is used as the validity score of the feature dimension set. Or constructing an incidence relation table of the performance indexes and the effectiveness scores, and determining the effectiveness scores of the corresponding feature dimension groups according to the incidence relation table and the service model performance indexes.
And corresponding effectiveness scores exist in each new characteristic dimension group to be processed.
And S350, sorting the new feature dimension groups to be processed according to the evaluation result.
Optionally, the new feature dimension groups to be processed are sorted in the order of high to low effectiveness scores. The effectiveness score is proportional to the contribution of the feature dimension group to the performance improvement of the service model.
And S360, determining the new feature dimension group to be processed with the ranking within the set range as an effective new feature dimension group.
The setting range is determined by technicians according to actual conditions, and the new characteristic dimension group with the effectiveness score within the setting range is determined as an effective new characteristic dimension group. The process continues to step S370.
S332, taking the feature dimension group belonging to the target feature category as a new feature dimension group to be processed, executing evaluation operation of the new feature dimension group, and determining effective feature dimensions according to an evaluation result. The process continues to step S370.
The evaluation operation of the new feature dimension group includes step S340, step S350, and step S360 described in the embodiment of the present application. That is, in the case where there is no valid feature dimension group, the feature dimension group belonging to the target feature class is taken as a new feature dimension group to be processed, and then step S340, step S350, and step S360 are sequentially performed.
And S370, judging whether all the candidate feature types are processed, if so, returning to execute the step S310, and if not, ending.
According to the technical scheme provided by the embodiment of the application, the effectiveness of the new feature dimension groups to be processed is evaluated by utilizing the service model, and the effectiveness scores of the new feature dimension groups to be processed are obtained and used as evaluation results; sorting the new feature dimension groups to be processed according to the evaluation result; and determining the new feature dimension group to be processed ranked within the set range as a valid new feature dimension group. The method and the device realize the quantification of the performance improvement degree of the service model by the characteristic dimension group and improve the accuracy of characteristic dimension selection.
Example four
Fig. 4 is a feature dimension selection apparatus according to a fourth embodiment of the present application, which is applicable to a case where feature dimensions are selected for feature data when predicting whether a money laundering behavior exists in a user by using an anti-money laundering behavior prediction model. The device can be realized by software and/or hardware, and can be integrated in electronic equipment such as an intelligent terminal.
As shown in fig. 4, the apparatus may include: a target feature category selection module 410, a feature dimension group splicing module 420, a to-be-processed new feature dimension group evaluation module 430, and a to-be-processed new feature dimension group determination module 440.
A target feature class selection module 410, configured to select a target feature class to be processed from the candidate feature classes; wherein each candidate feature class comprises at least two feature dimensions;
a feature dimension group splicing module 420, configured to, if an effective feature dimension group exists, splice each feature dimension group belonging to a target feature category with the effective feature dimension group, respectively, to obtain a new feature dimension group to be processed; wherein the set of feature dimensions is a combination of the feature dimensions;
a to-be-processed new feature dimension group evaluation module 430, configured to evaluate the to-be-processed new feature dimension group, determine an effective new feature dimension group according to an evaluation result, and return to perform a selection operation of a target feature class until all candidate feature classes are processed;
and the to-be-processed new feature dimension group determining module 440 is configured to, if there is no effective feature dimension group, use the feature dimension group belonging to the target feature category as the to-be-processed new feature dimension group, perform an evaluation operation on the new feature dimension group, and determine an effective feature dimension according to an evaluation result.
According to the technical scheme provided by the embodiment of the application, the target feature category to be processed is selected from the candidate feature categories; if the effective characteristic dimension group exists, splicing each characteristic dimension group belonging to the target characteristic category with the effective characteristic dimension group respectively to obtain a new characteristic dimension group to be processed; and evaluating the new feature dimension group to be processed, determining an effective new feature dimension group according to an evaluation result, and returning to execute the selection operation of the target feature class until all candidate feature classes are processed. According to the method and the device, the characteristic dimensions are classified, and effective characteristic dimension combinations are determined according to the categories of the characteristic dimensions. The search space of the characteristic dimension is reduced, and the selection efficiency of the characteristic dimension is improved.
Optionally, the candidate categories include: original feature class, fused feature class, and advanced feature class; wherein the raw feature class comprises raw feature dimensions; the fusion feature class comprises fusion feature dimensions obtained by performing fusion processing on the original feature dimensions; the high-level feature class comprises high-level feature dimensions obtained by further fusing the fused feature dimensions.
Optionally, the apparatus 400 further includes: the target feature dimension determining module is used for determining a feature dimension belonging to a target class in the feature dimensions as a target feature dimension according to the target class after selecting the target feature class to be processed from candidate feature classes; and the characteristic dimension group determining module is used for generating at least two characteristic dimension groups belonging to the target characteristic category according to the target characteristic dimension.
Optionally, the module for evaluating the new feature dimension group to be processed includes: the effectiveness evaluation module is used for evaluating the effectiveness of the new feature dimension groups to be processed by utilizing the service model to obtain the effectiveness scores of the new feature dimension groups to be processed as evaluation results; the effectiveness sorting module of the new feature dimension group to be processed is used for sorting the new feature dimension group to be processed according to the evaluation result; and the effective new characteristic dimension group determining module is used for determining the new characteristic dimension group to be processed, ranked in the set range, as the effective new characteristic dimension group.
Optionally, the business model is an anti-money laundering behavior prediction model, and correspondingly, the apparatus 400 further includes: and the user money laundering behavior prediction module is used for predicting whether the money laundering behavior exists in the user or not according to the anti-money laundering behavior prediction model and the effective characteristic dimension group after all candidate characteristic categories are processed.
Optionally, the target feature categories to be processed are sequentially selected from the candidate feature categories according to the front-back order of the original feature categories, the fused feature categories and the high-level feature categories.
The characteristic dimension selection device provided by the embodiment of the invention can execute the characteristic dimension selection method provided by any embodiment of the invention, and has the corresponding performance module and beneficial effects of executing the characteristic dimension selection method.
EXAMPLE five
A storage medium containing computer-executable instructions for performing a method for feature dimension selection when executed by a computer processor, the method comprising:
selecting a target feature category to be processed from the candidate feature categories; wherein each candidate feature class comprises at least two feature dimensions;
if the effective characteristic dimension group exists, splicing each characteristic dimension group belonging to the target characteristic category with the effective characteristic dimension group respectively to obtain a new characteristic dimension group to be processed; wherein the set of feature dimensions is a combination of the feature dimensions;
evaluating the new feature dimension group to be processed, determining an effective new feature dimension group according to an evaluation result, and returning to execute the selection operation of the target feature class until all candidate feature classes are processed;
and if the effective characteristic dimension group does not exist, taking the characteristic dimension group belonging to the target characteristic category as a new characteristic dimension group to be processed, executing the evaluation operation of the new characteristic dimension group, and determining the effective characteristic dimension according to the evaluation result.
Storage media refers to any of various types of memory electronics or storage electronics. The term "storage medium" is intended to include: mounting media such as CD-ROM, floppy disk, or tape devices; computer system memory or random access memory such as DRAM, DDR RAM, SRAM, EDO RAM, Lanbas (Rambus) RAM, etc.; non-volatile memory such as flash memory, magnetic media (e.g., hard disk or optical storage); registers or other similar types of memory elements, etc. The storage medium may also include other types of memory or combinations thereof. In addition, the storage medium may be located in the computer system in which the program is executed, or may be located in a different second computer system connected to the computer system through a network (such as the internet). The second computer system may provide the program instructions to the computer for execution. The term "storage medium" may include two or more storage media that may reside in different unknowns (e.g., in different computer systems connected by a network). The storage medium may store program instructions (e.g., embodied as a computer program) that are executable by one or more processors.
Of course, the storage medium provided in the embodiments of the present application contains computer-executable instructions, and the computer-executable instructions are not limited to the above-described selection operation of the feature dimension, and may also perform related operations in the feature dimension selection method provided in any embodiment of the present application.
EXAMPLE six
An embodiment of the present invention provides an electronic device, where the selection apparatus for feature dimension provided in the embodiment of the present application may be integrated in the electronic device, and the electronic device may be configured in a system, or may be a device that performs part or all of the capabilities in the system. Fig. 5 is a schematic structural diagram of an electronic device according to a sixth embodiment of the present application. As shown in fig. 5, the present embodiment provides an electronic device 500, which includes: one or more processors 520; the storage 510 is configured to store one or more programs, and when the one or more programs are executed by the one or more processors 520, the one or more processors 520 implement the method for selecting a feature dimension provided in the embodiment of the present application, the method includes:
selecting a target feature category to be processed from the candidate feature categories; wherein each candidate feature class comprises at least two feature dimensions;
if the effective characteristic dimension group exists, splicing each characteristic dimension group belonging to the target characteristic category with the effective characteristic dimension group respectively to obtain a new characteristic dimension group to be processed; wherein the set of feature dimensions is a combination of the feature dimensions;
evaluating the new feature dimension group to be processed, determining an effective new feature dimension group according to an evaluation result, and returning to execute the selection operation of the target feature class until all candidate feature classes are processed;
and if the effective characteristic dimension group does not exist, taking the characteristic dimension group belonging to the target characteristic category as a new characteristic dimension group to be processed, executing the evaluation operation of the new characteristic dimension group, and determining the effective characteristic dimension according to the evaluation result.
Of course, those skilled in the art will understand that the processor 520 also implements the technical solution of the feature dimension selection method provided in any embodiment of the present application.
The electronic device 500 shown in fig. 5 is only an example, and should not bring any limitation to the performance and the application range of the embodiments of the present application.
As shown in fig. 5, the electronic device 500 includes a processor 520, a storage 510, an input 530, and an output 540; the number of the processors 520 in the electronic device may be one or more, and one processor 520 is taken as an example in fig. 5; the processor 520, the storage 510, the input device 530, and the output device 540 in the electronic apparatus may be connected by a bus or other means, and are exemplified by a bus 550 in fig. 5.
The storage device 510 is a computer-readable storage medium, and can be used to store software programs, computer-executable programs, and module units, such as program instructions corresponding to the feature dimension selection method in the embodiment of the present application.
The storage device 510 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for performance; the storage data area may store data created according to the use of the terminal, and the like. Further, the storage 510 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, storage 510 may further include memory located remotely from processor 520, which may be connected via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 530 may be used to receive input numbers, character information, or voice information, and to generate key signal inputs related to user settings and performance control of the electronic apparatus. The output device 540 may include a display screen, speakers, etc. of electronic equipment.
The selection device, the medium, and the electronic device for the feature dimension provided in the embodiments above may execute the selection method for the feature dimension provided in any embodiment of the present application, and have a performance module and a beneficial effect corresponding to the execution of the method. For technical details not described in detail in the above embodiments, reference may be made to the method for selecting a feature dimension provided in any embodiment of the present application.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present application and the technical principles employed. It will be understood by those skilled in the art that the present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the application. Therefore, although the present application has been described in more detail with reference to the above embodiments, the present application is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present application, and the scope of the present application is determined by the scope of the appended claims.

Claims (10)

1. A method for selecting a feature dimension, the method comprising:
selecting a target feature category to be processed from the candidate feature categories; wherein each candidate feature class comprises at least two feature dimensions;
if the effective characteristic dimension group exists, splicing each characteristic dimension group belonging to the target characteristic category with the effective characteristic dimension group respectively to obtain a new characteristic dimension group to be processed; wherein the set of feature dimensions is a combination of the feature dimensions;
evaluating the new feature dimension group to be processed, determining an effective new feature dimension group according to an evaluation result, and returning to execute the selection operation of the target feature class until all candidate feature classes are processed;
and if the effective characteristic dimension group does not exist, taking the characteristic dimension group belonging to the target characteristic category as a new characteristic dimension group to be processed, executing the evaluation operation of the new characteristic dimension group, and determining the effective characteristic dimension according to the evaluation result.
2. The method of claim 1, wherein the candidate categories comprise: original feature class, fused feature class, and advanced feature class; wherein the raw feature class comprises raw feature dimensions; the fusion feature class comprises fusion feature dimensions obtained by performing fusion processing on the original feature dimensions; the high-level feature class comprises high-level feature dimensions obtained by further fusing the fused feature dimensions.
3. The method of claim 1, wherein after selecting the target feature class to be processed from the candidate feature classes, the method further comprises:
determining a characteristic dimension belonging to the target category in the characteristic dimensions according to the target category, and taking the characteristic dimension as a target characteristic dimension;
and generating at least two characteristic dimension groups belonging to the target characteristic category according to the target characteristic dimension.
4. The method of claim 1, wherein evaluating the new set of feature dimensions to be processed and determining an effective new set of feature dimensions based on the evaluation comprises:
evaluating the effectiveness of the new feature dimension groups to be processed by using the service model to obtain the effectiveness scores of the new feature dimension groups to be processed as evaluation results;
sorting the new feature dimension groups to be processed according to the evaluation result;
and determining the new feature dimension group to be processed ranked within the set range as a valid new feature dimension group.
5. The method of claim 3, wherein before evaluating the new set of feature dimensions to be processed and determining a valid new set of feature dimensions from the evaluation, the method further comprises:
and selecting a set number of feature dimension groups from a search space formed by the new feature dimension groups to be processed by utilizing a feature combination selection algorithm, and updating the new feature dimension groups to be processed.
6. The method of claim 4, wherein the business model is an anti-money laundering behavior prediction model, and wherein the method further comprises, after processing each candidate feature class:
and predicting whether the money laundering behavior of the user exists or not according to the anti-money laundering behavior prediction model and the effective characteristic dimension group.
7. The method according to claim 2, characterized in that the target feature classes to be processed are selected sequentially from the candidate feature classes in the order of the original feature classes, the fused feature classes and the high-level feature classes.
8. An apparatus for selecting a feature dimension, the apparatus comprising:
the target feature class selection module is used for selecting a target feature class to be processed from the candidate feature classes; wherein each candidate feature class comprises at least two feature dimensions;
the characteristic dimension group splicing module is used for splicing each characteristic dimension group belonging to the target characteristic category with the effective characteristic dimension group to obtain a new characteristic dimension group to be processed if the effective characteristic dimension group exists; wherein the set of feature dimensions is a combination of the feature dimensions;
the to-be-processed new feature dimension group evaluation module is used for evaluating the to-be-processed new feature dimension group, determining an effective new feature dimension group according to an evaluation result, and returning to execute the selection operation of the target feature class until all candidate feature classes are processed;
and the to-be-processed new feature dimension group determining module is used for taking the feature dimension group belonging to the target feature category as the to-be-processed new feature dimension group if the effective feature dimension group does not exist, executing the evaluation operation of the new feature dimension group, and determining the effective feature dimension according to the evaluation result.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method for selecting a feature dimension according to any one of claims 1 to 7.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method for selecting a feature dimension according to any of claims 1-7 when executing the computer program.
CN202110847552.6A 2021-07-27 2021-07-27 Feature dimension selection method, device, medium and electronic equipment Active CN113297337B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110847552.6A CN113297337B (en) 2021-07-27 2021-07-27 Feature dimension selection method, device, medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110847552.6A CN113297337B (en) 2021-07-27 2021-07-27 Feature dimension selection method, device, medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN113297337A true CN113297337A (en) 2021-08-24
CN113297337B CN113297337B (en) 2021-11-12

Family

ID=77331068

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110847552.6A Active CN113297337B (en) 2021-07-27 2021-07-27 Feature dimension selection method, device, medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN113297337B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109685107A (en) * 2018-11-22 2019-04-26 东软集团股份有限公司 Feature selection approach, system, computer readable storage medium and electronic equipment
US20200380524A1 (en) * 2019-05-29 2020-12-03 Alibaba Group Holding Limited Transaction feature generation
CN112581261A (en) * 2020-12-22 2021-03-30 北京三快在线科技有限公司 Wind control rule determination method and device
CN112613983A (en) * 2020-12-25 2021-04-06 北京知因智慧科技有限公司 Feature screening method and device in machine modeling process and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109685107A (en) * 2018-11-22 2019-04-26 东软集团股份有限公司 Feature selection approach, system, computer readable storage medium and electronic equipment
US20200380524A1 (en) * 2019-05-29 2020-12-03 Alibaba Group Holding Limited Transaction feature generation
CN112581261A (en) * 2020-12-22 2021-03-30 北京三快在线科技有限公司 Wind control rule determination method and device
CN112613983A (en) * 2020-12-25 2021-04-06 北京知因智慧科技有限公司 Feature screening method and device in machine modeling process and electronic equipment

Also Published As

Publication number Publication date
CN113297337B (en) 2021-11-12

Similar Documents

Publication Publication Date Title
KR102061987B1 (en) Risk Assessment Method and System
CN107563757B (en) Data risk identification method and device
CN113657465A (en) Pre-training model generation method and device, electronic equipment and storage medium
CN110969528A (en) Transaction channel routing method, device, server and computer storage medium
JP2023550194A (en) Model training methods, data enrichment methods, equipment, electronic equipment and storage media
CN113010778A (en) Knowledge graph recommendation method and system based on user historical interest
CN113537960A (en) Method, device and equipment for determining abnormal resource transfer link
CN106776757B (en) Method and device for indicating user to complete online banking operation
CN115293336A (en) Risk assessment model training method and device and server
CN113609345B (en) Target object association method and device, computing equipment and storage medium
CN114328808A (en) Address fuzzy matching method, address processing method, address fuzzy matching device and electronic equipment
CN113297337B (en) Feature dimension selection method, device, medium and electronic equipment
CN111582448A (en) Weight training method and device, computer equipment and storage medium
CN117499309A (en) Method, apparatus, electronic device, and computer-readable medium for flow control
CN116246140A (en) Res-50 and CBAM fused automatic earthquake fault identification method
CN115130536A (en) Training method of feature extraction model, data processing method, device and equipment
CN112561569B (en) Dual-model-based store arrival prediction method, system, electronic equipment and storage medium
WO2021196843A1 (en) Derived variable selection method and apparatus for risk identification model
CN112364258A (en) Map-based recommendation method, system, storage medium and electronic device
CN113947154A (en) Target detection method, system, electronic equipment and storage medium
CN114218997A (en) Experimental data grouping method, device, medium and electronic equipment
CN113434436A (en) Test case generation method and device, electronic equipment and storage medium
CN112288528A (en) Malicious community discovery method and device, computer equipment and readable storage medium
CN115953248B (en) Wind control method, device, equipment and medium based on saprolitic additivity interpretation
CN115545043B (en) Entity and relation parallel extraction model and construction method, device and application thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant