CN110738476B - Sample migration method, device and equipment - Google Patents
Sample migration method, device and equipment Download PDFInfo
- Publication number
- CN110738476B CN110738476B CN201910905305.XA CN201910905305A CN110738476B CN 110738476 B CN110738476 B CN 110738476B CN 201910905305 A CN201910905305 A CN 201910905305A CN 110738476 B CN110738476 B CN 110738476B
- Authority
- CN
- China
- Prior art keywords
- target domain
- sample
- same
- source
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q20/00—Payment architectures, schemes or protocols
- G06Q20/08—Payment architectures
- G06Q20/085—Payment architectures involving remote charge determination or related payment systems
- G06Q20/0855—Payment architectures involving remote charge determination or related payment systems involving a third party
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/251—Fusion techniques of input or preprocessed data
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Image Analysis (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
A method, an apparatus and a device for sample migration are disclosed. According to the scheme provided by the embodiment of the specification, a source sample in a source field with mature service and a small number of target domain samples in a target domain are extracted, the same feature and different features are respectively compared, adaptation after high-dimensional mapping is carried out on the same feature, and completion is carried out on different features, so that a source sample and a target domain sample after correction are obtained, and a fusion sample set which can be used for the target domain is obtained by combination.
Description
Technical Field
The embodiment of the specification relates to the technical field of information, in particular to a sample migration method, device and equipment.
Background
The establishment of the wind control model cannot bypass the model training stage, and the training stage needs business data accumulation and marking data in a certain period. In practical application, it is often encountered in a certain environment to develop some mature services from zero base.
For example, when third party payment is mature in China, but business needs to be developed in a foreign country, the business scenarios are similar, but the environments are different, and only a very small amount of samples are accumulated in the early stage of business development, which causes that the business is difficult to train with local data and establish effective wind control models and strategies for risk prevention and control in the early stage of online business.
Based on this, a reliable sample migration scheme is needed.
Disclosure of Invention
It is an object of embodiments of the present application to provide a reliable sample migration.
In order to solve the above technical problem, the embodiment of the present application is implemented as follows:
a method of sample migration, comprising:
acquiring a source sample set and a target domain sample set, wherein the source sample and the target domain sample contain the same number of features, and the source sample and the target domain sample are applied to similar business fields;
determining the same characteristics and different characteristics contained in the source sample and the target domain sample;
aiming at the same characteristic, mapping the source sample and the target domain sample to the same high-dimensional space, determining the closest characteristic value distribution in the source sample set and the target domain sample set under the same characteristic, and changing the characteristic value of each sample under the same characteristic to be the characteristic value under the high-dimensional space according to the closest characteristic value distribution;
according to different characteristics, supplementing characteristic values of different characteristics in the target domain sample according to values of the different characteristics in a source sample set;
and combining the source sample set and the target domain sample set after the characteristic value is changed to generate a fusion sample set for model training in the target domain.
Correspondingly, embodiments of the present specification further provide a sample transfer device, including:
the system comprises a sample acquisition module, a data acquisition module and a data acquisition module, wherein a sample acquisition device acquires a source sample set and a target domain sample set, wherein the source sample and the target domain sample contain the same number of characteristics, and the source sample and the target domain sample are applied to similar business fields;
the characteristic determining module is used for determining the same characteristics and different characteristics contained in the source sample and the target domain sample;
the same feature transformation module is used for mapping the source sample and the target domain sample to the same high-dimensional space aiming at the same feature, determining the closest feature value distribution in the source sample set and the target domain sample set under the same feature, and changing the feature value of each sample under the same feature into the feature value under the high-dimensional space according to the closest feature value distribution;
the different feature transformation module is used for completing the feature values of different features in the target domain sample according to the values of the different features in the source sample set aiming at the different features;
and the fusion module is used for merging the source sample set and the target domain sample set after the characteristic value is changed to generate a fusion sample set for model training in the target domain.
According to the scheme provided by the embodiment of the specification, a source sample in a source field with mature service and a small number of target domain samples in a target domain are extracted, the same feature and different features are respectively compared, adaptation after high-dimensional mapping is carried out on the same feature, and completion is carried out on different features, so that a source sample and a target domain sample after correction are obtained, and a fusion sample set which can be used for the target domain is obtained by combination. Because the target domain samples and the source domain samples belong to similar business scenes, the distribution of the characteristic values of the target domain samples and the source domain samples is approximately equivalent by the method, and the quantity of the fused sample sets and the reliability in the target domain are improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of embodiments of the invention.
In addition, any one of the embodiments in the present specification is not required to achieve all of the effects described above.
Drawings
In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present specification, and other drawings can be obtained by those skilled in the art according to the drawings.
Fig. 1 is a schematic flow chart of a sample migration method provided in an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a feature description provided in an embodiment of the present disclosure;
FIG. 3 is a diagram illustrating a sample feature of a fill-up target domain in accordance with an embodiment of the present disclosure;
FIG. 4 is a schematic structural diagram of a sample transfer device provided in an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an apparatus for configuring a method according to an embodiment of the present disclosure.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the embodiments of the present specification, the technical solutions in the embodiments of the present specification will be described in detail below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, and not all the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of protection.
Some concepts involved in the embodiments of the present description are explained first:
a source domain: the method can be used as a source service field of sample migration, such as a mature third party payment field in China, in the field, the number of samples is large, and a trained model is mature and reliable.
Source sample: applied to the samples in the source domain, each sample having been tagged with a label for either a fraudulent transaction or a normal transaction.
A source sample set: the set formed by the source samples has a large number of elements in the set.
Target domain: the method refers to a target domain needing sample migration, such as a third party payment domain developed in a certain overseas country, and the service in the target domain is similar to the service scene in the source domain.
Target domain samples: the samples applied to the target domain are fewer because the business is just developed abroad.
Target domain sample set: the set of target domain samples has a smaller number of elements in the set.
The technical solutions provided by the embodiments of the present description are described in detail below with reference to the accompanying drawings. As shown in fig. 1, fig. 1 is a schematic flow chart of a sample migration method provided in an embodiment of the present specification, where the flow chart specifically includes the following steps:
s101, a source sample set and a target domain sample set are obtained, wherein the source sample and the target domain sample contain the same number of features.
It should be noted that the sample is composed of a plurality of features, and each feature has a corresponding feature value. In the source sample, both the features and feature values are well-defined as the traffic has matured. And each source sample may have an explicit label. In other words, the number of features in the source sample is determined, as is the value of the feature in each sample.
Meanwhile, for the target domain sample, since sample migration is required, the number of features in the target domain sample needs to be the same as that in the source sample. One embodiment may be that the number of features in the target domain sample is determined to be equal to the number of features in the source sample when the target domain sample is created.
S103, the same characteristics and different characteristics contained in the source sample and the target domain sample are determined.
The different features referred to in the embodiments of the present specification refer to features that are present in the source samples but not in the target domain samples.
If one feature is present in the target domain sample and the source sample is not present, it can be regarded as a redundant feature in the process, and the deletion is performed. In fact, because the traffic types are similar, the target domain sample also contains the same or less features as the source sample when being established. As shown in fig. 2, fig. 2 is a schematic diagram of a feature description provided in an embodiment of the present disclosure. Overlapping portions in the figures are denoted by the same feature, and it is obvious that the same feature may be one or more.
And S105, aiming at the same characteristic, mapping the source sample and the target domain sample to the same high-dimensional space, determining the closest characteristic value distribution in the source sample set and the target domain sample set under the same characteristic, and changing the characteristic value of each sample under the same characteristic into the characteristic value under the high-dimensional space according to the closest characteristic value distribution.
In particular, various functional transformations may be employed to map the source and target domain samples to the same high-dimensional space simultaneously. For example using a linear transformation, or a polynomial transformation, or a gaussian transformation, etc.
The purpose of high-dimensional mapping by using function transformation is that in a low-dimensional space, the relationship between characteristic values of the same characteristic is difficult to see due to different environments of a target domain and the target domain. For example, for the feature "monthly spending limit" or the feature "trusted overdraft limit", it is obviously not feasible to directly make a comparison or migration of feature values in domestic and foreign environments.
However, when the sample data amount is large enough, it can be known that the distribution mode of the "monthly consumption quota" or the "credible overdraft quota" has a similar rule even under different field environments, for example, the distribution mode follows gaussian distribution, but parameters of the gaussian distribution are different in different fields. Therefore, if proper multi-dimensional mapping is carried out, on a certain high-dimensional space, the characteristic value distribution of the 'monthly consumption line' of the domestic domain and the 'monthly consumption line' of the foreign domain can still be seen to be very close. The specific expression is that after high-dimensional mapping is carried out according to the monthly consumption limit, the characteristic values of the source sample and the target domain sample in the high-dimensional space are relatively close, and the clustering effect is relatively obvious.
In practical application, the potential rules are multivariate, so that different high-dimensional mapping modes can be adopted for mapping respectively, and then the clustering effects in the high-dimensional space are compared, so that the closest feature value distribution in the source sample set and the target domain sample set is obtained, and further, for the same feature, the feature values of the same feature in the source sample and the target domain sample are replaced by the feature values in the high-dimensional space.
One practical way is to set the calculation step size to map the source sample set and the target domain sample set multiple times by using different mapping functions or adjusting mapping parameters in the mapping functions. After each mapping, i.e. calculating the difference of the average eigenvalue of the source samples and the average eigenvalue of the target domain samples in the high dimensional space. Furthermore, the minimum value of the differences obtained by multiple mappings is determined, and obviously, when the difference is taken as the minimum value, it can be determined that the source sample and the target domain sample are close enough in the mapped high-dimensional space, and the characteristic value distribution corresponding to the minimum value is the closest characteristic value distribution.
Further, it can be known that the mapping function corresponding to the minimum value is the target distribution adaptive mapping function corresponding to the same feature. For any sample in the source sample set and the sample in the target domain sample set, the target distribution adaptive mapping function can be adopted for mapping aiming at the same characteristic, and the characteristic value of the changed sample is obtained.
And S107, according to different features, completing feature values of different features in the target domain sample according to values of the different features in the source sample set.
As previously described, different features refer to features that are present in the source samples but not in the target domain samples. In other words, initially the target domain samples are created without including features that are not present in the source samples, or features that are present in the target domain samples but not present in the source samples are removed during processing.
Correspondingly, in order to keep the target domain samples and the source samples consistent, different features may be added to the target domain samples at this time. Fig. 3 is a schematic diagram of a feature of a filled-up target domain sample according to an embodiment of the present disclosure, as shown in fig. 3.
After the features are filled, the feature values of different features in the target domain sample can be filled based on a preset value (the preset value can be set based on experience). Alternatively, the population may be based on statistics (e.g., mean, median, mode, maximum/minimum, etc.) of the different features in the set of source samples. Generally, since the service scenes are similar, the average value comparison is adopted to meet the requirement of practical application.
And S109, combining the source sample set and the target domain sample set after the characteristic value is changed, and generating a fusion sample set for model training in the target domain.
In the manner, for the source samples in the source sample set, the feature values of the same features in the source samples are changed through high-dimensional mapping; for the target domain samples in the target domain sample set, the feature values of the same features in the target domain samples are changed through high-dimensional mapping, and the feature values under different features are obtained through feature filling. And moreover, the source sample and the target domain sample are ensured to have the same characteristics and similar characteristic value distribution.
Therefore, the source sample set and the target domain sample set after the variable characteristic value is changed can be merged and applied to the target domain as a training sample for model training to obtain a wind control model available in the target domain.
According to the scheme provided by the embodiment of the specification, a source sample in a source field with mature service and a small number of target domain samples in a target domain are extracted, the same feature and different features are respectively compared, adaptation after high-dimensional mapping is carried out on the same feature, and completion is carried out on different features, so that a source sample and a target domain sample after correction are obtained, and a fusion sample set which can be used for the target domain is obtained by combination. Because the target domain samples and the source domain samples belong to similar business scenes, the distribution of the characteristic values of the target domain samples and the source domain samples is approximately equivalent by the method, and the quantity of the fused sample sets and the reliability in the target domain are improved.
In one embodiment, when mapping the source samples and the target domain samples to the same high-dimensional space, the mapping may be performed for each of the same features one by one. The feature values of the same features of the source sample and the target domain sample are mapped to the same high-dimensional space one by one, and the closest feature value distribution in the source sample set and the target domain sample set in the high-dimensional space under the current same features is determined.
In this way, each identical feature has a corresponding mapping function, and the forms and parameters of the corresponding target distribution adaptation mapping functions of the identical features may be different. Changing the eigenvalues of the same signature will also require adapting the mapping function with a different target distribution. In this way, each identical feature in each of the source sample set and the target domain sample set can ensure a sufficiently close feature distribution, and the obtained sample is more accurate after the conversion.
In one embodiment, part/all of the same features may also be batch mapped to the same high-dimensional space using one function, for example, a polynomial mapping function. In this way, when determining the distribution of feature values, it is also necessary to perform corresponding determination in a high-dimensional space under the same feature of partial/full amount, and when performing feature value conversion after determining the distribution and obtaining the corresponding target distribution adaptive mapping function, it is also necessary to correspondingly convert the feature values of the same feature of partial/full amount. In this way, a plurality of same features are regarded as a whole to be transformed, and when the same features are more, the computing efficiency can be effectively improved, and the time is saved.
In one embodiment, since the source sample set and the target domain sample set are marked samples, and the number of the source sample sets is sufficient, when merging, the part or all of the source sample set and the full target domain sample set after feature value change can be merged to obtain a merged sample set.
Further, part of samples in the part of source sample sets are selected for combination, and the selection can be performed randomly, for example, 50% of the source samples are selected randomly, or some samples are selected in a targeted manner according to certain conditions, for example, part of the source samples in the latest time window period are selected, and when the number of the source samples is large enough, the efficiency of risk model training of the fused sample sets in the target domain can be improved by selecting part of the source samples on the premise of guaranteeing the number of the samples
Correspondingly, an embodiment of the present specification further provides a sample transfer device, as shown in fig. 4, fig. 4 is a schematic structural diagram of the sample transfer device provided in the embodiment of the present specification, and includes:
a sample obtaining module 401, where a sample obtaining apparatus obtains a source sample set and a target domain sample set, where the source sample and the target domain sample include features with the same number, and the source sample and the target domain sample are applied to similar business fields;
a feature determination module 403, which determines the same features and different features contained in the source sample and the target domain sample;
the same feature transformation module 405 maps the source sample and the target domain sample to the same high-dimensional space for the same feature, determines the closest feature value distribution in the source sample set and the target domain sample set under the same feature, and changes the feature value under the same feature of each sample to the feature value under the high-dimensional space according to the closest feature value distribution;
a different feature transformation module 407, configured to, for different features, fill up feature values of different features in the target domain sample according to values of the different features in the source sample set;
and the fusion module 409 combines the source sample set and the target domain sample set after the characteristic value is changed, and generates a fusion sample set for model training in the target domain.
Further, the same feature transformation module 405 maps feature values of the same feature to the same high-dimensional space by adjusting a mapping function or parameters in the mapping function; and when the average value of the sample characteristic values of the source sample set and the target domain sample set is minimum in the high-dimensional space, determining the characteristic value distribution at the moment as the closest characteristic value distribution.
Further, the same feature transformation module 405 maps feature values of the same features of the source sample and the target domain sample to the same high-dimensional space one by one, and determines the closest feature value distribution in the source sample set and the target domain sample set in the current high-dimensional space under the same features.
Further, the same feature transformation module 405 maps the feature values of the same features of the source sample and the target domain sample to the same high-dimensional space, and determines the closest feature value distribution in the source sample set and the target domain sample set in the high-dimensional space under the same features of the total amount.
Further, the different feature transformation module 407 determines, for any different feature, an average value of the different feature in the source sample set, adds the different feature in the target domain sample, and takes a value of the different feature in the target domain sample as the average value.
Further, the fusion module 409 merges part or all of the source sample set and the full target domain sample set after the feature value is changed.
Embodiments of the present specification also provide a computer device, which at least includes a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the sample migration method shown in fig. 1 when executing the program.
Fig. 5 is a schematic diagram illustrating a more specific hardware structure of a computing device according to an embodiment of the present disclosure, where the computing device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.
The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.
The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).
It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.
Embodiments of the present description also provide a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the sample migration method shown in fig. 1.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
From the above description of the embodiments, it is clear to those skilled in the art that the embodiments of the present disclosure can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the embodiments of the present specification may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments of the present specification.
The systems, methods, modules or units described in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the method embodiment, since it is substantially similar to the method embodiment, it is relatively simple to describe, and reference may be made to the partial description of the method embodiment for relevant points. The above-described method embodiments are merely illustrative, wherein the modules described as separate components may or may not be physically separate, and the functions of the modules may be implemented in one or more software and/or hardware when implementing the embodiments of the present specification. And part or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The foregoing is only a specific embodiment of the embodiments of the present disclosure, and it should be noted that, for those skilled in the art, a plurality of modifications and decorations can be made without departing from the principle of the embodiments of the present disclosure, and these modifications and decorations should also be regarded as the protection scope of the embodiments of the present disclosure.
Claims (13)
1. A method of sample migration, comprising:
acquiring a source sample set and a target domain sample set, wherein the source sample and the target domain sample contain the same number of features, and the source sample and the target domain sample are applied to similar business fields;
determining the same characteristics and different characteristics contained in the source sample and the target domain sample;
aiming at the same characteristic, mapping the source sample and the target domain sample to the same high-dimensional space, determining the closest characteristic value distribution in the source sample set and the target domain sample set under the same characteristic, and changing the characteristic value of each sample under the same characteristic to be the characteristic value under the high-dimensional space according to the closest characteristic value distribution; when the difference between the average values of the sample characteristic values of the source sample set and the target domain sample set is minimum in the same high-dimensional space, determining the characteristic value distribution at the moment as the closest characteristic value distribution;
according to different characteristics, supplementing characteristic values of different characteristics in the target domain sample according to values of the different characteristics in a source sample set;
and combining the source sample set and the target domain sample set after the characteristic value is changed to generate a fusion sample set for model training in the target domain.
2. The method of claim 1, mapping source samples and target domain samples to the same high-dimensional space for the same feature, comprising:
and mapping the characteristic values of the same characteristic to the same high-dimensional space by adjusting the mapping function or parameters in the mapping function.
3. The method of claim 1, mapping source samples and target domain samples to the same high-dimensional space for the same feature, comprising:
mapping feature values of the same features of the source sample and the target domain sample to the same high-dimensional space one by one, and correspondingly, determining the closest feature value distribution in the source sample set and the target domain sample set under the same features comprises the following steps:
and determining the nearest characteristic value distribution in the source sample set and the target domain sample set in the high-dimensional space under the current same characteristic.
4. The method of claim 1, mapping source samples and target domain samples to the same high-dimensional space for the same feature, comprising:
mapping the feature values of the partial/full-scale same features of the source sample and the target domain sample to the same high-dimensional space, and correspondingly, determining the closest feature value distribution in the source sample set and the target domain sample set under the same features comprises the following steps:
the closest eigenvalue distributions in the source and target domain sample sets in the high dimensional space under partial/full amount of the same features are determined.
5. The method of claim 1, wherein completing feature values of different features in the target domain samples according to values of the different features in the source sample set comprises:
and aiming at any different characteristic, determining the average value of the different characteristics in the source sample set, adding the different characteristics in the target domain sample, and taking the value of the different characteristics in the target domain sample as the average value.
6. The method of claim 1, merging the altered set of feature value source samples and the altered set of target domain samples, comprising:
and combining part or all of the source sample set and the full target domain sample set after the characteristic value is changed.
7. A sample transfer device, comprising:
the system comprises a sample acquisition module, a data acquisition module and a data acquisition module, wherein a sample acquisition device acquires a source sample set and a target domain sample set, wherein the source sample and the target domain sample contain the same number of characteristics, and the source sample and the target domain sample are applied to similar business fields;
the characteristic determining module is used for determining the same characteristics and different characteristics contained in the source sample and the target domain sample;
the same feature transformation module is used for mapping the source sample and the target domain sample to the same high-dimensional space aiming at the same feature, determining the closest feature value distribution in the source sample set and the target domain sample set under the same feature, and changing the feature value of each sample under the same feature into the feature value under the high-dimensional space according to the closest feature value distribution; when the difference between the average values of the sample characteristic values of the source sample set and the target domain sample set is minimum in the same high-dimensional space, determining the characteristic value distribution at the moment as the closest characteristic value distribution;
the different feature transformation module is used for completing the feature values of different features in the target domain sample according to the values of the different features in the source sample set aiming at the different features;
and the fusion module is used for merging the source sample set and the target domain sample set after the characteristic value is changed to generate a fusion sample set for model training in the target domain.
8. The apparatus of claim 7, wherein the same feature transformation module maps feature values of the same feature to the same high-dimensional space by adjusting a mapping function or parameters in the mapping function; and when the average value of the sample characteristic values of the source sample set and the target domain sample set is minimum in the high-dimensional space, determining the characteristic value distribution at the moment as the closest characteristic value distribution.
9. The apparatus of claim 7, wherein the same feature transformation module maps feature values of the same features of the source sample and the target domain sample one by one to a same high-dimensional space, and determines a closest feature value distribution in the source sample set and the target domain sample set in the high-dimensional space under the current same features.
10. The apparatus of claim 7, wherein the same feature transformation module maps feature values of a total number of same features of the source samples and the target domain samples to a same high-dimensional space, and determines a closest feature value distribution in the source sample set and the target domain sample set in the high-dimensional space under the total number of the same features.
11. The apparatus of claim 7, wherein the different feature transformation module determines an average of any different feature in the source sample set, adds the different feature to the target domain sample, and takes a value of the different feature in the target domain sample as the average.
12. The apparatus of claim 7, the fusion module to merge some or all of the altered feature value source sample sets and a full target domain sample set.
13. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1 to 6 when executing the program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910905305.XA CN110738476B (en) | 2019-09-24 | 2019-09-24 | Sample migration method, device and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910905305.XA CN110738476B (en) | 2019-09-24 | 2019-09-24 | Sample migration method, device and equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110738476A CN110738476A (en) | 2020-01-31 |
CN110738476B true CN110738476B (en) | 2021-06-29 |
Family
ID=69269377
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910905305.XA Active CN110738476B (en) | 2019-09-24 | 2019-09-24 | Sample migration method, device and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110738476B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111428783B (en) * | 2020-03-23 | 2022-06-21 | 支付宝(杭州)信息技术有限公司 | Method and device for performing sample domain conversion on training samples of recommendation model |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107045640A (en) * | 2017-03-31 | 2017-08-15 | 南京邮电大学 | A kind of method kept based on neighborhood with kernel space alignment for image recognition |
CN108460523A (en) * | 2018-02-12 | 2018-08-28 | 阿里巴巴集团控股有限公司 | A kind of air control rule generating method and device |
CN108898181A (en) * | 2018-06-29 | 2018-11-27 | 咪咕文化科技有限公司 | A kind of processing method, device and the storage medium of image classification model |
CN109034080A (en) * | 2018-08-01 | 2018-12-18 | 桂林电子科技大学 | The adaptive face identification method in multi-source domain |
CN109214421A (en) * | 2018-07-27 | 2019-01-15 | 阿里巴巴集团控股有限公司 | A kind of model training method, device and computer equipment |
CN109902393A (en) * | 2019-03-01 | 2019-06-18 | 哈尔滨理工大学 | Fault Diagnosis of Roller Bearings under a kind of variable working condition based on further feature and transfer learning |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108399414B (en) * | 2017-02-08 | 2021-06-01 | 南京航空航天大学 | Sample selection method and device applied to cross-modal data retrieval field |
-
2019
- 2019-09-24 CN CN201910905305.XA patent/CN110738476B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107045640A (en) * | 2017-03-31 | 2017-08-15 | 南京邮电大学 | A kind of method kept based on neighborhood with kernel space alignment for image recognition |
CN108460523A (en) * | 2018-02-12 | 2018-08-28 | 阿里巴巴集团控股有限公司 | A kind of air control rule generating method and device |
CN108898181A (en) * | 2018-06-29 | 2018-11-27 | 咪咕文化科技有限公司 | A kind of processing method, device and the storage medium of image classification model |
CN109214421A (en) * | 2018-07-27 | 2019-01-15 | 阿里巴巴集团控股有限公司 | A kind of model training method, device and computer equipment |
CN109034080A (en) * | 2018-08-01 | 2018-12-18 | 桂林电子科技大学 | The adaptive face identification method in multi-source domain |
CN109902393A (en) * | 2019-03-01 | 2019-06-18 | 哈尔滨理工大学 | Fault Diagnosis of Roller Bearings under a kind of variable working condition based on further feature and transfer learning |
Also Published As
Publication number | Publication date |
---|---|
CN110738476A (en) | 2020-01-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108563548B (en) | Abnormality detection method and apparatus | |
WO2019095782A1 (en) | Data sample label processing method and apparatus | |
CN110163612B (en) | Payment wind control method and device | |
CN109145025B (en) | Multi-data-source integrated data query method and device and service server | |
CN109255486B (en) | Method and device for optimizing policy configuration | |
CN113793071A (en) | Suspicious group identification method and device | |
CN106355391A (en) | Service processing method and device | |
CN111553488A (en) | Risk recognition model training method and system for user behaviors | |
CN105224343A (en) | A kind of renewal reminding method of application program and device | |
WO2018219285A1 (en) | Data object display method and device | |
CN111079944B (en) | Transfer learning model interpretation realization method and device, electronic equipment and storage medium | |
CN111506580B (en) | Transaction storage method based on centralized block chain type account book | |
CN111475853A (en) | Model training method and system based on distributed data | |
CN106326062A (en) | Method and device for controlling running state of application program | |
CN110852754A (en) | Risk identification method, device and equipment | |
CN111126623A (en) | Model updating method, device and equipment | |
CN110738476B (en) | Sample migration method, device and equipment | |
CN112492535A (en) | Short message sending method and device | |
CN109325015B (en) | Method and device for extracting characteristic field of domain model | |
CN108985831B (en) | Offline transaction distinguishing method and device and computer equipment | |
CN109598478B (en) | Wind measurement result description document generation method and device and electronic equipment | |
CN111798263A (en) | Transaction trend prediction method and device | |
CN110781500A (en) | Data wind control system and method | |
CN110020780A (en) | The method, apparatus and electronic equipment of information output | |
CN109656805B (en) | Method and device for generating code link for business analysis and business server |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |