CN117350366A - Network model construction method and related equipment - Google Patents

Network model construction method and related equipment Download PDF

Info

Publication number
CN117350366A
CN117350366A CN202311341516.8A CN202311341516A CN117350366A CN 117350366 A CN117350366 A CN 117350366A CN 202311341516 A CN202311341516 A CN 202311341516A CN 117350366 A CN117350366 A CN 117350366A
Authority
CN
China
Prior art keywords
model
sample
expert
training
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311341516.8A
Other languages
Chinese (zh)
Inventor
郭清宇
欧阳天雄
何茂亮
李超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202311341516.8A priority Critical patent/CN117350366A/en
Publication of CN117350366A publication Critical patent/CN117350366A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Finance (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Accounting & Taxation (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a network model construction method and related equipment, which can acquire a reference sample required by constructing a network model; sample selection processing is carried out on a plurality of samples in a sample library through a preset selection network model, so that a plurality of target samples which are screened from the sample library and are similar to the reference samples are obtained; sample selection processing is carried out on a plurality of expert models in the expert model library through the preset selection network model, so that a plurality of target expert models, which are similar to the reference sample, of the training samples screened from the expert model library are obtained; learning model information of a plurality of target expert models through knowledge distillation processing to obtain a fused expert model; and based on the reference sample and the target sample, performing network model training on the fused expert model to obtain a trained target network model. The generalization of the network model can be improved, and the network model effect is enhanced.

Description

Network model construction method and related equipment
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method for constructing a network model and related devices.
Background
In actual business, clients have different network model construction requirements, but samples of different scenes are often insufficient, and at present, a method for collecting more labeled samples for modeling or using transfer learning to transfer useful knowledge from samples of other similar scenes to the client scenes is generally adopted to solve the problem. However, the sample collection period of the labels is long, and the real service scene is usually not long enough, so that the real service scene is difficult to achieve; the migration learning usually adopts a mode of directly mixing and training a general model by a plurality of samples of similar scenes, so that the generalization of the network model is low, and the enhancement of the network model effect is not facilitated.
Disclosure of Invention
The embodiment of the application provides a network model construction method and related equipment, wherein the related equipment can comprise a network model construction device, electronic equipment, a computer readable storage medium and a computer program product, and is used for improving generalization of a network model and enhancing network model effects.
The embodiment of the application provides a network model construction method, which comprises the following steps:
acquiring a reference sample required for constructing a network model;
taking the reference sample as a benchmark, and carrying out sample selection processing on a plurality of samples in a sample library through a preset selection network model to obtain a plurality of target samples which are screened from the sample library and are similar to the reference sample;
Taking the reference sample as a benchmark, selecting a network model through the preset, and based on training samples applied by a plurality of expert models in an expert model library during training, carrying out sample selection processing on the plurality of expert models in the expert model library to obtain a plurality of target expert models, which are similar to the reference sample, of the training samples screened from the expert model library;
learning model information of a plurality of target expert models through knowledge distillation processing to obtain a fused expert model;
and based on the reference sample and the target sample, performing network model training on the fused expert model to obtain a trained target network model.
Accordingly, an embodiment of the present application provides a network model building apparatus, including:
the acquisition unit is used for acquiring a reference sample required by constructing the network model;
the first selection unit is used for carrying out sample selection processing on a plurality of samples in a sample library by taking the reference sample as a benchmark and through a preset selection network model to obtain a plurality of target samples which are screened from the sample library and are similar to the reference sample;
the second selection unit is used for selecting a network model through the preset selection by taking the reference sample as a benchmark, and performing sample selection processing on a plurality of expert models in the expert model library based on training samples applied by the plurality of expert models in the expert model library during training to obtain a plurality of target expert models, which are similar to the reference sample, of the training samples screened from the expert model library;
The fusion unit is used for learning the model information of a plurality of target expert models through knowledge distillation processing so as to obtain a fused expert model;
and the enhancement unit is used for carrying out network model training on the fused expert model based on the reference sample and the target sample to obtain a trained target network model.
Optionally, in some embodiments of the present application, the first selecting unit may include a first prediction subunit, a first filtering subunit, and a sampling subunit, as follows:
the first prediction subunit is configured to perform similarity distribution prediction processing on a plurality of samples in a sample library through the preset selection network model, so as to obtain first similarity distribution distances between different samples in the sample library and the reference sample;
the first screening subunit is configured to screen a plurality of candidate samples from different samples in the sample library based on the first similarity distribution distance;
the sampling subunit is configured to downsample the candidate samples to obtain a plurality of target samples similar to the reference sample.
Optionally, in some embodiments of the present application, the sampling subunit may specifically be configured to determine a sample ratio between a black sample and a white sample in each of the sample sets; and downsampling the candidate samples based on the sample proportion to obtain a plurality of target samples similar to the reference samples.
Optionally, in some embodiments of the present application, the second selecting unit may include a first determining subunit, a second predicting subunit, and a second screening subunit, as follows:
the first determining subunit is configured to determine a training sample applied by each expert model in the expert model library during training;
the second prediction subunit is configured to perform similarity distribution prediction processing on a training sample applied by the expert model during training through the preset selection network model, so as to obtain a second similarity distribution distance between the training sample applied by the expert model during training and the reference sample;
the second screening subunit is configured to screen a plurality of target expert models from the expert models in the expert model library based on the second similarity distribution distance.
Optionally, in some embodiments of the present application, the fusion unit may include a third prediction subunit, a distillation subunit, a fourth prediction subunit, a construction subunit, and a training subunit, as follows:
the third prediction subunit is configured to construct a hybrid model prediction score based on prediction results of the multiple target expert models for the reference sample;
The distillation subunit is used for learning model information of a plurality of target expert models based on a preset student model frame so as to obtain a distilled expert model;
the fourth prediction subunit is configured to perform prediction processing on the reference sample through the post-distillation expert model to obtain a distillation model prediction result;
the construction subunit is used for constructing model loss based on the mixed model prediction score, the real label carried by the reference sample and the distillation model prediction result;
and the training subunit is used for training the post-distillation expert model based on the model loss to obtain a post-fusion expert model.
Optionally, in some embodiments of the present application, the third prediction subunit may include a fifth prediction subunit, a second determination subunit, and a first generation subunit, as follows:
the fifth prediction subunit is configured to respectively perform prediction processing on the reference samples through a plurality of target expert models, so as to obtain a target expert model prediction result corresponding to each target expert model;
the second determining subunit is configured to assign a weight for characterizing model distinction to each target expert model based on the numerical value of the target expert model prediction result and the real label carried by the reference sample;
And the first generation subunit is used for carrying out numerical fusion on the target expert model prediction result and the weight corresponding to the target expert model to obtain a mixed model prediction score.
Optionally, in some embodiments of the present application, the building sub-units may include a first building sub-unit, a second building sub-unit, and a second generating sub-unit, as follows:
the first construction subunit is configured to construct a first model loss based on a real tag carried by the reference sample and a difference between the distillation model prediction results;
the second construction subunit is configured to construct a second model loss based on the hybrid model prediction score and a difference between the distillation model prediction results;
and the second generation subunit is used for carrying out loss fusion processing on the first model loss and the second model loss to obtain the model loss.
Optionally, in some embodiments of the present application, the network model building apparatus may further include an extraction subunit, a second extraction subunit, a selection subunit, a determination subunit, a prediction subunit, and a training subunit, as follows:
the first extraction subunit is configured to select a first training sample set from a network training sample library, where the network training sample library includes a plurality of training sample sets;
The second extraction subunit is configured to select a second training sample set from the network training sample library, where the network training sample library includes the first training sample set;
the selecting subunit is configured to select a first training sample from the first training sample set, and select a second training sample from the second training sample set;
the determining subunit is configured to determine a tag value from whether the first training sample set and the second training sample set are from the same training sample set;
the prediction subunit is configured to generate a model prediction value from determining, by initially selecting a network model, whether the first training sample and the second training sample are from the same training sample set;
the training subunit is configured to obtain a preset selection network model from training the initial selection network model based on the tag value and the model prediction value.
The electronic device provided by the embodiment of the application comprises a processor and a memory, wherein the memory stores a plurality of instructions, and the processor loads the instructions to execute the steps in the network model building method provided by the embodiment of the application.
The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps in the network model construction method provided by the embodiment of the application.
In addition, the embodiment of the application further provides a computer program product, which comprises a computer program or instructions, and the computer program or instructions implement the steps in the network model building method provided by the embodiment of the application when being executed by a processor.
The embodiment of the application provides a network model construction method and related equipment, which are used for acquiring a reference sample required by constructing a network model; taking the reference sample as a benchmark, and carrying out sample selection processing on a plurality of samples in a sample library through a preset selection network model to obtain a plurality of target samples which are screened from the sample library and are similar to the reference sample; taking the reference sample as a benchmark, selecting a network model through the preset, and based on training samples applied by a plurality of expert models in an expert model library during training, carrying out sample selection processing on the plurality of expert models in the expert model library to obtain a plurality of target expert models, which are similar to the reference sample, of the training samples screened from the expert model library; learning model information of a plurality of target expert models through knowledge distillation processing to obtain a fused expert model; and based on the reference sample and the target sample, performing network model training on the fused expert model to obtain a trained target network model. According to the method and the device, the model information of the multiple target expert models can be learned, so that the fused expert model can absorb the characteristics of the multiple target expert models, the target samples similar to the reference samples are screened out through the preset selection network model, then the fused expert model is trained by the reference samples and the target samples, the model prediction effect of the target network model is further improved, the target network model can be better adapted to the requirements of the current application, the generalization of the network model is improved, and the network model effect is enhanced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic view of a scenario of a network model construction method provided in an embodiment of the present application;
FIG. 2 is a first flowchart of a network model building method provided in an embodiment of the present application;
FIG. 3 is a second flowchart of a network model building method provided in an embodiment of the present application;
FIG. 4 is a preset selection network training flowchart provided in an embodiment of the present application;
FIG. 5 is a flow chart of the fusion of the target expert model provided by the embodiments of the present application;
FIG. 6 is a flow chart of a model enhancement method provided by an embodiment of the present application;
fig. 7 is an effect diagram of a network model construction method provided in an embodiment of the present application;
FIG. 8 is a third flowchart of a network model building method provided by an embodiment of the present application;
fig. 9 is a schematic structural diagram of a network model building apparatus according to an embodiment of the present application;
Fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
It should be noted that the principles of the present application are illustrated as implemented in a suitable computing environment. The following description is based on illustrated embodiments of the present application and should not be taken as limiting other embodiments not described in detail herein.
In the following description of the present application, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or a different subset of all possible embodiments and can be combined with each other without conflict.
In the following description of the present application, the terms "first", "second", "third" and "third" are merely used to distinguish similar objects from each other, and do not represent a particular ordering of the objects, it being understood that the "first", "second", "third" may be interchanged with a particular order or sequencing, as permitted, to enable embodiments of the present application described herein to be practiced otherwise than as illustrated or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the present application.
Artificial intelligence (Artificial Intelligence, AI) is a theory, method, technique, and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend, and extend human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. Artificial intelligence software technology mainly includes Machine Learning (ML) technology, wherein Deep Learning (DL) is a new research direction in Machine Learning, which is introduced into Machine Learning to make it closer to an original target, i.e., artificial intelligence. At present, deep learning is mainly applied to the fields of machine vision, natural network model construction and the like.
Deep learning is the inherent regularity and presentation hierarchy of learning sample data, and information obtained during such learning processes greatly aids in interpretation of data such as text, image and sound. The network model for realizing different functions can be obtained through training by utilizing the deep learning technology and the corresponding training set. For example, taking a natural network model construction as an example, a question and answer model for question and answer can be trained based on one training set, a viewpoint extraction model for text viewpoint extraction can be trained based on another training set, and the like.
The application relates to the field of machine learning of artificial intelligence technology, and provides a network model construction method, a network model construction device, electronic equipment, a computer readable storage medium and a computer program product. The network model construction method may be executed by a network model construction device or by an electronic apparatus integrated with the network model construction device.
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
Referring to fig. 1, the present application further provides a network model building system, as shown in fig. 1, where the network model building system includes an electronic device, and the network model building apparatus provided in the present application is integrated in the electronic device. For example, the electronic device may obtain a reference sample required to build a network model; taking the reference sample as a benchmark, and carrying out sample selection processing on a plurality of samples in a sample library through a preset selection network model to obtain a plurality of target samples which are screened from the sample library and are similar to the reference sample; taking the reference sample as a benchmark, selecting a network model through the preset, and based on training samples applied by a plurality of expert models in an expert model library during training, carrying out sample selection processing on the plurality of expert models in the expert model library to obtain a plurality of target expert models, which are similar to the reference sample, of the training samples screened from the expert model library; learning model information of a plurality of target expert models through knowledge distillation processing to obtain a fused expert model; and based on the reference sample and the target sample, performing network model training on the fused expert model to obtain a trained target network model.
The electronic device may be any device configured with a processor and having a processing capability, such as a mobile electronic device having a processor, such as a smart phone, a tablet computer, a palm computer, a notebook computer, and a smart speaker, or a stationary electronic device having a processor, such as a desktop computer, a television, a server, and an industrial device.
It should be noted that, the schematic view of the scenario of the network model building system shown in fig. 1 is only an example, and the network model building system and scenario described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided in the embodiments of the present application, and those skilled in the art can know that, with the evolution of the network model building system and the appearance of a new service scenario, the technical solutions provided in the embodiments of the present application are equally applicable to similar technical problems.
The following will describe in detail. The numbers of the following examples are not intended to limit the preferred order of the examples.
Referring to fig. 2, fig. 2 is a schematic flow chart of a network model construction method according to an embodiment of the present application, and as shown in fig. 2, the flow chart of the network model construction method according to the present application is as follows:
201. And obtaining a reference sample required for constructing the network model.
The reference sample is a sample for constructing a network model, and the reference sample carries a real label. However, in the prior art, in actual service, the problem that the effect of the constructed network model is poor and the generalization is not high due to the insufficient reference sample exists.
For example, the present application wishes to build a financial wind-controlled migration network model, and then a reference sample required for training the financial wind-controlled migration network model can be obtained, where the reference sample carries a real tag.
202. And taking the reference sample as a reference, and carrying out sample selection processing on a plurality of samples in a sample library through a preset selection network model to obtain a plurality of target samples which are screened from the sample library and are similar to the reference sample.
The preset selection network model is a network model for adapting similar samples according to the reference samples and adapting expert models. For example, in the embodiment of the present application, the preset selection network model may be a domain-related selector, and the preset selection network may calculate a similarity distribution distance between different samples by learning characteristic information of the samples, where the similarity distribution distance may represent a degree of similarity between the different samples, and then select a required sample from a sample library according to the similarity distribution distance, or select a required network model from a network model library, and so on.
For example, a sample library may be obtained, where the sample library includes a plurality of samples, and then a plurality of target samples similar to the reference sample are screened from the sample library using a preset selection network model.
Optionally, in an embodiment, the step of "taking the reference sample as a reference, performing sample selection processing on a plurality of samples in the sample library by using a preset selection network model to obtain a plurality of target samples similar to the reference sample, where the target samples are screened from the sample library may include:
performing similarity distribution prediction processing on a plurality of samples in a sample library through a preset selection network model to obtain first similarity distribution distances between different samples in the sample library and a reference sample;
screening a plurality of candidate samples from different samples in a sample library based on the first similarity distribution distance;
and downsampling the candidate samples to obtain a plurality of target samples similar to the reference samples.
The similarity distribution distance may be a bulldozer distance (wasperstein distance), which measures the minimum of the average distance that needs to be moved when moving data from one distribution to another, and may be used to characterize the similarity between the two distributions. For example, in the embodiment of the present application, a preset selection network model may be applied to obtain the similarity distribution distance between different samples.
Since the similarity distribution distance can measure the similarity between two distributions, the similarity between different input samples can be calculated by using the similarity distribution distance, and the closer the similarity distribution distance is to 0, the more similar the two batches of samples are; the less the similarity distribution distance approaches 0, the greater the difference between the two batches of samples. The similarity distribution distance has the advantage that no matter how far the two batches of data are distributed, the gradient disappearance problem can not occur.
For example, the preset selection network model in the embodiment of the present application may be a sample filter based on bulldozer distance, and different samples in the reference sample and the sample library may be input into the preset selection network model, and according to the preset selection network model, a first similarity distribution distance between the reference sample and the different samples in the sample library is predicted, where a formula for calculating the first similarity distribution distance may be as follows:
wherein P and Q are respectively different samples, W 1 (P, Q) represents the first similarity distribution distance. The first similarity distribution distance between P and Q may be expressed as a desired difference value after the different samples are mapped by f (x), respectively, and then, a plurality of candidate samples may be screened out from the different samples of the sample library according to the first similarity distribution distance, and downsampling may be performed on the candidate samples to obtain a plurality of target samples similar to the reference sample.
Optionally, in an embodiment, the step of downsampling the candidate samples to obtain a plurality of target samples similar to the reference sample may include:
determining a sample ratio between a black sample and a white sample in each sample set;
and downsampling the candidate samples based on the sample proportion to obtain a plurality of target samples similar to the reference sample.
The candidate samples comprise a plurality of sample sets, and the sample sets comprise a plurality of black samples and a plurality of white samples. The black sample is a sample marked with problems; white samples are standard correct samples.
For example, since the selected candidate samples may include a plurality of customer groups, that is, sample sets, the proportion between the black samples and the white samples in the sample sets is not necessarily the same, and the number of samples is not necessarily the same, in order to ensure the effect of enhancing the samples, so that the network model output result is not biased to a certain sample set, a dynamic sampling strategy of down-sampling the white samples may be adopted, that is, if the proportion of the black and white samples in each sample set is detected to be less than 1:10 on the premise that the number of samples is kept to be approximately the same, 50% down-sampling can be performed on the white samples in the sample set until the proportion of the black and white samples is greater than 1:10, so as to obtain a plurality of target samples similar to the reference samples; if the ratio of the black and white samples in the sample set is detected to be not less than 1:10, 50% downsampling can be directly carried out on the samples in the sample set, so that a plurality of target samples similar to the reference samples are obtained. By the strategy, the total amount of samples in each sample set is approximately the same, and the proportion of black and white samples is greater than 1:10.
Optionally, in an embodiment, before the step of performing sample selection processing on the plurality of samples in the sample library by using the reference sample as a reference and through a preset selection network model to obtain the plurality of target samples similar to the reference sample screened from the sample library, the method further includes:
selecting a first training sample set from a network training sample library, wherein the network training sample library comprises a plurality of training sample sets;
selecting a second training sample set from the network training sample library, wherein the network training sample library comprises the first training sample set;
selecting a first training sample from the first training sample set, and selecting a second training sample from the second training sample set;
determining a label value according to whether the first training sample set and the second training sample set are from the same training sample set;
determining whether the first training sample and the second training sample are from the same training sample set through initial selection of a network model, and generating a model predicted value;
training the initial selection network model based on the label value and the model predictive value to obtain a preset selection network model.
The pre-training process of the preset selection network model is shown in fig. 4. The network training sample library comprises a plurality of training sample sets, and the embodiment of the application can select a first training sample set from the plurality of training sample sets in the network training sample library, then put the selected first training sample set back into the network training sample library, and select a second training sample set from the plurality of training sample sets in the network training sample library, wherein the first training sample set and the second training sample set are one training batch.
For this initial selection network model, its loss function may be set as: if the first training sample set and the second training sample set are from the same training sample set in the network training sample library, a label value of 0 can be given; if the first set of training samples and the second set of training samples are from different sets of training samples in the network training sample library, a label value of 1 may be assigned.
Then, as shown in fig. 4, a first training sample may be selected from the first training sample set, and a second training sample may be selected from the second training sample set. And extracting the characteristics of the first training sample and the second training sample through the initial selection network model, and carrying out similarity distribution calculation on the extracted characteristics through the initial selection network model so as to judge whether the first training sample and the second training sample come from the same training sample set or not, so that a model predicted value is generated. And training an initial selected network model according to the given label value and the model predicted value, repeatedly extracting a first training sample set and a second training sample set from a network training sample library, and repeatedly training the network model until the network converges to obtain a preset selected network model.
203. And taking the reference sample as a benchmark, selecting a network model through a preset, and based on training samples applied by a plurality of expert models in the expert model library during training, performing sample selection processing on the plurality of expert models in the expert model library to obtain a plurality of target expert models, wherein the training samples are screened from the expert model library and are similar to the reference sample.
Wherein the training sample is a sample used when training the expert model.
For example, an expert model library may be obtained, which includes a plurality of expert models, and then, using a preset selection network model, the degree of similarity between a training sample and a reference sample applied to the expert model during training is detected, and then, a target expert model is selected from the plurality of expert models in the expert model library.
Optionally, in an embodiment, the step of "selecting, by using the reference sample as a reference, the network model through a preset selection, and performing sample selection processing on the multiple expert models in the expert model library based on training samples applied by the multiple expert models in the expert model library during training, to obtain multiple target expert models, where the training samples screened from the expert model library are similar to the reference sample, may include:
Determining a training sample applied by each expert model in the expert model library during training;
performing similarity distribution prediction processing on a training sample applied by the expert model in training through a preset selection network model to obtain a second similarity distribution distance between the training sample applied by the expert model in training and a reference sample;
and screening a plurality of target expert models from the expert models in the expert model library based on the second similarity distribution distance.
For example, the preset selection network model in the embodiment of the present application may be a sample filter based on a bulldozer distance, when a target expert model is screened, a training sample used in training of the expert model may be used to represent model distribution of the expert model, a reference sample and a training sample applied by each expert model in training may be input into the preset selection network model, and according to the preset selection network model, a second similarity distribution distance between the reference sample and the training sample applied by each expert model in training may be predicted, where a formula for calculating the second similarity distribution distance may be as follows:
wherein P and Q are respectively different samples, W 2 (P, Q) represents the second similarity distribution distance. The second similarity distribution distance between P and Q may be expressed as a desired difference value after the different samples are mapped by f (x), respectively, and then, a plurality of training samples having the second similarity distribution distance closest to 0 may be selected, and an expert model trained by applying the plurality of training samples having the second similarity distribution distance closest to 0 may be taken as the target expert model.
204. And learning model information of a plurality of target expert models through knowledge distillation processing to obtain a fused expert model.
Knowledge distillation (Knowledgedistillation) is a machine learning technique used to transfer knowledge of a complex model into a simpler model. This process involves training a simpler model (commonly referred to as a "student model") using predictions generated by a complex model (commonly referred to as a "teacher model") as a target. In this way, the student model can learn more knowledge and patterns from the teacher model.
For example, in the embodiment of the present application, a plurality of target expert models may be used as teacher models, a student model frame may be preset, and then, by means of knowledge distillation, the preset student model frame may learn model information of the plurality of target expert models, thereby having the capability of the plurality of target expert models.
Optionally, in an embodiment, the step of learning model information of the plurality of target expert models through knowledge distillation processing to obtain the fused expert model may include:
constructing a mixed model prediction score based on the prediction results of the multiple target expert models for the reference samples;
based on a preset student model framework, learning model information of a plurality of target expert models to obtain a distilled expert model;
predicting the reference sample through a post-distillation expert model to obtain a distillation model prediction result;
constructing model loss based on the mixed model prediction score, the real label carried by the reference sample and the distillation model prediction result;
and training the post-distillation expert model based on the model loss to obtain a post-fusion expert model.
For example, as shown in fig. 5, the reference samples are respectively input into a plurality of target expert models to obtain target expert model predicted values, then the target expert model predicted values and the reference sample real labels enter a weight generator together, the weight generator generates corresponding weights of each target expert model, the generated target expert model weights and the corresponding target expert model predicted values are multiplied and then added to obtain the mixed model predicted score.
And then carrying out knowledge distillation treatment on a plurality of target expert models in a distillation learning mode to obtain a distilled expert model for a subsequent sample and model mixing enhancement module. Then, the reference sample can be input into a post-distillation expert model to obtain a distillation model prediction result. And constructing model loss based on the mixed model prediction score, the real label carried by the reference sample and the distillation model prediction result, and finally training the post-distillation expert model based on the model loss to obtain the fused expert model.
Optionally, in an embodiment, the step of "constructing a hybrid model prediction score based on the prediction results of the plurality of target expert models for the reference samples" may include:
respectively carrying out prediction processing on the reference sample through a plurality of target expert models to obtain a target expert model prediction result corresponding to each target expert model;
giving a weight for characterizing model distinction to each target expert model based on the numerical value of the target expert model prediction result and the real label carried by the reference sample;
and carrying out numerical fusion on the target expert model prediction result and the weight corresponding to the target expert model to obtain a mixed model prediction score.
The weight corresponding to each target expert model is determined by the size of a KS index value of the target expert model, wherein the KS index (Kolmogorov-Smirnov statistic) is commonly used for evaluating the distinguishing degree of the model in a wind control scene, and the larger the KS index value is, the larger the distinguishing degree of the model is, and the stronger the risk ordering capability of the model is. For example, in the embodiment of the present application, the larger the KS index value of the target expert model, the stronger the classification of the target expert model on the guest group of the reference sample, and the stronger the risk ranking capability, so the weight occupied by the target expert model should be larger.
For example, as shown in fig. 5, the reference samples are respectively input into a plurality of target expert models to obtain target expert model predicted values, then a weight generator is used to assign a weight for characterizing the model discrimination to each target expert model according to the numerical value of the target expert model predicted result and the real label carried by the reference samples, then the generated target expert model weight and the corresponding target expert model predicted values are multiplied and added to obtain the hybrid model predicted score. The calculation formula of the predictive score of the hybrid model can be as follows:
Wherein,representing the prediction score of the hybrid model,/->And representing the prediction result of the target expert model, and y represents the real label value of the reference sample.
In one embodiment, the weights herein may be determined by the size of the KS index value of the target expert model, i.e., the larger the KS value, the stronger the distinction of the expert model across the customer base, and the correspondingly larger the weights.
Optionally, in an embodiment, the step of "constructing a model loss based on the mixed model prediction score, the real label carried by the reference sample, and the distillation model prediction result" may include:
constructing a first model loss based on the difference between the real label carried by the reference sample and the distillation model prediction result;
constructing a second model loss based on the hybrid model predictive score and the differences between the distillation model predictive results;
and carrying out loss fusion processing on the first model loss and the second model loss to obtain model loss.
For example, the model penalty for distillation learning may include a first model penalty and a second model penalty, wherein the first model penalty is a penalty between the real label carried by the reference sample and the distillation model prediction result of the post-distillation expert model, and the second model penalty is a penalty between the hybrid model prediction score and the distillation model prediction result of the post-distillation expert model. The formula for calculating model loss may be as follows:
Where loss represents model loss, L1 (yd≡y) represents first model loss,representing the second model loss, a and b representing two hyper-parameters, yd A representing the post-distillation expert model predictive score, y representing the reference sample true tag value, < >>Representing the predictive score of the hybrid model.
In one embodiment, the final model loss may be generated by combining the two hyper-parameters a and b. Wherein a and b may be generated by gaussian boosting or grid searching to determine the optimal combined value.
205. And training the network model of the fused expert model based on the reference sample and the target sample to obtain a trained target network model.
For example, in the embodiment of the present application, the post-fusion expert model may be trained by keeping the tree structure of the post-fusion expert model unchanged, and then using the reference sample and the plurality of target samples to obtain the trained target network model, which has the advantages of not only keeping the capability and generalization of the post-fusion expert model, but also increasing the fraud recognition capability and generalization of the target network model to the client sample.
For example, GBDT (gradient lifted tree) model structures are often used because of the basic partial structured data of the features used in the financial wind control scenarios of practical applications. As shown in fig. 6, the network model training is performed on the fused expert model by using the reference sample and the target sample, so as to obtain a trained target network model, so that the model structure of the target network model is the same as that of the fused expert model, and the information gain caused by the reference sample and the target sample is remained in the target network model diagram. The strategy can keep the capability and generalization of the expert model after fusion, and increase the fraud recognition capability and generalization of the target network model to the client sample, thereby training a more powerful network model.
In an embodiment, the network enhancement method disclosed in the technical solution of the present application may also be applied to other algorithm models such as a neural network, for example, the neural network may use custom samples and similar enhancement samples to perform fine-tune training on the mixed private base model.
In addition, from the application point of view, the scheme is not limited to the credit anti-fraud scene applied in the participation of multi-bank or internet financial enterprises, and can be applied to various scenes with migration learning requirements of electronic commerce, games and the like. From the model perspective, the model architecture algorithm used in rejecting the inferred modeling can flexibly change according to the actual application scene, and can also be adapted to the neural network model.
At the product use level, the scheme can be widely applied to the wind control modeling of any company and institution with the group migration scene problem, such as different banks or credit institutions. The method provided by the scheme can effectively solve the problem that model parameters are estimated to be biased due to small sample reasons, improves the performance and generalization capability of a fraud identification model, and improves the use experience of products.
According to the self-adaptive source domain sample and base expert model financial wind control migration scheme provided by the technical scheme, the fraud recognition capability and model generalization of the network model to a client scene can be effectively enhanced, and the network model construction method disclosed by the technical scheme is more excellent than a traditional hybrid modeling method in terms of model effect and generalization.
For example, as shown in fig. 7, an effect comparison between the network model construction method disclosed by the technical scheme of the application and the traditional hybrid modeling method is shown, and method verification is performed on 29 client groups respectively. As can be seen from fig. 7, compared with the conventional hybrid modeling method, the network model construction method disclosed in the technical scheme of the present application performs knowledge distillation on a plurality of target expert models, and the fusion is performed to form a fused expert model, so that the fused expert model has the advantages and capabilities of a plurality of target experts; meanwhile, on the basis of the fused expert model tree model structure, training is performed by utilizing a reference sample and a plurality of target samples, the number of trees of the tree model structure is increased, and the capability of the target network model is improved, so that the degree of fitting (difference between a training set and a testing set), the KS value of the testing set and the KS value of the verification set are improved to different degrees in 3 evaluation dimensions.
Wherein, as shown in fig. 7, the average trained fitting degree is reduced from 4.89% to 3.13%, and 35.94% is improved; test set KS values were raised by 0.55 points (absolute value) and validation set KS values were raised by 2.82 points (absolute value). The network model construction method disclosed by the technical scheme of the application is superior to the traditional mixed modeling method.
Compared with the technical scheme of the application, the traditional hybrid modeling method mainly has the following defects:
firstly, in the traditional method, similar scene samples are often selected through expert experience, so that the selected samples are not necessarily accurate; meanwhile, the sample quantity and the black-and-white sample ratio of each similar sample are different, and the modeling effect is poor due to direct fusion;
secondly, in the traditional method, from the point of view of only a labeled sample, similar expert models exist in the service in practical application, and information contained in the expert models can be utilized.
Finally, when the traditional method is faced with the problem that modeling of a customer is needed but samples are insufficient, the method of collecting more labeled samples for modeling or migrating useful knowledge from samples of other similar scenes to the customer scene by using migration learning is often adopted, but the labeled samples are long in collection period (the customer expression period is longer), the real business scene is usually not long enough, and is difficult to achieve, and a universal model is usually trained by directly mixing samples of a plurality of similar scenes, so that the problems of insufficient samples and insufficient generalization are solved, but the obtained actual model is relatively poor in effect and low in generalization.
As shown in fig. 8, the innovation points of the technical scheme of the application are as follows:
firstly, the technical scheme designs a preset selection network model, wherein the preset selection network model can be used for screening target samples similar to a reference sample from a sample library by learning characteristic information of the reference sample, and screening target expert models similar to the reference sample from an expert model library by using the training samples.
Secondly, the technical scheme of the method enables the fused expert model to learn model characteristics of a plurality of target expert models through a knowledge distillation method, and the method can fuse similar target expert models into a single fused expert model, so that the effect of the network model is greatly improved. In addition, the technical scheme of the application also designs a new sample sampling mode, and the sampling mode can perform 50% downsampling on the white samples in the sample set until the proportion of the black and white samples is greater than 1:10 on the premise that each sample set keeps approximately the same sample number, so as to obtain a plurality of target samples similar to the reference samples if the proportion of the black and white samples in the sample set is detected to be less than 1:10; if the ratio of the black and white samples in the sample set is detected to be not less than 1:10, 50% downsampling can be directly carried out on the samples in the sample set, so that a plurality of target samples similar to the reference samples are obtained. The sampling mode can ensure that the total sample amount in each sample set is approximately the same, and the proportion of black and white samples is greater than 1:10, so that the model cannot be more biased towards the guest group with large sample amount in the modeling process.
Finally, the technical scheme of the application also designs a training mode based on the mixing of the tree model and the sample, in the training process of the model, the tree structure of the expert model after fusion is kept unchanged, the expert model after fusion is trained by utilizing a plurality of target samples and reference samples, so that the number of the tree model structure is increased, the target network model can be better adapted to the current requirement by the mode, the effect is better, and the generalization is better.
The technical scheme can effectively solve the problems of poor model effect and generalization capability caused by biased modeling samples in the credit wind control field with the customer group migration modeling requirement. The method mainly solves the problems of poor model effect and low generalization of custom modeling caused by insufficient samples of different scenes of customers in financial wind control business. According to the technical scheme, a preset selection network is designed to process self-adaptive matching of similar samples and expert models, then a plurality of expert models are fused into a fused expert model by adopting distillation learning, meanwhile, after candidate samples are dynamically sampled, a target sample is obtained, the fused expert model and the target sample are subjected to reinforcement learning, and finally, a financial wind control model with better effect and better generalization is obtained.
As can be seen from the above, the present embodiment can obtain a reference sample required for constructing a network model; taking the reference sample as a benchmark, and carrying out sample selection processing on a plurality of samples in a sample library through a preset selection network model to obtain a plurality of target samples which are screened from the sample library and are similar to the reference sample; taking the reference sample as a benchmark, selecting a network model through the preset, and based on training samples applied by a plurality of expert models in an expert model library during training, carrying out sample selection processing on the plurality of expert models in the expert model library to obtain a plurality of target expert models, which are similar to the reference sample, of the training samples screened from the expert model library; learning model information of a plurality of target expert models through knowledge distillation processing to obtain a fused expert model; and based on the reference sample and the target sample, performing network model training on the fused expert model to obtain a trained target network model. According to the method and the device, the model information of the multiple target expert models can be learned, so that the fused expert model can absorb the characteristics of the multiple target expert models, the target samples similar to the reference samples are screened out through the preset selection network model, then the fused expert model is trained by the reference samples and the target samples, the model prediction effect of the target network model is further improved, the target network model can be better adapted to the requirements of the current application, the generalization of the network model is improved, and the network model effect is enhanced.
The method according to the previous embodiment will be described in further detail below with the specific integration of the network model building apparatus in an electronic device. The embodiment of the application provides a network model construction method, as shown in fig. 3, the specific flow of the network model construction method may be as follows:
301. the electronic device selects a first training sample set and a second training sample set from the network training sample library.
For example, as shown in fig. 4, the pre-training of the pre-selected network model may select a first training sample set from a plurality of training sample sets in a network training sample library, where the network training sample library includes a plurality of training sample sets, then replace the selected first training sample set in the network training sample library, and then select a second training sample set from the plurality of training sample sets in the network training sample library, where the first training sample set and the second training sample set are a training batch.
302. The electronic device determines a tag value based on whether the first training sample set and the second training sample set are from the same training sample set.
For example, if the first training sample set and the second training sample set are from the same training sample set in the network training sample library, a label value of 0 may be assigned; if the first set of training samples and the second set of training samples are from different sets of training samples in the network training sample library, a label value of 1 may be assigned.
303. The electronic device selects a first training sample from the first training sample set and selects a second training sample from the second training sample set.
304. The electronic equipment judges whether the first training sample and the second training sample are from the same training sample set or not through the initial selection network model, trains the initial selection network model according to the label value, and obtains a preset selection network model.
For example, whether the first training sample and the second training sample are from the same training sample set is judged through the initial selection network model, a model predicted value is generated, then the initial selection network model is trained based on the label value and the model predicted value, then the first training sample set and the second training sample set are repeatedly extracted from the network training sample library, and training of the network model is repeatedly performed until the network converges, so that a preset selection network model is obtained.
305. The electronic device obtains a reference sample required to build the network model.
306. And the electronic equipment screens out a plurality of target samples similar to the reference sample from the sample library through a preset selection network model.
For example, the preset selection network model in the embodiment of the present application may be a pre-trained sample filter based on bulldozer distance, and the electronic device may learn the characteristic information of the reference sample itself through the preset selection network model, and calculate a first similarity distribution distance between different samples in the sample library and the reference sample, where the first similarity distribution distance may be expressed as a desired difference value after the different samples pass through the mapping respectively.
The formula for calculating the first similarity distribution distance may be as follows:
wherein P and Q are respectively different samples in the sample library, W 1 (P, Q) represents the first similarity distribution distance.
And then screening a plurality of candidate samples from different samples in the sample library according to the first similarity distribution distance, wherein the candidate samples comprise a plurality of sample sets, and the sample sets comprise a plurality of black samples and a plurality of white samples. The black sample is a sample marked with problems; white samples are standard correct samples.
Because the selected candidate samples may include a plurality of customer groups, that is, sample sets, the proportion between black samples and white samples in the sample sets is not necessarily the same, and the number of samples is not necessarily the same, in order to ensure the effect of sample enhancement, so that the network model output result is not biased to a certain sample set, a dynamic sampling strategy of white sample downsampling may be adopted, that is, if the proportion of black and white samples in each sample set is detected to be less than 1:10 on the premise that the number of samples is kept to be approximately the same, 50% downsampling may be performed on the white samples in the sample set until the proportion of black and white samples is greater than 1:10, so as to obtain a plurality of target samples similar to the customer modeling samples; if the ratio of black and white samples in the sample set is detected to be not less than 1:10, 50% downsampling can be directly carried out on the samples in the sample set, and a plurality of target samples similar to the customer modeling samples are obtained. By the strategy, the total amount of samples in each sample set is approximately the same, and the proportion of black and white samples is greater than 1:10.
307. And the electronic equipment screens out a plurality of target expert models from the expert model library through a preset selection network model.
For example, the preset selection network model in the embodiment of the present application may be a pre-trained sample filter based on bulldozer distance, when the electronic device screens a target expert model, a training sample used in training the expert model may be used to represent model distribution of the expert model, a reference sample and a training sample applied by each expert model in training may be input into the preset selection network model, and according to the preset selection network model, a second similarity distribution distance between a customer modeling sample and a training sample applied by each expert model in training may be predicted, where a formula for calculating the second similarity distribution distance may be as follows:
wherein P and Q are respectively different samples, W 2 (P, Q) represents the second similarity distribution distance.
Then, a plurality of training samples having a second similarity distribution distance closest to 0 may be selected, and an expert model trained by applying the plurality of training samples having the second similarity distribution distance closest to 0 may be used as the target expert model.
308. The electronic device fuses the plurality of target expert models into a fused expert model based on knowledge distillation processing.
For example, in the embodiment of the present application, a plurality of target expert models may be used as teacher models, a student model frame may be preset, and then, by means of knowledge distillation, the preset student model frame may learn model information of the plurality of target expert models, thereby having the capability of the plurality of target expert models.
As shown in fig. 5, reference samples are respectively input into a plurality of target expert models to obtain target expert model predicted values, then weights for characterizing model discrimination are given to each target expert model according to the numerical value of the target expert model predicted result and the real labels carried by the reference samples through a weight generator, and then the generated target expert model weights and the corresponding target expert model predicted values are multiplied and added to obtain a mixed model predicted score. The calculation formula of the predictive score of the hybrid model can be as follows:
wherein,representing the prediction score of the hybrid model,/->And representing the prediction result of the target expert model, and y represents the real label value of the reference sample.
And then carrying out knowledge distillation treatment on a plurality of target expert models in a distillation learning mode to obtain a distilled expert model for a subsequent sample and model mixing enhancement module. Then, the reference sample can be input into a post-distillation expert model to obtain a distillation model prediction result. And constructing model loss based on the mixed model prediction score, the real label carried by the reference sample and the distillation model prediction result, and finally training the post-distillation expert model based on the model loss to obtain the fused expert model.
For example, the model penalty for distillation learning may include a first model penalty and a second model penalty, wherein the first model penalty is a penalty between the real label carried by the reference sample and the distillation model prediction result of the post-distillation expert model, and the second model penalty is a penalty between the hybrid model prediction score and the distillation model prediction result of the post-distillation expert model. The formula for calculating model loss may be as follows:
where loss represents the loss of the model,representing the first model loss,/->Representing the second model loss, a and b representing two hyper-parameters, yd A representing the post-distillation expert model predictive score, y representing the reference sample true tag value, < >>Representing the predictive score of the hybrid model.
309. And the electronic equipment carries out network model training on the fused expert model through the reference sample and the target sample to obtain a target network model.
For example, the network model training is performed on the fused expert model through the reference sample and the target sample to obtain a trained target network model, so that the model structure of the target network model is identical to that of the fused expert model, and the information gain caused by the reference sample and the target sample can be kept in the target network model diagram. The strategy can keep the capability and generalization of the expert model after fusion, and increase the fraud recognition capability and generalization of the target network model to the client sample, thereby training a more powerful network model.
As can be seen from the above, in this embodiment, the electronic device may select the first training sample set and the second training sample set from the network training sample library; determining a label value according to whether the first training sample set and the second training sample set are from the same training sample set; selecting a first training sample from the first training sample set, and selecting a second training sample from the second training sample set; judging whether the first training sample and the second training sample are from the same training sample set or not through an initial selection network model, and training the initial selection network model according to the label value to obtain a preset selection network model; acquiring a reference sample required for constructing a network model; screening a plurality of target samples similar to the reference sample from a sample library through a preset selection network model; screening a plurality of target expert models from an expert model library through a preset selection network model; based on knowledge distillation treatment, fusing a plurality of target expert models into a fused expert model; and performing network model training on the fused expert model through the reference sample and the target sample to obtain a target network model. According to the method and the device, the model information of the multiple target expert models can be learned, so that the fused expert model can absorb the characteristics of the multiple target expert models, the target samples similar to the reference samples are screened out through the preset selection network model, then the fused expert model is trained by the reference samples and the target samples, the model prediction effect of the target network model is further improved, the target network model can be better adapted to the requirements of the current application, the generalization of the network model is improved, and the network model effect is enhanced.
In order to better implement the above method, the embodiment of the present application further provides a network model building device, as shown in fig. 9, which may include an acquisition unit 901, a first selection unit 902, a second selection unit 903, a fusion unit 904, and an enhancement unit 905, as follows:
(1) An acquisition unit 901;
and the acquisition unit is used for acquiring a reference sample required by constructing the network model.
(2) A first selection unit 902;
the first selection unit is used for carrying out sample selection processing on a plurality of samples in a sample library by taking the reference sample as a benchmark and through a preset selection network model to obtain a plurality of target samples which are screened from the sample library and are similar to the reference sample.
Optionally, in some embodiments of the present application, the first selecting unit may include a first prediction subunit, a first filtering subunit, and a sampling subunit, as follows:
the first prediction subunit is configured to perform similarity distribution prediction processing on a plurality of samples in a sample library through the preset selection network model, so as to obtain first similarity distribution distances between different samples in the sample library and the reference sample;
The first screening subunit is configured to screen a plurality of candidate samples from different samples in the sample library based on the first similarity distribution distance;
the sampling subunit is configured to downsample the candidate samples to obtain a plurality of target samples similar to the reference sample.
Optionally, in some embodiments of the present application, the sampling subunit may specifically be configured to determine a sample ratio between a black sample and a white sample in each of the sample sets; and downsampling the candidate samples based on the sample proportion to obtain a plurality of target samples similar to the reference samples.
Optionally, in some embodiments of the present application, the second selecting unit may include a first determining subunit, a second predicting subunit, and a second screening subunit, as follows:
the first determining subunit is configured to determine a training sample applied by each expert model in the expert model library during training;
the second prediction subunit is configured to perform similarity distribution prediction processing on a training sample applied by the expert model during training through the preset selection network model, so as to obtain a second similarity distribution distance between the training sample applied by the expert model during training and the reference sample;
The second screening subunit is configured to screen a plurality of target expert models from the expert models in the expert model library based on the second similarity distribution distance.
(3) A second selecting unit 903;
and the second selection unit is used for taking the reference sample as a benchmark, selecting a network model through the preset, and carrying out sample selection processing on a plurality of expert models in the expert model library based on training samples applied by the plurality of expert models in the expert model library during training to obtain a plurality of target expert models, which are similar to the reference sample, of the training samples screened from the expert model library.
Optionally, in some embodiments of the present application, the second selecting unit may include a first determining subunit, a second predicting subunit, and a second screening subunit, as follows:
the first determining subunit is configured to determine a training sample applied by each expert model in the expert model library during training;
the second prediction subunit is configured to perform similarity distribution prediction processing on a training sample applied by the expert model during training through the preset selection network model, so as to obtain a second similarity distribution distance between the training sample applied by the expert model during training and the reference sample;
The second screening subunit is configured to screen a plurality of target expert models from the expert models in the expert model library based on the second similarity distribution distance.
Optionally, in some embodiments of the present application, the fusion unit may include a third prediction subunit, a distillation subunit, a fourth prediction subunit, a construction subunit, and a training subunit, as follows:
the third prediction subunit is configured to construct a hybrid model prediction score based on prediction results of the multiple target expert models for the reference sample.
(4) A fusion unit 904;
and the fusion unit is used for learning the model information of the plurality of target expert models through knowledge distillation processing so as to obtain a fused expert model.
Optionally, in some embodiments of the present application, the fusion unit may include a third prediction subunit, a distillation subunit, a fourth prediction subunit, a construction subunit, and a training subunit, as follows:
the third prediction subunit is configured to construct a hybrid model prediction score based on prediction results of the multiple target expert models for the reference sample;
the distillation subunit is used for carrying out knowledge distillation treatment on the plurality of target expert models to obtain a distilled expert model;
The fourth prediction subunit is configured to input the reference sample into the post-distillation expert model to obtain a distillation model prediction result;
the construction subunit is used for constructing model loss based on the mixed model prediction score, the real label carried by the reference sample and the distillation model prediction result;
and the training subunit is used for training the post-distillation expert model based on the model loss to obtain a post-fusion expert model.
Optionally, in some embodiments of the present application, the third prediction subunit may include a fifth prediction subunit, a second determination subunit, and a first generation subunit, as follows:
the fifth prediction subunit is configured to respectively perform prediction processing on the reference samples through a plurality of target expert models, so as to obtain a target expert model prediction result corresponding to each target expert model;
the second determining subunit is configured to assign a weight for characterizing model distinction to each target expert model based on the numerical value of the target expert model prediction result and the real label carried by the reference sample;
and the first generation subunit is used for carrying out numerical fusion on the target expert model prediction result and the weight corresponding to the target expert model to obtain a mixed model prediction score.
Optionally, in some embodiments of the present application, the building sub-units may include a first building sub-unit, a second building sub-unit, and a second generating sub-unit, as follows:
the first construction subunit is configured to construct a first model loss based on a real tag carried by the reference sample and a difference between the distillation model prediction results;
the second construction subunit is configured to construct a second model loss based on the hybrid model prediction score and a difference between the distillation model prediction results;
and the second generation subunit is used for carrying out loss fusion processing on the first model loss and the second model loss to obtain the model loss.
Optionally, in some embodiments of the present application, the network model building apparatus may further include an extraction subunit, a second extraction subunit, a selection subunit, a determination subunit, a prediction subunit, and a training subunit, as follows:
the first extraction subunit is configured to select a first training sample set from a network training sample library, where the network training sample library includes a plurality of training sample sets;
the second extraction subunit is configured to select a second training sample set from the network training sample library, where the network training sample library includes the first training sample set
The selecting subunit is configured to select a first training sample from the first training sample set, and select a second training sample from the second training sample set;
the determining subunit is configured to determine a tag value from whether the first training sample set and the second training sample set are from the same training sample set;
the prediction subunit is configured to generate a model prediction value from determining, by initially selecting a network model, whether the first training sample and the second training sample are from the same training sample set;
the training subunit is configured to obtain a preset selection network model from training the initial selection network model based on the tag value and the model prediction value.
(5) An enhancement unit 905;
and the enhancement unit is used for carrying out network model training on the fused expert model based on the reference sample and the target sample to obtain a trained target network model.
The electronic device provided by the embodiment of the application comprises a processor and a memory, wherein the memory stores a plurality of instructions, and the processor loads the instructions to execute the steps in the network model building method provided by the embodiment of the application.
As can be seen from the above, in this embodiment, the acquiring unit 901 may acquire a reference sample required for constructing a network model; through a first selection unit 902, taking the reference sample as a benchmark, and through a preset selection network model, performing sample selection processing on a plurality of samples in a sample library to obtain a plurality of target samples which are screened from the sample library and are similar to the reference sample; through a second selection unit 903, taking the reference sample as a benchmark, through the preset selection network model, based on training samples applied by a plurality of expert models in an expert model library during training, performing sample selection processing on the plurality of expert models in the expert model library to obtain a plurality of target expert models, which are similar to the reference sample, of the training samples screened from the expert model library; learning model information of a plurality of target expert models through knowledge distillation processing by a fusion unit 904 to obtain a fused expert model; and through an enhancement unit 905, performing network model training on the fused expert model based on the reference sample and the target sample, so as to obtain a trained target network model. According to the method and the device, the model information of the multiple target expert models can be learned, so that the fused expert model can absorb the characteristics of the multiple target expert models, the target samples similar to the reference samples are screened out through the preset selection network model, then the fused expert model is trained by the reference samples and the target samples, the model prediction effect of the target network model is further improved, the target network model can be better adapted to the requirements of the current application, the generalization of the network model is improved, and the network model effect is enhanced.
The embodiment of the application further provides an electronic device, as shown in fig. 10, which shows a schematic structural diagram of the electronic device according to the embodiment of the application, where the electronic device may be a terminal or a server, specifically:
the electronic device may include one or more processing cores 'processors 1001, one or more computer-readable storage media's memory 1002, a power supply 1003, and an input unit 1004, among other components. It will be appreciated by those skilled in the art that the electronic device structure shown in fig. 10 is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Wherein:
the processor 1001 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 1002, and by data stored in the memory 1002. Optionally, the processor 1001 may include one or more processing cores; preferably, the processor 1001 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, an application program, and the like, and the modem processor mainly processes wireless communication. It will be appreciated that the modem processor described above may not be integrated into the processor 1001.
The memory 1002 may be used to store software programs and modules, and the processor 1001 executes various functional applications and data processing by executing the software programs and modules stored in the memory 1002. The memory 1002 may mainly include a storage program area that may store an operating system, application programs required for at least one function (such as a sound playing function, an image playing function, etc.), and a storage data area; the storage data area may store data created according to the use of the electronic device, etc. In addition, memory 1002 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 1002 may also include a memory controller to provide the processor 1001 with access to the memory 1002.
The electronic device further comprises a power supply 1003 for powering the various components, preferably the power supply 1003 is logically connected to the processor 1001 by a power management system, whereby charging, discharging, and power consumption management functions are performed by the power management system. The power supply 1003 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
The electronic device may also include an input unit 1004, which input unit 1004 may be used to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
Although not shown, the electronic device may further include a display unit or the like, which is not described herein. In particular, in this embodiment, the processor 1001 in the electronic device loads executable files corresponding to the processes of one or more application programs into the memory 1002 according to the following instructions, and the processor 1001 executes the application programs stored in the memory 1002, so as to implement various functions as follows:
acquiring a reference sample required for constructing a network model; taking the reference sample as a benchmark, and carrying out sample selection processing on a plurality of samples in a sample library through a preset selection network model to obtain a plurality of target samples which are screened from the sample library and are similar to the reference sample; taking the reference sample as a benchmark, selecting a network model through the preset, and based on training samples applied by a plurality of expert models in an expert model library during training, carrying out sample selection processing on the plurality of expert models in the expert model library to obtain a plurality of target expert models, which are similar to the reference sample, of the training samples screened from the expert model library; learning model information of a plurality of target expert models through knowledge distillation processing to obtain a fused expert model; and based on the reference sample and the target sample, performing network model training on the fused expert model to obtain a trained target network model.
The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.
As can be seen from the above, the present embodiment can obtain a reference sample required for constructing a network model; taking the reference sample as a benchmark, and carrying out sample selection processing on a plurality of samples in a sample library through a preset selection network model to obtain a plurality of target samples which are screened from the sample library and are similar to the reference sample; taking the reference sample as a benchmark, selecting a network model through the preset, and based on training samples applied by a plurality of expert models in an expert model library during training, carrying out sample selection processing on the plurality of expert models in the expert model library to obtain a plurality of target expert models, which are similar to the reference sample, of the training samples screened from the expert model library; learning model information of a plurality of target expert models through knowledge distillation processing to obtain a fused expert model; and based on the reference sample and the target sample, performing network model training on the fused expert model to obtain a trained target network model. According to the method and the device, the model information of the multiple target expert models can be learned, so that the fused expert model can absorb the characteristics of the multiple target expert models, the target samples similar to the reference samples are screened out through the preset selection network model, then the fused expert model is trained by the reference samples and the target samples, the model prediction effect of the target network model is further improved, the target network model can be better adapted to the requirements of the current application, the generalization of the network model is improved, and the network model effect is enhanced.
Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.
To this end, embodiments of the present application provide a computer readable storage medium having stored therein a plurality of instructions capable of being loaded by a processor to perform steps in any of the network model building methods provided by embodiments of the present application. For example, the instructions may perform the steps of:
acquiring a reference sample required for constructing a network model; taking the reference sample as a benchmark, and carrying out sample selection processing on a plurality of samples in a sample library through a preset selection network model to obtain a plurality of target samples which are screened from the sample library and are similar to the reference sample; taking the reference sample as a benchmark, selecting a network model through the preset, and based on training samples applied by a plurality of expert models in an expert model library during training, carrying out sample selection processing on the plurality of expert models in the expert model library to obtain a plurality of target expert models, which are similar to the reference sample, of the training samples screened from the expert model library; learning model information of a plurality of target expert models through knowledge distillation processing to obtain a fused expert model; and based on the reference sample and the target sample, performing network model training on the fused expert model to obtain a trained target network model.
The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.
Wherein the computer-readable storage medium may comprise: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.
Because the instructions stored in the computer readable storage medium may execute the steps in any of the network model building methods provided in the embodiments of the present application, the beneficial effects that any of the network model building methods provided in the embodiments of the present application may be achieved, which are detailed in the previous embodiments and are not described herein.
According to one aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from a computer-readable storage medium by a processor of a computer device, and executed by the processor, cause the computer device to perform the methods provided in various alternative implementations of the network model building aspects described above.
The foregoing has described in detail a network model construction method and related devices provided by embodiments of the present application, and specific examples have been applied herein to illustrate the principles and embodiments of the present application, where the foregoing examples are provided to assist in understanding the methods of the present application and their core ideas; meanwhile, those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present application, and the present description should not be construed as limiting the present application in view of the above.

Claims (12)

1. A method for constructing a network model, comprising:
acquiring a reference sample required for constructing a network model;
taking the reference sample as a benchmark, and carrying out sample selection processing on a plurality of samples in a sample library through a preset selection network model to obtain a plurality of target samples which are screened from the sample library and are similar to the reference sample;
taking the reference sample as a benchmark, selecting a network model through the preset, and based on training samples applied by a plurality of expert models in an expert model library during training, carrying out sample selection processing on the plurality of expert models in the expert model library to obtain a plurality of target expert models, which are similar to the reference sample, of the training samples screened from the expert model library;
learning model information of a plurality of target expert models through knowledge distillation processing to obtain a fused expert model;
and based on the reference sample and the target sample, performing network model training on the fused expert model to obtain a trained target network model.
2. The method for constructing a network model according to claim 1, wherein the step of performing sample selection processing on a plurality of samples in a sample library by using the reference sample as a reference and selecting a network model in advance to obtain a plurality of target samples which are screened from the sample library and are similar to the reference sample comprises:
Performing similarity distribution prediction processing on a plurality of samples in a sample library through the preset selection network model to obtain first similarity distribution distances between different samples in the sample library and the reference samples;
screening a plurality of candidate samples from different samples of the sample library based on the first similarity distribution distance;
and downsampling the candidate samples to obtain a plurality of target samples similar to the reference sample.
3. The method of constructing a network model according to claim 2, wherein the downsampling the candidate samples to obtain a plurality of target samples similar to the reference sample, the candidate samples including a plurality of sample sets, includes:
determining a sample ratio between a black sample and a white sample in each of the sample sets;
and downsampling the candidate samples based on the sample proportion to obtain a plurality of target samples similar to the reference samples.
4. The network model construction method according to claim 1, wherein the selecting, based on the training samples applied by the plurality of expert models in the expert model library during training, the plurality of target expert models, which are screened from the expert model library, by using the reference sample as a reference through the preset selection network model, includes:
Determining a training sample applied by each expert model in the expert model library during training;
performing similarity distribution prediction processing on a training sample applied by the expert model during training through the preset selection network model to obtain a second similarity distribution distance between the training sample applied by the expert model during training and the reference sample;
and screening a plurality of target expert models from the expert models in the expert model library based on the second similarity distribution distance.
5. The network model construction method according to claim 1, wherein learning model information of a plurality of the target expert models by a knowledge distillation process to obtain a fused expert model comprises:
constructing a mixed model prediction score based on the prediction results of a plurality of target expert models for the reference samples;
based on a preset student model framework, learning model information of a plurality of target expert models to obtain a distilled expert model;
carrying out prediction processing on the reference sample through the post-distillation expert model to obtain a distillation model prediction result;
constructing model loss based on the mixed model predictive score, the real label carried by the reference sample and the distillation model predictive result;
And training the post-distillation expert model based on the model loss to obtain a post-fusion expert model.
6. The network model construction method according to claim 5, wherein constructing a mixed model predictive score based on the predictive results of the plurality of the target expert models for the reference sample includes:
respectively carrying out prediction processing on the reference sample through a plurality of target expert models to obtain a target expert model prediction result corresponding to each target expert model;
assigning a weight for characterizing model distinction to each target expert model based on the numerical value of the target expert model prediction result and the real label carried by the reference sample;
and carrying out numerical fusion on the target expert model prediction result and the weight corresponding to the target expert model to obtain a mixed model prediction score.
7. The network model construction method according to claim 5, wherein the constructing model loss based on the mixed model predictive score, the real label carried by the reference sample, and the distillation model predictive result comprises:
constructing a first model loss based on the real label carried by the reference sample and the difference between the distillation model prediction results;
Constructing a second model loss based on the hybrid model predictive score and the differences between the distillation model predictive results;
and carrying out loss fusion processing on the first model loss and the second model loss to obtain model loss.
8. The method for constructing a network model according to claim 1, wherein before the selecting the network model by presetting, and taking the reference sample as a reference, performing sample selection processing on a plurality of samples in a sample library to obtain a plurality of target samples which are screened from the sample library and are similar to the reference sample, the method further comprises:
selecting a first training sample set from a network training sample library, wherein the network training sample library comprises a plurality of training sample sets;
selecting a second training sample set from the network training sample library, wherein the network training sample library comprises the first training sample set;
selecting a first training sample from the first training sample set, and selecting a second training sample from the second training sample set;
determining a label value according to whether the first training sample set and the second training sample set are from the same training sample set;
Determining whether the first training sample and the second training sample are from the same training sample set through initial selection of a network model, and generating a model predicted value;
training the initial selection network model based on the label value and the model predictive value to obtain a preset selection network model.
9. A network model construction apparatus, comprising:
the acquisition unit is used for acquiring a reference sample required by constructing the network model;
the first selection unit is used for carrying out sample selection processing on a plurality of samples in a sample library by taking the reference sample as a benchmark and through a preset selection network model to obtain a plurality of target samples which are screened from the sample library and are similar to the reference sample;
the second selection unit is used for selecting a network model through the preset selection by taking the reference sample as a benchmark, and performing sample selection processing on a plurality of expert models in the expert model library based on training samples applied by the plurality of expert models in the expert model library during training to obtain a plurality of target expert models, which are similar to the reference sample, of the training samples screened from the expert model library;
The fusion unit is used for learning the model information of a plurality of target expert models through knowledge distillation processing so as to obtain a fused expert model;
and the enhancement unit is used for carrying out network model training on the fused expert model based on the reference sample and the target sample to obtain a trained target network model.
10. An electronic device comprising a processor and a memory, the memory storing an application, the processor being configured to run the application in the memory to perform the steps in the network model building method of any one of claims 1 to 8.
11. A computer program product comprising computer programs/instructions which when executed by a processor implement the steps of the network model building method of any of claims 1 to 8.
12. A computer readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps in the network model building method of any one of claims 1 to 8.
CN202311341516.8A 2023-10-13 2023-10-13 Network model construction method and related equipment Pending CN117350366A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311341516.8A CN117350366A (en) 2023-10-13 2023-10-13 Network model construction method and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311341516.8A CN117350366A (en) 2023-10-13 2023-10-13 Network model construction method and related equipment

Publications (1)

Publication Number Publication Date
CN117350366A true CN117350366A (en) 2024-01-05

Family

ID=89366312

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311341516.8A Pending CN117350366A (en) 2023-10-13 2023-10-13 Network model construction method and related equipment

Country Status (1)

Country Link
CN (1) CN117350366A (en)

Similar Documents

Publication Publication Date Title
CN112131978B (en) Video classification method and device, electronic equipment and storage medium
CN111291266A (en) Artificial intelligence based recommendation method and device, electronic equipment and storage medium
CN110889450B (en) Super-parameter tuning and model construction method and device
CN113822951B (en) Image processing method, device, electronic equipment and storage medium
CN105786711A (en) Data analysis method and device
CN112418302A (en) Task prediction method and device
CN112819024B (en) Model processing method, user data processing method and device and computer equipment
CN111210072A (en) Prediction model training and user resource limit determining method and device
CN112206541A (en) Game plug-in identification method and device, storage medium and computer equipment
Zhang et al. The generative adversarial networks and its application in machine vision
CN115168720A (en) Content interaction prediction method and related equipment
CN112633425B (en) Image classification method and device
CN114581702A (en) Image classification method and device, computer equipment and computer readable storage medium
CN111046655A (en) Data processing method and device and computer readable storage medium
CN112148994B (en) Information push effect evaluation method and device, electronic equipment and storage medium
CN112819152B (en) Neural network training method and device
CN117196808A (en) Mobility risk prediction method and related device for peer business
CN116977271A (en) Defect detection method, model training method, device and electronic equipment
CN116522131A (en) Object representation method, device, electronic equipment and computer readable storage medium
CN117350366A (en) Network model construction method and related equipment
Hassan et al. Optimising deep learning by hyper-heuristic approach for classifying good quality images
CN114627085A (en) Target image identification method and device, storage medium and electronic equipment
CN113822412A (en) Graph node marking method, device, equipment and storage medium
CN111651626A (en) Image classification method and device and readable storage medium
CN116883673B (en) Semantic segmentation model training method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication