CN115688042A - Model fusion method, device, equipment and storage medium - Google Patents

Model fusion method, device, equipment and storage medium Download PDF

Info

Publication number
CN115688042A
CN115688042A CN202110837115.6A CN202110837115A CN115688042A CN 115688042 A CN115688042 A CN 115688042A CN 202110837115 A CN202110837115 A CN 202110837115A CN 115688042 A CN115688042 A CN 115688042A
Authority
CN
China
Prior art keywords
models
random
model
parameters
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110837115.6A
Other languages
Chinese (zh)
Inventor
曾海恩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zitiao Network Technology Co Ltd
Original Assignee
Beijing Zitiao Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zitiao Network Technology Co Ltd filed Critical Beijing Zitiao Network Technology Co Ltd
Priority to CN202110837115.6A priority Critical patent/CN115688042A/en
Publication of CN115688042A publication Critical patent/CN115688042A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the disclosure relates to a model fusion method, a device, equipment and a storage medium, wherein a plurality of models to be fused and a plurality of random arrays are obtained, the plurality of models to be fused are weighted and fused by random weight in each random array respectively to obtain a fusion model corresponding to each random array, then, the accuracy corresponding to each fusion model is determined based on a preset test sample, and the fusion model with the highest accuracy is determined as a target fusion model, so that the performance of the models can be improved, the models are not limited by a task scene, and the universality of the models is improved.

Description

Model fusion method, device, equipment and storage medium
Technical Field
The embodiment of the disclosure relates to the technical field of artificial intelligence, in particular to a model fusion method, a model fusion device, model fusion equipment and a storage medium.
Background
The correlation technique can fuse a plurality of models with the same structure, type and function to obtain a model with stronger performance. The model fusion method mainly comprises the following steps: a method of aggregation and a method of weight fusion.
The aggregation method is to fuse the outputs of multiple models to obtain a better prediction result. However, this method needs to put multiple models on the same machine to run at the same time, which consumes large computational resources of the machine. The weight fusion method is to fuse the weight parameters of the model so as to obtain a better model. However, most of the weight fusion methods add an exponential smoothing mode as a training skill to the training process of the model, and the method can only bring weak performance improvement to a few tasks, and has large application limitation and limited performance improvement.
Disclosure of Invention
In order to solve the technical problem or at least partially solve the technical problem, embodiments of the present disclosure provide a model fusion method, apparatus, device and storage medium.
A first aspect of an embodiment of the present disclosure provides a model fusion method, including: obtaining a plurality of models to be fused and a plurality of random arrays, wherein the random arrays comprise random weights corresponding to the models to be fused; respectively carrying out weighted fusion processing on a plurality of models to be fused by random weights in each random array to obtain a fusion model corresponding to each random array; and determining the accuracy corresponding to each fusion model based on a preset test sample, and determining the fusion model with the highest accuracy as a target fusion model.
A second aspect of an embodiment of the present disclosure provides a model processing apparatus, including:
the system comprises an acquisition module, a fusion module and a fusion module, wherein the acquisition module is used for acquiring a plurality of models to be fused and a plurality of random arrays, and the random arrays comprise random weights corresponding to the models to be fused;
the fusion module is used for performing weighted fusion processing on the multiple models to be fused by using the random weight in each random array respectively to obtain a fusion model corresponding to each random array;
and the determining module is used for determining the accuracy corresponding to each fusion model based on a preset test sample, and determining the fusion model with the highest accuracy as the target fusion model.
A third aspect of the embodiments of the present disclosure provides a computer device, which includes a memory and a processor, wherein the memory stores a computer program, and when the computer program is executed by the processor, the method of the first aspect may be implemented.
A fourth aspect of embodiments of the present disclosure provides a computer-readable storage medium having a computer program stored therein, which, when executed by a processor, may implement the method of the first aspect described above.
Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages:
according to the model fusion method, the plurality of models to be fused and the plurality of random arrays are obtained, weighted fusion processing is carried out on the plurality of models to be fused according to the random weight in each random array, the fusion model corresponding to each random array is obtained, then the accuracy corresponding to each fusion model is determined based on the preset test sample, the fusion model with the highest accuracy is determined as the target fusion model, the performance of the models can be improved, the models are not limited by task scenes, the universality of the models is improved, and the model fusion method does not need to operate the plurality of models on the same machine like the aggregation method in the prior art, so that the consumption of machine computing resources is reduced.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1 is a flow chart of a model fusion method provided by an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a method for obtaining a model to be fused and a random array according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of another method for obtaining a model to be fused and a random array according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a model fusion method provided by an embodiment of the present disclosure;
FIG. 5 is a flow chart of yet another model fusion method provided by an embodiment of the present disclosure;
FIG. 6 is a schematic structural diagram of a model processing apparatus provided in an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure.
Detailed Description
In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.
Fig. 1 is a flowchart of a model fusion method provided by an embodiment of the present disclosure, which may be performed by a computer device. The computer device may be understood as any device having computing and processing capabilities. As shown in fig. 1, the method provided by the embodiment of the present disclosure includes the following steps:
step 101, obtaining a plurality of models to be fused and a plurality of random arrays, wherein the random arrays comprise random weights corresponding to the models to be fused.
The plurality of models to be fused referred to in the embodiments of the present disclosure refer to a plurality of models having the same structure, type, and function.
The random array is an array formed by a plurality of random numbers, and the number of the random numbers in the random array is larger than or equal to the number of the models to be fused. In the disclosed embodiment, one model to be fused corresponds to one random number in the random number group. The random number corresponding to the model to be fused is called the random weight corresponding to the model. The random array of the embodiment of the present disclosure has a plurality of random arrays, that is, each model to be fused has a corresponding random weight in each random array.
In the embodiment of the present disclosure, there may be a variety of manners for acquiring the model to be fused and the random data. For example, fig. 2 is a schematic diagram of a method for obtaining a model to be fused and a random array according to an embodiment of the present disclosure. As shown in fig. 2, in an implementation manner of the embodiment of the present disclosure, a plurality of models to be fused and a plurality of random arrays may be obtained from a network server, a database, a readable storage medium, and the like. For another example, fig. 3 is a schematic diagram of another method for obtaining a model to be fused and a random array according to an embodiment of the present disclosure. As shown in fig. 3, in another implementation manner of the embodiment of the present disclosure, n models to be fused may be obtained from a preset data source, where n is a positive integer, and then a random array including n random weights (i.e., W1 to Wn in fig. 3) may be generated according to the obtained number of the models to be fused. It should be noted that, in the scenario shown in fig. 3, after the fusion model is obtained based on one set of random arrays, a next set of random arrays is generated, and a corresponding model is generated based on the next set of random arrays, and this is repeated until a preset number of random arrays and fusion models are obtained. Or in another mode, after obtaining a plurality of models to be fused, a plurality of random arrays may be generated, and then the plurality of models to be fused are fused based on each random array.
Of course, fig. 2 and 3 are only two exemplary methods and are not the only limitations of the method for acquiring the model and the random array to be fused, and actually, the method for acquiring the model and the random array to be fused may be set as needed and is not necessarily limited to a specific method.
And 102, respectively carrying out weighted fusion processing on the plurality of models to be fused by using the random weight in each random array to obtain a fusion model corresponding to each random array.
For example, fig. 4 is a schematic diagram of a model fusion method provided by an embodiment of the present disclosure, and in fig. 4, model 1 and model 2 are two models with the same structure, type, and function. The random array includes 2 random weights W1 and W2, where the weight W1 corresponds to model 1 and the weight is corresponding to model 2. Middle bag of model 1Comprises a 1 ···a m M are positive integers. Model 2 includes a 1 ’···a m ' m parameters. Wherein, a i And a i ' are two identical parameters, a i And a i ' may have different values, i is an integer of 1 or more and m or less. As shown in fig. 4, when model 1 and model 2 are fused by using random weights in the random array, the parameters in model 1 may be weighted by W1 to obtain weighted parameters W1 × a 1 ···W1*a m Weighting the parameters in the model 2 by using W2 to obtain weighted parameters W2 a 1 ’···W2*a m ' then, the weighted results of the same parameters in model 1 and model 2 are summed to obtain (W1 a) 1 +W2*a 1 ’)···(W1*a m +W2*a m ') as target parameters, and using the target parameters as model parameters of the fusion model to generate and obtain the fusion model corresponding to the random array. That is to say, in an implementation manner of the embodiment of the present disclosure, for each random array, based on a random weight in the random array, a weighted summation process may be performed on parameters of multiple models to be fused, so as to obtain a target parameter; and taking the target parameters as model parameters of the fusion model to generate and obtain the corresponding fusion model.
It should be noted that fig. 4 is only an illustration and not the only limitation of the disclosed embodiments. In fact, fig. 4 uses the weighted and summed parameters as the parameters of the fusion model, but in other embodiments, the average data of the weighted parameters may also be used as the parameters of the fusion model, such as the parameter a in the above example 1 In other words, W1 a 1 And W2 a 1 ' as parameter a in the fusion model 1 And so on, the values of the remaining parameters in the fusion model can be obtained. In addition, fig. 4 actually performs weighted summation on all parameters in the multiple models to be fused, but in other embodiments, it is not excluded that weighted summation may also be performed on some parameters in the multiple models to be fused, for example, only multiple models may also be performed in a possible implementation mannerAnd carrying out weighted summation on the weight parameters of the models to be fused, taking the new weight parameters obtained by weighted summation as the weight parameters of the fusion models, and initializing the hyper-parameters of the fusion models on the basis to obtain the complete fusion models.
And 103, determining the accuracy corresponding to each fusion model based on a preset test sample, and determining the fusion model with the highest accuracy as a target fusion model.
The test samples in this embodiment may be understood as samples for testing the accuracy of the model, and the types and the number of the test samples may be set according to the task scenario of the fusion model, which is not specifically limited in this disclosure.
In the embodiment of the present disclosure, a corresponding fusion model can be obtained based on each random array, and the fusion models have different performance performances due to different model parameters. The accuracy of each fusion model is tested through the test sample, and the performance corresponding to each fusion model can be obtained, so that the model with the optimal performance is obtained.
According to the model fusion method, the multiple models to be fused and the multiple random arrays are obtained, weighted fusion processing is carried out on the multiple models to be fused according to the random weight in each random array, the fusion model corresponding to each random array is obtained, then the accuracy corresponding to each fusion model is determined based on the preset test sample, the fusion model with the highest accuracy is determined as the target fusion model, the performance of the models can be improved, the models are not limited by task scenes, the universality of the models is improved, and the model fusion method provided by the embodiment of the disclosure does not need to operate the multiple models on the same machine like an aggregation method, so that the consumption of machine computing resources is reduced.
Fig. 5 is a flowchart of another model fusion method provided in the embodiment of the present disclosure, and as shown in fig. 5, the method includes:
step 501, obtaining a plurality of models with different parameters and a plurality of random arrays, wherein the random arrays comprise random weights corresponding to the models.
The plurality of parametrically different models referred to in the embodiments of the present disclosure may be understood as a plurality of models differing in the weight parameter and/or the hyper-parameter. For example, in an implementation manner of the embodiment of the present disclosure, a plurality of models with different weight parameters may be obtained by performing multiple random initialization processes on the weight parameters in the original model. In this way, the original model can be understood as a model having a basic model architecture but not initialized with the weight parameters, and the original model at this time may have completed initialization of the hyper-parameters or may not have completed initialization of the hyper-parameters. If the initialization of the hyper-parameters is not completed, the hyper-parameter initialization processing needs to be performed on the model after the initialization of the weight parameters. After the hyper-parameters are initialized, the hyper-parameters of different models can be the same or different. For another example, in another implementation manner of the embodiment of the present disclosure, multiple models with different hyper-parameters may be obtained by performing multiple random initializations on the hyper-parameters in the original model. In this way, the original model can be understood as a model having the basic model architecture but not being initialized with the hyperparameters, and the original model at this time may have completed initialization of the weight parameters or may not have completed initialization of the weight parameters. If the initialization of the weight parameters is not completed, the weight parameter initialization processing needs to be performed on the model after the initialization of the super parameters. When the weight parameter initialization is performed on the model which has completed the super parameter initialization, the weight parameter initialization can be performed for multiple times aiming at the same model, and a plurality of models with the same super parameter and different weight parameters are obtained.
Of course, the above two methods are only illustrative and not only limited to the embodiments of the present disclosure, and in fact, in different scenarios, a plurality of models with different parameters may be obtained by different methods according to needs, and are not necessarily limited to a specific method or specific methods.
Step 502, training a plurality of models with different parameters based on a preset sample group, and taking the trained plurality of models as models to be fused.
The sample group referred to in the embodiments of the present disclosure may include a plurality of samples, and the types and the number of the samples may be set according to needs, which is not specifically limited in the embodiments of the present disclosure.
The method for training the model based on the sample group according to the embodiments of the present disclosure may refer to the prior art, and is not described herein again.
Step 503, performing weighted fusion processing on the multiple models to be fused by using the random weight in each random array respectively, to obtain a fusion model corresponding to each random array.
The embodiment of the disclosure can adopt different methods to fuse a plurality of models to be fused. For example, when the obtained weight parameters of the multiple models to be fused are different and the hyperparameters of the multiple models to be fused are the same, the weight parameters of the multiple models to be fused can be subjected to weighted summation, then, the new weight parameters obtained after weighted summation are used as the weight parameters of the fusion model, and the hyperparameters common to the multiple models to be fused are used as the hyperparameters of the fusion model to generate the corresponding fusion model. For another example, when the obtained weight parameters and the hyper-parameters of the multiple models to be fused are different, the weighting and summing process may be performed on the same parameters (including the weight parameters and the hyper-parameters) between the multiple models to be fused, and then the weight parameters and the hyper-parameters obtained by the weighting and summing process are used as the weight parameters and the hyper-parameters of the fusion model to generate and obtain the corresponding fusion model.
Of course, the two model fusion methods are only illustrative and not the only limitations of the embodiments of the present disclosure.
And step 504, determining the accuracy corresponding to each fusion model based on a preset test sample, and determining the fusion model with the highest accuracy as a target fusion model.
According to the embodiment of the invention, the models with different parameters are trained, and the models to be fused with different parameters obtained by training are fused, so that the problems of low model accuracy and insufficient universality caused by parameter one-sidedness can be solved, and the accuracy and universality of the fused model are improved.
The model fusion method provided by the embodiment of the disclosure can be applied to various fields. For example, in the field of image processing, the target fusion model determined by the embodiments of the present disclosure may be trained as a model with certain image processing capabilities (such as painting, beautifying, etc., but not limited to the image processing capabilities listed here). The processing device (e.g., a mobile phone, a computer, etc.) equipped with the model may acquire the image to be processed by shooting, downloading, receiving, etc., and then process the image to be processed by the pre-trained fusion model to obtain a processed image (e.g., a smear image, a beautification-processed image, etc.). Of course, this is merely an example and is not intended to limit the scope or application of the disclosed embodiments.
Fig. 6 is a schematic structural diagram of a model processing apparatus provided in an embodiment of the present disclosure, where the model processing apparatus may be understood as a computer device or a part of functional modules in a computer device in the foregoing embodiments. As shown in fig. 6, the model processing device 60 includes:
the acquiring module 61 is configured to acquire a plurality of models to be fused and a plurality of random arrays, where each random array includes a random weight corresponding to each model to be fused;
the fusion module 62 is configured to perform weighted fusion processing on the multiple models to be fused by using the random weight in each random array, respectively, to obtain a fusion model corresponding to each random array;
and a determining module 63, configured to determine, based on a preset test sample, the accuracy corresponding to each fusion model, and determine the fusion model with the highest accuracy as the target fusion model.
In one embodiment, the obtaining module 61 may include:
an acquisition unit configured to acquire a plurality of models having different parameters;
and the training unit is used for training a plurality of models with different parameters based on a preset sample group, and taking the trained models as the models to be fused.
In one embodiment, the obtaining unit may include:
the first acquiring subunit is used for acquiring a plurality of models with different hyper-parameters.
In one embodiment, the obtaining unit may further include:
and the second acquisition subunit is used for acquiring a plurality of models with different built-in weight parameters.
In one embodiment, the second acquiring subunit may be configured to:
obtaining an original model; and carrying out random initialization processing on the weight parameters in the original model for multiple times to obtain a plurality of models with different weight parameters.
In one embodiment, the fusion module 62 may include:
the weighting processing unit is used for weighting and summing the parameters of the models to be fused according to each random array and based on the random weight in the random array to obtain target parameters;
and the generating unit is used for generating and obtaining the fusion model corresponding to the random array by taking the target parameters as model parameters of the fusion model.
The apparatus provided in this embodiment can execute the method in any one of fig. 1 to fig. 5, and the execution manner and the beneficial effect are similar, which are not described herein again.
The embodiment of the present disclosure further provides a computer device, which includes a processor and a memory, where the memory stores a computer program, and when the computer program is executed by the processor, the method of any one of the above-mentioned fig. 1 to 5 may be implemented.
For example, fig. 7 is a schematic structural diagram of a computer device provided in an embodiment of the present disclosure. Referring now in particular to fig. 7, there is shown a schematic block diagram of a computer device 1000 suitable for use in implementing embodiments of the present disclosure. The computer device 1000 in the embodiments of the present disclosure may be understood as a device having computing and processing functions, such as a notebook computer, a desktop computer, a server, and the like. The computer device shown in fig. 7 is only an example and should not bring any limitation to the function and scope of use of the embodiments of the present disclosure.
As shown in fig. 7, the computer apparatus 1000 may include a processing device (e.g., a central processing unit, a graphic processor, etc.) 1001 that may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1002 or a program loaded from a storage device 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for the operation of the computer apparatus 1000 are also stored. The processing device 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.
Generally, the following devices may be connected to the I/O interface 1005: input devices 1006 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 1007 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 1008 including, for example, magnetic tape, hard disk, and the like; and a communication device 1009. The communication means 1009 may allow the computer device 1000 to communicate with other devices wirelessly or by wire to exchange data. While fig. 7 illustrates a computer device 1000 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may be alternatively implemented or provided.
In particular, the processes described above with reference to the flow diagrams may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 1009, or installed from the storage means 1008, or installed from the ROM 1002. The computer program, when executed by the processing device 1001, performs the above-described functions defined in the methods of embodiments of the present disclosure.
It should be noted that the computer readable medium of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the computer device; or may exist separately and not be incorporated into the computer device.
The computer readable medium carries one or more programs which, when executed by the computing device, cause the computing device to: obtaining a plurality of models to be fused and a plurality of random arrays, wherein the random arrays comprise random weights corresponding to the models to be fused; respectively carrying out weighted fusion processing on a plurality of models to be fused by random weights in each random array to obtain a fusion model corresponding to each random array; and determining the accuracy corresponding to each fusion model based on a preset test sample, and determining the fusion model with the highest accuracy as a target fusion model.
Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The embodiments of the present disclosure further provide a computer-readable storage medium, where a computer program is stored in the storage medium, and when the computer program is executed by a processor, the method of any one of the embodiments in fig. 1 to fig. 5 may be implemented, where the execution manner and the beneficial effects are similar, and are not described herein again.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
The previous description is only for the purpose of describing particular embodiments of the present disclosure, so as to enable those skilled in the art to understand or implement the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (14)

1. A method of model fusion, comprising:
obtaining a plurality of models to be fused and a plurality of random arrays, wherein the random arrays comprise random weights corresponding to the models to be fused;
respectively carrying out weighted fusion processing on the multiple models to be fused according to the random weight in each random array to obtain a fusion model corresponding to each random array;
and determining the accuracy corresponding to each fusion model based on a preset test sample, and determining the fusion model with the highest accuracy as a target fusion model.
2. The method of claim 1, wherein the obtaining a plurality of models to be fused comprises:
obtaining a plurality of models with different parameters;
and training the models with different parameters based on a preset sample group, and taking the trained models as the models to be fused.
3. The method of claim 2, wherein obtaining a plurality of models with different parameters comprises:
and acquiring a plurality of models with different hyper-parameters.
4. The method of claim 2 or 3, wherein the obtaining a plurality of models with different parameters comprises:
and obtaining a plurality of models with different built-in weight parameters.
5. The method of claim 4, wherein obtaining a plurality of models with different built-in weight parameters comprises:
obtaining an original model;
and carrying out random initialization processing on the weight parameters in the original model for multiple times to obtain a plurality of models with different weight parameters.
6. The method according to claim 1, wherein the performing weighted fusion on the plurality of models to be fused by using the random weight in each random number group respectively to obtain a fusion model corresponding to each random number group comprises:
for each random array, based on random weights in the random array, carrying out weighted summation processing on parameters of the multiple models to be fused to obtain target parameters;
and taking the target parameters as model parameters of a fusion model, and generating the fusion model corresponding to the random array.
7. A model processing apparatus, comprising:
the system comprises an acquisition module, a fusion module and a fusion module, wherein the acquisition module is used for acquiring a plurality of models to be fused and a plurality of random arrays, and the random arrays comprise random weights corresponding to the models to be fused;
the fusion module is used for performing weighted fusion processing on the plurality of models to be fused by respectively using the random weight in each random array to obtain a fusion model corresponding to each random array;
and the determining module is used for determining the accuracy corresponding to each fusion model based on a preset test sample, and determining the fusion model with the highest accuracy as the target fusion model.
8. The apparatus of claim 7, wherein the obtaining module comprises:
an acquisition unit configured to acquire a plurality of models having different parameters;
and the training unit is used for training the models with different parameters based on a preset sample group and taking the trained models as the models to be fused.
9. The apparatus of claim 8, wherein the obtaining unit comprises:
the first acquiring subunit is used for acquiring a plurality of models with different hyper-parameters.
10. The apparatus according to claim 8 or 9, wherein the obtaining unit further comprises:
and the second acquisition subunit is used for acquiring a plurality of models with different built-in weight parameters.
11. The apparatus of claim 10, wherein the second obtaining subunit is configured to:
obtaining an original model;
and carrying out random initialization processing on the weight parameters in the original model for multiple times to obtain a plurality of models with different weight parameters.
12. The apparatus of claim 7, wherein the fusion module comprises:
the weighting processing unit is used for weighting and summing the parameters of the models to be fused according to each random array and based on the random weight in the random array to obtain target parameters;
and the generating unit is used for generating and obtaining the fusion model corresponding to the random array by taking the target parameters as model parameters of the fusion model.
13. A computer device, comprising:
memory and a processor, wherein the memory has stored therein a computer program which, when executed by the processor, implements the method of any of claims 1-6.
14. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-6.
CN202110837115.6A 2021-07-23 2021-07-23 Model fusion method, device, equipment and storage medium Pending CN115688042A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110837115.6A CN115688042A (en) 2021-07-23 2021-07-23 Model fusion method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110837115.6A CN115688042A (en) 2021-07-23 2021-07-23 Model fusion method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115688042A true CN115688042A (en) 2023-02-03

Family

ID=85044294

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110837115.6A Pending CN115688042A (en) 2021-07-23 2021-07-23 Model fusion method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115688042A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116680625A (en) * 2023-08-04 2023-09-01 山东华科信息技术有限公司 Cloud edge end cooperation-based distribution network multi-scene matching data processing method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116680625A (en) * 2023-08-04 2023-09-01 山东华科信息技术有限公司 Cloud edge end cooperation-based distribution network multi-scene matching data processing method and system
CN116680625B (en) * 2023-08-04 2024-01-05 山东华科信息技术有限公司 Cloud edge end cooperation-based distribution network multi-scene matching data processing method and system

Similar Documents

Publication Publication Date Title
CN112966712B (en) Language model training method and device, electronic equipment and computer readable medium
CN111368973B (en) Method and apparatus for training a super network
CN111968647B (en) Voice recognition method, device, medium and electronic equipment
CN113222813B (en) Image super-resolution reconstruction method and device, electronic equipment and storage medium
CN111340220A (en) Method and apparatus for training a predictive model
CN109598344B (en) Model generation method and device
CN110009101B (en) Method and apparatus for generating a quantized neural network
CN113392018B (en) Traffic distribution method and device, storage medium and electronic equipment
CN114240506A (en) Modeling method of multi-task model, promotion content processing method and related device
CN115688042A (en) Model fusion method, device, equipment and storage medium
CN111862081B (en) Image scoring method, training method and device of score prediction network
CN113051933A (en) Model training method, text semantic similarity determination method, device and equipment
CN111967584A (en) Method, device, electronic equipment and computer storage medium for generating countermeasure sample
CN113780534B (en) Compression method, image generation method, device, equipment and medium of network model
CN111784567B (en) Method, apparatus, electronic device, and computer-readable medium for converting image
CN113240108A (en) Model training method and device and electronic equipment
CN114021010A (en) Training method, device and equipment of information recommendation model
CN113435528A (en) Object classification method and device, readable medium and electronic equipment
CN110209851B (en) Model training method and device, electronic equipment and storage medium
CN111898061A (en) Method, device, electronic equipment and computer readable medium for searching network
CN111680754A (en) Image classification method and device, electronic equipment and computer-readable storage medium
CN111523639A (en) Method and apparatus for training a hyper-network
CN109670577B (en) Model generation method and device
CN111582482B (en) Method, apparatus, device and medium for generating network model information
CN117876091A (en) Information transmission method, apparatus, electronic device, and computer-readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination