CN114547349A

CN114547349A - Method, device, equipment and storage medium for model adjustment and business processing

Info

Publication number: CN114547349A
Application number: CN202210149138.2A
Authority: CN
Inventors: 罗雄文; 陈德健; 项伟
Original assignee: Bigo Technology Pte Ltd
Current assignee: Bigo Technology Pte Ltd
Priority date: 2022-02-18
Filing date: 2022-02-18
Publication date: 2022-05-27
Also published as: WO2023155783A1

Abstract

The application discloses a method, a device, equipment and a storage medium for model adjustment and service processing, wherein the method comprises the following steps: determining a target adjustment type of a target model to be adjusted; determining the existing capacity data of the target model according to the target adjustment type; determining the target weight needing to be adjusted in the target model according to the target adjustment type; acquiring target training data corresponding to the target weight; and adjusting the target model based on the existing capability data and the target training data, wherein the adjustment comprises the adjustment of the target weight, so that the target model learns new capability while keeping the recognition capability of the old mode, and the capability of flexibly and quickly adjusting the system according to the service change is improved.

Description

Method, device, equipment and storage medium for model adjustment and business processing

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a method for model adjustment, a method for service processing, a device for model adjustment, a device for service processing, a computer-readable storage medium, and a computer program product.

Background

Considering the problems of short video/picture auditing and recommendation instantaneity and scalability, most of the auditing and recommendation processes are currently performed by a combination of machine and manual operations, wherein the machine processing part occupies a major position. Taking a machine review as an example, the method can find and process more than 90% of violations at a high speed while ensuring a certain accuracy.

Currently, the mainstream deep learning review/push system tends to use a flattening method to solve a wide range of complex review/recommendation application scenarios, i.e., a plurality of functionally different models are used in parallel to meet different business requirements. When the large environment of the business changes, due to the low multiplexing capability of the business system, the business system is difficult to adapt to the business requirement and data distribution which change rapidly, and a long time is needed when the system needs to be updated and adjusted, especially the time is longer under the condition of multi-model cooperation.

Disclosure of Invention

The application provides a method, a device, equipment and a storage medium for model adjustment and service processing, which aim to solve the problems that in the prior art, when the large environment of a service changes, a system is difficult to adapt to the service requirement and data distribution which change rapidly, and a long time is needed when the system needs to be updated and adjusted.

In a first aspect, an embodiment of the present application provides a method for model adjustment, where the method includes:

determining a target adjustment type of a target model to be adjusted;

determining the existing capacity data of the target model according to the target adjustment type;

determining the target weight needing to be adjusted in the target model according to the target adjustment type;

acquiring target training data corresponding to the target weight;

adjusting the target model based on the existing capability data and the target training data, the adjusting including adjusting the target weights.

In a second aspect, an embodiment of the present application further provides a service processing method, where the method includes:

the service data acquisition module is used for acquiring service data to be processed;

a target model loading module, configured to load a target model obtained by the method according to the first aspect;

and the business processing module is used for processing the business data by adopting the target model.

In a third aspect, an embodiment of the present application further provides an apparatus for model adjustment, where the apparatus includes:

the adjustment type determining module is used for determining a target adjustment type of a target model to be adjusted;

the existing capacity data determining module is used for determining existing capacity data of the target model according to the target adjustment type;

the target weight determining module is used for determining the target weight needing to be adjusted in the target model according to the target adjustment type;

a target training data acquisition module for acquiring target training data corresponding to the target weight;

a model adjustment module configured to adjust the target model based on the existing capability data and the target training data, where the adjustment includes an adjustment of the target weight.

In a fourth aspect, an embodiment of the present application further provides a service processing apparatus, where the apparatus includes:

acquiring service data to be processed;

loading the target model obtained according to the method of the first aspect;

and processing the service data by adopting the target model.

In a fifth aspect, an embodiment of the present application further provides a service processing device, where the service processing device includes:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of the first or second aspect described above.

In a sixth aspect, the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the method of the first aspect or the second aspect.

In a seventh aspect, this application embodiment also provides a computer program product, where the computer program product includes computer-executable instructions, and when executed, the computer-executable instructions are configured to implement the method of the first aspect or the second aspect.

The technical scheme that this application provided has following beneficial effect:

in this embodiment, when the target model needs to be adjusted, the adjustment direction is determined by determining a target adjustment type of the target model, and then the existing capability data of the target model and the target weight needing to be adjusted in the target model are determined according to the target adjustment type. Adjusting the target model according to the existing capability data and the target training data so as to realize the adjustment of the target weight, so that the target model learns new capability while keeping the recognition capability of the old mode; the service system can expand new functions aiming at changed service requirements and data distribution on the premise of keeping original system capacity as much as possible, so that the performance of the service system is prevented from being greatly reduced under new data distribution and service conditions, even the system has certain automatic optimization and intelligent improvement capacity, and the capacity of the system for flexibly and quickly adjusting according to service changes is improved.

Drawings

FIG. 1 is a flowchart of an embodiment of a method for model adjustment according to an embodiment of the present disclosure;

FIG. 2 is a flowchart of an embodiment of a method for model adjustment according to a second embodiment of the present disclosure;

FIG. 3 is a flowchart of an embodiment of a method for model adjustment according to a third embodiment of the present disclosure;

FIG. 4 is a diagram illustrating a range expansion branch partitioning according to a third embodiment of the present application;

FIG. 5 is a flowchart of an embodiment of a method for model adjustment according to a fourth embodiment of the present disclosure;

fig. 6 is a schematic diagram of an architecture for connecting with a mapping connection layer when a new task function is extended according to a fourth embodiment of the present application;

fig. 7 is a block diagram of an embodiment of a model adjusting apparatus according to a fifth embodiment of the present disclosure;

fig. 8 is a flowchart of an embodiment of a service processing method according to a sixth embodiment of the present application;

fig. 9 is a flowchart of an embodiment of a service processing apparatus according to a seventh embodiment of the present application;

fig. 10 is a schematic structural diagram of a service processing device according to a sixth embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some of the structures related to the present application are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a flowchart of an embodiment of a method for model adjustment provided in an embodiment of the present application, where the embodiment may be applied to a scenario where a service system adaptively adjusts a used model or a model set, as shown in fig. 1, the embodiment may include the following steps:

step 110, determining a target adjustment type of the target model to be adjusted.

The target model may include a model that needs to be updated or adjusted, and may include various models with different functions for meeting different business requirements, which are required by a business system, and may include, for example, a behavior recognition model, a scene classification model, an age regression model, a gender recognition model, a speech recognition model, a user portrait model, a multi-modal integration model, a sensitive word detection model, a gun detection model, and the like.

The target model may be a model that needs to be updated and is specified by a user, or may be a model that needs to be updated and is automatically determined by the system according to a preset model update policy, which is not limited in this embodiment.

The target adjustment type is used to guide the direction of adjustment of the target model, and different target models may have different adjustment types.

In one embodiment, step 110 may further include the steps of:

and step 110-1, obtaining model requirement information input by a user aiming at the target model.

The model requirement information may include business functions that the user desires the target model to be able to implement, or types of tasks that can be performed. For example, assuming that the business function that a user wants a model to implement is task classification and task segmentation, the model requirement information may include: identification information for indicating to perform task classification, and identification information for indicating to perform task segmentation.

In one implementation, an interactive interface can be presented to a user, the interactive interface can comprise various model functional elements and provide various basic models for the user to select, and the user can select one or more basic models from the various basic models as a target model according to actual requirements; and then selecting one or more model functional elements from the model functional elements as functions which can be realized by the expected target model, and generating model requirement information according to the model functional elements selected by the user.

In other embodiments, the user may also input the model requirement information in a manner of an instruction such as a command line, and the model requirement information may also carry an identifier of the target model, which is not limited in this embodiment.

The model requirement information input by the user is convenient for the user to more flexibly configure artificial conditions, and the flexibility of processing the service requirement is enhanced.

And step 110-2, obtaining the existing model information of the target model.

The existing model information refers to description information used for representing existing capabilities of the target model, and for example, the existing model information may include a model architecture of the target model, model parameters and corresponding parameter values, implemented functions of the model, and the like.

In implementation, after the target model is determined, the data of the target model may be loaded, and the existing model information of the target model may be extracted from the data of the target model.

And step 110-3, comparing the model requirement information with the existing model information to determine model extension information.

After the model requirement information and the existing model information of the target model are determined, the model requirement information and the existing model information can be compared to determine the capability information which is not possessed by the current target model from the model requirement information as the model extension information. During implementation, the model requirement information is analyzed to determine one or more requirements contained in the model requirement information, then whether the requirements are requirements (namely the possessed capacity) which can be implemented by the target model is judged one by one, and if one or more requirements are requirements which are not implemented by the target model, the one or more requirements are used as model extension information.

At step 110-4, adjustment logic for one or more preset adjustment types is obtained.

In one implementation, the adjustment type is adjustment information that is abstracted by a developer in advance according to experience and used for guiding the adjustment direction of the model. Different adjustment types can have different adjustment emphasis points, and the adjustment emphasis points of different adjustment types can be described by adopting adjustment logic.

For example, since the adjusted model in the business system involves a great variety of tasks and structural designs, in order to reduce the complexity of the system as much as possible, the various model task types may be organized into the following adjustment types: a mode adjustment type, a range adjustment type, a function adjustment type, and the like.

The mode adjustment type relates to diversity adjustment covered by the data mode, namely, adjusting the breadth of the model for identifying the data in the category. For example, assuming that the model is a gun recognition model, the original gun recognition model can only recognize a certain type of gun in a certain area or work. This is an adjustment of the pattern coverage diversity if the user wants the model to be able to identify a large variety of firearms worldwide, or firearms identification covering a wide variety of works.

The scope adjustment type relates to adjustment of the scope of data that can be processed within the same task function, for example, increasing the category of classification tasks, increasing the number of keypoint detections, increasing the anchor frame type of detection tasks, and so on. For example, an example of "adding a category for a classification task" may be: assuming that the original model can recognize both cat and dog animals, if the user wants the model to recognize the lion, the adjustment does not change the task type (both classification tasks), but only adds the recognized categories, and thus the adjustment belongs to the model-wide adjustment.

Unlike the pattern adjustment type, the range adjustment type may change the range of types that the model can recognize, and the pattern adjustment type does not increase or decrease the range of types recognized, but increases the breadth of the same kind of recognition (extending from a certain region to the country or the world, etc.).

The function adjustment type relates to adjustment of the number or combination of task functions that the model can handle, for example, a single-task function model is changed into a multi-task function model, or task functions that the original model does not have are added, for example, a model originally having only a classification capability is expanded into a dual-function model having both classification and detection capabilities, or a simple key point identification capability is upgraded into an association relationship identification capability.

Unlike the above-described range adjustment type, the function adjustment type may increase or decrease the task function. For example, assuming that the original model is able to perform classification tasks, if the user wants the model to also perform segmentation tasks, such adjustments pertain to the adjustment of task functionality.

Of course, besides the above three adjustment types, those skilled in the art may abstract other adjustment types and define an adjustment logic of each adjustment type according to actual needs, which is not limited in this embodiment.

And step 110-5, matching the model extension information with the adjustment logics of the one or more preset adjustment types, and taking the matched adjustment type as the target adjustment type.

By analyzing the model extension information, it is possible to determine which extension the target model belongs to, thereby determining the target adjustment type. In one implementation, the target adjustment type may be determined as follows:

judging whether the increase or decrease of the task functions is involved according to the model extension information, and if so, determining the target adjustment type as a function adjustment type; if not, further judging whether the model extension information relates to the increase or decrease of the task types of the same task function, and if so, determining that the target adjustment type is a range adjustment type; if not, further judging whether the model extension information relates to the mode (or the breadth) of the same task type of the same task function is increased or decreased, and if so, determining that the target adjustment type is the mode adjustment type; if not, the model extension information can be provided for developers to carry out manual judgment.

And step 120, determining the existing capacity data of the target model according to the target adjustment type.

The existing capability data refers to data existing in the target model and associated with the target adjustment type. The existing model information in step 110-2 refers to all the existing capability data of the target model.

In implementation, the existing capability data can be extracted from the existing model information of the target model according to the adjustment logic of the target adjustment type. Since the adjustment logics of different adjustment types are different, the existing capability data corresponding to different adjustment types may also be different.

And step 130, determining the target weight needing to be adjusted in the target model according to the target adjustment type.

In practice, when the model is adjusted, the adjustment of each weight of the model is included. When adjusting the weights, it is also possible to adjust all the weights of the model or to adjust some of the weights in a targeted manner. When the method is implemented, the target weight needing to be adjusted can be determined according to the adjustment logic of the target adjustment type. Since the adjustment logic of different adjustment types is different, the target weights corresponding to different adjustment types may also be different.

Step 140, obtaining target training data corresponding to the target weight.

The target training data may be existing training data or newly generated training data, which is not limited in this embodiment.

In implementation, the target training data corresponding to the target weight may be looked up from a database for storing training data. For example, if the target weights are weights for processing layers that identify lions, training data labeled with lions may be used as the target training data.

Step 150, adjusting the target model based on the existing capability data and the target training data.

In this step, the target model may be adjusted by using the existing capability data and the target training data according to the adjustment logic of the target adjustment type, which may include the adjustment of the target weight. Wherein the adjusting may include: iterative updating of the object model, or expanding the object model (e.g., expanding the object model used in the old domain to the new domain), or reducing the object model (e.g., reducing some redundant functionality), etc.

In one implementation, when the target model is adjusted, the target model may be adjusted by using at least one of a transfer learning method and a continuous learning method according to existing capability data of the target model. The transfer learning is a model learning method for transferring the capability of processing a model in the current field to a new model or expanding the capability to the new field, and is mainly expected to use the learned pattern recognition capability. Continuous learning is a learning method for avoiding catastrophic forgetfulness, and aims to learn new capabilities while hopefully preserving old pattern recognition capabilities. Catastrophic forgetfulness generally refers to a phenomenon that a newly trained model based on old weight parameters loses the ability to recognize past data patterns in subsequent training, and most of the phenomenon occurs when the data distribution changes greatly.

In the embodiment, the dual learning mechanisms of the transfer learning method and the continuous learning method are used for adjusting the capability of the business system, so that the development cost can be reduced. Further, from the timeline of system development, the dual learning mechanism also shoulder significant missions at different stages of development. In the early expansion stage of system development, labeling data is lacked, the self-supervision thought of transfer learning plays a great role at the moment, and a plurality of different business requirements can be transferred from a single task field according to the commonality of data distribution. In the optimization/expansion stage of the middle and later stages, the concept of continuous learning is adopted, so that the catastrophic forgetting of the existing capability is avoided, the original capability is maintained, meanwhile, the steady expansion is realized, the coping capability of the variable artificial conditions is gradually enhanced, and the system can also maintain higher precision in the variable environment.

It should be noted that the above mentioned adjustments to the target model are all adjustments generated based on expansion, and in practice, the adjustments may also include adjustments generated by performing reduction on the target model. In this case, unnecessary branches or entire models can be masked accordingly without being deleted or structurally processed. This is done primarily to prevent the increased adjustment overhead that would be required to reuse the reduced functionality. Some special reduction requirements can be achieved by combining the too thin outputs through a preset post-processing strategy.

Since the adjustment of the target model in this embodiment is completed based on the existing capability data of the multiplexing model and assisted by a specific learning mechanism, and meanwhile, common iterative optimization can be periodically performed by using the data markers fed back by the current system, the system adjustment can be realized under a stable condition by virtue of low overhead, so as to match the rapidly changing trend of the focus of services such as short video/picture, and the like, and the effect is not greatly reduced while the change is rapidly adapted.

Example two

Fig. 2 is a flowchart of an embodiment of a method for model adjustment provided in the second embodiment of the present application, and this embodiment describes a model adjustment process based on a mode adjustment type on the basis of the first embodiment, and as shown in fig. 2, this embodiment may include the following steps:

step 210, determining a target adjustment type of the target model to be adjusted.

Step 220, obtaining a pattern library of the target model.

In this embodiment, the concept of "replay" in continuous learning is applied, and a pattern library is created. The replay is a special continuous learning method, and the main operation is to firstly deduce the recognition capability of the old data distribution by a new model according to the learned data or data prototypes and then learn the mode of the new data. The pattern library may include a plurality of pattern entries, and the pattern entries may be information generated based on the specific feature information extracted from the target model. When the mode entries are used for subsequently adjusting the target model, the recognition capability of the model to the old data mode can be quickly recovered, and the adjusted model is prevented from being disastrous to data which can be recognized once.

In one implementation, the pattern library may be stored in a server where the business system is located, or may be managed by a dedicated storage server. The corresponding pattern library can be found through the identification of the target model.

And step 230, obtaining the weight of each processing layer of the target model, wherein the processing layers comprise a fixed processing layer and a non-fixed processing layer.

In one implementation, the fixed and non-fixed process layers in the target model may be specified by a developer. For example, the developer can specify which specific processing layers are fixed processing layers and the rest are non-fixed processing layers according to the a priori knowledge. Alternatively, the developer may specify property information provided by the fixed processing layer, and the system may determine the fixed processing layer from among the plurality of processing layers of the target model based on the property information. For example, the feature extraction layer or the attention layer may be used as the fixed processing layer.

And 240, taking the weight of the pattern library and each processing layer as the existing capability data.

In this embodiment, the pattern library of the target model and the weights of the processing layers of the target model may be used as the existing capability data of the target model. The existing capability data refers to model data associated with the target adjustment type.

And step 250, initializing the weight of the non-fixed processing layer, and taking the initialized weight as the target weight needing to be adjusted in the target model.

When the target model is updated and adjusted, the weight of the fixed processing layer may be fixed, and then the weight of the non-fixed processing layer is initialized, so that the initialized weight of the non-fixed processing layer may be used as the target weight to be adjusted in the target model.

And step 260, acquiring target training data corresponding to the target weight.

The target training data is new training data, that is, training data for updating the target model.

Step 270, determining old pattern feature information of the target model based on the pattern entries in the pattern library.

In this step, all pattern entries may be fetched from the pattern library of the target model and processed for each pattern entry, generating old pattern feature information of the pattern entry.

In one embodiment, step 270 may further include the steps of:

and 270-1, generating one or more corresponding disturbance vectors for each mode entry by adopting a random vector generator, wherein the random vector generator is a random vector generator simulating multi-mode normal distribution.

And 270-2, with the mode entry as a center, perturbing the mode entry by respectively adopting the one or more perturbation vectors to obtain old mode feature information corresponding to the mode entry.

Wherein the perturbation vector is used to perturb the pattern entry to generate feature information (i.e., old pattern feature information) of the pattern entry. For one pattern entry, one perturbation vector may be generated, or multiple perturbation vectors may be generated. If a plurality of perturbation vectors are generated, the pattern entry is perturbed for a plurality of times, and an old pattern embedding (i.e. old pattern feature information) is generated corresponding to each perturbation vector.

The corresponding perturbation vector or vectors may be generated for each pattern entry by a random vector generator that simulates the normal distribution of the multiple patterns. And then, taking the current mode entry as a center, and disturbing the current mode entry by using the generated disturbance vectors to obtain the old mode characteristic information corresponding to the mode entry.

For example, assuming that the feature vector of a pattern entry is (x1, x2, x 3.., xn) and the perturbation vector is (d1, d 2.., dn), the perturbation result at one time may be (x1+ d1, x2+ d 2.., xn + dn). When multiple perturbation is needed, the feature vector of the pattern entry is taken as the center, and each perturbation vector acts on the "center" (for example, each perturbation vector is added to the feature vector of the center), so that a plurality of corresponding old patterns are obtained.

Step 280, inputting the target training data into the target model, and performing feature extraction on the target training data in the target model by using the fixed processing layer to obtain new feature information.

In this embodiment, in addition to calculating the old pattern embedding, it is necessary to calculate embedding (i.e., new feature information) of new data obtained based on new training data (i.e., target training data). In implementation, the target training data may be input into a target model where new feature information is computed by a fixed processing layer.

And 290, adjusting the target model by using the old mode feature information and the new feature information.

In this step, the old mode feature information and the new feature information are used together for the adjustment of the target model, and in one embodiment, the target weights may be updated by using the old mode feature information and the weights of all processing layers may be updated by using the new feature information. That is, the old pattern feature information is only used to update the initialized part of the weights (i.e., the target weights), and the new feature information is used to update all the weight parameters, so that the model can quickly recover the recognition capability of the old data pattern and learn the pattern of the new data.

In one embodiment, in order that the target model can quickly recover the memory of the old patterns, the loss function can be augmented with an L2 regularization term until the model converges when the target model is adjusted. The L2 regularization term is a regularization term combining laplacian (Laplace) estimation and Fisher coding. Laplace estimation is a posterior probability estimation method for estimating the approach of a new distribution to an old distribution by using a Laplace matrix and maximum likelihood estimation; fisher coding is a curvature-based coding mode, and the model can be forced to learn a specific data distribution by controlling the size of curvature. This allows the model to quickly learn the old data distribution early in the training of the model and then facilitate learning of the new data distribution later in the training.

In an embodiment, after the adjustment of the target model is completed, the present embodiment may further include the following steps:

after the target model is adjusted, extracting characteristic information from newly generated training data by adopting the adjusted target model; clustering all the obtained characteristic information to obtain a clustering center; adding the cluster centers as pattern entries into the pattern library.

In this embodiment, each time the adjustment of the model is completed, the newly converged model is used to extract embedding (feature information) for the newly generated training data, and these embedding may be a plurality of feature vectors or feature maps. The clustering center may then be obtained using a density clustering method for all the newly generated embeddings of the training data, which may then be stored as a pattern entry into the pattern repository.

In other implementations, instead of using a density clustering method to obtain the clustering center, a hierarchical clustering method can be used to determine the clustering center, and the hierarchical information is used to filter out the partial information with negative effects.

The embodiment can greatly reduce the space required for storing the 'replay prototypes' by storing the cluster centers and the disturbance recovery.

In the embodiment, aiming at the requirement of improving the data coverage diversity, the reuse of the old mode feature information is realized by reenacting the thought and creating the mode library, and the target model is adjusted by combining the new feature information obtained by the new training data, so that the capability of fast recovering is realized, and meanwhile, the mode of the new data is learned. On the premise of not expanding the weight parameter quantity of the model at all, the purpose of improving data coverage is achieved, the space overhead required by storage is reduced, and the multiplexing capability of the developed module is improved.

EXAMPLE III

Fig. 3 is a flowchart of an embodiment of a method for model adjustment provided in a third embodiment of the present application, where this embodiment describes a model adjustment process based on a task range adjustment type on the basis of the first embodiment, and as shown in fig. 3, this embodiment may include the following steps:

step 310, determining a target adjustment type of the target model to be adjusted.

In one embodiment, step 310 may further include the steps of:

acquiring model demand information input by a user aiming at the target model; obtaining existing model information of the target model; comparing the model requirement information with the existing model information to determine model extension information; acquiring one or more preset adjustment logics of adjustment types; and matching the model extension information with the adjustment logics of the one or more preset adjustment types, and taking the matched adjustment type as the target adjustment type.

And 320, acquiring the weight of each processing layer of the target model, wherein the processing layers comprise a low-layer processing layer, a middle-layer processing layer and a high-layer processing layer.

In this embodiment, the processing layers of the target model may be divided into a lower layer processing layer, a middle layer processing layer and a higher layer processing layer, and the dividing strategy may be artificially specified.

Step 330, obtaining the weight of each existing output branch of the target model.

For example, for a classification model that handles classification tasks, the model may have multiple output branches, one output branch corresponding to each class.

Step 340, using the weight of each processing layer and the weight of each existing output branch as existing capability data.

Step 350, generating a corresponding range extension branch for the model extension information.

In this embodiment, when the target model needs to extend the task range, model extension information, which is information for indicating that the target model needs to be extended, may be acquired. After the model extension information is obtained, a small processing branch, called a range extension branch, may be added to the target model for the model extension information.

In practice, for a scenario of extending the task range of the target model, the following two cases can be considered: one is to extend the task types not included in the current model output, namely to extend the unknown task range; and the other method is to expand or disassemble the task types of the existing output into more detailed types, namely expanding the range of the known tasks. In the embodiment, when the two situations are processed, the idea of transfer learning is applied, and the currently learned pattern recognition capability is transferred to a task in a wider range.

In order to maintain the generalization ability of the model and avoid the influence of negative migration (negative migration refers to a migration learning operation which brings adverse effects on the new field task learning) on the existing recognition ability, as shown in fig. 4, when the unknown task range of the model needs to be expanded, a specified output option can be set in the newly-built range expansion branch (i.e. the new cracking branch in fig. 4), and the specified output option can exemplarily include "other" types. The specified output option may correspond to unlabeled training data. When the known task range needs to be expanded, a specified output option does not need to be set in the newly-built range expansion branch, because the output of the range expansion branch is actually derived from a certain output of the original branch, and the non-tag data only needs to be concentrated on the specified output option of the original branch, so that the model can be enabled to be capable of continuing the recognition capability of the original type mode as much as possible.

Step 360, initialize the weight of the range extension branch.

In this step, the weight of the newly created range expansion branch may be the initial weight.

Step 370, using the weight of the initialization of the range extension branch and the weight of the higher processing layer as the target weight.

Step 380, respectively fixing the weight of the lower layer processing layer and the weight of the middle layer processing layer.

In one implementation, when the target model is trained for the first time, the lower layer processing layer and the middle layer processing layer of the target model may be trained first by a large-scale general data set, and the weight of the lower layer processing layer and the weight of the middle layer processing layer may be fixed. In each adjustment, the weight of the newly added range extension branch and the weight of the high-level processing layer are updated, so that the main part of feature extraction has strong generalization feature extraction capability, and the training overhead in each adjustment is reduced.

Step 390, obtaining target training data corresponding to the target weight.

For example, in this embodiment, the target training data may include training data before model expansion and training data after model expansion, and the training data after model expansion mainly refers to training data corresponding to the range expansion branch.

And 3110, adjusting the target weight by using the target training data.

In one implementation, when the target weight is adjusted by using the target training data, the loss function used may be a loss function based on domain matching, where the domain matching refers to a self-learning scheme for unlabeled data, and unknown data is generally learnt through reasoning from a part of existing data. The domain matching based loss function may include, for example, a loss function based on Laplace estimation.

During actual training, the gradient calculated by the existing output branch and the gradient calculated by the range expansion branch act together on the weight parameter to be updated. This can continue to enhance the generalization capability of the feature extraction part while adjusting. Meanwhile, the increase of the parameter quantity is controlled to the maximum extent by expanding in a small branch form, so that the excessively large model at the later stage is avoided; but also gives better scalability to services with increased characteristics.

In one embodiment, the existing capability data further includes a specified output option for the existing output branch; step 3110 may further include the steps of:

acquiring first training data corresponding to a designated output option of an existing output branch;

judging whether the range expansion branch contains a specified output option or not;

if so, acquiring second training data of the appointed output option of the range extension branch, and adjusting the weight of the range extension branch and the weight of the existing output branch in an automatic supervision mode based on the first training data and the second training data;

if not, based on the first training data, the weight of the range extension branch and the weight of the existing output branch are adjusted in an automatic supervision mode.

Specifically, when the model adjustment is performed, if it is determined that the range expansion branch includes the specified output option, it indicates that the current range expansion is the expansion of the unknown range. Then, the second training data of the specified output option of the range extension branch may be obtained, and the first training data corresponding to the specified output option of the existing output branch is combined to serve as the training data set. And then learning the training data set by adopting an automatic supervision mode so as to adjust the weight of the current range expansion branch and the weight of the existing output branch.

It should be noted that the specified output options (such as "other" types) of the existing output branches only back propagate the last layer of the decision layer to fine tune the parameters, and the specified output options are set to maintain the generalization capability of the model and avoid the influence of "negative migration" on the existing recognition capability.

On the other hand, if it is determined that the range expansion branch does not include the specified output option, it indicates that the current range is expanded to an expansion of the known range. When the weight of the current range extension branch is adjusted, the original task type corresponding to the range extension branch can be shielded, and the weight of the current range extension branch is mainly trained during training. During training, the weights of the range extension branches and the weights of the existing output branches can be adjusted in an automatic supervision mode based on the first training data.

In the embodiment, aiming at the requirement of expanding the task range of the same task, the idea of transfer learning is applied, and the currently learned pattern recognition capability is transferred to the task in a wider range. After the corresponding range extension branch is generated according to the model extension information, when the target model is adjusted, the weight of the range extension branch and the weight of a high-level processing layer are adjusted in an emphasized mode, and the extension in a small branch mode can control the increase of parameters to the maximum extent, avoid the overlarge model in the later stage and provide better extension capability for services with growth characteristics.

In addition, the embodiment respectively makes corresponding design for expanding the unknown task range and the known task range, and a space is reserved for further expanding the task range subsequently through the self-supervision learning of uncertain tag data.

Example four

Fig. 5 is a flowchart of an embodiment of a method for model adjustment provided in the fourth embodiment of the present application, where this embodiment describes a model adjustment process based on a task function adjustment type on the basis of the first embodiment, and as shown in fig. 5, this embodiment may include the following steps:

step 510, determining a target adjustment type of the target model to be adjusted.

In one embodiment, step 510 may further include the steps of:

acquiring model demand information input by a user aiming at the target model; obtaining existing model information of the target model; comparing the model requirement information with the existing model information to determine model extension information; acquiring one or more preset adjustment types of adjustment logic; and matching the model extension information with the adjustment logics of the one or more preset adjustment types, and taking the matched adjustment type as the target adjustment type.

Step 520, obtaining the weight of each existing functional branch of the target model.

Step 530, using the weight of each existing functional branch as the existing capability data.

And 540, generating a corresponding function extension branch aiming at the model extension information.

Step 550, initializing the weight of the function extension branch.

Step 560, constructing a mapping linkage layer between the functional extension branch and each existing functional branch, where the mapping linkage layer is used to map the feature information of the existing functional branch and determine a sharing weight from the feature information of the existing functional branch, and the sharing weight is used as the existing capacity data.

Step 570, using the weight of the extended branch initialization and the weight of the mapping connection layer as the target weight.

Step 580, based on the existing capability data and the target training data, making adjustments to the target model, the adjustments including adjustments to the target weights.

Similar to the task range adjustment type in the embodiment of fig. 3, the task range adjustment type model adjustment method also generates a corresponding extension branch for the model extension information, and this embodiment is referred to as a function extension branch and is used to carry a new task function. The newly built small function extension branch is connected with the output corresponding to the extended task function, and the function extension branch is trained by using the loss function related to the corresponding task function.

In practice, since the output of the new task function may be different from the original task function, for example, the split task function uses a complete thermodynamic diagram as the output, and the classification function only includes confidence sequences of multiple classes. Therefore, the function extension branch of the present embodiment is more complicated and diversified than the configuration of the range extension branch of fig. 3.

When the target model is adjusted, the weights of the function extension branches are initialized and targeted training is performed. Meanwhile, the weight of a part of feature extraction layers is shared with the old task function, mainly because the learning mechanism mainly aims at the change of the task function, the data distribution basically keeps consistent or fluctuates slightly, and the feature extraction capability of the original task function also can greatly help to the performance of the new task function.

Specifically, as shown in fig. 6, when a new task function is extended, a mapping connection layer (i.e., a connection layer in fig. 6) is constructed between the function extension branch and each existing function branch, which is mainly because the determination of the function extension branch needs to be combined with the feature extraction capability of the existing function branch, but is greatly affected by the output of the new task function when a decision is made and the feature information of the existing function branch needs to be mapped to be reasonably used by the new function. Meanwhile, because it is unclear which feature information of the existing functional branches is more beneficial to the decision of the new function, the feature extraction layers of the existing functional branches are connected with the mapping link layer, and the mapping link layer judges and selects proper shared feature information for use. During training, the mapping link layer is also trained together with the new function extension branch, and the gradient back propagation of the loss function is also covered to the mapping link layer and the new function extension branch.

In the embodiment, the task function is expanded by combining the mode of linking the mapping layer and the new function branch, so that the feature extraction capability of the old task function is utilized, and the proper mapping is provided for the decision of the new task function.

In addition, the embodiment integrates the ideas of continuous learning and transfer learning, which can not cause catastrophic forgetting of the learned ability in the past, but also can fulfill the aim of diffusing the ability of a single function field to multiple function fields, so that the system can keep up with the change of business requirements and data distribution.

EXAMPLE five

Fig. 7 is a block diagram of a structure of an embodiment of a model adjustment apparatus according to a fifth embodiment of the present application, which may include the following modules:

an adjustment type determining module 710, configured to determine a target adjustment type of a target model to be adjusted;

an existing capability data determining module 720, configured to determine existing capability data of the target model according to the target adjustment type;

a target weight determining module 730, configured to determine, according to the target adjustment type, a target weight that needs to be adjusted in the target model;

a target training data obtaining module 740, configured to obtain target training data corresponding to the target weight;

a model adjustment module 750 configured to adjust the target model based on the existing capability data and the target training data, wherein the adjustment includes an adjustment of the target weight.

In an embodiment, the adjustment type determining module 710 is specifically configured to:

acquiring model demand information input by a user aiming at the target model;

obtaining existing model information of the target model;

comparing the model requirement information with the existing model information to determine model extension information;

acquiring one or more preset adjustment types of adjustment logic;

and matching the model extension information with the adjustment logics of the one or more preset adjustment types, and taking the matched adjustment type as the target adjustment type.

In an embodiment, the existing capability data determining module 720 is specifically configured to:

obtaining a pattern library of the target model, wherein the pattern library comprises a plurality of pattern items;

acquiring the weight of each processing layer of the target model;

and taking the weight of the pattern library and each processing layer as the existing capability data.

In one embodiment, the handle layer comprises a fixed handle layer and a non-fixed handle layer; the target weight determination module 730 is specifically configured to:

initializing the weight of the non-fixed processing layer;

and taking the initialized weight as a target weight needing to be adjusted in the target model.

In one embodiment, the model adjustment module 750 may further include the following sub-modules:

an old mode feature information determining sub-module, configured to determine old mode feature information of the target model based on the mode entries in the mode library;

a new feature information acquisition sub-module, configured to input the target training data into the target model, and perform feature extraction on the target training data in the target model by using the fixed processing layer to obtain new feature information;

and the model adjusting submodule is used for adjusting the target model by adopting the old mode characteristic information and the new characteristic information.

In an embodiment, the old mode feature information determining submodule is specifically configured to:

generating one or more corresponding disturbance vectors aiming at each mode item by adopting a random vector generator, wherein the random vector generator is a random vector generator simulating multi-mode normal distribution;

and with the mode entry as a center, respectively disturbing the mode entry by adopting the one or more disturbance vectors to obtain the old mode characteristic information corresponding to the mode entry.

In one embodiment, the model adaptation submodule is specifically configured to:

updating the target weight by adopting the old mode characteristic information;

and updating the weights of all processing layers by adopting the new characteristic information.

In one embodiment, the apparatus may further include the following modules:

and a loss function expansion module, configured to expand a loss function by using an L2 regularization term when the target model is adjusted, where the L2 regularization term is a regularization term combining laplacian estimation and Fisher coding.

In one embodiment, the apparatus may further include the following modules:

the mode adding module is used for extracting characteristic information of newly generated training data by adopting the adjusted target model after the target model is adjusted; clustering all the obtained characteristic information to obtain a clustering center; adding the cluster center as a pattern entry into the pattern library.

In another embodiment, the target adjustment type includes a task range adjustment type, and the existing capability data determining module 720 is specifically configured to:

acquiring the weight of each processing layer of the target model;

acquiring the weight of each existing output branch of the target model;

and taking the weight of each processing layer and the weight of each existing output branch as the existing capacity data.

In one embodiment, the processing layers include a higher layer processing layer; the target weight determination module 730 is specifically configured to:

generating a corresponding range extension branch for the model extension information;

initializing weights of the range extension branches;

and taking the weight of the initialization of the range expansion branch and the weight of the high-layer processing layer as the target weight.

In one embodiment, the treatment layer further comprises a lower treatment layer and a middle treatment layer; the model adjustment module 750 is specifically configured to:

respectively fixing the weight of the low-layer treatment layer and the weight of the middle-layer treatment layer;

and adjusting the target weight by adopting the target training data.

In one embodiment, the existing capability data further includes a specified output option for an existing output branch;

the model adjustment module 750 is specifically configured to:

acquiring first training data corresponding to a specified output option of an existing output branch;

In another embodiment, the existing capability data determining module 720 is specifically configured to:

acquiring the weight of each existing functional branch of the target model;

and taking the weight of each existing functional branch as the existing capability data.

In an embodiment, the target weight determining module 730 is specifically configured to:

generating a corresponding function extension branch aiming at the model extension information;

initializing the weight of the function extension branch;

constructing a mapping connection layer between the function extension branch and each existing function branch, wherein the mapping connection layer is used for mapping the feature information of the existing function branch and determining a sharing weight from the feature information of the existing function branch, and the sharing weight is used as the existing capacity data;

and taking the weight of the initialization of the extended branch and the weight of the mapping splicing layer as the target weight.

The model adjusting device provided by the embodiment of the present application can execute the method for adjusting a model according to any one of the first to fourth embodiments of the present application, and has functional modules and beneficial effects corresponding to the execution method.

EXAMPLE six

Fig. 8 is a flowchart of an embodiment of a service processing method according to a sixth embodiment of the present application, where the method includes the following steps:

step 810, acquiring service data to be processed;

step 820, loading the target model obtained according to any one of the first embodiment to the fourth embodiment;

and 830, processing the service data by adopting the target model.

In this embodiment, the target model used in the service processing is a real-time adjusted model, and new functions are extended for changing service requirements and data distribution on the premise of maintaining the original model capability as much as possible. Because the adjustment is completed by multiplexing the existing model and assisting a specific learning mechanism, and the data marks fed back by the model in real time are used for periodically carrying out common iterative optimization, the model adjustment can be realized under the stable condition by means of lower expenditure so as to match the rapidly changing trend of the service focus, and the effect can not be greatly reduced while the change is rapidly adapted.

EXAMPLE seven

Fig. 9 is a flowchart of an embodiment of a service processing apparatus according to a seventh embodiment of the present application, where the service processing apparatus may include the following modules:

a service data obtaining module 910, configured to obtain service data to be processed;

an object model loading module 920, configured to load an object model obtained according to any one of the first to fourth embodiments;

a business processing module 930, configured to process the business data by using the target model.

The service processing device provided by the embodiment of the present application can execute the service processing method according to the seventh embodiment of the present application, and has the corresponding functional modules and beneficial effects of the execution method.

Example eight

Fig. 10 is a schematic structural diagram of a service processing apparatus according to an eighth embodiment of the present application, as shown in fig. 10, the service processing apparatus includes a processor 1010, a memory 1020, an input device 1030, and an output device 1040; the number of the processors 1010 in the service processing device may be one or more, and one processor 1010 is taken as an example in fig. 10; the processor 1010, the memory 1020, the input device 1030, and the output device 1040 in the service processing apparatus may be connected by a bus or other means, and fig. 10 illustrates an example of connection by a bus.

The memory 1020 is used as a computer-readable storage medium for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to any one of the first to fourth embodiments or the sixth embodiment in the embodiments of the present application. The processor 1010 executes software programs, instructions and modules stored in the memory 1020 to execute various functional applications and data processing of the service processing device, that is, to implement the method mentioned in any one of the first to fourth embodiments or the sixth embodiment.

The memory 1020 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 1020 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 1020 may further include memory located remotely from the processor 1010, which may be connected to a device/terminal/server over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 1030 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the service processing apparatus. Output device 1040 may include a display device such as a display screen.

Example nine

An embodiment seventh of the present application further provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform the method of any of the first to fourth method embodiments or the sixth method embodiment.

Of course, the storage medium provided in the embodiments of the present application contains computer-executable instructions, and the computer-executable instructions are not limited to the method operations described above, and may also perform related operations in the method provided in any embodiments of the present application.

EXAMPLE ten

An embodiment tenth of the present application further provides a computer program product, which includes computer-executable instructions, which when executed by a computer processor, are configured to perform any one of the method embodiments one to four or the method of embodiment six.

Of course, the computer program product provided in the embodiments of the present application has computer-executable instructions that are not limited to the method operations described above, and may also perform related operations in the method provided in any embodiments of the present application.

From the above description of the embodiments, it is obvious for those skilled in the art that the present application can be implemented by software and necessary general hardware, and certainly can be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions to enable a service processing device (which may be a personal computer, a server, or a network device) to execute the methods described in the embodiments of the present application.

It should be noted that, in the embodiment of the apparatus, the included units and modules are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the application.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present application and the technical principles employed. It will be understood by those skilled in the art that the present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the application. Therefore, although the present application has been described in more detail with reference to the above embodiments, the present application is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present application, and the scope of the present application is determined by the scope of the appended claims.

Claims

1. A method of model tuning, the method comprising:

determining a target adjustment type of a target model to be adjusted;

acquiring target training data corresponding to the target weight;

2. The method of claim 1, wherein determining a target adjustment type for the target model to be adjusted comprises:

acquiring model demand information input by a user aiming at the target model;

obtaining existing model information of the target model;

acquiring one or more preset adjustment types of adjustment logic;

3. The method of claim 1 or 2, wherein the target adjustment type comprises a mode adjustment type, and wherein determining existing capability data of the target model according to the target adjustment type comprises:

acquiring the weight of each processing layer of the target model;

4. The method of claim 3, wherein the process layers comprise a fixed process layer and a non-fixed process layer;

determining the target weight needing to be adjusted in the target model according to the target adjustment type comprises the following steps:

initializing the weight of the non-fixed processing layer;

5. The method of claim 4, wherein the adjusting the target model based on the existing capability data and the target training data comprises:

determining old pattern feature information of the target model based on pattern entries in the pattern library;

inputting the target training data into the target model, and performing feature extraction on the target training data in the target model by adopting the fixed processing layer to obtain new feature information;

and adjusting the target model by adopting the old mode characteristic information and the new characteristic information.

6. The method of claim 5, wherein determining old pattern feature information for the target model based on pattern entries in the pattern library comprises:

7. The method of claim 5, wherein the adapting the target model using the old mode feature information and the new feature information comprises:

updating the target weight by adopting the old mode characteristic information;

8. The method of claim 5, further comprising:

when the target model is adjusted, a loss function is expanded by adopting an L2 regular term, and the L2 regular term is a regular term combining Laplace estimation and Fisher coding.

9. The method of claim 3, further comprising:

after the target model is adjusted, extracting characteristic information from newly generated training data by adopting the adjusted target model;

clustering all the obtained characteristic information to obtain a clustering center;

adding the cluster center as a pattern entry into the pattern library.

10. The method of claim 2, wherein the target adjustment type comprises a task scope adjustment type, and wherein determining existing capability data for the target model based on the target adjustment type comprises:

acquiring the weight of each processing layer of the target model;

acquiring the weight of each existing output branch of the target model;

11. The method of claim 10, wherein the processing layer comprises a higher processing layer;

initializing weights of the range extension branches;

12. The method of claim 11, wherein the treatment layers further comprise a lower treatment layer and a middle treatment layer;

adjusting the target model based on the existing capability data and the target training data includes:

and adjusting the target weight by adopting the target training data.

13. The method of claim 11, wherein the existing capability data further comprises a specified output option for an existing output branch;

the adjusting the target weight by using the target training data includes:

14. The method of claim 2, wherein the target adjustment type comprises a task function adjustment type, and wherein determining existing capability data for the target model based on the target adjustment type comprises:

acquiring the weight of each existing functional branch of the target model;

15. The method of claim 14, wherein determining the target weights to be adjusted in the target model according to the target adjustment type comprises:

initializing the weight of the function extension branch;

16. A method for processing a service, the method comprising:

acquiring service data to be processed;

loading a target model obtained according to the method of any one of claims 1-15;

and processing the service data by adopting the target model.

17. An apparatus for model adjustment, the apparatus comprising:

18. A traffic processing apparatus, characterized in that the apparatus comprises:

an object model loading module for loading an object model obtained according to the method of any one of claims 1-15;

19. A service processing device, characterized in that the service processing device comprises:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-16.

20. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 16.

21. A computer program product comprising computer-executable instructions for implementing the method of any one of claims 1-16 when executed.