CN116797464A

CN116797464A - Computing method, computing device, computer apparatus, and storage medium

Info

Publication number: CN116797464A
Application number: CN202210220369.8A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Cambrian Kunshan Information Technology Co ltd
Current assignee: Cambrian Kunshan Information Technology Co ltd
Priority date: 2022-03-08
Filing date: 2022-03-08
Publication date: 2023-09-22

Abstract

The present disclosure relates to a computing method, apparatus, computer device, and storage medium. The method comprises the following steps: restoring the sample image based on the generated network model to generate a first output image; calculating a first gradient from the first output image and the first loss function; calculating a second gradient according to a first discrimination result of the discrimination network model on the first output image; calculating a second gradient bias according to a second discrimination result of the discrimination network model on the label image corresponding to the sample image and a second loss function; taking the difference value between the second gradient and the second gradient bias as a third gradient; calculating a target gradient according to the first gradient and the third gradient; and updating parameters of the generated network model according to the target gradient to complete model training of the generated network model. The model training speed can be improved, the conflict of the optimization directions of the two loss functions in the training process can be balanced, and the precision and the accuracy of the model obtained through training can be improved.

Description

Computing method, computing device, computer apparatus, and storage medium

Technical Field

The present disclosure relates to the field of computer technology, and in particular, to a computing method, an apparatus, a computer device, and a storage medium.

Background

With the development of neural network technology, the neural network technology is also widely developed and applied in image reconstruction and restoration. To obtain a better generator (generator) capable of restoring the input image to output a more realistic image, a required generator is obtained by training the generator and a discriminator (also called discriminator) synchronously. In the related art, a generator obtained by model training is performed by using a reconstruction loss function and an antagonism loss function (universal loss), but the output image of the trained generator has the problems of edge deformation, wrong texture, color cast and the like.

Disclosure of Invention

Based on this, it is necessary to provide a computing method, an apparatus, a computer device, and a storage medium in order to address the above technical problems.

According to an aspect of the present disclosure, there is provided a computing method, the method comprising:

restoring the sample image based on the generated network model to generate a first output image;

calculating a first gradient from the first output image and a first loss function;

calculating a second gradient according to a first discrimination result of the discrimination network model on the first output image;

Calculating a second gradient bias according to a second discrimination result of the discrimination network model on the label image corresponding to the sample image and a second loss function;

taking the difference value between the second gradient and the second gradient bias as a third gradient;

calculating a target gradient according to the first gradient and the third gradient;

and updating parameters of the generated network model according to the target gradient to complete model training of the generated network model.

In one possible implementation, calculating the target gradient from the first gradient and the third gradient includes:

adjusting the first gradient and/or the third gradient according to the first p-norm of the first gradient and the third p-norm of the third gradient so that the adjusted first gradient is consistent with the p-norm of the third gradient;

and taking the sum of the first gradient and the third gradient with consistent p-norms as the target gradient.

In one possible implementation, adjusting the first gradient and/or the third gradient according to the first p-norm of the first gradient and the third p-norm of the third gradient so that the adjusted first gradient is consistent with the p-norm of the third gradient includes:

Determining a target p-norm according to the first p-norm and the third p-norm, wherein the target p-norm is larger than or equal to the minimum value of the first p-norm and the third p-norm and smaller than or equal to the maximum value of the first p-norm and the third p-norm;

and adjusting the gradient, of which the p-norms are inconsistent with the target p-norms, in the third gradient and the first gradient so that the p-norms of the adjusted first gradient and the third gradient are consistent.

before a target gradient is calculated according to the first gradient and the third gradient, the first gradient and the third gradient are adjusted according to a gradient adjustment model.

In one possible implementation, the method includes:

after updating the parameters of the generated network model according to the target gradient, restoring the sample image based on the updated generated network model to generate a second output image;

calculating a first loss value according to the second output image and the first loss function;

Calculating a second loss value for updating according to a discrimination result of the discrimination network model on the second output image and a second loss function;

calculating a third loss value of the gradient value adjustment model according to the first loss value for updating and the second loss value for updating;

and updating the gradient adjustment model according to the third loss value, and completing model training of the loss value adjustment model.

In one possible implementation, the first loss function comprises a reconstruction loss function and the second loss function comprises an anti-loss function.

According to another aspect of the present disclosure, there is provided a computing device, the device comprising:

the first image acquisition module is used for carrying out recovery processing on the sample image based on the generated network model to generate a first output image;

the first gradient acquisition module is used for calculating a first gradient according to the first output image and a first loss function;

the second gradient acquisition module is used for calculating a second gradient according to a first discrimination result of the discrimination network model on the first output image;

the bias acquisition module is used for calculating a second gradient bias according to a second discrimination result of the discrimination network model on the label image corresponding to the sample image and a second loss function;

A third gradient acquisition module, configured to take a difference value between the second gradient and the second gradient bias as a third gradient;

the target gradient acquisition module is used for calculating a target gradient according to the first gradient and the third gradient;

and the first updating module is used for updating the parameters of the generated network model according to the target gradient so as to complete model training of the generated network model.

In one possible implementation manner, the target gradient acquisition module includes:

a first gradient adjustment sub-module, configured to adjust the first gradient and/or the third gradient according to a first p-norm of the first gradient and a third p-norm of the third gradient, so that the adjusted first gradient is consistent with the p-norm of the third gradient;

and the gradient calculation sub-module is used for taking the sum of the first gradient and the third gradient with the consistent p-norm as the target gradient.

In one possible implementation, the first gradient adjustment sub-module includes:

a target-norm determination submodule, configured to determine a target p-norm according to the first p-norm and the third p-norm, where the target p-norm is greater than or equal to a minimum value of the first p-norm and the third p-norm and less than or equal to a maximum value of the first p-norm and the third p-norm;

And the adjustment submodule is used for adjusting the gradients of which the p-norms are inconsistent with the target p-norms in the third gradient and the first gradient so that the p-norms of the adjusted first gradient and the third gradient are consistent.

and the second gradient adjustment submodule is used for adjusting the first gradient and the third gradient according to a gradient adjustment model before calculating the target gradient according to the first gradient and the third gradient.

In one possible implementation, the apparatus further includes:

the second image acquisition module is used for updating the parameters of the generated network model according to the target gradient, and then carrying out recovery processing on the sample image based on the updated generated network model to generate a second output image;

the first calculation module is used for calculating a first loss value according to the second output image and the first loss function;

the second calculation module is used for calculating a second loss value for updating according to a discrimination result of the discrimination network model on the second output image and a second loss function;

a third calculation module for calculating a third loss value of the gradient value adjustment model according to the first loss value for updating and the second loss value for updating;

And the second updating module is used for updating the gradient adjustment model according to the third loss value, and model training of the loss value adjustment model is completed.

According to another aspect of the present disclosure, there is provided an artificial intelligence chip comprising the above-described computing device.

According to an aspect of the present disclosure, there is provided another electronic device including the above artificial intelligence chip.

According to another aspect of the present disclosure, there is provided a board including: a memory device, an interface device, and a control device, and an artificial intelligence chip as described above;

wherein the artificial intelligent chip is respectively connected with the storage device, the control device and the interface device;

the storage device is used for storing data;

the interface device is used for realizing data transmission between the artificial intelligent chip and external equipment;

the control device is used for monitoring the state of the artificial intelligent chip;

wherein the memory device includes: each group of storage units is connected with the artificial intelligent chip through a bus, and the storage units are as follows: DDR SDRAM;

The chip comprises: the DDR controller is used for controlling data transmission and data storage of each storage unit;

the interface device is as follows: standard PCIE interfaces.

According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.

The calculation method and the calculation device provided by the embodiment of the disclosure can improve the model training speed, balance the conflict of the optimization directions of the two loss functions in the training process, and improve the precision and the accuracy of the model obtained by training, so that the output image can be closer to a real image without edge deformation, error texture or color cast.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features and aspects of the present disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 shows a flowchart of a computing method according to an embodiment of the present disclosure.

Fig. 2 shows a schematic diagram of gradient adjustment by way of implementation one embodiment of the present disclosure.

FIG. 3 illustrates a schematic diagram of a process for adjusting gradients by a gradient adjustment model and updating the gradient adjustment model in an embodiment of the disclosure.

Fig. 4 is a block diagram illustrating a combination processing apparatus 1200 according to an embodiment of the present disclosure.

Fig. 5 is a schematic diagram illustrating the structure of a board 1300 according to an embodiment of the disclosure.

Detailed Description

The following description of the technical solutions in the embodiments of the present disclosure will be made clearly and completely with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are some embodiments of the present disclosure, but not all embodiments. Based on the embodiments in this disclosure, all other embodiments that a person skilled in the art would obtain without making any inventive effort are within the scope of protection of this disclosure.

It should be understood that the terms "first," "second," "third," and "fourth," etc. in the claims, specification, and drawings of this disclosure are used for distinguishing between different objects and not for describing a particular sequential order. The terms "comprises" and "comprising" when used in the specification and claims of this disclosure are taken to specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the present disclosure is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in this disclosure and in the claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the present disclosure and claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in this specification and the claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

In the related art, in the synchronous optimization training process of the generator and the discriminator, the reconstruction loss function and the antagonism loss function are utilized for updating and optimizing, but because the optimization directions of the two loss functions are not completely consistent, even larger conflict can exist, in order to solve the influence caused by the conflict, the proportional coefficients of the two loss functions are generally set according to experience in the related art. When the proportionality coefficient of the reconstruction loss function is increased, the output image of the generator contains more blurring and fewer errors; the output image of the generator contains less blurring and more errors when increasing the scaling factor of the contrast loss function. The specific gravity of each loss function is selected empirically, and thus good consideration cannot be given. At the same time, the counterloss function comes from an constantly optimized arbiter, the range of values itself may change, and a fixed scaling factor may not be able to effectively balance the specific gravity of the two loss functions dynamically changing in the range of values in determining the direction of optimization.

In order to solve the technical problems, the present disclosure provides a computing method and a computing device, which can improve the model training speed, balance the conflict of the optimization directions of two loss functions in the training process, and improve the precision and accuracy of the model obtained by training, so that the output image can be closer to a real image without edge deformation, error texture or color cast. Fig. 1 shows a flowchart of a computing method according to an embodiment of the present disclosure. As shown in fig. 1, the method may be applied to a processor, and the method includes steps S11 to S17.

In step S11, a restoration process is performed on the sample image based on the generated network model, and a first output image is generated.

Wherein generating the network model may be a model created from the image reconstruction task, which may be a generator. After inputting the images into the generated network model, the generated network model can output a network output image (first output image, second output image, etc., as described herein). In order to train out the generated network model meeting the requirement, the training is completed by means of the discrimination network model in the training process. The discrimination network model may be a discriminator, configured to perform a class label probability discrimination on an image output by the generation network model, where the output discrimination result includes probabilities (also referred to herein as class probabilities) of the input image and belonging to different class labels, and the class labels include: true (True), which can be represented by 1, represents that the image input to the discrimination network is an image obtained by True shooting; false (False), which may be denoted by 0, represents that the image input to the discrimination network is not a true photographed image. Wherein a plurality of sample images and a label image corresponding to each sample image may be acquired (ground truth image) prior to model training. Wherein the class label of the label image is true. The category label at the first output image should be "false".

In step S12, a first gradient is calculated from the first output image and a first loss function.

In this embodiment, the first Loss function may be a reconstruction Loss function, which may include a mean square error Loss function (mean square error Loss function, MSELESS), a 1-norm Loss function (L1 Loss), and so on. After detecting the first output image that is output by the generated network model, a first loss value corresponding to the first output image may be calculated according to the first loss function, and then a first gradient may be calculated according to the first loss value.

In step S13, a second gradient is calculated based on the first discrimination result of the discrimination network model for the first output image and the second loss function.

In this embodiment, the second loss function may include an opposing loss function (absolute loss), where the opposing loss function is also denoted as GAN loss. The contrast loss function may include a bi-categorical cross entropy loss function (binary cross entropy loss, BCELoss), and the like. The first output image may be input to a discrimination network model to obtain a first discrimination result of the discrimination network model for the first output image, and then a second loss value corresponding to the first output image is calculated according to the first discrimination result, the second loss function, and the positive and negative class labels (i.e., the class labels "true" and "false" described above). And then a second gradient is calculated from the second loss value.

In step S14, the second gradient bias is calculated from the second discrimination result of the discrimination network model for the label image corresponding to the sample image and the second loss function.

The label image corresponding to the first output image may be input to the discrimination network model to obtain a second discrimination result of the discrimination network model for the label image, and then a third loss value corresponding to the label image is calculated according to the second discrimination result, the second loss function and the positive and negative type labels (i.e. the type labels "true" and "false" described above). And further calculating a second gradient bias based on the third loss value.

In step S15, the difference between the second gradient and the second gradient bias is taken as a third gradient.

In step S16, a target gradient is calculated from the first gradient and the third gradient.

In this embodiment, under the condition that the target gradient is zero, it is determined that the first output image output by the generated network model and the corresponding label image may already reach the same (for example, the similarity reaches the specified degree, and is the same completely), and update optimization of the generated network model and the determined network model may be stopped, so as to end model training. The third gradient is used as a basis for calculating the target gradient in the subsequent step, so that the target gradient which is generated to be zero can be ensured under the condition that the first output image output by the generated network model is consistent with the corresponding label image, and the model training can be stopped smoothly.

In the related art, the updating optimization of generating the network model and discriminating the network model is performed by using the counterdamage function and the reconstruction damage function, but no matter whether the first output image output by the network model is consistent with the corresponding label image, a non-zero gradient is generated, and it cannot be determined when the model training can be stopped. The problems in the related art are described below by taking the case that the contrast loss function in the related art is a two-class cross entropy loss function (binary cross entropy loss, BCELoss). The method comprises the following steps:

BCELoss is:wherein y is a discrimination tag (which is True or False), and +.>Is the class probability output by the class network model. In the actual training process->It is almost impossible to reach 0 or 1, which is a floating point number in the (0, 1) range. At this time, the BCELoss pair +.>The gradient of (2) is not always 0. But also because of->Obtained from the first output image out through a discriminating network model, easy to find +.>Second gradient of the first output image out>Is not 0, thus there is ∈>And is not 0. This means that even if the first output image generated by the generated network model is identical to the tag map image gt, the second gradient calculated based on the contrast loss function is non-zero, resulting in the generated network model not staying at an ideal state position where the first output image out consistent with the tag image gt can be generated.

The reconstruction loss function is realized by adopting a mode of performing difference recalculation norm, so that when the first output image out is consistent with the corresponding label image gt, the first gradient obtained by generating the network model is 0, and model training can be stopped when the generated network model is in an ideal state.

In the related art, the generation of the network model is to determine a total gradient based on the first gradient and the second gradient, and then perform updating optimization of the generation network model based on the total gradient. However, in the related art, the total gradient is determined by adopting a weighted summation mode of the first gradient and the second gradient, and the total gradient is always greater than zero because the second gradient is always greater than zero, so that model training can be stopped when the generated network model is in an ideal state.

In the embodiment of the present disclosure, however, the loss countermeasure function l _rec And reconstructing a loss function l _adv Can stay at the ideal position of the generated network model, i.e. the first output image out is consistent with the corresponding label image gtThe combination of parameters, introducing a second gradient bias to force the generation of the network model when the first output image out is consistent with the corresponding label image gt, does not receive any non-0 gradients.

Then at l _rec And l _adv In the case of simply summing the target gradients in an additive manner (i.e. directly summing the first gradient and the third gradient as target gradient), l is used _rec Bias guide for the tag image gt, second gradient biasTarget gradient acquired by parameter w for generating network model>The method comprises the following steps: />

Wherein, the liquid crystal display device comprises a liquid crystal display device,to reconstruct the loss function l _rec Gradient calculated for the first output image out, < >>Gradient of parameter w for the first output image out,/->Is the first gradient. />For the third gradient to be a third gradient,a gradient calculation function is calculated for calculating a third gradient from the second gradient and the second gradient bias.

In one possible implementation, the gradient calculation function is In the case of (a), when the first output image out coincides with the tag image gt,/c> Then->

In one possible implementation, the manner of calculating the target gradient from the first gradient and the third gradient may include a first manner and a second manner:

mode one, the target gradient For the first gradient->Is the second gradient.

Mode two:

the first gradient and/or the third gradient may be adapted according to a first p-norm of the first gradient and a third p-norm of the third gradient such that the adapted first gradient coincides with the p-norm of the third gradient. And taking the sum of the first gradient and the third gradient with consistent p-norms as the target gradient.

Wherein adjusting the first gradient and/or the third gradient according to the first p-norm of the first gradient and the third p-norm of the third gradient may include any one of the following implementations:

the implementation mode is as follows: the smaller of the first and third p-norms is determined to be the target p-norm, and then one of the first and third gradients, the p-norms of which are not the target p-norms, is adjusted such that, after adjustment, the p-norms of both the first and third gradients are the target p-norms.

The implementation mode II is as follows: the larger of the first and third p-norms is determined to be the target p-norm, and then one of the first and third gradients, the p-norms of which are not the target p-norms, is adjusted such that, after adjustment, the p-norms of both the first and third gradients are the target p-norms.

And the implementation mode is three: and determining a target p-norm according to the first p-norm and the third p-norm in a preset mode, and then adjusting the first gradient and the third gradient so that the p-norms of the first gradient and the third gradient are both the target p-norms after adjustment. The preset mode may include: and determining the average value of the first p-norm and the third p-norm as a target p-norm, carrying out weighted summation on the first p-norm and the third p-norm to obtain a target p-norm, and determining that the value is in a value range interval formed by the first p-norm and the third p-norm by weighted summation on the first p-norm and the third p-norm, namely that the target p-norm is smaller than or equal to the larger one of the first p-norm and the third p-norm and is larger than or equal to the smaller one of the first p-norm and the third p-norm.

The adjustment of the first gradient and/or the third gradient refers to multiplying the gradient by a scalar coefficient, and p-norms of the two gradients after multiplying the coefficient are consistent and are all target p-norms.

In this way, through the first, second and third implementation modes, the p-norms of the first gradient and the third gradient can be adjusted to be the target p-norms, so that the stable running of the model training process can be ensured, and the occurrence of model training breakdown is avoided. For example, fig. 2 shows a schematic diagram of gradient adjustment by implementation one of the embodiments of the present disclosure. As shown in fig. 2 "before adjustment", when there is a large angle between the directions in which the first gradient and/or the third gradient are directed before adjustment, a large gradient G2 (larger of the first gradient and the third gradient) having a large value and a large p-norm is projected toward a small gradient G1 (smaller of the first gradient and the third gradient) having a small value and a small p-norm, the length of the projection G2 of the large gradient G2 may exceed the length G1 of the small gradient G1 itself, which causes the added target gradient direction to include the opposite direction of the small gradient G1 and the vertical direction of the small gradient G1.

In the gradient descent method, the gradient direction is in the direction of the fastest change of the corresponding loss function value, and the normal direction value changes slowly. When the "pre-adjustment" condition shown in fig. 2 occurs, the component of the small gradient G1 in the opposite direction will cause the loss function corresponding to the small gradient G1 to become larger after being applied to update the network parameters, and the loss function is not sensitive enough in the vertical direction, so that the influence of the opposite component is not compensated. I.e. the effect of the loss function corresponding to the small gradient G1 is covered by the loss function corresponding to the large gradient G2. While an increase in the loss function represents a greater error occurrence under the corresponding evaluation criteria. Wherein the fastest gradient descent direction refers to the parameter adjustment direction in which the value of the loss function falls fastest, which in network optimization generally refers to the opposite direction of the derivative of the loss function with respect to the parameters of the network model. The derivative of the loss function with respect to the parameter is generally denoted as the gradient direction, which itself is the direction of the steepest rise of the loss function, and the finite component of the step in this direction is subtracted from the original parameter during the optimization of the parameter, thus achieving the gradient descent. In the example of fig. 2, once a gradient projection occurs that is greater than the other gradient itself (G2 is greater than G1), the target gradient contains the steepest descent direction of the loss function corresponding to G2, which introduces a steepest gradient ascent direction component in the updated amount of network parameters when the original parameter optimization is performed using the target gradient.

The large gradient G2 may be adjusted as in implementation two described above to obtain an adjusted large gradient G2 'as shown in the "adjusted" section of fig. 2, where the p-norm of the small gradient G1 is identical to the p-norm of the adjusted large gradient G2'. Since the p-norm of the small gradient G1 is consistent with that of the adjusted large gradient G2', even if the included angle between the p-norm and the p-norm is large, the length of projection does not exceed the length of the gradient to be projected after one gradient (G1 or G2') is projected to the other gradient. At this time, the target gradient calculated based on G1 and G2', and the parameter update amount of the generated network model calculated according to the target gradient will not include the fastest rising direction of any one of the loss functions (i.e., the first loss function and the second loss function), so as to alleviate the unbalanced state of the two loss functions.

The first gradient and the third gradient both comprise (1) the partial derivative of the corresponding loss function to the first output image out and (2) the partial derivative of the first output image out to the network parameter w according to the chain law. In combination with the numerical changes of the first and second loss functions during model training, in one illustrative example of the second implementation, only the bias of the loss function to w may be modified. Taking the p-norm as the 1-norm as an example, a simple implementation may be: Wherein (1)>Representing the 1-norm of the first loss function1-norm +.>The ratio between them. />For the first gradient->Is the third gradient.

Mode three:

the first adjustment of the first gradient and/or the second gradient may be performed by a gradient adjustment model (coordinator) before the target gradient is calculated in the second manner, then the second adjustment with the consistent p-norm in the second manner is performed, and finally the target gradient is calculated according to the first gradient and the third gradient obtained after the two adjustments. And updating the gradient adjustment model after completing updating and optimizing the generated network model according to the target gradient.

FIG. 3 is a schematic diagram illustrating a process of adjusting a gradient by a gradient adjustment model and updating the gradient adjustment model according to an embodiment of the present disclosure, where the first adjustment of the first gradient and/or the second gradient may include: after the steps S11 to S15 are executed, the first gradient and the second gradient are input into the gradient adjustment model, and the adjusted first gradient and the adjusted third gradient output by the model are obtained. And then, continuously executing the step of the second mode to calculate the target gradient. The gradient adjustment model can adjust the first gradient and the third gradient, can synchronously optimize the generation network model and the category network model in the whole model training process, and ensures the accuracy of the adjustment of the first gradient and the third gradient. The first gradient and/or the third gradient can be adjusted by the gradient adjustment model to balance the influence of the two gradients on parameter updating, so that the generated network model is gradually updated and optimized.

In step S17, the parameters of the generated network model are updated according to the target gradient, so as to complete the model training of the generated network model.

Wherein, after calculating the target gradient, a determination as to whether the target gradient is zero may be made first. If the target gradient is zero, it may be determined that the generated network model has been able to output an output image consistent with the corresponding label image, and the model may end. If the target gradient is not zero, it can be determined that the training of generating the network model still needs to be continued, and the parameter updating of generating the network model is performed according to the target gradient.

After the updating of the generated network model is completed, the first output image and the corresponding label image are respectively input into the judging network model, and the parameter updating of the judging network model is carried out according to the output result.

In one possible implementation, after the updating of the parameters of the generated network model in step S17, a new generated network model is obtainedType (2). Then, as shown in fig. 3, updating the gradient adjustment model includes: and (3) inputting a sample image used for updating the generated network model at this time (namely, the sample image of the generated network model input in the first row in fig. 3 and the sample image of the generated network model input in the second row in fig. 3 are the same sample image) into a new generated network model to obtain a second output image output by the new generated network model. Then, the first loss value and the second loss value corresponding to the second output image are calculated (the calculation process is similar to the process of calculating the first loss value and the second loss value corresponding to the first output image in step S12 and step S13, and is not repeated here). Calculating a third loss value of the gradient adjustment model according to the first loss value and the second loss value, and then calculating a third loss value l according to the third loss value _coor And calculating the gradient of the gradient adjustment model, and finally, carrying out parameter updating optimization of the gradient adjustment model according to the gradient of the gradient adjustment model.

In one possible implementation, calculating the third loss value from the first loss value and the second loss value of the second output image may be calculated by the following formula:

l _coor ＝ReLU(l _rec,new -l _rec,old )+ReLU(l _adv,new -l _adv,old )

wherein, the liquid crystal display device comprises a liquid crystal display device,l _rec,old a first loss value representing the same first output image as the sample image corresponding to the second output image, i.e., the first loss value of the first line in fig. 3. l (L) _adv,old Representing a second loss value of the same first output image as the sample image corresponding to the second output image, i.e. the second loss value of the first line in fig. 3. l (L) _rec,new Representing a first penalty value for the second output image, i.e. the first penalty value for the second line in fig. 3. l (L) _adv,new Representing a second penalty value for the second output image, i.e. the second penalty value for the second line in fig. 3.

ReLU(l _rec,new -l _rec,old ) The value of (2) is greater than zero, proving that for the same sample imageIn other words, the loss value of the corresponding first loss function after the network model is updated is increased, and the gradient adjustment model needs to be updated and optimized; also, reLU (l _adv,new -l _adv,old ) When the value of the second loss function is larger than zero, it is proved that the loss value of the corresponding second loss function after the network model is updated is larger for the same sample image, and the optimized gradient adjustment model also needs to be updated. And l is _coor The value of (2) may represent: if the new loss function is smaller than the old loss function value, relu post l _coor For 0, it is indicated that both loss functions can be reduced in value after one update, and the obtained optimization is indicated, and this loss function may not be cared for this time, and the update of the gradient adjustment model may not be performed; if the new loss function is larger than the old loss function value, relu post l _coor And not 0, this time the optimized gradient adjustment model needs to be updated.

The present disclosure also provides a computing device for application to a processor, the device comprising:

In one possible implementation, the apparatus further includes:

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present disclosure is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present disclosure. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all alternative embodiments, and that the acts and modules referred to are not necessarily required by the present disclosure.

It should be further noted that, although the steps in the flowchart of fig. 1 are sequentially shown as indicated by arrows, the steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 1 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of other steps or sub-steps of other steps.

It should be understood that the above-described apparatus embodiments are merely illustrative and that the apparatus of the present disclosure may be implemented in other ways. For example, the division of the units/modules in the above embodiments is merely a logic function division, and there may be another division manner in actual implementation. For example, multiple units, modules, or components may be combined, or may be integrated into another system, or some features may be omitted or not performed.

In addition, each functional unit/module in the embodiments of the present disclosure may be integrated into one unit/module, or each unit/module may exist alone physically, or two or more units/modules may be integrated together, unless otherwise specified. The integrated units/modules described above may be implemented either in hardware or in software program modules.

Fig. 4 is a block diagram illustrating a combination processing apparatus 1200 according to an embodiment of the present disclosure. As shown in fig. 4, the combined processing device 1200 includes a computing processing device 1202, an interface device 1204, other processing devices 1206, and a storage device 1208. Depending on the application scenario, one or more computing devices 1210 may be included in the computing processing device and may be configured to perform the operations of the steps of the computing method described herein in connection with fig. 1.

In various embodiments, the computing processing means of the present disclosure may be configured to perform user-specified operations. In an exemplary application, the computing processing device may be implemented as a single-core artificial intelligence processor or as a multi-core artificial intelligence processor. Similarly, one or more computing devices included within a computing processing device may be implemented as an artificial intelligence processor core or as part of a hardware architecture of an artificial intelligence processor core. When multiple computing devices are implemented as artificial intelligence processor cores or portions of hardware structures of artificial intelligence processor cores, the computing processing devices of the present disclosure may be considered to have a single core structure or an isomorphic multi-core structure.

In an exemplary operation, the computing processing device of the present disclosure may interact with other processing devices through an interface device to collectively accomplish user-specified operations. Depending on the implementation, other processing devices of the present disclosure may include one or more types of processors among general-purpose and/or special-purpose processors, such as central processing units (Central Processing Unit, CPU), graphics processors (Graphics Processing Unit, GPU), artificial intelligence processors, and the like. These processors may include, but are not limited to, digital signal processors (Digital Signal Processor, DSPs), application specific integrated circuits (Application Specific Integrated Circuit, ASICs), field programmable gate arrays (Field-Programmable Gate Array, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., and the number thereof may be determined according to actual needs. As previously mentioned, the computing processing device of the present disclosure may be considered to have a single core structure or an isomorphic multi-core structure only with respect to it. However, when computing processing devices and other processing devices are considered together, both may be considered to form heterogeneous multi-core structures.

In one or more embodiments, the other processing device may interface with external data and controls as a computing processing device of the present disclosure (which may be embodied as an associated computing device for artificial intelligence, such as neural network operations), performing basic controls including, but not limited to, data handling, turning on and/or off the computing device, and the like. In other embodiments, other processing devices may also cooperate with the computing processing device to jointly accomplish the computational tasks.

In one or more embodiments, the interface device may be used to transfer data and control instructions between the computing processing device and other processing devices. For example, the computing device may obtain input data from other processing devices via the interface device, and write the input data to a storage device (or memory) on the computing device. Further, the computing processing device may obtain control instructions from other processing devices via the interface device, and write the control instructions into a control cache on the computing processing device chip. Alternatively or in addition, the interface device may also read data in a memory device of the computing processing device and transmit it to the other processing device.

Additionally or alternatively, the combined processing apparatus of the present disclosure may further comprise a storage device. As shown in the figure, the storage means are connected to the computing processing means and the other processing means, respectively. In one or more embodiments, a storage device may be used to store data for the computing processing device and/or the other processing devices. For example, the data may be data that cannot be stored entirely within an internal or on-chip memory device of a computing processing device or other processing device.

In some embodiments, the present disclosure also discloses a chip (e.g., chip 1302 shown in fig. 5). In one implementation, the Chip is a System on Chip (SoC) and is integrated with one or more combined processing devices as shown in fig. 4. The chip may be connected to other related components by an external interface device (such as external interface device 1306 shown in fig. 5). The relevant component may be, for example, a camera, a display, a mouse, a keyboard, a network card, or a wifi interface. In some application scenarios, other processing units (e.g., video codecs) and/or interface modules (e.g., DRAM interfaces) etc. may be integrated on the chip. In some embodiments, the disclosure further discloses a chip package structure, which includes the chip. In some embodiments, the disclosure further discloses a board card, which includes the chip packaging structure described above. The board will be described in detail with reference to fig. 5.

Fig. 5 is a schematic diagram illustrating the structure of a board 1300 according to an embodiment of the disclosure. As shown in fig. 5, the board includes a memory device 1304 for storing data, which includes one or more memory cells 1310. The memory device may be connected and data transferred to the control device 1308 and the chip 1302 described above by means of, for example, a bus or the like. Further, the board card also includes an external interface device 1306 configured for data relay or transfer functions between the chip (or chips in the chip package structure) and an external device 1312 (e.g., a server or computer, etc.). For example, the data to be processed may be transferred by the external device to the chip through the external interface means. For another example, the calculation result of the chip may be transmitted back to the external device via the external interface device. The external interface device may have different interface forms according to different application scenarios, for example, it may use a standard PCIE interface or the like.

In one or more embodiments, the control device in the board card of the present disclosure may be configured to regulate the state of the chip. For this purpose, in an application scenario, the control device may include a single chip microcomputer (Micro Controller Unit, MCU) for controlling the working state of the chip.

From the above description in connection with fig. 4 and 5, those skilled in the art will appreciate that the present disclosure also discloses an electronic device or apparatus that may include one or more of the above-described boards, one or more of the above-described chips, and/or one or more of the above-described combination processing apparatuses.

According to different application scenarios, the electronic device or apparatus of the present disclosure may include a server, a cloud server, a server cluster, a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, an intelligent terminal, a PC device, an internet of things terminal, a mobile phone, a vehicle recorder, a navigator, a sensor, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a visual terminal, an autopilot terminal, a vehicle, a household appliance, and/or a medical device. The vehicle comprises an aircraft, a ship and/or a vehicle; the household appliances comprise televisions, air conditioners, microwave ovens, refrigerators, electric cookers, humidifiers, washing machines, electric lamps, gas cookers and range hoods; the medical device includes a nuclear magnetic resonance apparatus, a B-mode ultrasonic apparatus, and/or an electrocardiograph apparatus. The electronic device or apparatus of the present disclosure may also be applied to the internet, the internet of things, data centers, energy sources, transportation, public management, manufacturing, education, power grids, telecommunications, finance, retail, construction sites, medical, and the like. Further, the electronic device or apparatus of the present disclosure may also be used in application scenarios related to artificial intelligence, big data, and/or cloud computing, such as cloud, edge, terminal, and the like. In one or more embodiments, a computationally intensive electronic device or apparatus according to the aspects of the present disclosure may be applied to a cloud device (e.g., a cloud server), while a less power consuming electronic device or apparatus may be applied to a terminal device and/or an edge device (e.g., a smart phone or camera). In one or more embodiments, the hardware information of the cloud device and the hardware information of the terminal device and/or the edge device are compatible with each other, so that appropriate hardware resources can be matched from the hardware resources of the cloud device according to the hardware information of the terminal device and/or the edge device to simulate the hardware resources of the terminal device and/or the edge device, so as to complete unified management, scheduling and collaborative work of an end cloud entity or an edge cloud entity.

It should be noted that, for the sake of brevity, the present disclosure describes some methods and embodiments thereof as a series of actions and combinations thereof, but those skilled in the art will understand that the scheme of the present disclosure is not limited by the order of the described actions. Thus, one of ordinary skill in the art will appreciate in light of the present disclosure or teachings that certain steps thereof may be performed in other sequences or concurrently. Further, those skilled in the art will appreciate that the embodiments described in this disclosure may be considered alternative embodiments, i.e., wherein the acts or modules involved are not necessarily required for the implementation of some or some aspects of this disclosure. In addition, the description of some embodiments of the present disclosure also has an emphasis on each of them, depending on the solution. In view of this, those skilled in the art will appreciate that portions of one embodiment of the disclosure that are not described in detail may be referred to in connection with other embodiments.

In particular implementations, based on the disclosure and teachings of the present disclosure, one of ordinary skill in the art will appreciate that several embodiments of the disclosure disclosed herein may also be implemented in other ways not disclosed herein. For example, in terms of the foregoing embodiments of the electronic device or apparatus, the units are divided herein by taking into account the logic function, and there may be other manners of dividing the units when actually implemented. For another example, multiple units or components may be combined or integrated into another system, or some features or functions in the units or components may be selectively disabled. In terms of the connection relationship between different units or components, the connections discussed above in connection with the figures may be direct or indirect couplings between the units or components. In some scenarios, the foregoing direct or indirect coupling involves a communication connection utilizing an interface, where the communication interface may support electrical, optical, acoustical, magnetic, or other forms of signal transmission.

In the present disclosure, units described as separate components may or may not be physically separate, and components shown as units may or may not be physical units. The aforementioned components or units may be co-located or distributed across multiple network elements. In addition, according to actual needs, some or all of the units may be selected to achieve the purposes of the solution described in the embodiments of the disclosure. In addition, in some scenarios, multiple units in embodiments of the disclosure may be integrated into one unit or each unit may physically reside separately.

In some implementation scenarios, the above-described integrated units may be implemented in the form of software program modules. The integrated unit may be stored in a computer readable memory if implemented in the form of software program modules and sold or used as a stand alone product. In this regard, when the aspects of the present disclosure are embodied in the form of a software product (e.g., a computer-readable storage medium), the software product may be stored in a memory, which may include instructions for causing a computer device (e.g., a personal computer, a server, or a network device, etc.) to perform some or all of the steps of the methods described by the embodiments of the present disclosure. The aforementioned Memory may include, but is not limited to, a usb disk, a flash disk, a Read Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, etc. various media capable of storing program codes.

In other implementation scenarios, the integrated units may also be implemented in hardware, i.e. as specific hardware circuits, which may include digital circuits and/or analog circuits, etc. The physical implementation of the hardware structure of the circuit may include, but is not limited to, physical devices, which may include, but are not limited to, devices such as transistors or memristors. In view of this, various types of devices described herein (e.g., computing devices or other processing devices) may be implemented by appropriate hardware processors, such as CPU, GPU, FPGA, DSP and ASICs, etc. Further, the aforementioned storage unit or storage device may be any suitable storage medium (including magnetic storage medium or magneto-optical storage medium, etc.), which may be, for example, variable resistance memory (Resistive Random Access Memory, RRAM), dynamic random access memory (Dynamic Random Access Memory, DRAM), static random access memory (Static Random Access Memory, SRAM), enhanced dynamic random access memory (Enhanced Dynamic Random Access Memory, EDRAM), high bandwidth memory (High Bandwidth Memory, HBM), hybrid memory cube (Hybrid Memory Cube, HMC), ROM, RAM, etc.

While various embodiments of the present disclosure have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous modifications, changes, and substitutions will occur to those skilled in the art without departing from the spirit and scope of the present disclosure. It should be understood that various alternatives to the embodiments of the disclosure described herein may be employed in practicing the disclosure. The appended claims are intended to define the scope of the disclosure and are therefore to cover all equivalents or alternatives falling within the scope of these claims.

The foregoing has outlined rather closely the embodiments of the present disclosure, and detailed description of the principles and embodiments of the present disclosure have been presented herein with the application of specific examples, the description of the examples above being merely intended to facilitate an understanding of the method of the present disclosure and its core ideas. Meanwhile, those skilled in the art will recognize that modifications or variations made on the basis of the specific embodiments and application scope of the present disclosure are within the scope of the protection of the present disclosure in light of the ideas of the present disclosure. In view of the foregoing, this description should not be construed as limiting the disclosure.

Claims

1. A method of computing, the method comprising:

2. The method of claim 1, wherein calculating a target gradient from the first gradient and the third gradient comprises:

3. The method of claim 2, wherein adjusting the first gradient and/or the third gradient to conform the adjusted first gradient to the third gradient's p-norm according to the first and third gradients' first and third p-norms, comprises:

4. A method according to any one of claims 1-3, wherein calculating a target gradient from the first gradient and the third gradient comprises:

5. The method according to claim 4, characterized in that the method comprises:

6. The method of any of claims 1-5, wherein the first loss function comprises a reconstruction loss function and the second loss function comprises an anti-loss function.

7. A computing device, the device comprising:

8. The apparatus of claim 7, wherein the target gradient acquisition module comprises:

9. The apparatus of claim 8, wherein the first gradient adjustment sub-module comprises:

10. The apparatus according to any one of claims 7-9, wherein the target gradient acquisition module comprises:

11. The apparatus of claim 10, wherein the apparatus further comprises:

12. The apparatus of any of claims 7-11, wherein the first loss function comprises a reconstruction loss function and the second loss function comprises an anti-loss function.

13. An artificial intelligence chip, characterized in that the chip comprises a computing device according to any of claims 7-12.

14. An electronic device comprising the artificial intelligence chip of claim 13.

15. A board, characterized in that, the board includes: a memory device, interface means and control device, an artificial intelligence chip according to claim 13;

the storage device is used for storing data;

the interface device is as follows: standard PCIE interfaces.

16. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1 to 6.