CN115063673A - Model compression method, image processing method and device and cloud equipment - Google Patents

Model compression method, image processing method and device and cloud equipment Download PDF

Info

Publication number
CN115063673A
CN115063673A CN202210902200.0A CN202210902200A CN115063673A CN 115063673 A CN115063673 A CN 115063673A CN 202210902200 A CN202210902200 A CN 202210902200A CN 115063673 A CN115063673 A CN 115063673A
Authority
CN
China
Prior art keywords
network model
visual network
image
compression
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210902200.0A
Other languages
Chinese (zh)
Other versions
CN115063673B (en
Inventor
汪振宇
罗浩
王帆
李�昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202210902200.0A priority Critical patent/CN115063673B/en
Publication of CN115063673A publication Critical patent/CN115063673A/en
Application granted granted Critical
Publication of CN115063673B publication Critical patent/CN115063673B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Abstract

The application provides a model compression method, an image processing device and cloud equipment, wherein the model compression method comprises the following steps: receiving a first image sent by a user terminal, wherein the first image is an image of a preset application scene; determining a compression mode of the trained first visual network model according to a preset application scene, wherein the compression mode comprises channel compression and/or feature vector compression; determining frequency domain information of a first image; and compressing the first visual network model by adopting a compression mode based on the frequency domain information to obtain a second visual network model. The compressed visual network model obtained by the method occupies a smaller memory and has higher calculation efficiency.

Description

Model compression method, image processing method and device and cloud equipment
Technical Field
The application relates to the technical field of computers, in particular to a model compression method, an image processing device and cloud equipment.
Background
With the introduction of a transform (a deep learning Network), the visual Network model (visual transform, ViT) including the transform is better than the Convolutional Neural Network (CNN) in accuracy in many tasks, such as image classification, object detection, and semantic segmentation, so that the currently used CNN has a rolling position in the field of computer Vision.
However, ViT has the problems of greater memory usage and slower computational efficiency during operation compared to CNN.
Disclosure of Invention
Aspects of the present application provide a model compression method, an image processing apparatus, and a cloud device, so as to solve the problems of ViT that the memory usage amount is larger and the calculation efficiency is slower in the operation process.
In a first aspect, an embodiment of the present invention provides a model compression method, which is applied to a server, where the model compression method includes: receiving a first image sent by a user terminal, wherein the first image is an image of a preset application scene; determining a compression mode of the trained first visual network model according to a preset application scene, wherein the compression mode comprises channel compression and/or feature vector compression; determining frequency domain information of a first image; and compressing the first visual network model by adopting a compression mode based on the frequency domain information to obtain a second visual network model.
A second aspect of the embodiments of the present application provides a model compression method, applied to a server, including: receiving a remote sensing image sent by a user terminal; determining low-frequency information in the remote sensing image; and based on the low-frequency information, compressing the trained first remote sensing model by adopting a channel compression mode to obtain a second remote sensing model.
A third aspect of the embodiments of the present application provides an image processing method, which is applied to a terminal, and the image processing method includes: acquiring an image to be processed; sending the image to be processed to a server for the server to identify the image to be processed by adopting a visual network model to obtain a processing result, wherein the visual network model is obtained according to the model compression method of the first aspect or the second aspect; and receiving the processing result sent by the server.
A fourth aspect of the embodiments of the present application provides a model compression apparatus, including:
the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for receiving a first image sent by a user terminal, and the first image is an image of a preset application scene;
the first determining module is used for determining a compression mode of the trained first visual network model according to a preset application scene, wherein the compression mode comprises channel compression and/or feature vector compression;
a second determining module, configured to determine frequency domain information of the first image;
and the compression module is used for compressing the first visual network model by adopting a compression mode based on the frequency domain information to obtain a second visual network model.
A fifth aspect of the embodiments of the present application provides a model compression apparatus, including:
the receiving module is used for receiving the remote sensing image sent by the user terminal;
the determining module is used for determining low-frequency information in the remote sensing image;
and the compression module is used for compressing the trained first remote sensing model by adopting a channel compression mode based on the low-frequency information to obtain a second remote sensing model.
A sixth aspect of the embodiments of the present application provides an image processing apparatus, comprising:
the acquisition module is used for acquiring an image to be processed;
the sending module is used for sending the image to be processed to the server so that the server can identify the image to be processed by adopting a visual network model to obtain a processing result, wherein the visual network model is obtained according to the first aspect or the second aspect of the model compression method;
and the receiving module is used for receiving the processing result sent by the server.
A seventh aspect of the present embodiment provides a cloud device, including: a processor, a memory and a computer program stored on the memory and executable on the processor, the processor implementing the model compression method of the first aspect or the second aspect or the image processing method of the third aspect when executing the computer program.
An eighth aspect of embodiments of the present application provides a computer-readable storage medium, in which computer-executable instructions are stored, and when the computer-executable instructions are executed by a processor, the computer-executable instructions are used to implement the model compression method of the first aspect or the second aspect or the image processing method of the third aspect.
A ninth aspect of embodiments of the present application provides a computer program product, including: a computer program, stored in a readable storage medium, from which at least one processor of the electronic device can read the computer program, the execution of which by the at least one processor causes the electronic device to perform the model compression method of the first or second aspect or the image processing method of the third aspect.
The method and the device are applied to an image recognition scene, and the first image sent by the user terminal is received, wherein the first image is an image of a preset application scene; determining a compression mode of the trained first visual network model according to a preset application scene, wherein the compression mode comprises channel compression and/or feature vector compression; determining frequency domain information of a first image; and compressing the first visual network model by adopting a compression mode based on the frequency domain information to obtain a second visual network model. According to the embodiment of the application, the compressed visual network model can be obtained, and the compressed visual network model occupies a smaller memory and has higher calculation efficiency under the condition that the identification precision is not influenced.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a block diagram of a visual network model provided in an exemplary embodiment of the present application;
FIG. 2 is a schematic diagram of a model compression method provided in an exemplary embodiment of the present application;
FIG. 3 is a schematic diagram of another model compression method provided in an exemplary embodiment of the present application;
fig. 4 is a block diagram of a conversion module according to an exemplary embodiment of the present disclosure;
FIG. 5 is a block diagram of a feed-forward network layer provided in an exemplary embodiment of the present application;
FIG. 6 is a schematic diagram of a process for selecting layers according to an exemplary embodiment of the present application;
FIG. 7 is a schematic diagram of yet another model compression method provided by an exemplary embodiment of the present application;
FIG. 8 is a flowchart illustrating steps of an image processing method according to an exemplary embodiment of the present application;
FIG. 9 is a block diagram of a model compression apparatus according to an exemplary embodiment of the present disclosure;
FIG. 10 is a block diagram of another model compression apparatus provided in an exemplary embodiment of the present application;
fig. 11 is a block diagram of an image processing apparatus according to an exemplary embodiment of the present application;
fig. 12 is a schematic structural diagram of a cloud device according to an exemplary embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only a few embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The ViT model is compressed by pruning, but the ViT model in the related art still follows the experience obtained in CNN compression, for example, the weighting parameter with a larger norm value is relatively important, and the weighting parameter with a smaller norm value is deleted, and the characteristics of ViT itself are not considered. Among them, the characteristic of ViT itself is that ViT captures low frequency information of an image more effectively than CNN, which means that ViT has different sensitivity to frequency domain signals than CNN. Therefore, for compression of ViT, it is more important to consider low frequency information than high frequency parts.
Based on the foregoing background, the model compression method provided in the embodiment of the present application includes: receiving a first image sent by a user terminal, wherein the first image is an image of a preset application scene; determining a compression mode of the trained first visual network model according to a preset application scene, wherein the compression mode comprises channel compression and/or feature vector compression; determining frequency domain information of a first image; and compressing the first visual network model by adopting a compression mode based on the frequency domain information to obtain a second visual network model. According to the embodiment of the application, the visual network model (ViT) is compressed by considering different sensitivities of the visual network model to different frequency domain information of the image, for example, the sensitivity to a low-frequency part is higher, so that the reduction of the identification precision of the visual network model after compression can be reduced, and the compressed ViT model can occupy a small memory and has high calculation efficiency.
In this embodiment, the overall model compression method may be implemented by means of a cloud computing system. In addition, the server of the model compression method may be a cloud server in order to run various neural network models by virtue of resources on the cloud; as opposed to the cloud, the model compression method may also be applied to a server device such as a conventional server or a server array, and is not limited herein.
In addition, the model compression method provided by the embodiment of the application is applied to various compression scenes of the visual network model. As in the image recognition scenario, wherein the image recognition scenario includes: image classification, target detection, semantic segmentation, and the like. Referring to fig. 1, for a target task (image classification, target detection, semantic segmentation, or the like), a trained visual network model includes an embedding module (token embedding), where the embedding module is configured to segment an input image into a plurality of sub-images, then encode each sub-image to obtain a coding vector (token) of the sub-image, the embedding module outputs a plurality of coding vectors to a conversion module (transform) a1, and a result output by the conversion module a1 is input to the conversion module a2 until a result output by a last conversion module an is an image recognition result of the visual network model. The embodiment of the application realizes that the trained visual network model is compressed, the compressed visual network model can still accurately identify images, the memory occupation of the visual network model can be reduced, and the calculation efficiency is improved.
The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.
Fig. 2 is a flowchart illustrating steps of a model compression method according to an exemplary embodiment of the present disclosure. As shown in fig. 2, the model compression method specifically includes the following steps:
s201, receiving a first image sent by a user terminal.
The first image is an image of a preset application scene. Specifically, the first image is a captured natural image, such as a remote sensing image or a color image captured by a camera. The preset application scene comprises the following steps: one of a detection scenario, a segmentation task scenario, a classification scenario, and a retrieval task scenario.
Further, the first image may be sent by the user terminal to the server, when the user terminal sends the first image to the server, the user terminal sends the preset application scene of the first image at the same time, and the user terminal instructs the server to compress the first visual network model.
S202, determining a compression mode of the trained first visual network model according to a preset application scene.
The compression mode comprises channel compression and/or feature vector compression.
Specifically, under a detection scene and a segmentation task scene, the compression mode comprises channel compression, and under a classification scene and a retrieval task scene, the compression mode comprises channel compression and compression of feature vectors.
In the embodiment of the application, the detection scene and the segmentation task scene require details in an image, so that channel compression is not performed.
In addition, the structure of the first visual network model is as shown in fig. 1, the first visual network model is trained in advance, and for different preset application scenes, the first visual network model may perform corresponding tasks, and for a detection scene, the first visual network model may perform a detection task, for example, for an image input to the first visual network model, the first visual network model may detect a target object in the image.
The preset application scenario may also be other application scenarios as needed, and is not limited herein.
Further, after determining a compression mode of the trained first visual network model according to a preset application scenario, the method further includes: sending a compression mode to a user terminal; if negative feedback information of the user terminal on the compression modes is received, sending a plurality of compression modes to the user terminal; and receiving at least one compression mode sent by the user terminal, wherein the at least one compression mode is used for compressing the first visual network model.
If the compression mode determined in S202 is channel compression, the channel compression is sent to the user terminal, and if the feedback of the user terminal to the channel compression is negative feedback information, it is determined that the user does not agree to use the channel compression mode, all compression modes are sent to the user terminal, such as channel compression and feature vector compression, for the user to select, and the user may select a combination of channel compression and feature vector compression, or may select feature vector compression.
In the embodiment of the present application, the compression method of the first visual network model may also include other types of compression besides channel compression and feature vector compression, where a user may select one of the compression methods or a combination of the compression methods according to needs. The subsequent server may perform compression of the first visual network model according to a compression mode selected by the user.
S203, determining frequency domain information of the first image.
In practice, the low frequency information of the first image forms the basic grey scale of the first image. The high frequency information of the first image forms the edges and details of the image.
In the embodiment of the present application, since the first visual network model has a higher sensitivity to low-frequency information in the image, the frequency-domain information includes the low-frequency information of the first image.
In an optional embodiment, if the application scene of the first visual network model is that high-frequency information in the image needs to be identified, the frequency domain information may be the high-frequency information, and the high-frequency information is subsequently used to compress the first visual network model, so that the compressed first visual network model can still accurately identify the high-frequency information in the image.
And S204, compressing the first visual network model by adopting a compression mode based on the frequency domain information to obtain a second visual network model.
In an optional embodiment, based on the low-frequency information, the first visual network model is compressed by adopting a channel compression and/or feature vector compression mode to obtain the second visual network model.
In another optional embodiment, based on the high-frequency information, the first visual network model is compressed in a channel compression and/or feature vector compression mode to obtain the second visual network model.
In the embodiment of the application, because the first visual network model can capture the low-frequency information of the first image more effectively, the first visual network model is compressed based on the low-frequency information, and the second visual network model is obtained as an example to explain the application.
In the embodiment of the application, the visual network model (ViT) is compressed by considering different sensitivities of the visual network model to different frequency domain information of the image, for example, the sensitivity to a low frequency part is higher, so that the reduction of the identification precision of the visual network model after compression can be reduced, and the compressed ViT model can achieve the high calculation efficiency by occupying a small memory.
Fig. 3 is a flowchart illustrating steps of a model compression method according to an exemplary embodiment of the present application. As shown in fig. 3, the model compression method specifically includes the following steps:
s301, receiving a first image sent by a user terminal.
The specific implementation process of this step refers to S201, and is not described herein again.
S302, determining a compression mode of the trained first visual network model according to a preset application scene.
The specific implementation process of this step refers to S202, and is not described herein again.
S303, converting the first image into a frequency domain image.
Wherein the first image is converted from a spatial domain image to a frequency domain image using a fourier transform.
Specifically, the first image is X x,y Converting the first image X into a frequency domain image as shown in formula (1)
Figure 467434DEST_PATH_IMAGE001
Formula (1)
In the formula (1), the first and second groups,
Figure 237069DEST_PATH_IMAGE002
representing the second of the first imagexGo to the firstyFourier transform is carried out on the pixels of the columns to obtain a frequency domain image
Figure 557192DEST_PATH_IMAGE003
. Wherein, the first and the second end of the pipe are connected with each other,
Figure 252615DEST_PATH_IMAGE004
which is indicative of the height of the first image,
Figure 25399DEST_PATH_IMAGE005
representing the width of the first image.
Figure 300523DEST_PATH_IMAGE006
Representing the average pixel value of the row of pixels,
Figure 209573DEST_PATH_IMAGE007
representing the average pixel value of the column pixels.
S304, weakening processing is carried out on the high-frequency information of the frequency domain image to obtain a filtering frequency spectrum image.
The first visual network model can capture the low-frequency information of the first image more effectively, so that the high-frequency information in the frequency domain image is weakened, and the obtained filtering frequency spectrum image can better represent the low-frequency information of the visual network model in the frequency domain.
In particular, the de-emphasis process employs a filter function
Figure 341477DEST_PATH_IMAGE008
Cut-off ratio of frequency
Figure 601557DEST_PATH_IMAGE009
Performing attenuation processing on high-frequency information of the frequency domain image to obtain a filtering spectrum image as shown in formula (2):
Figure 477109DEST_PATH_IMAGE010
formula (2)
In the formula (2), the first and second groups,
Figure 7710DEST_PATH_IMAGE011
is a filtered spectral image. Wherein, when the first image is converted into the filtered spectrum image, the binary filter which can make the failure of partial frequency components can cause ringing effect phenomenon, therefore, the phenomenon can be avoided. Filter function
Figure 44936DEST_PATH_IMAGE012
Following a gaussian distribution. Wherein the filter function
Figure 792312DEST_PATH_IMAGE008
As in equation (3):
Figure 409238DEST_PATH_IMAGE013
formula (3)
S305, inverse transformation is carried out on the filtered frequency spectrum image to obtain a second image.
Wherein the spectral image is filtered
Figure 292881DEST_PATH_IMAGE014
And converting the image back to a space domain through inverse Fourier transform to obtain a second image. The second image includes low frequency information. Specifically, the second image is an image in which high-frequency information of the first image is attenuated and low-frequency information of the first image is enhanced.
S306, inputting the first image into the first visual network model for recognition processing to obtain a first output result.
The first output result is an output result obtained after the first visual network model identifies the first image.
Specifically, the first visual network model includes: a weight W representing all weight matrices of the visual network model, all weight matrices W comprising a plurality of weight matrices
Figure 501008DEST_PATH_IMAGE015
Each weight matrix
Figure 735680DEST_PATH_IMAGE015
Comprising a plurality of channels
Figure 953035DEST_PATH_IMAGE016
(weight matrix)
Figure 691184DEST_PATH_IMAGE015
A row or a column of weight parameters), each channel comprising a plurality of weight parameters
Figure 837257DEST_PATH_IMAGE017
In the present implementation, the first image X is input into a first output result obtained by the trained first visual network model, which is denoted as Ϻ (X, W).
S307, deleting the channel aiming at each channel of the first visual network model to obtain the current visual network model.
Wherein all weight matrices
Figure 496908DEST_PATH_IMAGE018
Wherein n is an integer greater than 1,
Figure 252375DEST_PATH_IMAGE015
one of the weight matrices, k, representing the visual network model takes one of 1 to n. Wherein the content of the first and second substances,
Figure 110609DEST_PATH_IMAGE019
wherein x and y are each an integer greater than 1,
Figure 191698DEST_PATH_IMAGE020
representing a weight matrix
Figure 869804DEST_PATH_IMAGE021
Set of weight parameters for a row, e.g.
Figure 428961DEST_PATH_IMAGE022
Figure 344964DEST_PATH_IMAGE017
Representing the kth weight matrix
Figure 331375DEST_PATH_IMAGE015
The ith row and the jth column of (2).
Wherein the weight matrix is deleted first
Figure 762356DEST_PATH_IMAGE023
The first channel of (2), then a weight matrix is obtained
Figure 626669DEST_PATH_IMAGE024
Then all the weight matrixes of the current visual network model are obtained
Figure 193917DEST_PATH_IMAGE025
. Then delete the weight moment
Figure 351229DEST_PATH_IMAGE023
A second channel of (2) is obtainedWeight matrix
Figure 207189DEST_PATH_IMAGE026
Then all the weight matrixes of the current visual network model are obtained
Figure 108149DEST_PATH_IMAGE027
When comparing the weight matrix
Figure 61062DEST_PATH_IMAGE023
And after each channel is deleted in sequence, m1 current visual network models are obtained. In the same way for
Figure 389275DEST_PATH_IMAGE028
Deleting each channel in turn to obtain m2 current visual network models until QUOTE is matched
Figure 529269DEST_PATH_IMAGE029
Figure 968341DEST_PATH_IMAGE029
And deleting each channel in sequence to obtain mn current visual network models. And the total number of the obtained current visual network models is m = m1+ m2+ … + mn.
And S308, inputting the second image into the current visual network model for recognition processing to obtain a second output result.
And respectively inputting the second images into each current visual network model for recognition processing to obtain corresponding second output results.
Illustratively, m current visual network models are included, and m second output results are obtained correspondingly.
S309, channel compression is carried out on the first visual network model based on the first output result and the second output result, and a second visual network model is obtained.
In an optional embodiment, performing channel compression on the first visual network model based on the first output result and the second output result to obtain a second visual network model includes: determining a first loss value of the second output result relative to the first output result, wherein the magnitude of the first loss value represents the influence degree of the channel on the first visual network model; and deleting at least one channel of the first visual network model according to the first loss value to obtain a second visual network model, wherein the influence degree of the deleted channel on the first visual network model is smaller than the influence degree threshold value.
Specifically, the magnitude of the first loss value represents the influence degree of the channel on the first visual network model. The greater the first loss value, the greater the influence of the corresponding channel on the first visual network model is determined to be.
And the influence degree of the deleted channel on the first visual network model is smaller than the influence degree threshold value. In the embodiment of the application, the deleted channel does not influence the identification precision of the first visual network model.
Illustratively, if the weight matrix is deleted
Figure 713443DEST_PATH_IMAGE023
After the first channel, the corresponding first loss value is less than the preset loss value threshold. Deleting weight matrix
Figure 714022DEST_PATH_IMAGE023
After the second channel, the corresponding first loss value is smaller than the preset loss value threshold. But deleting the weight matrix
Figure 606892DEST_PATH_IMAGE023
After the third channel, if the corresponding first loss value is greater than the preset loss value threshold, the weight matrix may be deleted after the second visual network model is obtained
Figure 849654DEST_PATH_IMAGE023
The first and second channels, and the third channel.
Optionally, deleting at least one channel of the first visual network model according to the first loss value to obtain a second visual network model, including: deleting at least one channel of the first visual network model according to the sequence of the first loss value from small to large to obtain an intermediate visual network model; inputting the first image into the mesopic vision network model for recognition processing to obtain a third output result; determining a second loss value of the third output result relative to the first output result; if the second loss value is smaller than the loss value threshold value, increasing the number of the channels to be deleted, and deleting at least one channel of the first visual network model according to the sequence of the first target loss value from small to large; and if the second loss value is greater than or equal to the loss value threshold value, determining the intermediate vision network model as the second vision network model.
And obtaining a final second visual network model by adopting an incremental deleting mode for the deleted channel. For example, two channels of the first visual network model are deleted in the order from small to large according to the first loss value to obtain an intermediate visual network model, if the second loss value is smaller than the loss value threshold value at the moment, four channels of the first visual network model are deleted in the order from small to large according to the first loss value, if the second loss value is still smaller than the loss value threshold value at the moment, six channels of the first visual network model are deleted in the order from small to large according to the first loss value, and if the second loss value is larger than the loss value threshold value at the moment, the first visual network model from which the six channels are deleted is taken as the second visual network model.
In the embodiment of the application, the output result obtained by inputting the first image into the first visual network model is the same as the output result obtained by inputting the first image into the second visual network model, so that the first visual network model does not reduce the recognition accuracy after being compressed, the occupation of a memory can be reduced, and the calculation speed is increased.
In the specific compression mode of channel compression, based on frequency domain information, compressing the first visual network model by adopting a compression mode to obtain a second visual network model specifically comprises the following steps:
receiving a true value result of a first image sent by a user terminal.
For example, the first visual network model is used to identify an object in the first image, and if the first image includes a target object, the true result is used to indicate the target object.
And secondly, determining a third loss value of the fourth output result relative to the true value result by adopting a second preset loss function according to the weight parameter.
And thirdly, determining a fourth loss value of the first output result relative to the true value result by adopting a second preset loss function.
And fourthly, determining the square of the difference value of the third loss value and the fourth loss value as the first target loss value.
In the embodiment of the present application, the first output result obtained by inputting the first image X into the trained visual network model is Ϻ (X, W). Setting one of the weight parameters in all the weight matrixes W to 0, then all the weight matrixes W are converted into
Figure 511580DEST_PATH_IMAGE030
. The second image
Figure 181596DEST_PATH_IMAGE031
Inputting a first visual network model with one of the weight parameters set to be 0 to obtain a fourth output result of
Figure 233865DEST_PATH_IMAGE032
Wherein the first target loss value is expressed as a function of equation (4):
Figure 280319DEST_PATH_IMAGE033
formula (4)
In equation (4), the second image
Figure 62330DEST_PATH_IMAGE034
Representing a filtered spectral image
Figure 903247DEST_PATH_IMAGE035
Performing inverse Fourier transform, Y representing true result,
Figure 442813DEST_PATH_IMAGE032
in the form of a fourth output result,
Figure 325580DEST_PATH_IMAGE036
a third loss value representing a fourth output result determined using a second predetermined loss function relative to the true value result, Ϻ (X, W) being the first output result,
Figure 696519DEST_PATH_IMAGE037
indicating that a fourth penalty value of the first output result relative to the true result is determined using a second predetermined penalty function.
Figure 708337DEST_PATH_IMAGE038
Representing a first target loss value.
In an alternative embodiment, the first target loss value may be formulated as
Figure 797516DEST_PATH_IMAGE039
To obtain a mixture of, among others,
Figure 123455DEST_PATH_IMAGE040
representing a first target loss value, the formula representing, in a first visual network model, a weight parameter
Figure 614479DEST_PATH_IMAGE017
When the value is set to 0, the first image is input into the first visual network model, and the obtained output result
Figure 797199DEST_PATH_IMAGE041
Loss value relative to true result
Figure 373674DEST_PATH_IMAGE042
And inputting the first image into the original first visual network model to obtain a loss value of the output result Ϻ (X, W) relative to the true value result
Figure 801506DEST_PATH_IMAGE043
The square of the difference of (a).
Determining the target influence value of the weight parameter on the weight matrix according to the first target loss value.
Wherein the first target loss value is positively correlated with the target impact value. The visual network model comprises a plurality of weight matrixes comprising a plurality of weight parameters
Further, determining a target influence value of the weight parameter on the associated weight matrix according to the first target loss value includes: inputting the first image into a first visual network model aiming at the weight matrix to obtain a first sub-output result corresponding to the weight matrix; aiming at the weight parameter of the weight matrix, under the condition that the weight parameter is set to be 0, inputting the second image into the first visual network model to obtain a second sub-output result corresponding to the weight matrix; determining a second target loss value of the first sub-output result and the second sub-output result by adopting a first preset loss function; and performing weighted calculation on the first target loss value and the second target loss value by adopting a preset hyper-parameter to obtain a target influence value of the weight parameter.
Referring to fig. 4 and 5, the structure of a feedforward network layer is shown, wherein the feedforward network layer includes: second normalization layer, weight matrix
Figure 147037DEST_PATH_IMAGE044
And a weight matrix
Figure 500658DEST_PATH_IMAGE045
. Wherein the second normalization layer adopts the normalization technique of layerorm to the embedded matrix
Figure 298850DEST_PATH_IMAGE046
And (6) carrying out normalization processing.
In an embodiment of the present application, referring to fig. 1, 4 and 5, the visual network model includes a plurality of conversion modules, each of which includes: a plurality of weight matrices, e.g.
Figure 763329DEST_PATH_IMAGE047
Figure 963366DEST_PATH_IMAGE048
Figure 753468DEST_PATH_IMAGE049
Figure 38956DEST_PATH_IMAGE050
Figure 775967DEST_PATH_IMAGE051
And
Figure 320257DEST_PATH_IMAGE052
. Wherein each weight matrix comprises a plurality of weight parameters, and each weight matrix has a corresponding sub-output result when the visual network model performs the identification process on the image, for example, referring to fig. 4, the weight matrix
Figure 15681DEST_PATH_IMAGE053
The corresponding sub-output result is
Figure 788465DEST_PATH_IMAGE054
The weight matrix
Figure 125905DEST_PATH_IMAGE055
The corresponding sub-output result is
Figure 34955DEST_PATH_IMAGE056
Wherein, the calculation mode of the target influence value refers to the formula (5):
Figure 370122DEST_PATH_IMAGE057
formula (5)
In the formula (5), the first and second groups,Tin order to output the result for the first sub-output,
Figure 364622DEST_PATH_IMAGE058
in order to output the result for the second sub-output,KLrepresenting a first pre-set loss function,
Figure 240175DEST_PATH_IMAGE059
is a hyper-parameter.
Illustratively, for the weight matrix
Figure 3731DEST_PATH_IMAGE060
Inputting the first image into the trained first visual network model (non-compressed visual network model), and obtaining a first sub-output result T corresponding to the weight matrix as
Figure 73581DEST_PATH_IMAGE061
(ii) a For the weight matrix
Figure 555378DEST_PATH_IMAGE062
In the weighting parameter, in the weighting parameter
Figure 969041DEST_PATH_IMAGE063
Under the condition of setting to 0, inputting the second image into the visual network model to obtain a weight matrix
Figure 55946DEST_PATH_IMAGE064
Corresponding second sub-output result
Figure 529653DEST_PATH_IMAGE058
Is composed of
Figure 498746DEST_PATH_IMAGE065
(ii) a Then substituting equation (5) to obtain the weight parameter
Figure 981680DEST_PATH_IMAGE063
Target influence value of
Figure 719829DEST_PATH_IMAGE066
In the embodiment of the present application, considering that the number of the weight parameters in the visual network model is large, if the target influence value of each weight parameter is calculated by using the formula (5), the compression efficiency of the visual network model is low. And then adopting a first-order Taylor unfolding myopia expression formula (5) to further obtain a myopia value of the target influence value, wherein the myopia value can be used for expressing the target influence value, and the specific calculation mode adopts a formula (6):
Figure 567699DEST_PATH_IMAGE067
formula (6)
Wherein the content of the first and second substances,
Figure 24088DEST_PATH_IMAGE068
for representing weight parameters
Figure 281019DEST_PATH_IMAGE069
The value of the target influence is,
Figure 404833DEST_PATH_IMAGE070
the representation is used for the purpose of partial derivation,
Figure 954763DEST_PATH_IMAGE069
representing the kth weight matrixw k The ith row and the jth column of (2).
Sixthly, determining the sum of the target influence values of each channel of the weight matrix aiming at the weight matrix.
Wherein each channel comprises a row or a column of weight parameters in a weight matrix, and the specific calculation formula refers to formula (7):
Figure 898448DEST_PATH_IMAGE071
formula (7)
In the formula (7), the first and second groups,
Figure 660868DEST_PATH_IMAGE072
representing the sum of the target impact values for one channel.
And removing at least one channel according to the sequence of the sum of the target influence values from small to large to obtain a second visual network model.
Wherein the visual network model comprises: the method comprises the following steps that a plurality of conversion modules are connected in a cascade mode, the output of a current conversion model is used as the input of a previous conversion module, at least one channel is deleted according to the sequence from small to large of the sum of target influence values, and a compressed visual network model is obtained, and the method comprises the following steps: and deleting at least one channel according to the cascade sequence of the plurality of conversion modules from bottom to top and the sequence of the sum of the target influence values from small to large aiming at the weight matrix of the current conversion module.
Further, the conversion module includes: a multi-head attention layer and a feedforward network layer; the multi-head attention layer comprises: a plurality of head processing units, wherein the number of deleted channels is the same for weight matrices of different head processing units.
In the embodiment of the application, in the same multi-head attention layer, different head processing units have the same weight matrix, and the number of deleted channels is the same.
Illustratively, referring to FIG. 4, if the conversion module is a first level conversion module, the weight matrix is
Figure 373609DEST_PATH_IMAGE073
Figure 625599DEST_PATH_IMAGE074
To the weight matrix
Figure 791001DEST_PATH_IMAGE075
The number of deleted channels is the same, if all 1 channel is deleted, the weight matrix
Figure 888270DEST_PATH_IMAGE076
Figure 222562DEST_PATH_IMAGE077
To the weight matrix
Figure 379873DEST_PATH_IMAGE078
The number of deleted channels is the same, for example, 2 channels are deleted. If the conversion module is a second-stage conversion module, the weight matrix
Figure 235834DEST_PATH_IMAGE079
Figure 136794DEST_PATH_IMAGE080
To the weight matrix
Figure 824127DEST_PATH_IMAGE081
The number of deleted channels is the same, for example, 2 channels are deleted.
In addition, deleting at least one channel of the first visual network model to obtain a second visual network model, comprising: the deleted channels are the same in number aiming at the weight matrixes with the same types of different head processing units, and the parallelism of the different head processing units can be ensured. In addition, the identification precision of the visual network model is not influenced by deleting at least one channel according to the sequence from small to large of the sum of the target influence values. Wherein the sum of the target impact values is positively correlated with the first loss value.
In the embodiment of the present application, each weight parameter can be obtained according to the above manner
Figure 417920DEST_PATH_IMAGE082
Corresponding to the first target loss value, in order to ensure the availability of the matrix, an alternative way is to delete one channel (the weight parameter of one row or one column) of the weight matrix.
For example: QUOTE
Figure 557914DEST_PATH_IMAGE083
Figure 996985DEST_PATH_IMAGE083
Figure 742088DEST_PATH_IMAGE084
Each line is a channel, and the first target loss value corresponding to each weight parameter is
Figure 241202DEST_PATH_IMAGE085
If the preset loss value threshold is 0.7, calculating to obtain that the sum of the first target loss values of the first row is a first loss value corresponding to the channel, and if the first loss value is smaller than the preset loss value threshold, calculating to obtain that the sum of the first target loss values of the first row is a first loss value corresponding to the channelAfter compression
Figure 369957DEST_PATH_IMAGE086
Figure 612720DEST_PATH_IMAGE086
. Wherein, the method is adopted to each weight matrix
Figure 274645DEST_PATH_IMAGE087
Compressing to obtain a compressed weight matrix
Figure 944661DEST_PATH_IMAGE088
And then, the compression of the first visual network model is realized, the compressed first visual network model has fewer weight parameters, the occupied memory amount is smaller, the calculation speed is higher, and the visual network model is compressed based on the low-frequency information, so that the identification precision of the visual network model cannot be reduced.
Wherein, S310 to S313 are compression of the feature vector.
And S310, inputting the second image into the first visual network model for recognition processing to obtain a plurality of intermediate feature vectors corresponding to the second image.
An optional mode is to input the second image containing the low-frequency information into the first visual network model for recognition processing, to obtain a plurality of intermediate feature vectors corresponding to the second image, where the intermediate feature vectors are representative of the low-frequency information in the first image.
Another optional mode is that the first image is input into the first visual network model for recognition processing to obtain a plurality of feature vectors corresponding to the first image, then each feature vector of the first image is subjected to low-frequency information extraction to obtain an intermediate feature vector, and then feature vector compression is performed on the intermediate feature vector.
In an embodiment of the application, the intermediate feature vectors are output to a multi-head attention layer of the first visual network model.
Specifically, referring to fig. 1, the visual network model includes: a plurality of conversion modules, such as conversion module a1 through conversion module an in fig. 1). A plurality of conversion modules are connected in cascade, with the output of the current conversion model being the input of the conversion module at the previous stage, e.g., conversion module a1 is the first stage, which is the next stage of conversion module a 2. Conversion module a2 is the second stage, which is the stage above conversion module a 1. The conversion module an is the nth stage.
Referring to fig. 4, each conversion module includes: a Multi-Head Self attachment (MHSA), a selection layer, and a Feed-Forward Network layer (FFN). Wherein the first image or the second image is input into the embedding module to obtain the embedding matrixX l,1 The input of the multi-head attention layer is an embedded matrixX l,1 Embedded matrixX l,1 Comprising a plurality of input eigenvectors, the output of the multi-headed attention layer being an embedded matrixX l,2 The embedded matrixX l,2 Is composed of a plurality of intermediate feature vectors. The input to the selection layer is an embedded matrixX l,2 The output of the selection layer being an embedded matrixX l,3 The embedded matrixX l,3 Is composed of multiple eigenvectors, and the input of feedforward network layer is embedded matrixX l,3 The output of the feedforward network layer is an embedded matrixX l +l,1 The embedded matrixX l+l,1 Is composed of a plurality of feature vectors, whereinlIs shown aslAnd a conversion module. Embedded matrix of target output of current conversion moduleX l+l,1 As input to the next conversion module.
Illustratively, a first image X is input to an embedding module, which outputs an embedded matrix of the first image XX 1,1 The target output after embedding the matrix input conversion module a1 is an embedded matrixX 2,1 Embedded matrixX 2,1 The target output tail embedded matrix after the input conversion module a2 isX 3,1 Until the target output of the last conversion module an is embedded in the matrixX n+1,1 Is the output result of the visual network model.
Further, the multi-head attention layer comprises a plurality of head processing units, each head processing unit correspondingly comprises a plurality of weight matrixes, each weight matrix corresponds to one output, and the output corresponding to the weight matrix is the intermediate output corresponding to the multi-head attention layer.
Referring to fig. 4, the multi-head attention layer includes a head processing unit b1, a head processing unit b2 through a head processing unit bH, where H is an integer greater than 1. Each head processing unit includes a weight matrix
Figure 262510DEST_PATH_IMAGE047
Figure 308963DEST_PATH_IMAGE048
And, and
Figure 559816DEST_PATH_IMAGE049
wherein q, k and v represent the class of the weight matrix,han integer of 1 to H is taken to correspond to the head processing unit. Further, the first normalization layer may employ a normalization technique of layerorm.
Exemplarily, the embedded matrixX l,1 After the first normalization layer of the multi-head attention layer is input, the first normalization layer is respectively input into each head processing unit for processing. For each head processing unit, the resulting intermediate output comprises: passing weight matrix
Figure 666312DEST_PATH_IMAGE047
Processed matrixQ l,h Passing through the weight matrix
Figure 268195DEST_PATH_IMAGE048
Processed matrixk l,h Passing weight matrix
Figure 354225DEST_PATH_IMAGE049
Processed matrixV l,h
In the embodiment of the present application, each head processing unit is obtainedQ l,h Andk l,h the cross product result is then summedV l,h Performing cross multiplication to obtain output c of the head processing unithFusing the outputs C1, C2 to cH of each head processing unit to obtain the total output C of the head processing units, wherein the total output C passes through a weight matrix
Figure 928426DEST_PATH_IMAGE050
To obtain a target outputX l,2
In the embodiment of the application, the middle output of the multi-head attention layer is a matrixQ l,h Matrix arrayk l,h Sum matrixV l,h . Target output isX l,2
S311, according to the intermediate feature vector, the attention score and the low-frequency correction quantity of the first visual network model are determined.
Wherein the attention score of the multi-head attention layer is determined according to the intermediate output, wherein the intermediate output isQ l,h k l,h AndV l,h
specifically, referring to fig. 4, the attention score of each head processing unit is calculated using the following formula (8):
Figure 940244DEST_PATH_IMAGE089
formula (8)
Wherein, in the formula (8),
Figure 29423DEST_PATH_IMAGE090
is shown as followslSecond of the conversion modules of a stagehThe attention score of the individual head processing unit,
Figure 152100DEST_PATH_IMAGE091
is the head processing unit output matrix chIs output dimension.
In the embodiment of the present application, the attention score of each head processing unit may be used to measure the information amount of the intermediate feature vector (token), and the attention score determines the influence degree of one token on other tokens. It can be understood that when the feature of the first image is combined with the self-attention mechanism, the token corresponding to the larger attention score can provide more information, and in the embodiment of the present application, the average attention score of the H head processing units in the multi-head attention layer is determined by specifically using formula (9):
Figure 643124DEST_PATH_IMAGE092
formula (9)
Wherein the target output of the multi-head attention layerX l,2 Includes a plurality of intermediate feature vectors (tokens), wherein the plurality of intermediate feature vectors has a classification token that represents information of all tokens and is therefore more important than other tokens.N l Represents the total number of tokens, where j takes 1 toN l H represents the number of head processing units,
Figure 91423DEST_PATH_IMAGE093
the attention scores of the j tokens are represented, the greater the attention score, the more important the class token is. Based on this, rewriting the formula (9) yields the following formula (10):
Figure 605581DEST_PATH_IMAGE094
equation 10)
Wherein, in the formula (10)
Figure 531948DEST_PATH_IMAGE095
For the attention score of a multi-head attention tier,
Figure 877479DEST_PATH_IMAGE096
Figure 998144DEST_PATH_IMAGE097
. Wherein the content of the first and second substances,
Figure 796336DEST_PATH_IMAGE098
an attention score representing the classification token,
Figure 526394DEST_PATH_IMAGE099
indicating the attention scores of the other tokens.
In addition, the low-frequency filtering processing is carried out on the intermediate characteristic vector aiming at the intermediate characteristic vector output by the target, and a low-frequency correction vector of the intermediate characteristic vector is obtained.
Here, the token containing important low-frequency information cannot be clearly identified only by using the above attention score, and therefore, in the embodiment of the present application, such missing information is introduced from the frequency domain. Specifically, since the token is transformed according to the change of the image input to the visual network model, the embodiment of the present application proposes low frequency energy correction (LEC) to delete the token by using the characteristics of the visual network model in the frequency domain.
Here, it is required to determine how much low-frequency information each token contains, and the specific implementation process is to convert the token into frequency-domain information by using fourier transform, determine the total energy of the token containing the low-frequency component in the frequency-domain information, and apply a cut-off ratio of the frequency-domain information to the frequency-domain information
Figure 929694DEST_PATH_IMAGE100
Low pass filter of
Figure 454216DEST_PATH_IMAGE101
The frequency domain information is processed to highlight more low frequency information in the frequency domain information.
Further, if the first visual network model is input as the first image in the compression process of the feature vector, the finally obtained low-frequency correction vector refers to the formula (11):
Figure 739704DEST_PATH_IMAGE102
formula (11)
Wherein: in the formula (11), the reaction mixture,
Figure 273453DEST_PATH_IMAGE103
a low-frequency correction amount is indicated,
Figure 327997DEST_PATH_IMAGE104
is used to calculate the value of the modulus,
Figure 289000DEST_PATH_IMAGE105
representing a target output
Figure 563249DEST_PATH_IMAGE106
The ith token of (a) is,
Figure 572793DEST_PATH_IMAGE107
representing a token
Figure 747422DEST_PATH_IMAGE106
And performing Fourier transform.
In an alternative embodiment, if the first visual network model is the second image during the compression of the feature vector, the low-frequency correction is obtained as
Figure 613747DEST_PATH_IMAGE108
And S312, integrating the attention score and the low-frequency correction to obtain the importance score of the intermediate feature vector.
Wherein the importance score is determined using equation (12):
Figure 139407DEST_PATH_IMAGE109
formula (12)
In the present embodiment, importance scores
Figure 749379DEST_PATH_IMAGE110
The larger the corresponding intermediate feature vector is, the higher the importance of the corresponding intermediate feature vector is, the more information of the first image is contained.
S313, based on the importance scores, a score threshold value is configured in the first visual network model, and a second visual network model is obtained.
Wherein the score threshold is used for instructing the second visual network model to delete the feature vectors with the importance scores lower than the score threshold.
Further, based on the importance score, configuring a score threshold in the first visual network model, and obtaining a second visual network model, including: deleting at least one intermediate feature vector of the first visual network model according to the importance score, wherein the influence degree of the deleted intermediate feature vector on the first visual network model is smaller than an influence degree threshold value; and taking the highest importance score in the deleted intermediate feature vectors as a score threshold, and configuring the score threshold in the first visual network model to obtain a second visual network model.
The influence degree of the deleted intermediate characteristic vector on the first visual network model is smaller than an influence degree threshold value;
specifically, deleting at least one intermediate feature vector in the order of the importance scores from small to large; inputting the residual intermediate characteristic vectors into a subsequent structure of the first visual network model for identification processing to obtain a fifth output result output by the visual network model; determining a fifth loss value of the fifth output result relative to the true result; and if the fifth loss value is greater than or equal to the loss value threshold, determining the highest importance score in the deleted intermediate feature vectors as the score threshold of the selection layer. And if the fifth loss value is smaller than the loss value threshold, increasing the number of the deleted intermediate feature vectors, and continuing to delete at least one intermediate feature vector according to the sequence from small to large of the importance scores.
In the embodiment of the present application, the intermediate feature vectors may be deleted in N incremental ways until the loss requirement is reached. Wherein N is an integer greater than 1.
S310 to S313 are performed for the selection layer of the conversion module. Referring to FIG. 6, target output of a multi-headed attention layerX l,2 A plurality of intermediate feature vectors (token 1-token 6) are included, wherein token1 is a classification token and tokens 2-token 6 are other tokens. Outputting the targetX l,2 Inputting a selection layer, calculating an importance score of each token in the selection layer, and calculating an importance score according to the importanceThe tokens are ranked by sex score, as in fig. 6, and the importance scores are in order from small to large, token2, token4, token3, token6, token5, token 1. The selection layer deletes at least one intermediate feature vector according to the order of the importance scores from small to large, for example, in fig. 6, deletes one token, for example, token2, and then obtains the target output of the selection layerX l,3
Wherein the subsequent structure comprises a feedforward network layer of the current conversion module, a conversion module at the upper stage of the current conversion module, and the output of the last conversion moduleX n+1,1 Is a fifth output result of the visual network model. The selection layer is used for determining the importance scores of the vectors in the output result of the multi-head attention layer and deleting the vectors with the importance scores lower than the score threshold;
illustratively, if the third loss value is smaller than the loss value threshold, it is determined that the currently deleted intermediate feature vector has less influence on the recognition accuracy of the visual network model, and then the intermediate feature vector is continuously deleted. For example, refer to FIG. 6X l,3 Deleting token4 on the basis that the resulting embedded matrix includes: and the token1, the token3, the token5 and the token6 input the embedded matrix into a subsequent visual network model to obtain a fifth output result, and if the fifth loss value is greater than or equal to the loss value threshold value, the importance score corresponding to the token4 is used as the score threshold value of the current conversion module.
In this embodiment of the application, the scoring threshold of the conversion module and the subsequent compression step may be sequentially determined according to the sequence from the lower level to the upper level of the conversion module, after one conversion module completes the determination and compression of the scoring threshold, the determination and compression of the scoring threshold of the upper level conversion module are performed, and finally, the compression of the entire visual network model is completed.
Wherein the score threshold is used for instructing the second visual network model to delete the feature vectors with the importance scores lower than the score threshold.
Specifically, if the fifth loss value is greater than or equal to the loss value threshold, it may be determined that, of the deleted intermediate feature vectors, the intermediate feature vector with the highest importance score may affect the recognition accuracy of the visual network model after deletion, and therefore, the importance score is used as the score threshold of the current visual network model, and when the visual network model is compressed and used online, the intermediate feature vector with the importance score smaller than the score threshold may be deleted, so that subsequent calculation amount may be reduced, and the recognition efficiency of the visual network model may be accelerated.
In this embodiment of the application, the first visual network model includes a plurality of conversion modules, the plurality of conversion modules are connected in cascade, and in the compression process of the first visual network model, compression is performed on each conversion module in sequence according to the sequence from a lower level to an upper level of the conversion module.
Further, in the process of compressing the first visual network model by using a compression method based on the frequency domain information to obtain the second visual network model, the method further includes: sending the size of the current compressed model of the first visual network model to a user terminal; and if receiving a compression stop instruction sent by the user terminal, determining that the currently compressed first visual network model is the second visual network model.
In the embodiment of the present application, a cascaded compression manner is adopted, referring to fig. 1, compression of the conversion module a2 is performed after compression of the conversion module a1 is completed, and therefore, after compression of each conversion module is completed, a model size of a currently compressed first visual network model is sent to a user terminal, where the model size refers to a memory size occupied by the currently compressed first visual network model, if a user determines that the current model size meets a requirement, a compression stop instruction is sent to a server through the user terminal, and if the user determines that the current model size does not meet the requirement, a compression continuation instruction may also be sent to the server through the terminal, so that the server continues to compress a next conversion module.
In the embodiment of the present application, each conversion module is compressed in a manner from a lower level to an upper level, and in each conversion module, the channel compression may be performed through steps S301 to S309, and then the scoring threshold of the current conversion module is determined through steps S310 to S313. Or determining a scoring threshold value first, and then performing channel compression until all the conversion modules determine the scoring threshold value and are compressed, thereby obtaining a compressed visual network model. In addition, the channels and tokens are deleted according to the preference of the visual network model to the low-frequency information, so that the calculation efficiency of the compressed visual network model is greatly improved in the use process. And occupies smaller memory, and meanwhile, the identification precision of the visual network model cannot be influenced.
Fig. 7 is a flowchart illustrating steps of another model compression method according to an exemplary embodiment of the present application. As shown in fig. 7, the method specifically includes the following steps:
and S701, receiving the remote sensing image sent by the user terminal.
The remote sensing image is a film or a photo for recording electromagnetic waves of various ground objects, and is divided into an aerial image and a satellite image. The remote sensing image has higher requirements on image details.
S702, determining low-frequency information in the remote sensing image.
The determination of the low frequency information refers to the above embodiments, and is not described herein again.
And S703, compressing the trained first remote sensing model by adopting a channel compression mode based on the low-frequency information to obtain a second remote sensing model.
In the embodiment of the present application, the first remote sensing model is also a visual network model, and a channel compression manner is adopted for the remote sensing image, and specific compression manners refer to S301 to S309 in the above embodiment, which is not described herein again.
In addition, after the second remote sensing model is obtained, training data are obtained, and the training data are remote sensing images; and optimizing and training the second remote sensing model according to the training data to obtain the target remote sensing model. And the obtained target remote sensing model can identify and process the remote sensing image.
In the embodiment of the application, the target remote sensing model can be obtained to accurately identify the remote sensing image, and the compressed target remote sensing model occupies a small memory and has a high calculation speed.
Fig. 8 is a flowchart illustrating steps of an image processing method according to an exemplary embodiment of the present application. As shown in fig. 8, the method specifically includes the following steps:
and S801, acquiring an image to be processed.
The image to be processed is a natural image to be identified by the visual network model. The image to be processed and the first image can be a kind of image, or the image to be processed is a remote sensing image.
S802, sending the image to be processed to the server, so that the server can identify the image to be processed by adopting the visual network model to obtain a processing result.
The visual network model is obtained according to the model compression method and can be a second visual network model or a target remote sensing model. Furthermore, the importance score of each token of the input embedded matrix is calculated by the selection layer of each conversion module of the visual network model, and then the tokens with the importance scores smaller than the score threshold are deleted according to the corresponding score threshold, so that the calculation rate of the visual network model can be reduced.
Illustratively, the visual network model includes: the score threshold of the first conversion module is 0.3, the score threshold of the second conversion module is 0.2, and the score threshold of the third conversion module is 0.1. Deleting the token with the importance score smaller than 0.3 when the first conversion module performs calculation, deleting the token with the importance score smaller than 0.2 when the second conversion module performs calculation, and deleting the token with the importance score smaller than 0.1 when the third conversion module performs calculation.
S803, the processing result transmitted by the server is received.
Aiming at the detection scene, the corresponding processing result is a detection result; for a segmentation task scene, the corresponding processing result is a segmentation result; for the classification scene, the corresponding processing result is a classification result; and for the retrieval task scene, the corresponding processing result is a retrieval result.
The visual network model in the embodiment of the application occupies a small memory, and the image to be processed is identified quickly and accurately.
In the embodiment of the present application, referring to fig. 9, in addition to providing a model compression method, there is provided a model compression apparatus 90, the model compression apparatus 90 including:
the acquiring module 91 is configured to receive a first image sent by a user terminal, where the first image is an image of a preset application scene;
a first determining module 92, configured to determine, according to a preset application scenario, a compression mode of the trained first visual network model, where the compression mode includes channel compression and/or compression of a feature vector;
a second determining module 93, configured to determine frequency domain information of the first image.
And a compression module 94, configured to compress the first visual network model in a compression manner based on the frequency domain information to obtain a second visual network model.
In an optional embodiment, the frequency domain information includes low frequency information, and the second determining module 8 is specifically configured to: converting the first image into a frequency domain image; weakening the high-frequency information of the frequency domain image to obtain a filtering frequency spectrum image; and performing inverse transformation on the filtered spectrum image to obtain a second image, wherein the second image comprises low-frequency information.
In an alternative embodiment, the compression mode is channel compression, and the compression module 84 is specifically configured to: inputting the first image into a first visual network model for identification processing to obtain a first output result; deleting channels aiming at each channel of the first visual network model to obtain a current visual network model; inputting the second image into the current visual network model for recognition processing to obtain a second output result; and performing channel compression on the first visual network model based on the first output result and the second output result to obtain a second visual network model.
In an optional embodiment, when the compression module 94 performs channel compression on the first visual network model based on the first output result and the second output result to obtain the second visual network model, it is specifically configured to: determining a first loss value of the second output result relative to the first output result, wherein the magnitude of the first loss value represents the influence degree of the channel on the first visual network model; and deleting at least one channel of the first visual network model according to the first loss value to obtain a second visual network model, wherein the influence degree of the deleted channel on the first visual network model is smaller than the influence degree threshold value.
In an alternative embodiment, the compression manner is compression of the feature vector, and the compression module 94 is specifically configured to: inputting the second image into the first visual network model for identification processing to obtain a plurality of intermediate characteristic vectors corresponding to the second image; according to the intermediate feature vector, determining the attention score and the low-frequency correction quantity of the first visual network model; integrating the attention score and the low-frequency correction to obtain an importance score of the intermediate feature vector; and configuring a score threshold in the first visual network model based on the importance score to obtain a second visual network model, wherein the score threshold is used for indicating the second visual network model to delete the feature vectors with the importance scores lower than the score threshold.
In an optional embodiment, the compression module 94, when configuring a score threshold in the first visual network model based on the importance score to obtain the second visual network model, is specifically configured to: deleting at least one intermediate feature vector of the first visual network model according to the importance score, wherein the influence degree of the deleted intermediate feature vector on the first visual network model is smaller than an influence degree threshold value; and taking the highest importance score in the deleted intermediate feature vectors as a score threshold, and configuring the score threshold in the first visual network model to obtain a second visual network model.
In an optional embodiment, the first visual network model includes a plurality of conversion modules, the plurality of conversion modules are connected in cascade, and in the compression process of the first visual network model, compression is performed on each conversion module in sequence according to the sequence from a lower stage to an upper stage of the conversion module.
In an optional embodiment, the preset application scenario includes: the method comprises the steps of detecting one of a scene, a task segmentation scene, a classification scene and a task retrieval scene, wherein the compression mode comprises channel compression under the detection scene and the task segmentation scene, and the compression mode comprises channel compression and compression of characteristic vectors under the classification scene and the task retrieval scene.
In an optional embodiment, after the first determining module 92 determines the compression mode of the trained first visual network model according to a preset application scenario, the first determining module is further configured to: sending a compression mode to a user terminal; if negative feedback information of the user terminal on the compression modes is received, sending a plurality of compression modes to the user terminal; and receiving at least one compression mode sent by the user terminal, wherein the at least one compression mode is used for compressing the first visual network model.
In an optional embodiment, the compressing module 94 is further configured to, in the process of compressing the first visual network model by using a compression method based on the frequency domain information to obtain the second visual network model: sending the size of the current compressed model of the first visual network model to a user terminal; and if receiving a compression stop instruction sent by the user terminal, determining that the currently compressed first visual network model is the second visual network model.
The model compression device provided by the embodiment of the application can achieve the purpose of obtaining the compressed visual network model, and the compressed visual network model occupies a smaller memory and has higher calculation efficiency under the condition of not influencing the identification precision.
In the embodiment of the present application, referring to fig. 10, there is also provided another pattern compression apparatus 10, the pattern compression apparatus 10 including:
the receiving module 11 is used for receiving the remote sensing image sent by the user terminal;
the determining module 12 is used for determining low-frequency information in the remote sensing image;
and the compression module 13 is used for compressing the trained first remote sensing model by adopting a channel compression mode based on the low-frequency information to obtain a second remote sensing model.
In an alternative embodiment, the model compressing apparatus 10 further includes a training module (not shown) for acquiring training data, wherein the training data is a remote sensing image; and optimizing and training the second remote sensing model according to the training data to obtain a target remote sensing model.
In the embodiment of the application, the target remote sensing model which can accurately identify the remote sensing image, occupies small memory and has high calculation speed can be obtained.
In the embodiment of the present application, referring to fig. 11, there is also provided an image processing apparatus 110, where the image processing apparatus 110 includes:
an obtaining module 111, configured to obtain an image to be processed;
and a sending module 112, configured to send the image to be processed to the server, so that the server performs identification processing on the image to be processed by using a visual network model, and obtains a processing result, where the visual network model is obtained according to the model compression method.
A receiving module 113, configured to receive a processing result sent by the server.
The image processing device provided by the embodiment of the application can realize rapid and accurate identification of the image to be processed.
In addition, in some of the flows described in the above embodiments and the drawings, a plurality of operations are included in a specific order, but it should be clearly understood that the operations may be executed out of order or in parallel as they appear in the present document, and only for distinguishing between the various operations, and the sequence number itself does not represent any execution order. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.
Fig. 12 is a schematic structural diagram of a cloud device 120 according to an exemplary embodiment of the present application. The cloud device 120 is configured to run the above-described model compression method or image processing method. As shown in fig. 12, the cloud device includes: a memory 124 and a processor 125.
A memory 124 for storing computer programs and may be configured to store other various information to support operations on the cloud device. The Storage 124 may be an Object Storage Service (OSS).
The memory 124 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
A processor 125, coupled to the memory 124, for executing the computer program in the memory 124 to: receiving a first image sent by a user terminal, wherein the first image is an image of a preset application scene; determining a compression mode of the trained first visual network model according to a preset application scene, wherein the compression mode comprises channel compression and/or feature vector compression; determining frequency domain information of a first image; and compressing the first visual network model by adopting a compression mode based on the frequency domain information to obtain a second visual network model.
Further optionally, the frequency domain information includes low frequency information, and the processor 125 is specifically configured to, when determining the frequency domain information of the first image: converting the first image into a frequency domain image; weakening the high-frequency information of the frequency domain image to obtain a filtering frequency spectrum image; and performing inverse transformation on the filtered spectrum image to obtain a second image, wherein the second image comprises low-frequency information.
Further optionally, the compression method is channel compression, and the processor 125 is specifically configured to, when compressing the visual network model in a compression method based on the frequency domain information to obtain a compressed visual network model: inputting the first image into a first visual network model for identification processing to obtain a first output result; deleting channels aiming at each channel of the first visual network model to obtain a current visual network model; inputting the second image into the current visual network model for recognition processing to obtain a second output result; and performing channel compression on the first visual network model based on the first output result and the second output result to obtain a second visual network model.
Further optionally, the compression mode is compression of the feature vector, and when the processor 125 performs channel compression on the first visual network model based on the first output result and the second output result to obtain the second visual network model, the processor is specifically configured to: determining a first loss value of the second output result relative to the first output result, wherein the magnitude of the first loss value represents the influence degree of the channel on the first visual network model; and deleting at least one channel of the first visual network model according to the first loss value to obtain a second visual network model, wherein the influence degree of the deleted channel on the first visual network model is smaller than the influence degree threshold value.
Further optionally, the compression mode is compression of the feature vector, and the processor 125 is specifically configured to, when compressing the first visual network model in the compression mode based on the frequency domain information to obtain the second visual network model: inputting the second image into the first visual network model for identification processing to obtain a plurality of intermediate characteristic vectors corresponding to the second image; according to the intermediate feature vector, determining the attention score and the low-frequency correction quantity of the first visual network model; integrating the attention score and the low-frequency correction to obtain an importance score of the intermediate feature vector; and configuring a score threshold in the first visual network model based on the importance score to obtain a second visual network model, wherein the score threshold is used for indicating the second visual network model to delete the feature vectors with the importance scores lower than the score threshold.
Further optionally, the processor 125, when configuring a score threshold in the first visual network model based on the importance score to obtain the second visual network model, is specifically configured to: deleting at least one intermediate feature vector of the first visual network model according to the importance score, wherein the influence degree of the deleted intermediate feature vector on the first visual network model is smaller than an influence degree threshold value; and taking the highest importance score in the deleted intermediate feature vectors as a score threshold, and configuring the score threshold in the first visual network model to obtain a second visual network model.
Further optionally, the first visual network model includes a plurality of conversion modules, the plurality of conversion modules are connected in cascade, and in the compression process of the first visual network model, the processor 125 is further configured to sequentially compress for each conversion module according to an order of the conversion module from a lower level to an upper level.
In an optional embodiment, the processor 125, after determining the compression mode of the trained first visual network model according to the preset application scenario, is further configured to: sending a compression mode to a user terminal; if negative feedback information of the user terminal on the compression modes is received, sending a plurality of compression modes to the user terminal; and receiving at least one compression mode sent by the user terminal, wherein the at least one compression mode is used for compressing the first visual network model.
In an optional embodiment, the processor 125, in the process of compressing the first visual network model by using a compression method based on the frequency domain information to obtain the second visual network model, is further configured to: sending the size of the current compressed model of the first visual network model to a user terminal; and if receiving a compression stop instruction sent by the user terminal, determining that the currently compressed first visual network model is the second visual network model.
In an alternative embodiment, the processor 125, coupled to the memory 124, is configured to execute the computer program in the memory 124 to: receiving a remote sensing image sent by a user terminal; determining low-frequency information in the remote sensing image; and based on the low-frequency information, compressing the trained first remote sensing model by adopting a channel compression mode to obtain a second remote sensing model.
In an optional embodiment, the processor 125, after compressing the trained first remote sensing model by using a channel compression method based on the low-frequency information to obtain a second remote sensing model, is further configured to: acquiring training data, wherein the training data are remote sensing images; and optimizing and training the second remote sensing model according to the training data to obtain the target remote sensing model.
In an alternative embodiment, the processor 125, coupled to the memory 124, is configured to execute the computer program in the memory 124 to: acquiring an image to be processed; sending the image to be processed to a server for the server to identify the image to be processed by adopting a visual network model to obtain a processing result, wherein the visual network model is obtained according to the model compression method; and receiving the processing result sent by the server.
Further, as shown in fig. 12, the cloud device further includes: firewall 121, load balancer 122, communications component 126, power component 123, and other components. Only some of the components are schematically shown in fig. 12, and the cloud device is not meant to include only the components shown in fig. 12.
The cloud equipment provided by the embodiment of the application can obtain the compressed visual network model, and the compressed visual network model occupies a smaller memory and has higher calculation efficiency under the condition of not influencing the identification precision.
Accordingly, the present application also provides a computer readable storage medium storing a computer program, which when executed by a processor causes the processor to implement the steps of the above-mentioned method.
Accordingly, embodiments of the present application also provide a computer program product, which includes computer programs/instructions, when executed by a processor, cause the processor to implement the steps in the method shown above.
The communications component of fig. 12 described above is configured to facilitate communications between the device in which the communications component is located and other devices in a wired or wireless manner. The device where the communication component is located can access a wireless network based on a communication standard, such as WiFi, a mobile communication network such as 2G, 3G, 4G/LTE, 5G, or the like, or a combination thereof. In an exemplary embodiment, the communication component receives a broadcast signal or broadcast associated text from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared information association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
The power supply module of fig. 9 provides power to various components of the device in which the power supply module is located. The power components may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device in which the power component is located.
In the several embodiments provided in the present application, it should be understood that the disclosed system and method may be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, systems or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
It is obvious to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above functions may be distributed by different functional modules according to needs, that is, the internal structure of the system is divided into different functional modules to complete all or part of the above described functions. For the specific working process of the system described above, reference may be made to the corresponding process in the foregoing method embodiment, which is not described herein again.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (14)

1. A model compression method is applied to a server, and comprises the following steps:
receiving a first image sent by a user terminal, wherein the first image is an image of a preset application scene;
determining a compression mode of the trained first visual network model according to the preset application scene, wherein the compression mode comprises channel compression and/or feature vector compression;
determining frequency domain information of the first image;
and compressing the first visual network model by adopting the compression mode based on the frequency domain information to obtain a second visual network model.
2. The model compression method of claim 1, wherein the frequency domain information includes low frequency information, and wherein the determining the frequency domain information for the first image comprises:
converting the first image into a frequency domain image;
weakening the high-frequency information of the frequency domain image to obtain a filtering frequency spectrum image;
and performing inverse transformation on the filtered spectrum image to obtain a second image, wherein the second image comprises the low-frequency information.
3. The model compression method according to claim 2, wherein the compression method is the channel compression, and the compressing the visual network model by using the compression method based on the frequency domain information to obtain a compressed visual network model comprises:
inputting the first image into the first visual network model for identification processing to obtain a first output result;
deleting each channel of the first visual network model to obtain a current visual network model;
inputting the second image into the current visual network model for recognition processing to obtain a second output result;
and performing channel compression on the first visual network model based on the first output result and the second output result to obtain the second visual network model.
4. The model compression method according to claim 3, wherein the channel compressing the first visual network model based on the first output result and the second output result to obtain the second visual network model comprises:
determining a first loss value of the second output result relative to the first output result, the magnitude of the first loss value representing the degree of influence of the channel on the first visual network model;
and deleting at least one channel of the first visual network model according to the first loss value to obtain the second visual network model, wherein the influence degree of the deleted channel on the first visual network model is smaller than an influence degree threshold value.
5. The model compression method according to claim 2, wherein the compression method is compression of the feature vector, and the compressing the first visual network model by using the compression method based on the frequency domain information to obtain a second visual network model comprises:
inputting the second image into the first visual network model for identification processing to obtain a plurality of intermediate feature vectors corresponding to the second image;
according to the intermediate feature vector, determining an attention score and a low-frequency correction quantity of the first visual network model;
fusing the attention score and the low-frequency correction to obtain an importance score of the intermediate feature vector;
and configuring a scoring threshold in the first visual network model based on the importance scores to obtain the second visual network model, wherein the scoring threshold is used for indicating the second visual network model to delete the feature vectors with the importance scores lower than the scoring threshold.
6. The model compression method of claim 5, wherein the configuring a score threshold in the first visual network model based on the importance score to obtain the second visual network model comprises:
deleting at least one intermediate feature vector of the first visual network model according to the importance score, wherein the influence degree of the deleted intermediate feature vector on the first visual network model is smaller than an influence degree threshold value;
and taking the highest importance score in the deleted intermediate feature vectors as a score threshold, and configuring the score threshold in the first visual network model to obtain the second visual network model.
7. The model compression method according to any one of claims 1 to 6, wherein the first visual network model includes a plurality of conversion modules, the plurality of conversion modules are connected in cascade, and compression is performed for each conversion module in order of the conversion module from a lower stage to an upper stage in the compression process of the first visual network model.
8. The model compression method according to any one of claims 1 to 6, wherein the preset application scenario comprises: the method comprises the steps of detecting one of a scene, a task segmentation scene, a classification scene and a task retrieval scene, wherein the compression mode comprises channel compression under the detection scene and the task segmentation scene, and the compression mode comprises channel compression and compression of feature vectors under the classification scene and the task retrieval scene.
9. The model compression method according to any one of claims 1 to 6, wherein after determining the compression mode of the trained first visual network model according to the preset application scenario, the method further comprises:
sending the compression mode to the user terminal;
if negative feedback information of the user terminal to the compression mode is received, sending a plurality of compression modes to the user terminal;
and receiving at least one compression mode sent by the user terminal, wherein the at least one compression mode is used for compressing the first visual network model.
10. The model compression method according to any one of claims 1 to 6, wherein in the process of compressing the first visual network model by the compression method based on the frequency domain information to obtain a second visual network model, the method further comprises:
sending the size of the current compressed model of the first visual network model to the user terminal;
and if receiving a compression stop instruction sent by the user terminal, determining that the currently compressed first visual network model is the second visual network model.
11. A model compression method is applied to a server, and comprises the following steps:
receiving a remote sensing image sent by a user terminal;
determining low-frequency information in the remote sensing image;
and based on the low-frequency information, compressing the trained first remote sensing model by adopting a channel compression mode to obtain a second remote sensing model.
12. The model compression method according to claim 11, wherein, after compressing the trained first remote sensing model by using a channel compression method based on the low-frequency information to obtain a second remote sensing model, the method further comprises:
acquiring training data, wherein the training data are remote sensing images;
and optimally training the second remote sensing model according to the training data to obtain a target remote sensing model.
13. An image processing method applied to a terminal, the image processing method comprising:
acquiring an image to be processed;
sending the image to be processed to a server, so that the server can identify the image to be processed by adopting a visual network model to obtain a processing result, wherein the visual network model is obtained according to the model compression method of any one of claims 1 to 12;
and receiving the processing result sent by the server.
14. A cloud device, comprising: a processor, a memory and a computer program stored on the memory and executable on the processor, the processor implementing the model compression method according to any one of claims 1 to 12 or the image processing method according to claim 13 when executing the computer program.
CN202210902200.0A 2022-07-29 2022-07-29 Model compression method, image processing method and device and cloud equipment Active CN115063673B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210902200.0A CN115063673B (en) 2022-07-29 2022-07-29 Model compression method, image processing method and device and cloud equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210902200.0A CN115063673B (en) 2022-07-29 2022-07-29 Model compression method, image processing method and device and cloud equipment

Publications (2)

Publication Number Publication Date
CN115063673A true CN115063673A (en) 2022-09-16
CN115063673B CN115063673B (en) 2022-11-15

Family

ID=83205306

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210902200.0A Active CN115063673B (en) 2022-07-29 2022-07-29 Model compression method, image processing method and device and cloud equipment

Country Status (1)

Country Link
CN (1) CN115063673B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115953239A (en) * 2023-03-15 2023-04-11 无锡锡商银行股份有限公司 Surface examination video scene evaluation method based on multi-frequency flow network model

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160189388A1 (en) * 2014-12-24 2016-06-30 Canon Kabushiki Kaisha Video segmentation method
US20190354811A1 (en) * 2017-12-07 2019-11-21 Shanghai Cambricon Information Technology Co., Ltd Image compression method and related device
CN110782406A (en) * 2019-10-15 2020-02-11 深圳大学 Image denoising method and device based on information distillation network
CN112396179A (en) * 2020-11-20 2021-02-23 浙江工业大学 Flexible deep learning network model compression method based on channel gradient pruning
CN112749802A (en) * 2021-01-25 2021-05-04 深圳力维智联技术有限公司 Neural network model training method and device and computer readable storage medium
CN112906874A (en) * 2021-04-06 2021-06-04 南京大学 Convolutional neural network characteristic graph data compression method and device
CN113255433A (en) * 2021-04-06 2021-08-13 北京迈格威科技有限公司 Model training method, device and computer storage medium
WO2021208151A1 (en) * 2020-04-13 2021-10-21 商汤集团有限公司 Model compression method, image processing method and device
CN113657585A (en) * 2021-09-03 2021-11-16 南方电网电力科技股份有限公司 Pruning method and device for sparse network structure
CN114139705A (en) * 2021-12-03 2022-03-04 杭州电子科技大学 Structured pruning method based on image frequency response
WO2022057776A1 (en) * 2020-09-21 2022-03-24 华为技术有限公司 Model compression method and apparatus
CN114492731A (en) * 2021-12-23 2022-05-13 北京达佳互联信息技术有限公司 Training method and device of image processing model and electronic equipment
CN114548362A (en) * 2021-12-07 2022-05-27 广东机场白云信息科技有限公司 Deep learning knowledge distillation method and system based on frequency domain supervision
CN114757350A (en) * 2022-04-22 2022-07-15 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Convolutional network channel cutting method and system based on reinforcement learning
CN115049054A (en) * 2022-06-12 2022-09-13 中国科学院重庆绿色智能技术研究院 Channel self-adaptive segmented dynamic network pruning method based on characteristic diagram response

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160189388A1 (en) * 2014-12-24 2016-06-30 Canon Kabushiki Kaisha Video segmentation method
US20190354811A1 (en) * 2017-12-07 2019-11-21 Shanghai Cambricon Information Technology Co., Ltd Image compression method and related device
CN110782406A (en) * 2019-10-15 2020-02-11 深圳大学 Image denoising method and device based on information distillation network
WO2021208151A1 (en) * 2020-04-13 2021-10-21 商汤集团有限公司 Model compression method, image processing method and device
WO2022057776A1 (en) * 2020-09-21 2022-03-24 华为技术有限公司 Model compression method and apparatus
CN112396179A (en) * 2020-11-20 2021-02-23 浙江工业大学 Flexible deep learning network model compression method based on channel gradient pruning
CN112749802A (en) * 2021-01-25 2021-05-04 深圳力维智联技术有限公司 Neural network model training method and device and computer readable storage medium
CN113255433A (en) * 2021-04-06 2021-08-13 北京迈格威科技有限公司 Model training method, device and computer storage medium
CN112906874A (en) * 2021-04-06 2021-06-04 南京大学 Convolutional neural network characteristic graph data compression method and device
CN113657585A (en) * 2021-09-03 2021-11-16 南方电网电力科技股份有限公司 Pruning method and device for sparse network structure
CN114139705A (en) * 2021-12-03 2022-03-04 杭州电子科技大学 Structured pruning method based on image frequency response
CN114548362A (en) * 2021-12-07 2022-05-27 广东机场白云信息科技有限公司 Deep learning knowledge distillation method and system based on frequency domain supervision
CN114492731A (en) * 2021-12-23 2022-05-13 北京达佳互联信息技术有限公司 Training method and device of image processing model and electronic equipment
CN114757350A (en) * 2022-04-22 2022-07-15 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Convolutional network channel cutting method and system based on reinforcement learning
CN115049054A (en) * 2022-06-12 2022-09-13 中国科学院重庆绿色智能技术研究院 Channel self-adaptive segmented dynamic network pruning method based on characteristic diagram response

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ONAT DALMAZ 等,: "ResViT: Residual vision transformers for multi-modal medical image synthesis", 《ARXIV》 *
肖光义,: "基于高低分辨率图像联合判别的SAR图像超分辨率重建", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115953239A (en) * 2023-03-15 2023-04-11 无锡锡商银行股份有限公司 Surface examination video scene evaluation method based on multi-frequency flow network model

Also Published As

Publication number Publication date
CN115063673B (en) 2022-11-15

Similar Documents

Publication Publication Date Title
CN110032926B (en) Video classification method and device based on deep learning
KR102082815B1 (en) Artificial intelligence based resolution improvement system
US8948540B2 (en) Optimized orthonormal system and method for reducing dimensionality of hyperspectral images
US8463025B2 (en) Distributed artificial intelligence services on a cell phone
CN110956202B (en) Image training method, system, medium and intelligent device based on distributed learning
WO2019012363A1 (en) Visual quality preserving quantization parameter prediction with deep neural network
CN107578453A (en) Compressed image processing method, apparatus, electronic equipment and computer-readable medium
CN112639828A (en) Data processing method, method and equipment for training neural network model
CN107529098A (en) Real-time video is made a summary
CN110555527A (en) Method and equipment for generating delayed shooting video
CN109598250B (en) Feature extraction method, device, electronic equipment and computer readable medium
CN111260037B (en) Convolution operation method and device of image data, electronic equipment and storage medium
CN108198130A (en) Image processing method, device, storage medium and electronic equipment
CN108960314B (en) Training method and device based on difficult samples and electronic equipment
KR20200140713A (en) Method and apparatus for training neural network model for enhancing image detail
CN104063686A (en) System and method for performing interactive diagnosis on crop leaf segment disease images
CN112561028A (en) Method for training neural network model, and method and device for data processing
CN114283351A (en) Video scene segmentation method, device, equipment and computer readable storage medium
CN115063673B (en) Model compression method, image processing method and device and cloud equipment
KR102177247B1 (en) Apparatus and method for determining manipulated image
CN111199540A (en) Image quality evaluation method, image quality evaluation device, electronic device, and storage medium
CN111062914B (en) Method, apparatus, electronic device and computer readable medium for acquiring facial image
CN115205613A (en) Image identification method and device, electronic equipment and storage medium
Zhong et al. Prediction system for activity recognition with compressed video
CN111160201A (en) Face image uploading method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant