WO2023050720A1 - Image processing method, image processing apparatus, and model training method - Google Patents

Image processing method, image processing apparatus, and model training method Download PDF

Info

Publication number
WO2023050720A1
WO2023050720A1 PCT/CN2022/078897 CN2022078897W WO2023050720A1 WO 2023050720 A1 WO2023050720 A1 WO 2023050720A1 CN 2022078897 W CN2022078897 W CN 2022078897W WO 2023050720 A1 WO2023050720 A1 WO 2023050720A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature
coding unit
information
processed
Prior art date
Application number
PCT/CN2022/078897
Other languages
French (fr)
Chinese (zh)
Inventor
任聪
刘衡祁
徐科
孔德辉
宋剑军
易自尧
杨维
Original Assignee
深圳市中兴微电子技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市中兴微电子技术有限公司 filed Critical 深圳市中兴微电子技术有限公司
Publication of WO2023050720A1 publication Critical patent/WO2023050720A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/40Tree coding, e.g. quadtree, octree

Definitions

  • the present application relates to the technical field of image processing, and in particular to an image processing method, an image processing device, and a model training method.
  • the image In order to better improve the coding quality, the image usually adopts a quadtree block partition structure based on the coding unit (Coding Unit, CU), divides the best coding unit with the minimum rate distortion cost, and codes each coding unit separately , that is to use a block-based coding method for coding, and as the bit rate decreases, the quantization becomes rough, and there will be discontinuities at the boundaries of the blocks, forming obvious defects in the reconstructed image, that is, block effects, due to the difference between adjacent blocks There are obvious differences between the imagination, so that the original video will be distorted after encoding and decoding, resulting in poor user experience.
  • CU coding unit
  • the image quality enhancement algorithm used can optimize the codec results, such as histogram equalization or gamma correction, but the above-mentioned algorithm mainly performs image enhancement by artificially summarizing experience and human eye characteristics, which is very difficult. To a large extent, it is constrained by the image scene, which limits the ability to improve the image quality.
  • the convolutional neural network in deep learning is also used to enhance the image quality. When the convolution extracts features, it is extracted through the local receptive field. To a certain extent On the contrary, the correlation between blocks is ignored, and the image quality is still difficult to guarantee.
  • the present application proposes an image processing method, an image processing device, a model training method, a training device, and a computer-readable storage medium.
  • an embodiment of the present application provides an image processing method, the method comprising: acquiring an image to be processed, the image to be processed is obtained by decoding an original image; acquiring a coding unit of the original image when encoding Splitting information, the coding unit splitting information includes first position information and first size information of each coding unit; divide the image to be processed according to the first position information and the first size information to obtain multiple A feature block corresponding to the encoding unit; a connection between multiple feature blocks is established through the self-attention mechanism of the Transformer module to obtain a first output image corresponding to the original image.
  • the embodiment of the present application provides an image processing device, including a division module and a Transformer module, the division module is configured to obtain an image to be processed obtained after the original image is decoded and processed, and obtain the original image in Coding unit division information during encoding, the coding unit division information includes first position information and first size information of each coding unit, and according to the first position information and the first size information, the image to be processed is performing division to obtain a plurality of feature blocks corresponding to the coding unit; the Transformer module is configured to establish a connection between a plurality of feature blocks through a self-attention mechanism, and obtain the first feature block corresponding to the original image output image.
  • an embodiment of the present application provides a model training method, the model includes a Transformer module, and the method includes: acquiring an image to be processed, the image to be processed is a training sample in a constructed training set, wherein the The image to be processed is obtained by decoding the original image; obtaining coding unit division information of the original image during encoding, the coding unit division information including first position information and first size information of each coding unit; The image to be processed and the coding unit division information are input into the model, and the image to be processed is divided according to the first position information and the first size information to obtain a plurality of features corresponding to the coding unit block; through the self-attention mechanism of the Transformer module, a connection between a plurality of the feature blocks is established to obtain a first output image corresponding to the original image; according to the first output image and the objective function, the The model is trained to obtain the trained model.
  • an embodiment of the present application provides an image processing device, including at least one control processor and a memory for communicating with the at least one control processor; the memory stores information that can be processed by the at least one control processor. Instructions executed by a device, the instructions are executed by the at least one control processor, so that the at least one control processor can execute the image processing method as described in the first aspect above.
  • the embodiment of the present application provides a training device, including at least one control processor and a memory for communicating with the at least one control processor; the memory stores information that can be controlled by the at least one control processor Executable instructions, the instructions are executed by the at least one control processor, so that the at least one control processor can execute the model training method as described in the third aspect above.
  • the embodiment of the present application provides a computer-readable storage medium, which stores computer-executable instructions, and the computer-executable instructions are used to make a computer execute the image processing method described in the above first aspect or the above third aspect The described model training method.
  • Fig. 1 is a flowchart of steps of an image processing method provided by an embodiment of the present application
  • FIG. 2 is a flow chart of the steps of an image processing method provided in another embodiment of the present application.
  • Fig. 3 is a flow chart of steps of an image processing method provided by another embodiment of the present application.
  • FIG. 4 is a flow chart of the steps of an image processing method provided in another embodiment of the present application.
  • FIG. 5 is a flow chart of the steps of an image processing method provided by another embodiment of the present application.
  • Fig. 6 is a schematic structural diagram of an image processing device provided by another embodiment of the present application.
  • Fig. 7 is a schematic structural diagram of a Transformer module provided by another embodiment of the present application.
  • Fig. 8 is a flow chart of the steps of the model training method provided by another embodiment of the present application.
  • Fig. 9 is a flow chart of the steps of the model training method provided by another embodiment of the present application.
  • Fig. 10 is a flow chart of the steps of the model training method provided by another embodiment of the present application.
  • Fig. 11 is a flow chart of the steps of the model training method provided by another embodiment of the present application.
  • Fig. 12 is a schematic structural diagram of an image processing device provided by another embodiment of the present application.
  • Fig. 13 is a schematic structural diagram of a training device provided by another embodiment of the present application.
  • the first aspect embodiment of the present application provides an image processing method, including but not limited to step S110, step S120, step S130 and step S140:
  • Step S110 Acquire the image to be processed, which is obtained from the original image after decoding
  • the original image in order to improve the transmission efficiency and storage reliability of image data, the original image generally needs to be encoded and decoded. Since encoding and decoding will cause certain loss to the original image, it will affect the image quality and bring inconvenience to the user. A good sense of experience, so it is necessary to enhance the picture quality of the image obtained after decoding.
  • the picture quality is the picture quality, which is related to the degree of loss of the picture. Corresponding reduction. It can be understood that the image to be processed may be a text image or a video image.
  • Step S120 Obtain coding unit division information of the original image during encoding, where the coding unit division information includes first position information and first size information of each coding unit;
  • the quadtree block partition structure based on the coding unit is adopted to divide the optimal coding unit with the minimum rate-distortion cost, and encode each coding unit separately. It flexibly adapts to the texture features of various images and significantly improves the coding efficiency.
  • the divided coding units support different sizes. The advantage of this division is that, on the one hand, a larger coding unit can greatly improve the coding efficiency of a flat area, and on the other hand, a smaller coding unit can handle image localities well. details, which can make the prediction of complex images more accurate.
  • the first position information and the first size information of each coding unit can be obtained, so that the spatial information of the coding unit can be clearly understood.
  • Step S130 Divide the image to be processed according to the first position information and the first size information to obtain a plurality of feature blocks corresponding to the coding units;
  • the convolutional neural network is used to enhance the quality of the image.
  • the convolution extracts features, it is extracted through the local receptive field. The method ignores the correlation between blocks to a certain extent, and the convolutional neural network generally enhances the image quality based on fixed feature blocks, resulting in the final enhanced and optimized image not as expected.
  • the local features of the image to be processed are extracted by combining the coding unit division information, that is, the local features are extracted according to the first position information and the first size information, so as to obtain a plurality of feature blocks corresponding to the coding units.
  • the division method of the feature block is determined according to the division information of the coding unit, so that each feature block corresponds to the coding unit, and the correlation between the feature blocks obtained by combining position and size information is stronger.
  • Step S140 establish the connection between multiple feature blocks through the self-attention mechanism of the Transformer module, and obtain a first output image corresponding to the original image.
  • the self-attention mechanism in the Transformer module in natural language processing tasks can effectively overcome the limitations brought about by the convolutional inductive bias, and take more into account the global information of the language. Therefore, in order to learn and infer non-local components, the embodiment of the present application establishes the connection between multiple feature blocks through the Transformer module. Since the feature blocks are divided according to the coding unit division information, the self-attention mechanism of the Transformer module can obtain The long-distance dependence of the coding unit in the coding code, and learning the correlation between different feature blocks to establish global information, so that the established global information can be more in line with the rules of coding, thereby greatly reducing the difference between adjacent blocks , making the transition between blocks smoother.
  • the coding unit division information includes the first position information and the first size information of each coding unit, and the image to be processed is divided according to the coding unit division information Divide into multiple feature blocks to make full use of the local coding information, so that the divided feature blocks correspond to the coding units, and then use the self-attention mechanism of the Transformer module to establish the connection between the feature blocks, that is, to establish global information, through
  • the interaction between local information and global information can better remove the difference between adjacent feature blocks, make the transition between blocks smoother, and thus better enhance the image quality of the encoded and decoded image.
  • step S130 the image to be processed is divided according to the first position information and the first size information to obtain a plurality of feature blocks corresponding to the coding units, including:
  • the image to be processed is divided into a plurality of characteristic blocks according to the first position information and the first size information, so that the positions and sizes of the characteristic blocks are the same as the coding units divided into the original image during encoding.
  • the coding unit division information includes the first position information and the first size information of each coding unit, the image to be processed is divided according to the first position information and the first size information, and the local coding information is effectively used to obtain A plurality of feature blocks corresponding to the coding units divided by the original image during encoding, each feature block corresponds to each coding unit position one by one and the size is consistent, that is, the divided feature blocks CU 1 , CU 2 , ... , CU n are consistent with the encoding, and the correlation between these feature blocks is established through the self-attention mechanism in Transformer, which contains rich global information.
  • step S140 before the connection between multiple feature blocks is established through the self-attention mechanism of the Transformer module in step S140, it also includes but is not limited to step S210 and step S220:
  • Step S210 Flatten multiple feature blocks into multiple first feature data to obtain a first feature sequence, wherein the first feature data is represented by a one-dimensional vector;
  • Step S220 Input the first feature sequence to the Transformer module.
  • the divided feature blocks are two-dimensional vector data, and multiple feature blocks need to be flattened into first feature data represented by one-dimensional vectors, and the first feature sequence is composed of multiple first feature data, which is convenient
  • the above-mentioned first feature sequence is input into the Transformer module for image enhancement processing, and the correlation between different one-dimensional data is learned through Transformer's self-attention mechanism, thereby establishing global information.
  • each CU 1 , CU 2 , ..., CU n is flattened into corresponding CU f1 , CU f2 , ..., CU fn to obtain a one-dimensional data sequence [CU f1 , CU f2 , .. ., CU fn ], since the size of the feature block division may be different, the length of the first feature data may also be inconsistent.
  • step S140 the self-attention mechanism of the Transformer module establishes the connection between multiple feature blocks, including but not limited to step S310, step S320, step S330, step S340 and step S350:
  • Step S310 According to the first feature data and the first preset matrix, obtain a second feature sequence composed of a plurality of second feature data with the same length;
  • Step S320 Establish the correlation between multiple second feature data through the self-attention mechanism of the Transformer module, and obtain a third feature sequence through residual connection and transformation processing, wherein the third feature sequence is composed of multiple third feature data composition;
  • Step S330 According to the third feature data and the second preset matrix, a fourth feature sequence composed of a plurality of fourth feature data is obtained, wherein the fourth feature data is represented by a one-dimensional vector;
  • Step S340 Restore the fourth feature data into a feature block represented by a two-dimensional vector
  • Step S350 Obtain the first output image according to the plurality of feature blocks.
  • the Transformer module when the first feature sequence [CU f1 , CU f2 , ..., CU fn ] is input to the Transformer module, since the length of the first feature data may be inconsistent, the multiple first The feature data are all converted into the same length, the first feature data is expressed in the form of a row matrix, and multiple first feature data are multiplied by the corresponding first preset matrix, the number of rows of the first preset matrix is equal to the first feature The number of columns of the data is the same, and the number of columns of the first preset matrix is a preset length, so that a plurality of second feature data of the same length are calculated to form a second feature sequence.
  • n is the number of divided feature blocks, which are optimally generated according to different coding objects.
  • the second preset array is a series of matrices of d model ⁇ len f1, d model ⁇ len f2 ,..., d model ⁇ len fn , so as to obtain a plurality of fourth feature data with inconsistent lengths, restore them to the original length, and A one-dimensional data sequence [CU p1 , CU p2 , . . . , CU pn ] is formed, that is, the fourth feature sequence.
  • the feature block represented by the two-dimensional vector is restored, that is, the original size is restored.
  • multiple feature blocks are combined into a complete image, so that the obtained first output image FM p has the same size as the original image.
  • step S320 the self-attention mechanism of the Transformer module is used to establish the correlation between multiple second feature data, including but not limited to step S410 and step S420:
  • Step S410 Obtain the second position information and second size information of each feature block during division
  • Step S420 According to the second position information and the second size information, establish the correlation between multiple second feature data through the self-attention mechanism of the Transformer module.
  • the second position information and the second size information correspond to the first position information and the first size information respectively, and by obtaining the second position information and the second size information of each feature block, the space of the feature block can be clearly Information, through the self-attention mechanism of the Transformer module, the correlation between multiple second feature data is established, and the second position information and the second size information are combined to facilitate the information interaction between adjacent feature blocks.
  • step S350 the first output image is obtained according to a plurality of feature blocks, including:
  • the plurality of feature blocks are stitched into a first output image according to the second position information.
  • the position representation of the feature blocks in the two-dimensional space can be enhanced, which is beneficial to greatly improve the image processing efficiency.
  • the details of the image are reconstructed through the Resblock convolutional network structure, thereby enhancing the useful information in the image.
  • the ResNet50 structure can be used to enhance the details of the first output image, improve the image quality, and help improve the visual effect of the image. It should be noted that other convolution structures may also be used, which are not specifically limited in this embodiment of the present application.
  • the image to be processed is a video image
  • the image processing method includes but is not limited to the following steps:
  • Step S510 Obtain the image to be processed obtained after decoding the original video image
  • Step S520 Obtain coding unit division information of the original image during encoding, the coding unit division information includes first position information and first size information of each coding unit;
  • Step S530 Divide the image to be processed according to the first position information and the first size information to obtain a plurality of feature blocks corresponding to the coding units;
  • Step S540 Flatten multiple feature blocks into corresponding first feature data to obtain a first feature sequence, wherein the first feature data is represented by a one-dimensional vector, and input the first feature sequence to the Transformer module;
  • Step S550 multiplying the first feature sequence by a plurality of corresponding first preset matrices to obtain a second feature sequence composed of a plurality of second feature data of the same length;
  • Step S560 Obtain the second position information and second size information of each feature block during division
  • Step S570 according to the second position information and the second size information, establish the correlation between multiple second feature data through the self-attention mechanism of the Transformer module;
  • Step S580 through residual connection and normalization steps, and then through nonlinear transformation processing to obtain a third feature sequence, wherein the third feature sequence is composed of a plurality of third feature data;
  • Step S590 Multiplying the third characteristic sequence by a plurality of corresponding second preset matrices to obtain a fourth characteristic sequence composed of a plurality of fourth characteristic data, wherein the fourth characteristic data is represented by a one-dimensional vector;
  • Step S5100 Restore the fourth feature data into a feature block represented by a two-dimensional vector
  • Step S5110 splicing a plurality of feature blocks into a first output image according to the second position information
  • Step S5120 Perform detail enhancement processing on the first output image to obtain a second output image.
  • the embodiment of the second aspect of the present application provides an image processing device, including a division module 110 and a Transformer module 130.
  • the division module 110 is configured to obtain the image to be processed obtained after the original image is decoded and processed, and obtain The coding unit division information of the original image when encoding, the coding unit division information includes the first position information and the first size information of each coding unit, and the image to be processed is divided according to the first position information and the first size information, and multiple A feature block corresponding to the coding unit;
  • the Transformer module 130 is configured to establish a connection between multiple feature blocks through a self-attention mechanism to obtain a first output image corresponding to the original image.
  • the function of the division module 110 is to obtain the coding unit division information of the original image during encoding.
  • the coding unit division information includes the first position information and the first size information of each coding unit, according to the coding unit division.
  • the information divides the image to be processed into multiple feature blocks to make full use of the local coding information, so that the divided feature blocks correspond to the coding units, and then use the self-attention mechanism of the Transformer module 130 to establish the connection between the feature blocks, namely Establish global information, and through the interaction of local information and global information, the difference between adjacent feature blocks can be better removed, making the transition between blocks smoother, so as to better enhance the picture quality of the image processed by encoding and decoding. quality.
  • the image to be processed is divided according to the first position information and the first size information to obtain a plurality of feature blocks corresponding to the coding unit, including:
  • the image to be processed is divided into a plurality of characteristic blocks according to the first position information and the first size information, so that the positions and sizes of the characteristic blocks are the same as the coding units divided into the original image during encoding.
  • a linear mapping module 120 is also included, and the linear mapping module 120 is configured to flatten a plurality of feature blocks into a plurality of first feature data to obtain a first feature sequence And input to the Transformer module 130, wherein the first feature data is represented by a one-dimensional vector.
  • a reconstruction module 140 is also included, and the reconstruction module 140 is configured to perform detail enhancement processing on the first output image to obtain a second output image.
  • the division module 110 divides the input image to be processed according to the coding unit division information to obtain a plurality of feature blocks CU 1 , CU 2 , ..., CU n , and then through the linear mapping module 120 each CU 1 , CU 2 , ..., CU n are flattened into a corresponding one-dimensional data sequence [CU f1 , CU f2 , ..., CU fn ], and the above-mentioned one-dimensional data sequence [CU f1 , CU f2 , ...
  • CU fn that is, the first feature sequence is input into the Transformer module 130, the correlation between different one-dimensional data is learned through the self-attention mechanism of the Transformer module 130, and the first output image corresponding to the original image is obtained, and then reconstructed Module 140 performs detail enhancement processing on the first output image to obtain a second output image.
  • Transformer module 130 comprises embedding layer (Embedding) 131, a plurality of encoding blocks (Encoder) 132 and stitching layer (Jigsaw Puzzle) 133, a plurality of encoding blocks 132 They are stacked on top of each other, and N represents the number of stacks.
  • the encoding block 132 includes sequentially adjacent self-attention mechanism layer (Self-attention), summation and normalization layer (Add&Norm), feed-forward network layer (Feed-forward) and summation and normalization layer.
  • the embedding layer 131 is used to obtain a second feature sequence composed of a plurality of second feature data of the same length according to the first feature data and the first preset matrix; the self-attention mechanism layer is used to establish a plurality of second feature data. The correlation among them; the output data of the self-attention mechanism layer is sequentially processed through the summation and normalization layer, and the third feature sequence is obtained through the nonlinear transformation of the feedforward network layer, wherein the third feature sequence is composed of multiple third feature data, and then input the third feature sequence into the summation and normalization layer for processing, and finally input the output of the coding block 132 to the splicing layer 133; the splicing layer 133 is used to base the third feature data and the second preset matrix , to obtain a fourth feature sequence composed of a plurality of fourth feature data, wherein the fourth feature data is represented by a one-dimensional vector, and the fourth feature data is restored into a feature block represented by a two-dimensional vector, according to the multiple feature blocks A first output
  • establishing the correlation between multiple second feature data includes: obtaining the second position information and second size information of each feature block when divided, and according to the second position information and the second size Information, the self-attention mechanism of the Transformer module 130 establishes the correlation between multiple second feature data.
  • the second position information and the second size information correspond to the first position information and the first size information respectively, and by obtaining the second position information and the second size information of each feature block, the space of the feature block can be clearly defined.
  • obtaining the first output image according to the multiple feature blocks includes: stitching the multiple feature blocks into the first output image according to the second position information.
  • the above image processing device can be deployed in the image processing device, and the image processing device can be a mobile terminal such as a smart phone, a tablet computer, a camera, or a device capable of processing image data such as a desktop computer, a robot, or a server.
  • a mobile terminal such as a smart phone, a tablet computer, a camera, or a device capable of processing image data such as a desktop computer, a robot, or a server.
  • the embodiment of the third aspect of the present application provides a model training method, the model includes a Transformer module, the model training method includes but not limited to step S610, step S620, step S630, step S640 and step S650:
  • Step S610 Obtain an image to be processed, which is a training sample in the constructed training set, wherein the image to be processed is obtained by decoding the original image;
  • Step S620 Obtain coding unit division information of the original image during encoding, where the coding unit division information includes first position information and first size information of each coding unit;
  • Step S630 Input the image to be processed and the coding unit division information into the model, divide the image to be processed according to the first position information and the first size information, and obtain a plurality of feature blocks corresponding to the coding units;
  • Step S640 establish the connection between multiple feature blocks through the self-attention mechanism of the Transformer module, and obtain the first output image corresponding to the original image;
  • Step S650 Train the model according to the first output image, the original image and the objective function to obtain a trained model.
  • the coding unit division information includes the first coding unit of each coding unit First position information and first size information, divide the image to be processed into multiple feature blocks according to the coding unit division information, that is, combine the coding unit division information to extract training blocks, so as to make full use of local coding information, so that the divided feature blocks are consistent with the coding
  • the units correspond, and then use the self-attention mechanism of the Transformer module to establish the connection between the feature blocks, that is, establish the global information, so as to obtain the first output image of the training sample, and train the model according to the first output image and the objective function , to obtain the trained model, through the interaction of local information and global information, the difference between adjacent feature blocks can be better removed, making the transition between blocks smoother, so that the trained model can better enhance the image picture quality.
  • the convergence of the objective function curve is continuously trained so that the first output image output by the model is as close as possible to the target image, and the ability of the model to generate the target image is continuously improved.
  • corresponding training sets and objective functions can be designed to train the model, so as to obtain models suitable for different image enhancement tasks, for example, based on low-resolution image samples and corresponding high-resolution
  • a training set composed of high-resolution image samples is used to train the model to obtain an image enhancement model that can be applied to super-resolution image enhancement tasks, or based on a training set composed of blurred image samples and corresponding clear image samples, the model is Training, an image augmentation model that can be applied to the image augmentation task of deblurring can be obtained.
  • the trained model can be deployed on training devices, for example, on mobile terminals such as smartphones, laptops, and cameras, or on devices capable of processing image data such as desktop computers, robots, and servers.
  • step S630 the image to be processed is divided according to the first position information and the first size information to obtain a plurality of feature blocks corresponding to the coding units, including:
  • the image to be processed is divided into a plurality of characteristic blocks according to the first position information and the first size information, so that the positions and sizes of the characteristic blocks are the same as the coding units divided into the original image during encoding.
  • the image to be processed is divided according to the first position information and the first size information, and the local encoding information is effectively used to obtain a plurality of feature blocks corresponding to the coding units divided by the original image during encoding.
  • the positions of the blocks and each coding unit are in one-to-one correspondence and the size is consistent, that is, the divided feature blocks CU 1 , CU 2 , ..., CU n are consistent with the encoding, and the correlation between these feature blocks is passed through Transformer
  • the middle self-attention mechanism is established, which contains rich global information.
  • step S650 the model is trained according to the first output image and the objective function, and the trained model is obtained, including:
  • the model is trained according to the second output image and the objective function to obtain a trained model.
  • the model training method includes but is not limited to the following steps:
  • Step S710 Obtain an image to be processed, which is a training sample in the constructed training set, wherein the image to be processed is obtained by decoding the original image;
  • Step S720 Obtain coding unit division information of the original image during encoding, where the coding unit division information includes first position information and first size information of each coding unit;
  • Step S720 Input the image to be processed and the coding unit division information into the model, divide the image to be processed according to the first position information and the first size information, and obtain a plurality of feature blocks corresponding to the coding units;
  • Step S740 Flatten multiple feature blocks into multiple first feature data to obtain a first feature sequence, wherein the first feature data is represented by a one-dimensional vector, and input the first feature sequence to the Transformer module;
  • Step S750 establishing a connection between multiple feature blocks through the self-attention mechanism of the Transformer module to obtain a first output image corresponding to the original image;
  • Step S760 performing detail enhancement processing on the first output image to obtain a second output image
  • Step S770 Train the model according to the second output image and the objective function to obtain a trained model.
  • step S640 the connection between multiple feature blocks is established through the self-attention mechanism of the Transformer module, including the following steps:
  • a second feature sequence consisting of a plurality of second feature data with the same length is obtained;
  • a fourth feature sequence composed of a plurality of fourth feature data is obtained, wherein the fourth feature data is represented by a one-dimensional vector;
  • a first output image is obtained according to the plurality of feature blocks.
  • step S640 through the self-attention mechanism of the Transformer module to establish the specific implementation of the connection between multiple feature blocks and the corresponding technical effects, you can refer to the specific implementation corresponding to Figure 3 in the above-mentioned image processing method methods and corresponding technical effects.
  • the model training method includes but is not limited to the following steps:
  • Step S810 Obtain an image to be processed, which is a training sample in the constructed training set, wherein the image to be processed is obtained by decoding the original image;
  • Step S820 Obtain coding unit division information of the original image during encoding, where the coding unit division information includes first position information and first size information of each coding unit;
  • Step S830 Input the image to be processed and the coding unit division information into the model, divide the image to be processed according to the first position information and the first size information, and obtain a plurality of feature blocks corresponding to the coding units;
  • Step S840 Flatten multiple feature blocks into multiple first feature data to obtain a first feature sequence, wherein the first feature data is represented by a one-dimensional vector, and input the first feature sequence to the Transformer module;
  • Step S850 According to the first feature data and the first preset matrix, obtain a second feature sequence composed of a plurality of second feature data with the same length;
  • Step S860 Establish the correlation between multiple second feature data through the self-attention mechanism of the Transformer module, and obtain a third feature sequence through residual connection and transformation processing, wherein the third feature sequence is composed of multiple third feature data composition;
  • Step S870 According to the third feature data and the second preset matrix, a fourth feature sequence composed of a plurality of fourth feature data is obtained, wherein the fourth feature data is represented by a one-dimensional vector;
  • Step S880 Restore the fourth feature data into a feature block represented by a two-dimensional vector
  • Step S890 Obtain the first output image according to the plurality of feature blocks
  • Step S8100 performing detail enhancement processing on the first output image to obtain a second output image
  • Step S8110 Train the model according to the second output image and the objective function to obtain a trained model.
  • step S860 the self-attention mechanism of the Transformer module is used to establish the correlation between a plurality of second feature data, including the following steps:
  • the self-attention mechanism of the Transformer module is used to establish the correlation between multiple second feature data.
  • step S890 the first output image is obtained according to a plurality of feature blocks, including:
  • the plurality of feature blocks are stitched into a first output image according to the second position information.
  • the second position information and the second size information correspond to the first position information and the first size information respectively, and by obtaining the second position information and the second size information of each feature block, the space of the feature block can be clearly defined.
  • the position representation of the feature blocks in the two-dimensional space can be enhanced, which is beneficial to greatly improve the image processing efficiency.
  • Step S910 Determine whether the trained model meets the standard according to the preset standard, and obtain the test result
  • Step S920 If the test result meets the standard, save the parameters of the model and complete the training;
  • Step S930 If the test result does not meet the standard, continue to train the model.
  • the preset standard can be used to determine whether the trained model is up to standard. Effective reference data can be provided according to the test results, and whether the model is up to standard can be judged based on network performance.
  • the preset standard can be subjective quality or objective indicators, such as objective indicators Peak Signal to Noise Ratio (PSNR), Structural Similarity (SSIM) and other indicators can be used. If the test result does not meet the standard, continue training. If the test result meets the standard, save the trained model parameters. Image quality enhancement can be done directly through this model.
  • PSNR Peak Signal to Noise Ratio
  • SSIM Structural Similarity
  • the embodiment of the fourth aspect of the present application provides an image processing device, which includes: a memory 1210, a control processor 1220, and a computer program stored in the memory 1210 and operable on the control processor 1220 .
  • control processor 1220 and the memory 1210 may be connected through a bus or in other ways.
  • the non-transitory software programs and instructions required to realize the image processing method of the above-mentioned embodiment are stored in the memory 1210, and when executed by the control processor 1220, the image processing method in the above-mentioned embodiment is executed, for example, the above-described diagram is executed.
  • Method step S110 to method step S140 in 1, method step S210 and method step S220 in Fig. 2, method step S310 to method step S350 in Fig. 3, method step S410 and method step S420 in Fig. 4, method step S420 in Fig. 5 The method step S510 to the method step S5120.
  • the embodiment of the fifth aspect of the present application provides a training device, which includes: a memory 1310, a control processor 1320, and a computer program stored on the memory 1310 and operable on the control processor 1320 .
  • control processor 1320 and the memory 1310 may be connected through a bus or in other ways.
  • the non-transitory software programs and instructions required to realize the model training method of the above-mentioned embodiment are stored in the memory 1310, and when executed by the control processor 1320, the model training method in the above-mentioned embodiment is executed, for example, the above-described diagram is executed.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • the embodiment of the sixth aspect of the present application provides a computer-readable storage medium, the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions can be used to make the computer execute the image processing method of the above first aspect or The model training method of the third aspect above, for example, execute method step S110 to method step S140 in Fig. 1 described above, method step S210 and method step S220 in Fig. 2 , method step S310 to method step S350 in Fig. 3 , method step S410 and method step S420 in FIG. 4, method step S510 to method step S5120 in FIG. 5, or perform method steps S610 to method step S650 in FIG. Method step S770, method step S810 to method step S8110 in FIG. 10 , method step S910 to method step S930 in FIG. 11 .
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage devices, or can Any other medium used to store desired information and which can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

An image processing method, an image processing apparatus, and a model training method. The image processing method comprises: obtaining an image to be processed, said image being obtained by decoding an original image (S110); obtaining encoding unit division information of the original image during encoding, the encoding unit division information comprising first position information and first size information of each encoding unit (S120); dividing said image according to the first position information and the first size information to obtain a plurality of feature blocks corresponding to the encoding unit (S130); and establishing a relationship between the plurality of feature blocks by means of a self-attention mechanism of a Transformer module to obtain a first output image corresponding to the original image (S140).

Description

图像处理方法、图像处理装置、模型训练方法Image processing method, image processing device, model training method
相关申请的交叉引用Cross References to Related Applications
本申请基于申请号为202111144470.1、申请日为2021年09月28日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application is based on a Chinese patent application with application number 202111144470.1 and a filing date of September 28, 2021, and claims the priority of this Chinese patent application. The entire content of this Chinese patent application is hereby incorporated by reference into this application.
技术领域technical field
本申请涉及图像处理技术领域,尤其涉及一种图像处理方法、图像处理装置、模型训练方法。The present application relates to the technical field of image processing, and in particular to an image processing method, an image processing device, and a model training method.
背景技术Background technique
随着技术的不断发展,人们对于图像画质的需求也越来越高,若数据量过大,在网络宽带或存储空间等因素的影响下,容易导致传输困难或存储困难,例如未经压缩的数字视频的数据量巨大,因此在数据传输或存储过程中需要对原始数据进行编码压缩,以去除空间、时间维度的冗余,通过传输系统将压缩的数据从编码端传输至解码端,经过解码能够还原原始数据。为了更好地提高编码质量,图像通常采用基于编码单元(Coding Unit,CU)的四叉树块分区结构,以最小率失真代价的方式划分出最佳编码单元,对每个编码单元分别进行编码,即采用基于块的编码方式进行编码,而随着码率的降低,量化变得粗糙,在块的边界会出现不连续,形成重建图像的明显缺陷,即产生块效应,由于相邻块之间存在明显差异的想象,使得原始视频在编解码后会产生失真,导致用户体验不佳。With the continuous development of technology, people's demand for image quality is getting higher and higher. If the amount of data is too large, under the influence of factors such as network bandwidth or storage space, it is easy to cause difficulties in transmission or storage, such as uncompressed The data volume of digital video is huge, so the original data needs to be encoded and compressed in the process of data transmission or storage to remove the redundancy of space and time dimensions, and the compressed data is transmitted from the encoding end to the decoding end through the transmission system. Decoding can restore the original data. In order to better improve the coding quality, the image usually adopts a quadtree block partition structure based on the coding unit (Coding Unit, CU), divides the best coding unit with the minimum rate distortion cost, and codes each coding unit separately , that is to use a block-based coding method for coding, and as the bit rate decreases, the quantization becomes rough, and there will be discontinuities at the boundaries of the blocks, forming obvious defects in the reconstructed image, that is, block effects, due to the difference between adjacent blocks There are obvious differences between the imagination, so that the original video will be distorted after encoding and decoding, resulting in poor user experience.
在一些情况下所采用的图像画质增强算法能够对编解码的结果进行优化,例如直方图均衡化或伽马校正,但是上述算法主要是通过人为总结经验和人眼特性去进行图像增强,很大程度上受到图像场景的约束,限制了画质提升的能力,另外也有利用深度学习中的卷积神经网络进行画质增强,卷积在提取特征时是通过局部感受野进行提取,在一定程度上忽视了块与块的相关性,图像画质依旧难以保证。In some cases, the image quality enhancement algorithm used can optimize the codec results, such as histogram equalization or gamma correction, but the above-mentioned algorithm mainly performs image enhancement by artificially summarizing experience and human eye characteristics, which is very difficult. To a large extent, it is constrained by the image scene, which limits the ability to improve the image quality. In addition, the convolutional neural network in deep learning is also used to enhance the image quality. When the convolution extracts features, it is extracted through the local receptive field. To a certain extent On the contrary, the correlation between blocks is ignored, and the image quality is still difficult to guarantee.
发明内容Contents of the invention
有鉴于此,本申请提出一种图像处理方法、图像处理装置、模型训练方法、训练设备及计算机可读存储介质。In view of this, the present application proposes an image processing method, an image processing device, a model training method, a training device, and a computer-readable storage medium.
第一方面,本申请实施例提供一种图像处理方法,所述方法包括:获取待处理图像,所述待处理图像由原始图像经解码处理后得到;获取所述原始图像在编码时的编码单元划分信息,所述编码单元划分信息包括各个编码单元的第一位置信息和第一大小信息;根据所述第一位置信息和所述第一大小信息将所述待处理图像进行划分,得到多个与所述编码单元对应的特征块;通过Transformer模块的自注意力机制建立多个所述特征块之间的联系,得到与所述原始图像对应的第一输出图像。In the first aspect, an embodiment of the present application provides an image processing method, the method comprising: acquiring an image to be processed, the image to be processed is obtained by decoding an original image; acquiring a coding unit of the original image when encoding Splitting information, the coding unit splitting information includes first position information and first size information of each coding unit; divide the image to be processed according to the first position information and the first size information to obtain multiple A feature block corresponding to the encoding unit; a connection between multiple feature blocks is established through the self-attention mechanism of the Transformer module to obtain a first output image corresponding to the original image.
第二方面,本申请实施例提供一种图像处理装置,包括划分模块和Transformer模块,所述划分模块被设置为获取由原始图像经解码处理后得到的待处理图像,以及获取所述原始图像在编码时的编码单元划分信息,所述编码单元划分信息包括各个编码单元的第一位置信息和第一大小信息,并根据所述第一位置信息和所述第一大小信息将所述待处理图像进行划分,得到多个与所述编码单元对应的特征块;所述Transformer模块被设置为通过自注意力机制建立多个所述特征块之间的联系,得到与所述原始图像对应的第一输出图像。In the second aspect, the embodiment of the present application provides an image processing device, including a division module and a Transformer module, the division module is configured to obtain an image to be processed obtained after the original image is decoded and processed, and obtain the original image in Coding unit division information during encoding, the coding unit division information includes first position information and first size information of each coding unit, and according to the first position information and the first size information, the image to be processed is performing division to obtain a plurality of feature blocks corresponding to the coding unit; the Transformer module is configured to establish a connection between a plurality of feature blocks through a self-attention mechanism, and obtain the first feature block corresponding to the original image output image.
第三方面,本申请实施例提供一种模型训练方法,所述模型包括Transformer模块,所述方法包括:获取待处理图像,所述待处理图像为构建的训练集中的训练样本,其中,所述待处理图像由原始图像经解码处理后得到;获取所述原始图像在编码时的编码单元划分信息,所述编码单元划分信息包括各个编码单元的第一位置信息和第一大小信息;将所述待处理图像和所述编码单元划分信息输入所述模型中,根据所述第一位置信息和所述第一大小信息将所述待处理图像进行划分,得到多个与所述编码单元对应的特征块;通过所述Transformer模块的自注 意力机制建立多个所述特征块之间的联系,得到与所述原始图像对应的第一输出图像;根据所述第一输出图像和目标函数对所述模型进行训练,得到训练后的模型。In a third aspect, an embodiment of the present application provides a model training method, the model includes a Transformer module, and the method includes: acquiring an image to be processed, the image to be processed is a training sample in a constructed training set, wherein the The image to be processed is obtained by decoding the original image; obtaining coding unit division information of the original image during encoding, the coding unit division information including first position information and first size information of each coding unit; The image to be processed and the coding unit division information are input into the model, and the image to be processed is divided according to the first position information and the first size information to obtain a plurality of features corresponding to the coding unit block; through the self-attention mechanism of the Transformer module, a connection between a plurality of the feature blocks is established to obtain a first output image corresponding to the original image; according to the first output image and the objective function, the The model is trained to obtain the trained model.
第四方面,本申请实施例提供一种图像处理装置,包括至少一个控制处理器和用于与所述至少一个控制处理器通信连接的存储器;所述存储器存储有可被所述至少一个控制处理器执行的指令,所述指令被所述至少一个控制处理器执行,以使所述至少一个控制处理器能够执行如上第一方面所述的图像处理方法。In a fourth aspect, an embodiment of the present application provides an image processing device, including at least one control processor and a memory for communicating with the at least one control processor; the memory stores information that can be processed by the at least one control processor. Instructions executed by a device, the instructions are executed by the at least one control processor, so that the at least one control processor can execute the image processing method as described in the first aspect above.
第五方面,本申请实施例提供一种训练设备,包括至少一个控制处理器和用于与所述至少一个控制处理器通信连接的存储器;所述存储器存储有可被所述至少一个控制处理器执行的指令,所述指令被所述至少一个控制处理器执行,以使所述至少一个控制处理器能够执行如上第三方面所述的模型训练方法。In a fifth aspect, the embodiment of the present application provides a training device, including at least one control processor and a memory for communicating with the at least one control processor; the memory stores information that can be controlled by the at least one control processor Executable instructions, the instructions are executed by the at least one control processor, so that the at least one control processor can execute the model training method as described in the third aspect above.
第六方面,本申请实施例提供一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于使计算机执行如上第一方面所述的图像处理方法或者如上第三方面所述的模型训练方法。In the sixth aspect, the embodiment of the present application provides a computer-readable storage medium, which stores computer-executable instructions, and the computer-executable instructions are used to make a computer execute the image processing method described in the above first aspect or the above third aspect The described model training method.
本申请的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本申请而了解。本申请的目的和其他优点可通过在说明书、权利要求书以及附图中所特别指出的结构来实现和获得。Additional features and advantages of the application will be set forth in the description which follows, and, in part, will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
附图说明Description of drawings
附图用来提供对本申请技术方案的进一步理解,并且构成说明书的一部分,与本申请的实施例一起用于解释本申请的技术方案,并不构成对本申请技术方案的限制。The accompanying drawings are used to provide a further understanding of the technical solution of the present application, and constitute a part of the specification, and are used together with the embodiments of the present application to explain the technical solution of the present application, and do not constitute a limitation to the technical solution of the present application.
下面结合附图和实施例对本申请进一步地说明;Below in conjunction with accompanying drawing and embodiment the application is further described;
图1是本申请一个实施例提供的图像处理方法的步骤流程图;Fig. 1 is a flowchart of steps of an image processing method provided by an embodiment of the present application;
图2是本申请另一个实施例提供的图像处理方法的步骤流程图;FIG. 2 is a flow chart of the steps of an image processing method provided in another embodiment of the present application;
图3是本申请另一个实施例提供的图像处理方法的步骤流程图;Fig. 3 is a flow chart of steps of an image processing method provided by another embodiment of the present application;
图4是本申请另一个实施例提供的图像处理方法的步骤流程图;FIG. 4 is a flow chart of the steps of an image processing method provided in another embodiment of the present application;
图5是本申请另一个实施例提供的图像处理方法的步骤流程图;FIG. 5 is a flow chart of the steps of an image processing method provided by another embodiment of the present application;
图6是本申请另一个实施例提供的图像处理装置的结构示意图;Fig. 6 is a schematic structural diagram of an image processing device provided by another embodiment of the present application;
图7是本申请另一个实施例提供的Transformer模块的结构示意图;Fig. 7 is a schematic structural diagram of a Transformer module provided by another embodiment of the present application;
图8是本申请另一个实施例提供的模型训练方法的步骤流程图;Fig. 8 is a flow chart of the steps of the model training method provided by another embodiment of the present application;
图9是本申请另一个实施例提供的模型训练方法的步骤流程图;Fig. 9 is a flow chart of the steps of the model training method provided by another embodiment of the present application;
图10是本申请另一个实施例提供的模型训练方法的步骤流程图;Fig. 10 is a flow chart of the steps of the model training method provided by another embodiment of the present application;
图11是本申请另一个实施例提供的模型训练方法的步骤流程图;Fig. 11 is a flow chart of the steps of the model training method provided by another embodiment of the present application;
图12是本申请另一个实施例提供的图像处理装置的结构示意图;Fig. 12 is a schematic structural diagram of an image processing device provided by another embodiment of the present application;
图13是本申请另一个实施例提供的训练设备的结构示意图。Fig. 13 is a schematic structural diagram of a training device provided by another embodiment of the present application.
具体实施方式Detailed ways
本部分将详细描述本申请的实施例,本申请之若干实施例在附图中示出,附图的作用在于用图形补充说明书文字部分的描述,使人能够直观地、形象地理解本申请的每个技术特征和整体技术方案,但其不能理解为对本申请保护范围的限制。This part will describe the embodiments of the application in detail. Several embodiments of the application are shown in the accompanying drawings. Each technical feature and the overall technical solution, but they should not be construed as limiting the protection scope of this application.
在本申请的描述中,如果有描述到第一、第二只是用于区分技术特征为目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量或者隐含指明所指示的技术特征的先后关系。需要说明的是,虽然在装置示意图中进行了功能模块划分,在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于装置中的模块划分,或流程图中的顺序执行所示出或描述的步骤。In the description of this application, if the first and second are described only for the purpose of distinguishing technical features, it cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features or implicitly indicating The sequence of the indicated technical features. It should be noted that although the functional modules are divided in the schematic diagram of the device, and the logical sequence is shown in the flowchart, in some cases, it can be executed in a different order than the module division in the device or the flowchart in the flowchart. steps shown or described.
本申请的描述中,除非另有明确的限定,设置、安装、连接等词语应做广义理解,所属技术领域技术人员可以结合技术方案的具体内容合理确定上述词语在本申请中的具体含义。In the description of this application, unless otherwise clearly defined, words such as setting, installation, and connection should be understood in a broad sense, and those skilled in the art can reasonably determine the specific meanings of the above words in this application in combination with the specific content of the technical solution.
下面结合附图,对本申请实施例作进一步阐述。The embodiments of the present application will be further described below in conjunction with the accompanying drawings.
参照图1,本申请的第一方面实施例提供一种图像处理方法,包括但不限于步骤S110、步 骤S120、步骤S130和步骤S140:Referring to Fig. 1, the first aspect embodiment of the present application provides an image processing method, including but not limited to step S110, step S120, step S130 and step S140:
步骤S110:获取待处理图像,待处理图像由原始图像经解码处理后得到;Step S110: Acquire the image to be processed, which is obtained from the original image after decoding;
需要说明的是,为了提高图像数据的传输效率和存储的可靠性,原始图像一般需经过编码和解码处理,由于编解码会对原始图像造成一定的耗损,从而影响画质,给用户带来不好的体验感,因此需对解码处理后得到的图像进行画质增强,画质就是画面质量,其跟图像的有损程度相关,图像在编码过程中如果损失了图像信息,则其画面质量也相应的降低。可以理解的是,待处理图像可以为文本图像或视频图像。It should be noted that in order to improve the transmission efficiency and storage reliability of image data, the original image generally needs to be encoded and decoded. Since encoding and decoding will cause certain loss to the original image, it will affect the image quality and bring inconvenience to the user. A good sense of experience, so it is necessary to enhance the picture quality of the image obtained after decoding. The picture quality is the picture quality, which is related to the degree of loss of the picture. Corresponding reduction. It can be understood that the image to be processed may be a text image or a video image.
步骤S120:获取原始图像在编码时的编码单元划分信息,编码单元划分信息包括各个编码单元的第一位置信息和第一大小信息;Step S120: Obtain coding unit division information of the original image during encoding, where the coding unit division information includes first position information and first size information of each coding unit;
需要说明的是,为了更好地提高编码质量,采用基于编码单元的四叉树块分区结构,以最小率失真代价的方式划分出最佳编码单元,对每个编码单元分别进行编码,这样能灵活地适应各种图像的纹理特征,显著提高编码效率。划分出的编码单元支持不同的大小,这样划分的好处是,一方面尺寸较大的编码单元可以使得平缓区域的编码效率大大提高,另一方面尺寸较小的编码单元能够很好地处理图像局部的细节,从而可以使复杂图像的预测更加准确。通过获取原始图像在编码时的编码单元划分信息,能够得到每个编码单元的第一位置信息和第一大小信息,从而能够清楚了解编码单元的空间信息。It should be noted that in order to better improve the coding quality, the quadtree block partition structure based on the coding unit is adopted to divide the optimal coding unit with the minimum rate-distortion cost, and encode each coding unit separately. It flexibly adapts to the texture features of various images and significantly improves the coding efficiency. The divided coding units support different sizes. The advantage of this division is that, on the one hand, a larger coding unit can greatly improve the coding efficiency of a flat area, and on the other hand, a smaller coding unit can handle image localities well. details, which can make the prediction of complex images more accurate. By acquiring the coding unit division information of the original image during encoding, the first position information and the first size information of each coding unit can be obtained, so that the spatial information of the coding unit can be clearly understood.
步骤S130:根据第一位置信息和第一大小信息将待处理图像进行划分,得到多个与编码单元对应的特征块;Step S130: Divide the image to be processed according to the first position information and the first size information to obtain a plurality of feature blocks corresponding to the coding units;
由于原始图像经过编解码之后,会产生块效应,从而影响输出图像的画质,在一些情况下采用卷积神经网络进行画质增强,卷积在提取特征时通过局部感受野进行提取,这种方式在一定程度上忽视了块与块之间的相关性,卷积神经网络一般基于固定特征块的方式进行画质增强,导致最终增强优化后的图像不如预期。而本申请实施例通过结合编码单元划分信息提取待处理图像的局部特征,即依据第一位置信息和第一大小信息提取局部特征,从而得到多个与编码单元对应的特征块,可以理解的是,特征块的划分方式依据编码单元的划分信息确定,从而使得各个特征块与编码单元对应,结合位置和大小信息划分所得到的特征块之间的相关性更强。Since the original image is coded and decoded, it will produce block effects, which will affect the quality of the output image. In some cases, the convolutional neural network is used to enhance the quality of the image. When the convolution extracts features, it is extracted through the local receptive field. The method ignores the correlation between blocks to a certain extent, and the convolutional neural network generally enhances the image quality based on fixed feature blocks, resulting in the final enhanced and optimized image not as expected. However, in the embodiment of the present application, the local features of the image to be processed are extracted by combining the coding unit division information, that is, the local features are extracted according to the first position information and the first size information, so as to obtain a plurality of feature blocks corresponding to the coding units. It can be understood that , the division method of the feature block is determined according to the division information of the coding unit, so that each feature block corresponds to the coding unit, and the correlation between the feature blocks obtained by combining position and size information is stronger.
步骤S140:通过Transformer模块的自注意力机制建立多个特征块之间的联系,得到与原始图像对应的第一输出图像。Step S140: establish the connection between multiple feature blocks through the self-attention mechanism of the Transformer module, and obtain a first output image corresponding to the original image.
需要说明的是,在自然语言处理任务中的Transformer模块中的自注意力机制可以有效克服卷积归纳偏差所带来的局限性,更多地考虑到语言全局信息。因此,为了对非局部成分进行学习和推理,本申请实施例通过Transformer模块建立多个特征块之间的联系,由于特征块依据编码单元划分信息进行划分,通过Transformer模块的自注意力机制能够获取编码码中编码单元的远距离依赖关系,并学习不同特征块之间的相关性以建立全局信息,使得建立后的全局信息能够更加符合编码时的规则,从而大大降低相邻块之间的差异性,使得块间过度更加平滑。It should be noted that the self-attention mechanism in the Transformer module in natural language processing tasks can effectively overcome the limitations brought about by the convolutional inductive bias, and take more into account the global information of the language. Therefore, in order to learn and infer non-local components, the embodiment of the present application establishes the connection between multiple feature blocks through the Transformer module. Since the feature blocks are divided according to the coding unit division information, the self-attention mechanism of the Transformer module can obtain The long-distance dependence of the coding unit in the coding code, and learning the correlation between different feature blocks to establish global information, so that the established global information can be more in line with the rules of coding, thereby greatly reducing the difference between adjacent blocks , making the transition between blocks smoother.
根据本申请实施例提供的方案,通过获取原始图像在编码时的编码单元划分信息,编码单元划分信息包括各个编码单元的第一位置信息和第一大小信息,依据编码单元划分信息将待处理图像划分成多个特征块,以充分利用局部编码信息,使得划分后的特征块与编码单元相对应,再利用Transformer模块的自注意力机制建立特征块之间的联系,即建立起全局信息,通过局部信息和全局信息的交互,可以更好地去除相邻特征块之间的差异性,使得块间过度更加平滑,从而更好地增强经编解码处理后的图像的画质。According to the solution provided by the embodiment of the present application, by obtaining the coding unit division information of the original image during encoding, the coding unit division information includes the first position information and the first size information of each coding unit, and the image to be processed is divided according to the coding unit division information Divide into multiple feature blocks to make full use of the local coding information, so that the divided feature blocks correspond to the coding units, and then use the self-attention mechanism of the Transformer module to establish the connection between the feature blocks, that is, to establish global information, through The interaction between local information and global information can better remove the difference between adjacent feature blocks, make the transition between blocks smoother, and thus better enhance the image quality of the encoded and decoded image.
在上述的图像处理方法中,步骤S130中根据第一位置信息和第一大小信息将待处理图像进行划分,得到多个与编码单元对应的特征块,包括:In the above image processing method, in step S130, the image to be processed is divided according to the first position information and the first size information to obtain a plurality of feature blocks corresponding to the coding units, including:
将待处理图像按照第一位置信息和第一大小信息划分成多个特征块,以使特征块与原始图像在编码时划分的编码单元的位置和大小相同。The image to be processed is divided into a plurality of characteristic blocks according to the first position information and the first size information, so that the positions and sizes of the characteristic blocks are the same as the coding units divided into the original image during encoding.
需要说明的是,编码单元划分信息包括各个编码单元的第一位置信息和第一大小信息,将待处理图像按照第一位置信息和第一大小信息进行划分,有效地利用了局部编码信息,得到多个与原始图像在编码时划分的编码单元对应的特征块,各个特征块与各个编码单元的位置一一对应且大小保持一致,即使得划分后的特征块CU 1、CU 2、...、CU n与编码时保持一致,这些特征块之间的相关性通过Transformer中自注意力机制建立起来,从而包含了丰富的全局信息。 It should be noted that the coding unit division information includes the first position information and the first size information of each coding unit, the image to be processed is divided according to the first position information and the first size information, and the local coding information is effectively used to obtain A plurality of feature blocks corresponding to the coding units divided by the original image during encoding, each feature block corresponds to each coding unit position one by one and the size is consistent, that is, the divided feature blocks CU 1 , CU 2 ,  … , CU n are consistent with the encoding, and the correlation between these feature blocks is established through the self-attention mechanism in Transformer, which contains rich global information.
如图2所示,在上述的图像处理方法中,步骤S140中通过Transformer模块的自注意力机制建立多个特征块之间的联系之前,还包括但不限于步骤S210和步骤S220:As shown in FIG. 2, in the above-mentioned image processing method, before the connection between multiple feature blocks is established through the self-attention mechanism of the Transformer module in step S140, it also includes but is not limited to step S210 and step S220:
步骤S210:将多个特征块压平成多个第一特征数据,得到第一特征序列,其中,第一特征数据以一维向量表示;Step S210: Flatten multiple feature blocks into multiple first feature data to obtain a first feature sequence, wherein the first feature data is represented by a one-dimensional vector;
步骤S220:将第一特征序列输入至Transformer模块。Step S220: Input the first feature sequence to the Transformer module.
需要说明的是,划分后的特征块为二维向量数据,需要将多个特征块压平成以一维向量表示的第一特征数据,并由多个第一特征数据组成第一特征序列,便于将上述的第一特征序列输入至Transformer模块中进行图像增强处理,通过Transformer的自注意力机制学习不同一维数据间的相关性,从而建立起全局信息。具体地,将每个CU 1、CU 2、...、CU n压平成对应的CU f1、CU f2、...、CU fn,以得到一维数据序列[CU f1、CU f2、...、CU fn],由于特征块划分的大小可能不一样,因此第一特征数据的长度也可能不一致。 It should be noted that the divided feature blocks are two-dimensional vector data, and multiple feature blocks need to be flattened into first feature data represented by one-dimensional vectors, and the first feature sequence is composed of multiple first feature data, which is convenient The above-mentioned first feature sequence is input into the Transformer module for image enhancement processing, and the correlation between different one-dimensional data is learned through Transformer's self-attention mechanism, thereby establishing global information. Specifically, each CU 1 , CU 2 , ..., CU n is flattened into corresponding CU f1 , CU f2 , ..., CU fn to obtain a one-dimensional data sequence [CU f1 , CU f2 , .. ., CU fn ], since the size of the feature block division may be different, the length of the first feature data may also be inconsistent.
如图3所示,在上述的图像处理方法中,步骤S140中通过Transformer模块的自注意力机制建立多个特征块之间的联系,包括但不限于步骤S310、步骤S320、步骤S330、步骤S340和步骤S350:As shown in Figure 3, in the above image processing method, in step S140, the self-attention mechanism of the Transformer module establishes the connection between multiple feature blocks, including but not limited to step S310, step S320, step S330, step S340 and step S350:
步骤S310:根据第一特征数据和第一预设矩阵,得到由多个长度相同的第二特征数据组成的第二特征序列;Step S310: According to the first feature data and the first preset matrix, obtain a second feature sequence composed of a plurality of second feature data with the same length;
步骤S320:通过Transformer模块的自注意力机制建立多个第二特征数据之间的相关性,通过残差连接和变换处理得到第三特征序列,其中,第三特征序列由多个第三特征数据组成;Step S320: Establish the correlation between multiple second feature data through the self-attention mechanism of the Transformer module, and obtain a third feature sequence through residual connection and transformation processing, wherein the third feature sequence is composed of multiple third feature data composition;
步骤S330:根据第三特征数据和第二预设矩阵,得到由多个第四特征数据组成的第四特征序列,其中,第四特征数据以一维向量表示;Step S330: According to the third feature data and the second preset matrix, a fourth feature sequence composed of a plurality of fourth feature data is obtained, wherein the fourth feature data is represented by a one-dimensional vector;
步骤S340:将第四特征数据还原成以二维向量表示的特征块;Step S340: Restore the fourth feature data into a feature block represented by a two-dimensional vector;
步骤S350:根据多个特征块得到第一输出图像。Step S350: Obtain the first output image according to the plurality of feature blocks.
需要说明的是,将第一特征序列[CU f1、CU f2、...、CU fn]输入至Transformer模块,由于第一特征数据的长度可能不一致,首先将第一特征序列的多个第一特征数据均转换成相同的长度,第一特征数据以行矩阵的形式表示,通过多个第一特征数据与对应的第一预设矩阵相乘,第一预设矩阵的行数与第一特征数据的列数相同,第一预设矩阵的列数为预设长度,使得计算得到多个长度相同的第二特征数据,以组成第二特征序列。具体地,第一预设矩阵是一系列len f1×d model、len f2×d model、...、len fn×d model的矩阵,其中len f1、len f2、...、len fn一一对应CU f1、CU f2、...、CU fn的长度,即第一预设矩阵的行数与第一特征数据的列数相同,而d model为预设长度,d model可以根据实际需求设定,本申请实施例取d model=1024,这样将第一特征序列[CU f1、CU f2、...、CU fn]中的第一特征数据分别与对应的第一预设矩阵相乘,即可统一第二特征序列[CU em_1、CU em_2、...、CU em_n]中第二特征数据的长度。然后利用Transformer模块的自注意力机制进行不同第二特征数据间的信息交互,从而获取到全局信息,通过残差连接和归一化步骤,再通过非线性变换得到输出第三特征序列[CU en_1、CU en_2、...、CU en_n],需要说明的是,n为划分的特征块的数量,其依据不同的编码对象按照最优生成。 It should be noted that, when the first feature sequence [CU f1 , CU f2 , ..., CU fn ] is input to the Transformer module, since the length of the first feature data may be inconsistent, the multiple first The feature data are all converted into the same length, the first feature data is expressed in the form of a row matrix, and multiple first feature data are multiplied by the corresponding first preset matrix, the number of rows of the first preset matrix is equal to the first feature The number of columns of the data is the same, and the number of columns of the first preset matrix is a preset length, so that a plurality of second feature data of the same length are calculated to form a second feature sequence. Specifically, the first preset matrix is a series of matrices of len f1 ×d model , len f2 ×d model , ..., len fn ×d model , where len f1 , len f2 , ..., len fn are Corresponding to the length of CU f1 , CU f2 , ..., CU fn , that is, the number of rows of the first preset matrix is the same as the number of columns of the first characteristic data, and d model is a preset length, and d model can be set according to actual needs It is determined that d model = 1024 is set in the embodiment of the present application, so that the first feature data in the first feature sequence [CU f1 , CU f2 , ..., CU fn ] are multiplied by the corresponding first preset matrix respectively, That is, the lengths of the second feature data in the second feature sequence [CU em_1 , CU em_2 , . . . , CU em_n ] can be unified. Then use the self-attention mechanism of the Transformer module to perform information interaction between different second feature data, so as to obtain global information, through residual connection and normalization steps, and then through nonlinear transformation to obtain the output third feature sequence [CU en_1 , CU en_2 , . . . , CU en_n ], it should be noted that n is the number of divided feature blocks, which are optimally generated according to different coding objects.
通过将第三特征序列还原成各自特征块原始大小,首先将第三特征序列的多个第三特征数据转换成原始长度,通过多个第三特征数据与对应的第二预设矩阵相乘,第二预设阵列是一系列d model×len f1、d model×len f2、...、d model×len fn的矩阵,从而计算得到多个长度不一致第四特征数据,还原成原始长度,并组成一维数据序列[CU p1、CU p2、...、CU pn],即第四特征序列。然后依据编码单元大小2n×2n,其中n=4、8、16或32的特性,还原成二维向量表示的特征块,即还原成原始大小。最后将多个特征块拼成完整的图像,这样得到的第一输出图像FM p就与原始图像保持一致大小。 By restoring the third feature sequence to the original size of each feature block, first converting the multiple third feature data of the third feature sequence into the original length, multiplying the multiple third feature data with the corresponding second preset matrix, The second preset array is a series of matrices of d model ×len f1, d model ×len f2 ,..., d model ×len fn , so as to obtain a plurality of fourth feature data with inconsistent lengths, restore them to the original length, and A one-dimensional data sequence [CU p1 , CU p2 , . . . , CU pn ] is formed, that is, the fourth feature sequence. Then, according to the characteristics of the coding unit size 2n×2n, where n=4, 8, 16 or 32, the feature block represented by the two-dimensional vector is restored, that is, the original size is restored. Finally, multiple feature blocks are combined into a complete image, so that the obtained first output image FM p has the same size as the original image.
如图4所示,在上述的图像处理方法中,步骤S320中通过Transformer模块的自注意力机制建立多个第二特征数据之间的相关性,包括但不限于步骤S410和步骤S420:As shown in Figure 4, in the above-mentioned image processing method, in step S320, the self-attention mechanism of the Transformer module is used to establish the correlation between multiple second feature data, including but not limited to step S410 and step S420:
步骤S410:获取各个特征块在划分时的第二位置信息和第二大小信息;Step S410: Obtain the second position information and second size information of each feature block during division;
步骤S420:根据第二位置信息和第二大小信息,通过Transformer模块的自注意力机制建立多个第二特征数据之间的相关性。Step S420: According to the second position information and the second size information, establish the correlation between multiple second feature data through the self-attention mechanism of the Transformer module.
需要说明的是,第二位置信息、第二大小信息分别与第一位置信息、第一大小信息相对应, 通过获取各个特征块的第二位置信息和第二大小信息,能够清楚特征块的空间信息,通过Transformer模块的自注意力机制建立多个第二特征数据之间的相关性,结合第二位置信息和第二大小信息,便于相邻特征块之间的信息交互。It should be noted that the second position information and the second size information correspond to the first position information and the first size information respectively, and by obtaining the second position information and the second size information of each feature block, the space of the feature block can be clearly Information, through the self-attention mechanism of the Transformer module, the correlation between multiple second feature data is established, and the second position information and the second size information are combined to facilitate the information interaction between adjacent feature blocks.
步骤S350中根据多个特征块得到第一输出图像,包括:In step S350, the first output image is obtained according to a plurality of feature blocks, including:
根据第二位置信息将多个特征块拼接成第一输出图像。The plurality of feature blocks are stitched into a first output image according to the second position information.
通过获取第二位置信息,并根据第二位置信息对特征块进行拼接,能够加强特征块在二维空间的位置表示,有利于大大提高图像的处理效率。By acquiring the second position information and splicing the feature blocks according to the second position information, the position representation of the feature blocks in the two-dimensional space can be enhanced, which is beneficial to greatly improve the image processing efficiency.
在上述的图像处理方法中,还包括以下步骤:In the above-mentioned image processing method, the following steps are also included:
对第一输出图像进行细节增强处理,得到第二输出图像。Perform detail enhancement processing on the first output image to obtain a second output image.
通过Resblock卷积网络结构重建图像细节部分,从而增强图像中的有用信息,具体地,可采用ResNet50结构对第一输出图像进行细节增强处理,提升图像质量,有利于改善图像的视觉效果。需要说明的是,还可以采用其它卷积结构,本申请实施例不作具体限制。The details of the image are reconstructed through the Resblock convolutional network structure, thereby enhancing the useful information in the image. Specifically, the ResNet50 structure can be used to enhance the details of the first output image, improve the image quality, and help improve the visual effect of the image. It should be noted that other convolution structures may also be used, which are not specifically limited in this embodiment of the present application.
如图5所示,以下将用一个具体实施例对本申请的技术方案进行描述,待处理图像为视频图像,该图像处理方法包括但不限于以下步骤:As shown in Figure 5, a specific embodiment will be used to describe the technical solution of the present application, the image to be processed is a video image, and the image processing method includes but is not limited to the following steps:
步骤S510:获取由原始视频图像经解码处理后得到的待处理图像;Step S510: Obtain the image to be processed obtained after decoding the original video image;
步骤S520:获取原始图像在编码时的编码单元划分信息,编码单元划分信息包括各个编码单元的第一位置信息和第一大小信息;Step S520: Obtain coding unit division information of the original image during encoding, the coding unit division information includes first position information and first size information of each coding unit;
步骤S530:根据第一位置信息和第一大小信息将待处理图像进行划分,得到多个与编码单元对应的特征块;Step S530: Divide the image to be processed according to the first position information and the first size information to obtain a plurality of feature blocks corresponding to the coding units;
步骤S540:将多个特征块压平成对应的第一特征数据,得到第一特征序列,其中,第一特征数据以一维向量表示,将第一特征序列输入至Transformer模块;Step S540: Flatten multiple feature blocks into corresponding first feature data to obtain a first feature sequence, wherein the first feature data is represented by a one-dimensional vector, and input the first feature sequence to the Transformer module;
步骤S550:第一特征序列与多个对应的第一预设矩阵相乘,得到由多个长度相同的第二特征数据组成的第二特征序列;Step S550: multiplying the first feature sequence by a plurality of corresponding first preset matrices to obtain a second feature sequence composed of a plurality of second feature data of the same length;
步骤S560:获取各个特征块在划分时的第二位置信息和第二大小信息;Step S560: Obtain the second position information and second size information of each feature block during division;
步骤S570:根据第二位置信息和第二大小信息,通过Transformer模块的自注意力机制建立多个第二特征数据之间的相关性;Step S570: according to the second position information and the second size information, establish the correlation between multiple second feature data through the self-attention mechanism of the Transformer module;
步骤S580:通过残差连接和归一化步骤,再通过非线性变换处理得到第三特征序列,其中,第三特征序列由多个第三特征数据组成;Step S580: through residual connection and normalization steps, and then through nonlinear transformation processing to obtain a third feature sequence, wherein the third feature sequence is composed of a plurality of third feature data;
步骤S590:第三特征序列与多个对应的第二预设矩阵相乘,得到由多个第四特征数据组成的第四特征序列,其中,第四特征数据以一维向量表示;Step S590: Multiplying the third characteristic sequence by a plurality of corresponding second preset matrices to obtain a fourth characteristic sequence composed of a plurality of fourth characteristic data, wherein the fourth characteristic data is represented by a one-dimensional vector;
步骤S5100:将第四特征数据还原成以二维向量表示的特征块;Step S5100: Restore the fourth feature data into a feature block represented by a two-dimensional vector;
步骤S5110:根据第二位置信息将多个特征块拼接成第一输出图像;Step S5110: splicing a plurality of feature blocks into a first output image according to the second position information;
步骤S5120:对第一输出图像进行细节增强处理,得到第二输出图像。Step S5120: Perform detail enhancement processing on the first output image to obtain a second output image.
参照图6,本申请的第二方面实施例提供一种图像处理装置,包括划分模块110和Transformer模块130,划分模块110被设置为获取由原始图像经解码处理后得到的待处理图像,以及获取原始图像在编码时的编码单元划分信息,编码单元划分信息包括各个编码单元的第一位置信息和第一大小信息,并根据第一位置信息和第一大小信息将待处理图像进行划分,得到多个与编码单元对应的特征块;Transformer模块130被设置为通过自注意力机制建立多个特征块之间的联系,得到与原始图像对应的第一输出图像。Referring to FIG. 6 , the embodiment of the second aspect of the present application provides an image processing device, including a division module 110 and a Transformer module 130. The division module 110 is configured to obtain the image to be processed obtained after the original image is decoded and processed, and obtain The coding unit division information of the original image when encoding, the coding unit division information includes the first position information and the first size information of each coding unit, and the image to be processed is divided according to the first position information and the first size information, and multiple A feature block corresponding to the coding unit; the Transformer module 130 is configured to establish a connection between multiple feature blocks through a self-attention mechanism to obtain a first output image corresponding to the original image.
根据本申请实施例提供的方案,划分模块110的作用是获取原始图像在编码时的编码单元划分信息,编码单元划分信息包括各个编码单元的第一位置信息和第一大小信息,依据编码单元划分信息将待处理图像划分成多个特征块,以充分利用局部编码信息,使得划分后的特征块与编码单元相对应,再利用Transformer模块130的自注意力机制建立特征块之间的联系,即建立起全局信息,通过局部信息和全局信息的交互,可以更好地去除相邻特征块之间的差异性,使得块间过度更加平滑,从而更好地增强经编解码处理后的图像的画质。According to the solution provided by the embodiment of the present application, the function of the division module 110 is to obtain the coding unit division information of the original image during encoding. The coding unit division information includes the first position information and the first size information of each coding unit, according to the coding unit division The information divides the image to be processed into multiple feature blocks to make full use of the local coding information, so that the divided feature blocks correspond to the coding units, and then use the self-attention mechanism of the Transformer module 130 to establish the connection between the feature blocks, namely Establish global information, and through the interaction of local information and global information, the difference between adjacent feature blocks can be better removed, making the transition between blocks smoother, so as to better enhance the picture quality of the image processed by encoding and decoding. quality.
需要说明的是,本申请实施例的图像处理装置的具体实施方式及对应的技术效果,可对应参照上述图像处理方法的具体实施方式及对应的技术效果。It should be noted that for the specific implementation manners and corresponding technical effects of the image processing device in the embodiment of the present application, reference may be made to the specific implementation manners and corresponding technical effects of the above-mentioned image processing method.
在上述的图像处理装置中,根据第一位置信息和第一大小信息将待处理图像进行划分,得 到多个与编码单元对应的特征块,包括:In the above image processing device, the image to be processed is divided according to the first position information and the first size information to obtain a plurality of feature blocks corresponding to the coding unit, including:
将待处理图像按照第一位置信息和第一大小信息划分成多个特征块,以使特征块与原始图像在编码时划分的编码单元的位置和大小相同。The image to be processed is divided into a plurality of characteristic blocks according to the first position information and the first size information, so that the positions and sizes of the characteristic blocks are the same as the coding units divided into the original image during encoding.
如图6和图7所示,在上述的图像处理装置中,还包括线性映射模块120,线性映射模块120被设置为将多个特征块压平成多个第一特征数据,得到第一特征序列并输入至Transformer模块130,其中,第一特征数据以一维向量表示。As shown in Fig. 6 and Fig. 7, in the above-mentioned image processing device, a linear mapping module 120 is also included, and the linear mapping module 120 is configured to flatten a plurality of feature blocks into a plurality of first feature data to obtain a first feature sequence And input to the Transformer module 130, wherein the first feature data is represented by a one-dimensional vector.
在上述的图像处理装置中,还包括重建模块140,重建模块140被设置为对第一输出图像进行细节增强处理,得到第二输出图像。In the above image processing device, a reconstruction module 140 is also included, and the reconstruction module 140 is configured to perform detail enhancement processing on the first output image to obtain a second output image.
示例性的,划分模块110将输入的待处理图像按照编码单元划分信息进行划分,得到多个特征块CU 1、CU 2、...、CU n,然后经过线性映射模块120将每个CU 1、CU 2、...、CU n压平成对应的一维数据序列[CU f1、CU f2、...、CU fn],将上述的一维数据序列[CU f1、CU f2、...、CU fn],即第一特征序列输入至Transformer模块130中,通过Transformer模块130的自注意力机制学习不同一维数据间的相关性,得到与原始图像对应的第一输出图像,再通过重建模块140对第一输出图像进行细节增强处理,得到第二输出图像。 Exemplarily, the division module 110 divides the input image to be processed according to the coding unit division information to obtain a plurality of feature blocks CU 1 , CU 2 , ..., CU n , and then through the linear mapping module 120 each CU 1 , CU 2 , ..., CU n are flattened into a corresponding one-dimensional data sequence [CU f1 , CU f2 , ..., CU fn ], and the above-mentioned one-dimensional data sequence [CU f1 , CU f2 , ... , CU fn ], that is, the first feature sequence is input into the Transformer module 130, the correlation between different one-dimensional data is learned through the self-attention mechanism of the Transformer module 130, and the first output image corresponding to the original image is obtained, and then reconstructed Module 140 performs detail enhancement processing on the first output image to obtain a second output image.
如图6和图7所示,在上述的图像处理装置中,Transformer模块130包括嵌入层(Embedding)131、多个编码块(Encoder)132和拼接层(Jigsaw Puzzle)133,多个编码块132相互堆叠而成,N表示堆叠的数量。编码块132包括依次相邻的自注意力机制层(Self-attention)、加和与归一化层(Add&Norm)、前馈网络层(Feed-forward)和加和与归一化层。As shown in Fig. 6 and Fig. 7, in the above-mentioned image processing device, Transformer module 130 comprises embedding layer (Embedding) 131, a plurality of encoding blocks (Encoder) 132 and stitching layer (Jigsaw Puzzle) 133, a plurality of encoding blocks 132 They are stacked on top of each other, and N represents the number of stacks. The encoding block 132 includes sequentially adjacent self-attention mechanism layer (Self-attention), summation and normalization layer (Add&Norm), feed-forward network layer (Feed-forward) and summation and normalization layer.
嵌入层131用于根据第一特征数据和第一预设矩阵,得到由多个长度相同的第二特征数据组成的第二特征序列;自注意力机制层用于建立多个第二特征数据之间的相关性;自注意力机制层的输出数据依次通过加和与归一化层处理、通过前馈网络层进行非线性变换得到第三特征序列,其中,第三特征序列由多个第三特征数据组成,再将第三特征序列输入加和与归一化层进行处理,最后将编码块132的输出输入至拼接层133;拼接层133用于根据第三特征数据和第二预设矩阵,得到由多个第四特征数据组成的第四特征序列,其中,第四特征数据以一维向量表示,并将第四特征数据还原成以二维向量表示的特征块,根据多个特征块得到第一输出图像FM pThe embedding layer 131 is used to obtain a second feature sequence composed of a plurality of second feature data of the same length according to the first feature data and the first preset matrix; the self-attention mechanism layer is used to establish a plurality of second feature data. The correlation among them; the output data of the self-attention mechanism layer is sequentially processed through the summation and normalization layer, and the third feature sequence is obtained through the nonlinear transformation of the feedforward network layer, wherein the third feature sequence is composed of multiple third feature data, and then input the third feature sequence into the summation and normalization layer for processing, and finally input the output of the coding block 132 to the splicing layer 133; the splicing layer 133 is used to base the third feature data and the second preset matrix , to obtain a fourth feature sequence composed of a plurality of fourth feature data, wherein the fourth feature data is represented by a one-dimensional vector, and the fourth feature data is restored into a feature block represented by a two-dimensional vector, according to the multiple feature blocks A first output image FM p is obtained.
在上述的Transformer模块130中,建立多个第二特征数据之间的相关性包括:获取各个特征块在划分时的第二位置信息和第二大小信息,并根据第二位置信息和第二大小信息,通过Transformer模块130的自注意力机制建立多个第二特征数据之间的相关性。In the aforementioned Transformer module 130, establishing the correlation between multiple second feature data includes: obtaining the second position information and second size information of each feature block when divided, and according to the second position information and the second size Information, the self-attention mechanism of the Transformer module 130 establishes the correlation between multiple second feature data.
需要说明的是,第二位置信息、第二大小信息分别与第一位置信息、第一大小信息相对应,通过获取各个特征块的第二位置信息和第二大小信息,能够清楚特征块的空间信息,通过Transformer模块130的自注意力机制建立多个第二特征数据之间的相关性,结合第二位置信息和第二大小信息,便于相邻特征块之间的信息交互。It should be noted that the second position information and the second size information correspond to the first position information and the first size information respectively, and by obtaining the second position information and the second size information of each feature block, the space of the feature block can be clearly defined. Information, through the self-attention mechanism of the Transformer module 130, the correlation between multiple second feature data is established, combined with the second position information and the second size information, to facilitate information interaction between adjacent feature blocks.
在上述的Transformer模块130中,根据多个特征块得到第一输出图像,包括:根据第二位置信息将多个特征块拼接成第一输出图像。通过获取第二位置信息,并根据第二位置信息对特征块进行拼接,能够加强特征块在二维空间的位置表示,有利于大大提高图像的处理效率。In the aforementioned Transformer module 130, obtaining the first output image according to the multiple feature blocks includes: stitching the multiple feature blocks into the first output image according to the second position information. By acquiring the second position information and splicing the feature blocks according to the second position information, the position representation of the feature blocks in the two-dimensional space can be enhanced, which is beneficial to greatly improve the image processing efficiency.
示例性的,N=8,将第一特征序列[CU f1、CU f2、...、CU fn]输入至Transformer模块130,首先经过可学习的操作嵌入层131将长度都转换成d model长度,例如取d model=1024,嵌入层131是一系列len f1×d model、len f2×d model、...、len fn×d model的矩阵,其中len f1、len f2、...、len fn一一对应CU f1、CU f2、...、CU fn的长度,第一预设矩阵的行数与第一特征数据的列数相同,这样将第一特征序列中的第一特征数据CU f1、CU f2、...、CU fn分别与对应的第一预设矩阵相乘,即可统一第二特征序列[CU em_1、CU em_2、...、CU em_n]中第二特征数据的长度,将第二特征序列结合第二位置信息和第二大小信息输入到后续的编码块132,通过自注意力机制层进行不同第二特征数据间的信息交互,输出数据经加和与归一化层处理,并通过前馈网络层进行非线性变换得到第三特征序列[CU en_1、CU en_2、...、CU en_n],再通过加和与归一化层处理数据,最后将编码块132的输出输入至拼接层133,拼接层133将第三特征序列还原成各自特征块原始大小,即与一系列d model×len f1、d model×len f2、...、d model×len fn的第二预设矩阵相乘,从而计算得到多个长度不一致第四特征数据,以组成一维数据序列[CU p1、CU p2、...、CU pn],即第四特征序列,并根据编码单元大 小2n×2n,其中n=4、8、16或32的特性,还原成二维向量表示的特征块,最后将多个特征块拼成完整的图像,这样得到的第一输出图像FM p就与原始图像保持一致大小。 Exemplarily, N=8, input the first feature sequence [CU f1 , CU f2 , ..., CU fn ] to the Transformer module 130, first convert the lengths into d model lengths through the learnable operation embedding layer 131 , such as taking d model = 1024, the embedding layer 131 is a series of matrices of len f1 × d model , len f2 × d model , ..., len fn × d model , wherein len f1 , len f2 , ..., len fn corresponds to the length of CU f1 , CU f2 , ..., CU fn one by one, the number of rows of the first preset matrix is the same as the number of columns of the first feature data, so the first feature data CU in the first feature sequence Multiplying f1 , CU f2 , ..., CU fn with the corresponding first preset matrix respectively can unify the second feature data in the second feature sequence [CU em_1 , CU em_2 , ..., CU em_n ] Length, the second feature sequence combined with the second position information and the second size information is input to the subsequent encoding block 132, and the information interaction between different second feature data is carried out through the self-attention mechanism layer, and the output data is summed and normalized The third feature sequence [CU en_1 , CU en_2 , ..., CU en_n ] is obtained by nonlinear transformation through the feed-forward network layer, and then the data is processed through the summation and normalization layer, and finally the coding block The output of 132 is input to the splicing layer 133, and the splicing layer 133 restores the third feature sequence to the original size of each feature block, that is, with a series of d model ×len f1, d model ×len f2 , ..., d model ×len fn The second preset matrix is multiplied to obtain a plurality of fourth feature data with inconsistent lengths to form a one-dimensional data sequence [CU p1 , CU p2 , ..., CU pn ], that is, the fourth feature sequence, and according to The size of the coding unit is 2n×2n, where n=4, 8, 16 or 32, and the characteristics of n=4, 8, 16 or 32 are restored to the feature blocks represented by the two-dimensional vector, and finally the multiple feature blocks are combined into a complete image, and the first output image obtained in this way is FM p remains the same size as the original image.
需要说明的是,上述图像处理装置可以部署在图像处理装置中,图像处理装置可以是智能手机、平板电脑、摄像机等移动终端,还可以是台式电脑、机器人、服务器等能够处理图像数据的设备。It should be noted that the above image processing device can be deployed in the image processing device, and the image processing device can be a mobile terminal such as a smart phone, a tablet computer, a camera, or a device capable of processing image data such as a desktop computer, a robot, or a server.
参照图8,本申请的第三方面实施例提供一种模型训练方法,模型包括Transformer模块,该模型训练方法包括但不限于步骤S610、步骤S620、步骤S630、步骤S640和步骤S650:Referring to FIG. 8 , the embodiment of the third aspect of the present application provides a model training method, the model includes a Transformer module, the model training method includes but not limited to step S610, step S620, step S630, step S640 and step S650:
步骤S610:获取待处理图像,待处理图像为构建的训练集中的训练样本,其中,待处理图像由原始图像经解码处理后得到;Step S610: Obtain an image to be processed, which is a training sample in the constructed training set, wherein the image to be processed is obtained by decoding the original image;
步骤S620:获取原始图像在编码时的编码单元划分信息,编码单元划分信息包括各个编码单元的第一位置信息和第一大小信息;Step S620: Obtain coding unit division information of the original image during encoding, where the coding unit division information includes first position information and first size information of each coding unit;
步骤S630:将待处理图像和编码单元划分信息输入模型中,根据第一位置信息和第一大小信息将待处理图像进行划分,得到多个与编码单元对应的特征块;Step S630: Input the image to be processed and the coding unit division information into the model, divide the image to be processed according to the first position information and the first size information, and obtain a plurality of feature blocks corresponding to the coding units;
步骤S640:通过Transformer模块的自注意力机制建立多个特征块之间的联系,得到与原始图像对应的第一输出图像;Step S640: establish the connection between multiple feature blocks through the self-attention mechanism of the Transformer module, and obtain the first output image corresponding to the original image;
步骤S650:根据第一输出图像、原始图像和目标函数对模型进行训练,得到训练后的模型。Step S650: Train the model according to the first output image, the original image and the objective function to obtain a trained model.
根据本申请实施例提供的方案,通过获取待处理图像,待处理图像为构建的训练集中的训练样本,并获取原始图像在编码时的编码单元划分信息,编码单元划分信息包括各个编码单元的第一位置信息和第一大小信息,依据编码单元划分信息将待处理图像划分成多个特征块,即结合编码单元划分信息提取训练块,以充分利用局部编码信息,使得划分后的特征块与编码单元相对应,再利用Transformer模块的自注意力机制建立特征块之间的联系,即建立起全局信息,从而得到训练样本的第一输出图像,并根据第一输出图像和目标函数对模型进行训练,得到训练后的模型,通过局部信息和全局信息的交互,可以更好地去除相邻特征块之间的差异性,使得块间过度更加平滑,使得训练后的模型能够更好地增强图像的画质。According to the solution provided by the embodiment of the present application, by acquiring the image to be processed, which is the training sample in the constructed training set, and obtaining the coding unit division information of the original image during encoding, the coding unit division information includes the first coding unit of each coding unit First position information and first size information, divide the image to be processed into multiple feature blocks according to the coding unit division information, that is, combine the coding unit division information to extract training blocks, so as to make full use of local coding information, so that the divided feature blocks are consistent with the coding The units correspond, and then use the self-attention mechanism of the Transformer module to establish the connection between the feature blocks, that is, establish the global information, so as to obtain the first output image of the training sample, and train the model according to the first output image and the objective function , to obtain the trained model, through the interaction of local information and global information, the difference between adjacent feature blocks can be better removed, making the transition between blocks smoother, so that the trained model can better enhance the image picture quality.
需要说明的是,目标函数根据以下公式得到:It should be noted that the objective function is obtained according to the following formula:
loss=||I recon-I GT|| 1,其中,I recon是第一输出图像,I GT是Ground Truth图像,即标注的目标图像,|| || 1表示计算L1范数。 loss=||I recon -I GT || 1 , where I recon is the first output image, I GT is the Ground Truth image, that is, the marked target image, and || || 1 means to calculate the L1 norm.
在训练模型的过程中,通过不断训练目标函数曲线收敛,以使得模型输出的第一输出图像尽可能靠近目标图像,不断提高模型生成目标图像的能力。In the process of training the model, the convergence of the objective function curve is continuously trained so that the first output image output by the model is as close as possible to the target image, and the ability of the model to generate the target image is continuously improved.
需要说明的是,可以针对不同种类的图像增强任务,设计对应的训练集和目标函数来训练模型,从而得到适用不同图像增强任务的模型,例如,基于由低分辨率图像样本和对应的高分辨率图像样本构成的训练集,对模型进行训练,可以得到能够应用于超分辨率的图像增强任务的图像增强模型,或者基于由模糊图像样本和对应的清晰图像样本构成的训练集,对模型进行训练,可以得到能够应用于去模糊的图像增强任务的图像增强模型。It should be noted that for different types of image enhancement tasks, corresponding training sets and objective functions can be designed to train the model, so as to obtain models suitable for different image enhancement tasks, for example, based on low-resolution image samples and corresponding high-resolution A training set composed of high-resolution image samples is used to train the model to obtain an image enhancement model that can be applied to super-resolution image enhancement tasks, or based on a training set composed of blurred image samples and corresponding clear image samples, the model is Training, an image augmentation model that can be applied to the image augmentation task of deblurring can be obtained.
需要说明的是,训练后的模型可部署在训练设备上,例如,部署在智能手机、笔记本电脑、摄像机等移动终端,或者台式电脑、机器人、服务器等能够处理图像数据的设备上。It should be noted that the trained model can be deployed on training devices, for example, on mobile terminals such as smartphones, laptops, and cameras, or on devices capable of processing image data such as desktop computers, robots, and servers.
在上述的模型训练方法中,步骤S630中根据第一位置信息和第一大小信息将待处理图像进行划分,得到多个与编码单元对应的特征块,包括:In the above model training method, in step S630, the image to be processed is divided according to the first position information and the first size information to obtain a plurality of feature blocks corresponding to the coding units, including:
将待处理图像按照第一位置信息和第一大小信息划分成多个特征块,以使特征块与原始图像在编码时划分的编码单元的位置和大小相同。The image to be processed is divided into a plurality of characteristic blocks according to the first position information and the first size information, so that the positions and sizes of the characteristic blocks are the same as the coding units divided into the original image during encoding.
需要说明的是,将待处理图像按照第一位置信息和第一大小信息进行划分,有效地利用了局部编码信息,得到多个与原始图像在编码时划分的编码单元对应的特征块,各个特征块与各个编码单元的位置一一对应且大小保持一致,即使得划分后的特征块CU 1、CU 2、...、CU n与编码时保持一致,这些特征块之间的相关性通过Transformer中自注意力机制建立起来,从而包含了丰富的全局信息。 It should be noted that the image to be processed is divided according to the first position information and the first size information, and the local encoding information is effectively used to obtain a plurality of feature blocks corresponding to the coding units divided by the original image during encoding. The positions of the blocks and each coding unit are in one-to-one correspondence and the size is consistent, that is, the divided feature blocks CU 1 , CU 2 , ..., CU n are consistent with the encoding, and the correlation between these feature blocks is passed through Transformer The middle self-attention mechanism is established, which contains rich global information.
在上述的模型训练方法中,步骤S640中通过Transformer模块的自注意力机制建立多个特征块之间的联系之前,还包括以下步骤:In the above-mentioned model training method, before the connection between multiple feature blocks is established through the self-attention mechanism of the Transformer module in step S640, the following steps are also included:
将多个特征块压平成多个第一特征数据,得到第一特征序列,其中,第一特征数据以一维向量表示;flattening a plurality of feature blocks into a plurality of first feature data to obtain a first feature sequence, wherein the first feature data is represented by a one-dimensional vector;
将第一特征序列输入至Transformer模块。Input the first feature sequence to the Transformer module.
在上述的模型训练方法中,还包括以下步骤:In the above-mentioned model training method, the following steps are also included:
对第一输出图像进行细节增强处理,得到第二输出图像;performing detail enhancement processing on the first output image to obtain a second output image;
在上述的模型训练方法中,步骤S650中根据第一输出图像和目标函数对模型进行训练,得到训练后的模型,包括:In the above-mentioned model training method, in step S650, the model is trained according to the first output image and the objective function, and the trained model is obtained, including:
根据第二输出图像和目标函数对模型进行训练,得到训练后的模型。The model is trained according to the second output image and the objective function to obtain a trained model.
如图9所示,以下将用一个具体实施例对本申请的技术方案进行描述,该模型训练方法包括但不限于以下步骤:As shown in Figure 9, a specific embodiment will be used to describe the technical solution of the present application. The model training method includes but is not limited to the following steps:
步骤S710:获取待处理图像,待处理图像为构建的训练集中的训练样本,其中,待处理图像由原始图像经解码处理后得到;Step S710: Obtain an image to be processed, which is a training sample in the constructed training set, wherein the image to be processed is obtained by decoding the original image;
步骤S720:获取原始图像在编码时的编码单元划分信息,编码单元划分信息包括各个编码单元的第一位置信息和第一大小信息;Step S720: Obtain coding unit division information of the original image during encoding, where the coding unit division information includes first position information and first size information of each coding unit;
步骤S720:将待处理图像和编码单元划分信息输入模型中,根据第一位置信息和第一大小信息将待处理图像进行划分,得到多个与编码单元对应的特征块;Step S720: Input the image to be processed and the coding unit division information into the model, divide the image to be processed according to the first position information and the first size information, and obtain a plurality of feature blocks corresponding to the coding units;
步骤S740:将多个特征块压平成多个第一特征数据,得到第一特征序列,其中,第一特征数据以一维向量表示,并将第一特征序列输入至Transformer模块;Step S740: Flatten multiple feature blocks into multiple first feature data to obtain a first feature sequence, wherein the first feature data is represented by a one-dimensional vector, and input the first feature sequence to the Transformer module;
步骤S750:通过Transformer模块的自注意力机制建立多个特征块之间的联系,得到与原始图像对应的第一输出图像;Step S750: establishing a connection between multiple feature blocks through the self-attention mechanism of the Transformer module to obtain a first output image corresponding to the original image;
步骤S760:对第一输出图像进行细节增强处理,得到第二输出图像;Step S760: performing detail enhancement processing on the first output image to obtain a second output image;
步骤S770:根据第二输出图像和目标函数对模型进行训练,得到训练后的模型。Step S770: Train the model according to the second output image and the objective function to obtain a trained model.
在上述的模型训练方法中,步骤S640中通过Transformer模块的自注意力机制建立多个特征块之间的联系,包括以下步骤:In the above-mentioned model training method, in step S640, the connection between multiple feature blocks is established through the self-attention mechanism of the Transformer module, including the following steps:
根据第一特征数据和第一预设矩阵,得到由多个长度相同的第二特征数据组成的第二特征序列;According to the first feature data and the first preset matrix, a second feature sequence consisting of a plurality of second feature data with the same length is obtained;
通过Transformer模块的自注意力机制建立多个第二特征数据之间的相关性,通过残差连接和变换处理得到第三特征序列,其中,第三特征序列由多个第三特征数据组成;Establishing a correlation between multiple second feature data through the self-attention mechanism of the Transformer module, and obtaining a third feature sequence through residual connection and transformation processing, wherein the third feature sequence is composed of multiple third feature data;
根据第三特征数据和第二预设矩阵,得到由多个第四特征数据组成的第四特征序列,其中,第四特征数据以一维向量表示;According to the third feature data and the second preset matrix, a fourth feature sequence composed of a plurality of fourth feature data is obtained, wherein the fourth feature data is represented by a one-dimensional vector;
将第四特征数据还原成以二维向量表示的特征块;Restoring the fourth feature data into a feature block represented by a two-dimensional vector;
根据多个特征块得到第一输出图像。A first output image is obtained according to the plurality of feature blocks.
需要说明的是,步骤S640中通过Transformer模块的自注意力机制建立多个特征块之间的联系的具体实施方式及对应的技术效果,可对应参照上述图像处理方法中图3所对应的具体实施方式及对应的技术效果。It should be noted that, in step S640, through the self-attention mechanism of the Transformer module to establish the specific implementation of the connection between multiple feature blocks and the corresponding technical effects, you can refer to the specific implementation corresponding to Figure 3 in the above-mentioned image processing method methods and corresponding technical effects.
如图10所示,以下将用一个具体实施例对本申请的技术方案进行描述,该模型训练方法包括但不限于以下步骤:As shown in Figure 10, a specific embodiment will be used to describe the technical solution of the present application. The model training method includes but is not limited to the following steps:
步骤S810:获取待处理图像,待处理图像为构建的训练集中的训练样本,其中,待处理图像由原始图像经解码处理后得到;Step S810: Obtain an image to be processed, which is a training sample in the constructed training set, wherein the image to be processed is obtained by decoding the original image;
步骤S820:获取原始图像在编码时的编码单元划分信息,编码单元划分信息包括各个编码单元的第一位置信息和第一大小信息;Step S820: Obtain coding unit division information of the original image during encoding, where the coding unit division information includes first position information and first size information of each coding unit;
步骤S830:将待处理图像和编码单元划分信息输入模型中,根据第一位置信息和第一大小信息将待处理图像进行划分,得到多个与编码单元对应的特征块;Step S830: Input the image to be processed and the coding unit division information into the model, divide the image to be processed according to the first position information and the first size information, and obtain a plurality of feature blocks corresponding to the coding units;
步骤S840:将多个特征块压平成多个第一特征数据,得到第一特征序列,其中,第一特征数据以一维向量表示,并将第一特征序列输入至Transformer模块;Step S840: Flatten multiple feature blocks into multiple first feature data to obtain a first feature sequence, wherein the first feature data is represented by a one-dimensional vector, and input the first feature sequence to the Transformer module;
步骤S850:根据第一特征数据和第一预设矩阵,得到由多个长度相同的第二特征数据组成的第二特征序列;Step S850: According to the first feature data and the first preset matrix, obtain a second feature sequence composed of a plurality of second feature data with the same length;
步骤S860:通过Transformer模块的自注意力机制建立多个第二特征数据之间的相关性, 通过残差连接和变换处理得到第三特征序列,其中,第三特征序列由多个第三特征数据组成;Step S860: Establish the correlation between multiple second feature data through the self-attention mechanism of the Transformer module, and obtain a third feature sequence through residual connection and transformation processing, wherein the third feature sequence is composed of multiple third feature data composition;
步骤S870:根据第三特征数据和第二预设矩阵,得到由多个第四特征数据组成的第四特征序列,其中,第四特征数据以一维向量表示;Step S870: According to the third feature data and the second preset matrix, a fourth feature sequence composed of a plurality of fourth feature data is obtained, wherein the fourth feature data is represented by a one-dimensional vector;
步骤S880:将第四特征数据还原成以二维向量表示的特征块;Step S880: Restore the fourth feature data into a feature block represented by a two-dimensional vector;
步骤S890:根据多个特征块得到第一输出图像;Step S890: Obtain the first output image according to the plurality of feature blocks;
步骤S8100:对第一输出图像进行细节增强处理,得到第二输出图像;Step S8100: performing detail enhancement processing on the first output image to obtain a second output image;
步骤S8110:根据第二输出图像和目标函数对模型进行训练,得到训练后的模型。Step S8110: Train the model according to the second output image and the objective function to obtain a trained model.
在上述的模型训练方法中,步骤S860中通过Transformer模块的自注意力机制建立多个第二特征数据之间的相关性,包括以下步骤:In the above-mentioned model training method, in step S860, the self-attention mechanism of the Transformer module is used to establish the correlation between a plurality of second feature data, including the following steps:
获取各个特征块在划分时的第二位置信息和第二大小信息;Obtaining second position information and second size information of each feature block during division;
根据第二位置信息和第二大小信息,通过Transformer模块的自注意力机制建立多个第二特征数据之间的相关性。According to the second position information and the second size information, the self-attention mechanism of the Transformer module is used to establish the correlation between multiple second feature data.
步骤S890中根据多个特征块得到第一输出图像,包括:In step S890, the first output image is obtained according to a plurality of feature blocks, including:
根据第二位置信息将多个特征块拼接成第一输出图像。The plurality of feature blocks are stitched into a first output image according to the second position information.
需要说明的是,第二位置信息、第二大小信息分别与第一位置信息、第一大小信息相对应,通过获取各个特征块的第二位置信息和第二大小信息,能够清楚特征块的空间信息,通过Transformer模块的自注意力机制建立多个第二特征数据之间的相关性,结合第二位置信息和第二大小信息,便于相邻特征块之间的信息交互。通过获取第二位置信息,并根据第二位置信息对特征块进行拼接,能够加强特征块在二维空间的位置表示,有利于大大提高图像的处理效率。It should be noted that the second position information and the second size information correspond to the first position information and the first size information respectively, and by obtaining the second position information and the second size information of each feature block, the space of the feature block can be clearly defined. Information, through the self-attention mechanism of the Transformer module, the correlation between multiple second feature data is established, and the second position information and the second size information are combined to facilitate the information interaction between adjacent feature blocks. By acquiring the second position information and splicing the feature blocks according to the second position information, the position representation of the feature blocks in the two-dimensional space can be enhanced, which is beneficial to greatly improve the image processing efficiency.
如图11所示,在上述的模型训练方法中,还包括以下步骤:As shown in Figure 11, in the above-mentioned model training method, the following steps are also included:
步骤S910:根据预设标准判定训练后的模型是否达标,得到测试结果;Step S910: Determine whether the trained model meets the standard according to the preset standard, and obtain the test result;
步骤S920:若测试结果达标,则保存模型的参数并完成训练;Step S920: If the test result meets the standard, save the parameters of the model and complete the training;
步骤S930:若测试结果不达标,则继续对模型进行训练。Step S930: If the test result does not meet the standard, continue to train the model.
需要说明的是,利用预设标准判定训练后的模型是否达标,根据测试结果能够提供有效的参考数据,可以根据网络性能判断模型是否达标,预设标准可以为主观质量或者客观指标,例如客观指标可以采用峰值信噪比(Peak Signal to Noise Ratio,PSNR)、结构相似性(Structural Similarity,SSIM)等指标,如果测试结果不达标则继续训练,如果测试结果达标,则保存训练好的模型参数,可以直接通过此模型进行图像画质的增强。It should be noted that the preset standard can be used to determine whether the trained model is up to standard. Effective reference data can be provided according to the test results, and whether the model is up to standard can be judged based on network performance. The preset standard can be subjective quality or objective indicators, such as objective indicators Peak Signal to Noise Ratio (PSNR), Structural Similarity (SSIM) and other indicators can be used. If the test result does not meet the standard, continue training. If the test result meets the standard, save the trained model parameters. Image quality enhancement can be done directly through this model.
需要说明的是,本申请实施例的模型训练方法的具体实施方式及对应的技术效果,可对应参照上述图像处理方法的具体实施方式及对应的技术效果。It should be noted that for the specific implementation manners and corresponding technical effects of the model training method in the embodiment of the present application, reference may be made to the specific implementation manners and corresponding technical effects of the above-mentioned image processing method.
如图12所示,本申请的第四方面实施例提供一种图像处理装置,该装置包括:存储器1210、控制处理器1220及存储在存储器1210上并可在控制处理器1220上运行的计算机程序。As shown in Figure 12, the embodiment of the fourth aspect of the present application provides an image processing device, which includes: a memory 1210, a control processor 1220, and a computer program stored in the memory 1210 and operable on the control processor 1220 .
控制处理器1220和存储器1210可以通过总线或者其他方式连接。The control processor 1220 and the memory 1210 may be connected through a bus or in other ways.
实现上述实施例的图像处理方法所需的非暂态软件程序以及指令存储在存储器1210中,当被控制处理器1220执行时,执行上述实施例中的图像处理方法,例如,执行以上描述的图1中的方法步骤S110至方法步骤S140、图2中的方法步骤S210和方法步骤S220、图3中的方法步骤S310至方法步骤S350、图4中的方法步骤S410和方法步骤S420、图5中的方法步骤S510至方法步骤S5120。The non-transitory software programs and instructions required to realize the image processing method of the above-mentioned embodiment are stored in the memory 1210, and when executed by the control processor 1220, the image processing method in the above-mentioned embodiment is executed, for example, the above-described diagram is executed. Method step S110 to method step S140 in 1, method step S210 and method step S220 in Fig. 2, method step S310 to method step S350 in Fig. 3, method step S410 and method step S420 in Fig. 4, method step S420 in Fig. 5 The method step S510 to the method step S5120.
如图13所示,本申请的第五方面实施例提供一种训练设备,该训练设备包括:存储器1310、控制处理器1320及存储在存储器1310上并可在控制处理器1320上运行的计算机程序。As shown in Figure 13, the embodiment of the fifth aspect of the present application provides a training device, which includes: a memory 1310, a control processor 1320, and a computer program stored on the memory 1310 and operable on the control processor 1320 .
控制处理器1320和存储器1310可以通过总线或者其他方式连接。The control processor 1320 and the memory 1310 may be connected through a bus or in other ways.
实现上述实施例的模型训练方法所需的非暂态软件程序以及指令存储在存储器1310中,当被控制处理器1320执行时,执行上述实施例中的模型训练方法,例如,执行以上描述的图8中的方法步骤S610至方法步骤S650、图9中的方法步骤S710至方法步骤S770、图10中的方法步骤S810至方法步骤S8110、图11中的方法步骤S910至方法步骤S930。The non-transitory software programs and instructions required to realize the model training method of the above-mentioned embodiment are stored in the memory 1310, and when executed by the control processor 1320, the model training method in the above-mentioned embodiment is executed, for example, the above-described diagram is executed. Method step S610 to method step S650 in 8, method step S710 to method step S770 in FIG. 9 , method step S810 to method step S8110 in FIG. 10 , method step S910 to method step S930 in FIG. 11 .
以上所描述的装置实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
此外,本申请的第六方面实施例提供一种计算机可读存储介质,计算机可读存储介质存储有计算机可执行指令,计算机可执行指令可以用于使计算机执行如上第一方面的图像处理方法或者如上第三方面的模型训练方法,例如,执行以上描述的图1中的方法步骤S110至方法步骤S140、图2中的方法步骤S210和方法步骤S220、图3中的方法步骤S310至方法步骤S350、图4中的方法步骤S410和方法步骤S420、图5中的方法步骤S510至方法步骤S5120,或者执行以上描述的图8中的方法步骤S610至方法步骤S650、图9中的方法步骤S710至方法步骤S770、图10中的方法步骤S810至方法步骤S8110、图11中的方法步骤S910至方法步骤S930。In addition, the embodiment of the sixth aspect of the present application provides a computer-readable storage medium, the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions can be used to make the computer execute the image processing method of the above first aspect or The model training method of the third aspect above, for example, execute method step S110 to method step S140 in Fig. 1 described above, method step S210 and method step S220 in Fig. 2 , method step S310 to method step S350 in Fig. 3 , method step S410 and method step S420 in FIG. 4, method step S510 to method step S5120 in FIG. 5, or perform method steps S610 to method step S650 in FIG. Method step S770, method step S810 to method step S8110 in FIG. 10 , method step S910 to method step S930 in FIG. 11 .
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。Those skilled in the art can understand that all or some of the steps and systems in the methods disclosed above can be implemented as software, firmware, hardware and an appropriate combination thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit . Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As known to those of ordinary skill in the art, the term computer storage media includes both volatile and nonvolatile media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. permanent, removable and non-removable media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage devices, or can Any other medium used to store desired information and which can be accessed by a computer. In addition, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .
以上是对本申请的若干实施进行了具体说明,但本申请并不局限于上述实施方式,熟悉本领域的技术人员在不违背本申请精神的前提下还可作出种种的等同变形或替换,这些等同的变形或替换均包含在本申请权利要求所限定的范围内。The above is a specific description of several implementations of the present application, but the present application is not limited to the above-mentioned embodiments. Those skilled in the art can also make various equivalent deformations or replacements without violating the spirit of the present application. Any modification or substitution is included within the scope defined by the claims of the present application.

Claims (18)

  1. 一种图像处理方法,所述方法包括:An image processing method, the method comprising:
    获取待处理图像,所述待处理图像由原始图像经解码处理后得到;Acquiring an image to be processed, the image to be processed is obtained by decoding the original image;
    获取所述原始图像在编码时的编码单元划分信息,所述编码单元划分信息包括各个编码单元的第一位置信息和第一大小信息;Acquire coding unit division information of the original image during encoding, where the coding unit division information includes first position information and first size information of each coding unit;
    根据所述第一位置信息和所述第一大小信息将所述待处理图像进行划分,得到多个与所述编码单元对应的特征块;Divide the image to be processed according to the first position information and the first size information to obtain a plurality of feature blocks corresponding to the coding unit;
    通过Transformer模块的自注意力机制建立多个所述特征块之间的联系,得到与所述原始图像对应的第一输出图像。The connection between the plurality of feature blocks is established through the self-attention mechanism of the Transformer module to obtain a first output image corresponding to the original image.
  2. 根据权利要求1所述的图像处理方法,其中,所述根据所述第一位置信息和所述第一大小信息将所述待处理图像进行划分,得到多个与所述编码单元对应的特征块,包括:The image processing method according to claim 1, wherein the image to be processed is divided according to the first position information and the first size information to obtain a plurality of feature blocks corresponding to the coding unit ,include:
    将所述待处理图像按照所述第一位置信息和所述第一大小信息划分成多个特征块,以使所述特征块与所述原始图像在编码时划分的编码单元的位置和大小相同。Divide the image to be processed into a plurality of feature blocks according to the first position information and the first size information, so that the feature blocks have the same position and size as the coding unit divided by the original image during encoding .
  3. 根据权利要求1所述的图像处理方法,其中,所述通过Transformer模块的自注意力机制建立多个所述特征块之间的联系之前,还包括:The image processing method according to claim 1, wherein, before the self-attention mechanism of the Transformer module is used to establish the connection between a plurality of the feature blocks, it also includes:
    将多个所述特征块压平成多个第一特征数据,得到第一特征序列,其中,所述第一特征数据以一维向量表示;flattening a plurality of the feature blocks into a plurality of first feature data to obtain a first feature sequence, wherein the first feature data is represented by a one-dimensional vector;
    将所述第一特征序列输入至所述Transformer模块。Inputting the first feature sequence to the Transformer module.
  4. 根据权利要求3所述的图像处理方法,其中,所述通过Transformer模块的自注意力机制建立多个所述特征块之间的联系,包括:The image processing method according to claim 3, wherein said establishing a connection between a plurality of said feature blocks through the self-attention mechanism of the Transformer module comprises:
    根据所述第一特征数据和第一预设矩阵,得到由多个长度相同的第二特征数据组成的第二特征序列;According to the first feature data and the first preset matrix, a second feature sequence consisting of a plurality of second feature data with the same length is obtained;
    通过Transformer模块的自注意力机制建立多个所述第二特征数据之间的相关性,通过残差连接和变换处理得到第三特征序列,其中,所述第三特征序列由多个第三特征数据组成;Through the self-attention mechanism of the Transformer module, the correlation between multiple second feature data is established, and the third feature sequence is obtained through residual connection and transformation processing, wherein the third feature sequence is composed of multiple third features data composition;
    根据所述第三特征数据和第二预设矩阵,得到由多个第四特征数据组成的第四特征序列,其中,所述第四特征数据以一维向量表示;According to the third feature data and the second preset matrix, a fourth feature sequence composed of a plurality of fourth feature data is obtained, wherein the fourth feature data is represented by a one-dimensional vector;
    将所述第四特征数据还原成以二维向量表示的特征块;Restoring the fourth feature data into a feature block represented by a two-dimensional vector;
    根据多个所述特征块得到所述第一输出图像。The first output image is obtained according to the plurality of feature blocks.
  5. 根据权利要求4所述的图像处理方法,其中,所述通过Transformer模块的自注意力机制建立多个所述第二特征数据之间的相关性,包括:The image processing method according to claim 4, wherein the self-attention mechanism of the Transformer module establishes a plurality of correlations between the second feature data, comprising:
    获取各个所述特征块在划分时的第二位置信息和第二大小信息;Obtaining second position information and second size information of each of the feature blocks when they are divided;
    根据所述第二位置信息和所述第二大小信息,通过Transformer模块的自注意力机制建立多个所述第二特征数据之间的相关性;According to the second position information and the second size information, establish a correlation between a plurality of the second feature data through the self-attention mechanism of the Transformer module;
    所述根据多个所述特征块得到所述第一输出图像,包括:The obtaining the first output image according to the plurality of feature blocks includes:
    根据所述第二位置信息将多个所述特征块拼接成所述第一输出图像。Stitching the plurality of feature blocks into the first output image according to the second position information.
  6. 根据权利要求1所述的图像处理方法,还包括:对所述第一输出图像进行细节增强处理,得到第二输出图像。The image processing method according to claim 1, further comprising: performing detail enhancement processing on the first output image to obtain a second output image.
  7. 一种图像处理装置,包括:An image processing device, comprising:
    划分模块,被设置为获取由原始图像经解码处理后得到的待处理图像,以及获取所述原始图像在编码时的编码单元划分信息,所述编码单元划分信息包括各个编码单元的第一位置信息和第一大小信息,并根据所述第一位置信息和所述第一大小信息将所述待处理图像进行划分,得到多个与所述编码单元对应的特征块;A division module, configured to acquire an image to be processed obtained by decoding an original image, and acquire coding unit division information of the original image during encoding, where the coding unit division information includes first position information of each coding unit and first size information, and divide the image to be processed according to the first position information and the first size information to obtain a plurality of feature blocks corresponding to the coding unit;
    Transformer模块,被设置为通过自注意力机制建立多个所述特征块之间的联系,得到与所述原始图像对应的第一输出图像。The Transformer module is configured to establish a connection between the plurality of feature blocks through a self-attention mechanism to obtain a first output image corresponding to the original image.
  8. 根据权利要求7所述的图像处理装置,其中,所述根据所述第一位置信息和所述第一大小信息将所述待处理图像进行划分,得到多个与所述编码单元对应的特征块,包括:The image processing device according to claim 7, wherein the image to be processed is divided according to the first position information and the first size information to obtain a plurality of feature blocks corresponding to the coding unit ,include:
    将所述待处理图像按照所述第一位置信息和所述第一大小信息划分成多个特征块,以使所 述特征块与所述原始图像在编码时划分的编码单元的位置和大小相同。Divide the image to be processed into a plurality of feature blocks according to the first position information and the first size information, so that the feature blocks have the same position and size as the coding unit divided by the original image during encoding .
  9. 根据权利要求7所述的图像处理装置,还包括线性映射模块,所述线性映射模块被设置为将多个所述特征块压平成多个第一特征数据,得到第一特征序列并输入至所述Transformer模块,其中,所述第一特征数据以一维向量表示。The image processing device according to claim 7, further comprising a linear mapping module configured to flatten a plurality of feature blocks into a plurality of first feature data, obtain a first feature sequence and input it to the The Transformer module, wherein the first feature data is represented by a one-dimensional vector.
  10. 根据权利要求7所述的图像处理装置,还包括重建模块,所述重建模块被设置为对所述第一输出图像进行细节增强处理,得到第二输出图像。The image processing device according to claim 7, further comprising a reconstruction module configured to perform detail enhancement processing on the first output image to obtain a second output image.
  11. 一种模型训练方法,其中,所述模型包括Transformer模块,所述方法包括:A method for model training, wherein the model includes a Transformer module, the method comprising:
    获取待处理图像,所述待处理图像为构建的训练集中的训练样本,其中,所述待处理图像由原始图像经解码处理后得到;Obtaining an image to be processed, the image to be processed is a training sample in the constructed training set, wherein the image to be processed is obtained by decoding the original image;
    获取所述原始图像在编码时的编码单元划分信息,所述编码单元划分信息包括各个编码单元的第一位置信息和第一大小信息;Acquire coding unit division information of the original image during encoding, where the coding unit division information includes first position information and first size information of each coding unit;
    将所述待处理图像和所述编码单元划分信息输入所述模型中,根据所述第一位置信息和所述第一大小信息将所述待处理图像进行划分,得到多个与所述编码单元对应的特征块;input the image to be processed and the coding unit division information into the model, divide the image to be processed according to the first position information and the first size information, and obtain multiple The corresponding feature block;
    通过所述Transformer模块的自注意力机制建立多个所述特征块之间的联系,得到与所述原始图像对应的第一输出图像;Establishing a connection between multiple feature blocks through the self-attention mechanism of the Transformer module to obtain a first output image corresponding to the original image;
    根据所述第一输出图像和目标函数对所述模型进行训练,得到训练后的模型。The model is trained according to the first output image and the objective function to obtain a trained model.
  12. 根据权利要求11所述的模型训练方法,其中,所述根据所述第一位置信息和所述第一大小信息将所述待处理图像进行划分,得到多个与所述编码单元对应的特征块,包括:The model training method according to claim 11, wherein the image to be processed is divided according to the first position information and the first size information to obtain a plurality of feature blocks corresponding to the coding units ,include:
    将所述待处理图像按照所述第一位置信息和所述第一大小信息划分成多个特征块,以使所述特征块与所述原始图像在编码时划分的编码单元的位置和大小相同。Divide the image to be processed into a plurality of feature blocks according to the first position information and the first size information, so that the feature blocks have the same position and size as the coding unit divided by the original image during encoding .
  13. 根据权利要求11所述的模型训练方法,其中,所述通过Transformer模块的自注意力机制建立多个所述特征块之间的联系之前,还包括:The model training method according to claim 11, wherein, before the self-attention mechanism of the Transformer module is used to establish the connection between a plurality of the feature blocks, it also includes:
    将多个所述特征块压平成多个第一特征数据,得到第一特征序列,其中,所述第一特征数据以一维向量表示;flattening a plurality of the feature blocks into a plurality of first feature data to obtain a first feature sequence, wherein the first feature data is represented by a one-dimensional vector;
    将所述第一特征序列输入至所述Transformer模块。Inputting the first feature sequence to the Transformer module.
  14. 根据权利要求11所述的模型训练方法,还包括:对所述第一输出图像进行细节增强处理,得到第二输出图像;The model training method according to claim 11, further comprising: performing detail enhancement processing on the first output image to obtain a second output image;
    所述根据所述第一输出图像和目标函数对所述模型进行训练,得到训练后的模型,包括:The training of the model according to the first output image and the objective function to obtain a trained model includes:
    根据所述第二输出图像和目标函数对所述模型进行训练,得到训练后的模型。The model is trained according to the second output image and the objective function to obtain a trained model.
  15. 根据权利要求11所述的模型训练方法,还包括:The model training method according to claim 11, further comprising:
    根据预设标准判定训练后的模型是否达标,得到测试结果;Determine whether the trained model meets the standard according to the preset standard, and obtain the test result;
    若测试结果达标,则保存所述模型的参数并完成训练;If the test result reaches the standard, then save the parameters of the model and complete the training;
    若测试结果不达标,则继续对所述模型进行训练。If the test result does not meet the standard, continue to train the model.
  16. 一种图像处理装置,包括至少一个控制处理器和用于与所述至少一个控制处理器通信连接的存储器;所述存储器存储有可被所述至少一个控制处理器执行的指令,所述指令被所述至少一个控制处理器执行,以使所述至少一个控制处理器能够执行如权利要求1至6任一项所述的图像处理方法。An image processing device, comprising at least one control processor and a memory for communicating with the at least one control processor; the memory stores instructions executable by the at least one control processor, and the instructions are The at least one control processor executes so that the at least one control processor can execute the image processing method according to any one of claims 1 to 6.
  17. 一种训练设备,包括至少一个控制处理器和用于与所述至少一个控制处理器通信连接的存储器;所述存储器存储有可被所述至少一个控制处理器执行的指令,所述指令被所述至少一个控制处理器执行,以使所述至少一个控制处理器能够执行如权利要求11至15任一项所述的模型训练方法。A training device comprising at least one control processor and a memory for communicating with the at least one control processor; the memory stores instructions executable by the at least one control processor, the instructions being executed by the at least one control processor The at least one control processor is executed, so that the at least one control processor can execute the model training method according to any one of claims 11 to 15.
  18. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令用于使计算机执行如权利要求1至6任一项所述的图像处理方法或者如权利要求11至15任一项所述的模型训练方法。A computer-readable storage medium, wherein the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are used to make a computer execute the image processing method according to any one of claims 1 to 6 Or the model training method as described in any one of claims 11 to 15.
PCT/CN2022/078897 2021-09-28 2022-03-02 Image processing method, image processing apparatus, and model training method WO2023050720A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111144470.1 2021-09-28
CN202111144470.1A CN115880381A (en) 2021-09-28 2021-09-28 Image processing method, image processing apparatus, and model training method

Publications (1)

Publication Number Publication Date
WO2023050720A1 true WO2023050720A1 (en) 2023-04-06

Family

ID=85763607

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/078897 WO2023050720A1 (en) 2021-09-28 2022-03-02 Image processing method, image processing apparatus, and model training method

Country Status (2)

Country Link
CN (1) CN115880381A (en)
WO (1) WO2023050720A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117036832A (en) * 2023-10-09 2023-11-10 之江实验室 Image classification method, device and medium based on random multi-scale blocking
CN117788473A (en) * 2024-02-27 2024-03-29 北京大学第一医院(北京大学第一临床医学院) Method, system and equipment for predicting blood pressure based on binocular fusion network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9740949B1 (en) * 2007-06-14 2017-08-22 Hrl Laboratories, Llc System and method for detection of objects of interest in imagery
CN111680447A (en) * 2020-04-21 2020-09-18 深圳睿心智能医疗科技有限公司 Blood flow characteristic prediction method, blood flow characteristic prediction device, computer equipment and storage medium
CN113191953A (en) * 2021-06-04 2021-07-30 山东财经大学 Transformer-based face image super-resolution method
CN113435210A (en) * 2021-06-30 2021-09-24 平安科技(深圳)有限公司 Social image text recognition method and device, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9740949B1 (en) * 2007-06-14 2017-08-22 Hrl Laboratories, Llc System and method for detection of objects of interest in imagery
CN111680447A (en) * 2020-04-21 2020-09-18 深圳睿心智能医疗科技有限公司 Blood flow characteristic prediction method, blood flow characteristic prediction device, computer equipment and storage medium
CN113191953A (en) * 2021-06-04 2021-07-30 山东财经大学 Transformer-based face image super-resolution method
CN113435210A (en) * 2021-06-30 2021-09-24 平安科技(深圳)有限公司 Social image text recognition method and device, computer equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117036832A (en) * 2023-10-09 2023-11-10 之江实验室 Image classification method, device and medium based on random multi-scale blocking
CN117036832B (en) * 2023-10-09 2024-01-05 之江实验室 Image classification method, device and medium based on random multi-scale blocking
CN117788473A (en) * 2024-02-27 2024-03-29 北京大学第一医院(北京大学第一临床医学院) Method, system and equipment for predicting blood pressure based on binocular fusion network
CN117788473B (en) * 2024-02-27 2024-05-14 北京大学第一医院(北京大学第一临床医学院) Method, system and equipment for predicting blood pressure based on binocular fusion network

Also Published As

Publication number Publication date
CN115880381A (en) 2023-03-31

Similar Documents

Publication Publication Date Title
WO2023050720A1 (en) Image processing method, image processing apparatus, and model training method
WO2018150083A1 (en) A method and technical equipment for video processing
CN110072119B (en) Content-aware video self-adaptive transmission method based on deep learning network
CN110798690B (en) Video decoding method, and method, device and equipment for training loop filtering model
CN110870310A (en) Image encoding method and apparatus
CN113538287B (en) Video enhancement network training method, video enhancement method and related devices
WO2022253249A1 (en) Feature data encoding method and apparatus and feature data decoding method and apparatus
US11854164B2 (en) Method for denoising omnidirectional videos and rectified videos
CN114979672A (en) Video encoding method, decoding method, electronic device, and storage medium
CN116233445A (en) Video encoding and decoding processing method and device, computer equipment and storage medium
US11095901B2 (en) Object manipulation video conference compression
CN114245126B (en) Depth feature map compression method based on texture cooperation
CN115205117B (en) Image reconstruction method and device, computer storage medium and electronic equipment
CN115661403A (en) Explicit radiation field processing method, device and storage medium
CN110276728B (en) Human face video enhancement method based on residual error generation countermeasure network
WO2024093627A1 (en) Video compression method, video decoding method, and related apparatuses
WO2023133888A1 (en) Image processing method and apparatus, remote control device, system, and storage medium
WO2023133889A1 (en) Image processing method and apparatus, remote control device, system and storage medium
CN116708793B (en) Video transmission method, device, equipment and storage medium
CN115471765B (en) Semantic segmentation method, device and equipment for aerial image and storage medium
CN115103188B (en) SVC error concealment method, model training method, system and equipment
Xiong et al. Deep feature compression with collaborative coding of image texture
CN114140363B (en) Video deblurring method and device and video deblurring model training method and device
WO2023246655A1 (en) Image encoding method and apparatus, and image decoding method and apparatus
CN114401406A (en) Face video coding method, decoding method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22874123

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE