CN114445511A

CN114445511A - Image format conversion method and device, equipment, medium and product thereof

Info

Publication number: CN114445511A
Application number: CN202210106711.1A
Authority: CN
Inventors: 冯进亨; 戴长军; 丘文威; 叶艾彦
Original assignee: Guangzhou Huaduo Network Technology Co Ltd
Current assignee: Guangzhou Huaduo Network Technology Co Ltd
Priority date: 2022-01-28
Filing date: 2022-01-28
Publication date: 2022-05-06

Abstract

The application discloses an image format conversion method, an image format conversion device, computer equipment and a storage medium, wherein the image format conversion method comprises the following steps: acquiring a first target image in a first data format; inputting the first target image into a preset image format conversion model, wherein the image format conversion model is constrained to a convergence state in advance through linear loss and nonlinear loss, and is used for carrying out format conversion on the image; and reading a second target image output by the image format conversion model, wherein the second target image format is a second data format, and the information capacity of the second data format is greater than that of the first data format. By the method, the image quality can be rapidly improved, the limitation of the hardware performance of the shooting device is broken through, and the requirements of users are met.

Description

Image format conversion method and device, equipment, medium and product thereof

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image format conversion method and apparatus, an electronic device, and a computer-readable storage medium.

Background

With the development and progress of science and technology, the wide-range popularization of smart phones and the high-speed development of the internet, massive image data are continuously generated and shared. Meanwhile, the quality requirements of people on images are continuously improved, and more people pursue higher definition and richer color display data.

The inventor of the present application found in research that, in the prior art, after a low information content picture is acquired by a shooting device, information content addition cannot be performed on the low information content picture, that is, a high information content image cannot be generated by a single low information content image.

Disclosure of Invention

An image format conversion method, apparatus, and computer-readable storage medium capable of obtaining a high information content image through a single low information content image are provided.

In order to achieve the above object, the present application provides an image format conversion method, including:

acquiring a first target image in a first data format;

inputting the first target image into a preset image format conversion model, wherein the image format conversion model is constrained to a convergence state in advance through linear loss and nonlinear loss, and is used for carrying out format conversion on the image;

and reading a second target image output by the image format conversion model, wherein the second target image format is a second data format, and the information capacity of the second data format is greater than that of the first data format.

Optionally, the image format conversion model includes: the device comprises a first convolution channel and a second convolution channel, wherein the characteristics output by the first convolution channel are mask characteristic vectors, and the characteristics output by the second convolution channel are convolution characteristic vectors.

Optionally, the first volume channel comprises: the first attention layer is connected to the output end of the first roll-up layer, and the first attention layer comprises a channel attention layer.

Optionally, the second convolution channel includes: the device comprises a plurality of cascaded feature layers, wherein the output end of each feature layer is connected with a second attention layer, and the second attention layers comprise a channel attention layer and a space attention layer.

Optionally, each of the feature layers comprises: and the output end of each second convolution layer is connected with a linear rectification layer, and the output of any linear rectification layer is used as the input of all second convolution layers arranged behind any linear rectification layer.

Optionally, the reading the second target image output by the image format conversion model includes:

reading a mask feature vector output by the first convolution channel;

performing dot product operation on the mask feature vector and the array vector matrix of the first target image;

adding the result of the dot product operation and the feature vector output by the second convolution channel;

and mapping the result obtained by the addition operation through a preset hyperbolic tangent function to generate the second target image.

Optionally, the training method of the image format conversion model includes:

reading a training sample to be processed;

performing image enhancement processing on the training sample according to a preset image enhancement strategy;

inputting the training sample after image enhancement processing into a preset non-convergence model, wherein the non-convergence model is an initialization model of the image format conversion model;

reading a stress image output by the non-convergence model, and calculating a loss distance between the stress image and an annotation image in the training sample according to a preset linear loss function and a preset non-linear loss function;

and according to the loss distance and a preset return function, carrying out callback correction on the weight value of the non-convergence model so as to enable the loss distance between the stress image and the labeled image to tend to a preset target threshold value.

To achieve the above object, the present application also provides an image format conversion apparatus, comprising:

the acquisition module is used for acquiring a first target image in a first data format;

the processing module is used for inputting the first target image into a preset image format conversion model, wherein the image format conversion model is constrained to a convergence state in advance through linear loss and nonlinear loss, and is a neural network model for carrying out format conversion on the image;

and the execution module is used for reading a second target image output by the image format conversion model, wherein the second target image format is a second data format, and the information capacity of the second data format is greater than that of the first data format.

Optionally, each of the feature layers comprises: and the output end of each second convolution layer is connected with a linear rectifying layer, and the output of any linear rectifying layer is used as the input of all second convolution layers arranged behind any linear rectifying layer.

Optionally, the image format conversion apparatus further includes:

the first reading submodule is used for reading the mask feature vector output by the first convolution channel;

the first operation submodule is used for performing dot product operation on the mask characteristic vector and the array vector matrix of the first target image;

the second operation submodule is used for performing addition operation on the result of the dot product operation and the feature vector output by the second convolution channel;

and the first generation submodule is used for generating the second target image after mapping the result obtained by the addition operation through a preset hyperbolic tangent function.

Optionally, the image format conversion apparatus further includes:

the second reading submodule is used for reading a training sample to be processed;

the first enhancement submodule is used for carrying out image enhancement processing on the training sample according to a preset image enhancement strategy;

the first processing submodule is used for inputting the training sample subjected to the image enhancement processing into a preset non-convergence model, wherein the non-convergence model is an initialization model of the image format conversion model;

the third calculation submodule is used for reading the stress image output by the non-convergence model and calculating the loss distance between the stress image and the labeled image in the training sample according to a preset linear loss function and a preset non-linear loss function;

and the first execution submodule is used for carrying out callback correction on the weighted value of the non-convergence model according to the loss distance and a preset return function so as to enable the loss distance between the stress image and the labeled image to tend to a preset target threshold value.

In order to solve the above technical problem, an embodiment of the present invention further provides a computer device, including a memory and a processor, where the memory stores computer-readable instructions, and the computer-readable instructions, when executed by the processor, cause the processor to execute the steps of the image format conversion method.

In order to solve the above technical problem, embodiments of the present application further provide a storage medium storing computer-readable instructions, which, when executed by one or more processors, cause the one or more processors to perform the steps of the image format conversion method.

A computer program product provided to adapt another object of the present application includes a computer program/instructions which, when executed by a processor, implement the steps of the image format conversion method described in any one of the embodiments of the present application.

The beneficial effects of the embodiment of the application are that: by training the image format conversion model in advance, the image format conversion model is a neural network model, the format of the input image can be converted, the information carrying capacity of the input image is improved, and the information content in the image is improved. Therefore, when a user shoots a first target image in a first data format by using the shooting device and inputs the first target image into the image format conversion model, the image format conversion model converts the format of the first target image to convert the first target image into a second target image in a second data format, the information capacity of the second data format is greater than that of the first data format, and the second target image after format conversion has higher image quality. By the method, the image quality can be rapidly improved, the limitation of the hardware performance of the shooting device is broken through, and the requirements of users are met.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a schematic diagram of a basic flow chart of an image format conversion method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart illustrating the generation of a second target image according to an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating a training process of an image format conversion model according to an embodiment of the present application;

FIG. 4 is a diagram illustrating a basic structure of an image format conversion apparatus according to an embodiment of the present application;

fig. 5 is a block diagram of a basic structure of a computer device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

As used herein, a "terminal" includes both devices that are wireless signal receivers, devices that have only wireless signal receivers without transmit capability, and devices that have receive and transmit hardware, devices that have receive and transmit hardware capable of performing two-way communication over a two-way communication link, as will be understood by those skilled in the art. Such a device may include: a cellular or other communication device having a single line display or a multi-line display or a cellular or other communication device without a multi-line display; PCS (Personal Communications Service), which may combine voice, data processing, facsimile and/or data communication capabilities; a PDA (Personal Digital Assistant), which may include a radio frequency receiver, a pager, internet/intranet access, a web browser, a notepad, a calendar and/or a GPS (Global Positioning System) receiver; a conventional laptop and/or palmtop computer or other appliance having and/or including a radio frequency receiver. As used herein, a "terminal" may be portable, transportable, installed in a vehicle (aeronautical, maritime, and/or land-based), or situated and/or configured to operate locally and/or in a distributed fashion at any other location(s) on earth and/or in space. The "terminal" used herein may also be a communication terminal, a Internet access terminal, and a music/video playing terminal, and may be, for example, a PDA, an MID (Mobile Internet Device), and/or a Mobile phone with music/video playing function, and may also be a smart television, a set-top box, and other devices.

The hardware referred to by the names "server", "client", "service node", etc. is essentially an electronic device with the performance of a personal computer, and is a hardware device having necessary components disclosed by the von neumann principle such as a central processing unit (including an arithmetic unit and a controller), a memory, an input device, an output device, etc., a computer program is stored in the memory, and the central processing unit calls a program stored in an external memory into the internal memory to run, executes instructions in the program, and interacts with the input and output devices, thereby completing a specific function.

It should be noted that the concept of "server" as referred to in this application can be extended to the case of a server cluster. According to the network deployment principle understood by those skilled in the art, the servers should be logically divided, and in physical space, the servers may be independent from each other but can be called through an interface, or may be integrated into one physical computer or a set of computer clusters. Those skilled in the art should understand this variation and should not be so constrained as to implement the network deployment of the present application.

One or more technical features of the present application, unless expressly specified otherwise, may be deployed to a server for implementation by a client remotely invoking an online service interface provided by a capture server for access, or may be deployed directly and run on the client for access.

Unless specified in clear text, the neural network model referred to or possibly referred to in the application can be deployed in a remote server and used for remote call at a client, and can also be deployed in a client with qualified equipment capability for direct call.

Various data referred to in the present application may be stored in a server remotely or in a local terminal device unless specified in the clear text, as long as the data is suitable for being called by the technical solution of the present application.

The person skilled in the art will know this: although the various methods of the present application are described based on the same concept so as to be common to each other, they may be independently performed unless otherwise specified. In the same way, for each embodiment disclosed in the present application, it is proposed based on the same inventive concept, and therefore, concepts of the same expression and concepts of which expressions are different but are appropriately changed only for convenience should be equally understood.

The embodiments to be disclosed herein can be flexibly constructed by cross-linking related technical features of the embodiments unless the mutual exclusion relationship between the related technical features is stated in the clear text, as long as the combination does not depart from the inventive spirit of the present application and can meet the needs of the prior art or solve the deficiencies of the prior art. Those skilled in the art will appreciate variations therefrom.

Referring to fig. 1, fig. 1 is a basic flow chart illustrating an image format conversion method according to the present embodiment. As shown in fig. 1, an image format conversion method includes:

s1100, collecting a first target image in a first data format;

in the present embodiment, the photographing apparatus for image photographing is a mobile terminal, for example, a mobile phone, a tablet computer, or a DV apparatus.

The mobile terminal needs to shoot and collect images in a specific shooting mode, the configuration parameters can be stored in the server side, the server side identifies the operating system or the device information of the mobile terminal according to the device information after receiving the device information sent by the mobile terminal, and then the configuration parameters matched with the mobile terminal are further configured according to the device information.

In some embodiments, the matching of the configuration parameters needs to be performed according to a camera interface carried by the mobile terminal, and first, an API interface of the camera of the mobile terminal is collected and searched in a configuration database, and after a parameter corresponding to the API interface is obtained, the configuration parameters are sent to the mobile terminal. And when the corresponding configuration parameters cannot be found through the API, reading the SDK information of the shooting module in the mobile terminal equipment information, searching in the configuration database through the SDK information to obtain the configuration parameters corresponding to the SDK, and then sending the corresponding configuration parameters to the mobile terminal. Furthermore, after the configuration parameters cannot be matched through the SDK information, the configuration parameters can be matched through the type of the operating system of the mobile terminal.

And after the server side receives the equipment information sent by the mobile terminal, matching the corresponding configuration parameters of the mobile terminal according to the equipment information, and sending the configuration parameters to the mobile terminal. After the mobile terminal obtains the configuration parameters, the shooting parameters of the mobile terminal are configured, and the configuration mode is as follows: and setting the shooting parameters of the mobile terminal.

And the mobile terminal set according to the configuration parameters enters a set target shooting mode. It should be noted that the target shooting mode is not a mode necessary for the mobile terminal to capture an image, and in some embodiments, the mobile terminal can capture the first target image in any shooting mode.

The image acquired by the mobile terminal is a first target image, the first data format is an SDR format, and it should be noted that the first data format is not limited to the SDR format and can also be conventional image formats such as JPG, PNG and the like.

S1200, inputting the first target image into a preset image format conversion model, wherein the image format conversion model is constrained to a convergence state in advance through linear loss and nonlinear loss, and is used for carrying out format conversion on the image;

after a first target image is acquired, the first target image is input into an image format conversion model, the image format conversion model is a neural network model which is trained to a convergence state in advance through supervised training, and format conversion can be carried out on the image input into the image format conversion model.

In some embodiments, the image format conversion model is deployed in a mobile terminal, and after the mobile terminal collects the first target image, the first target image is input into the mobile terminal stored locally, and the image format conversion model performs format conversion on the image locally.

In some embodiments, the image format conversion model is deployed in a server, the mobile terminal sends a first target image to the server after acquiring the first target image, the server performs image format conversion on the first target image through the image format conversion model, and after the image format conversion is completed, the server sends the converted image to the mobile terminal.

In some embodiments, after the first target image is acquired, the acquired first target image needs to be screened. The screening method comprises the following steps: and screening is carried out through the average brightness of the first target image. Specifically, a standard luminance section value is set. The standard luminance interval value can be [30,230], but the standard luminance interval value is not limited thereto, and the critical value of the standard luminance interval value can be larger or smaller according to a specific application scene.

Calculating the average brightness of the first target image, comparing whether the average brightness value corresponding to the first target image falls into a standard brightness interval, if so, inputting the first target image into an image format conversion model, otherwise, deleting the first target image, and then shooting the image again.

The value range of the standard brightness interval value can be dynamically adjusted along with different shooting environments. Specifically, the first target image is input into a neural network model trained to a convergence state in advance and used for identifying a scene where the image is located, and a shooting environment represented by the first target image, such as different image scenes, such as indoor, outdoor, cloudy, sunny, and the like, is identified by the neural network model.

And searching a standard brightness interval value matched with the image scene in a preset scene database according to the image scene. Through scene recognition of the first target image, the adaptability of the standard brightness interval value to the environment can be improved.

In some embodiments, the image format conversion model is deployed in a mobile terminal, and since the performance of the mobile terminal is limited and a neural network model with a large scale cannot be run, the image format conversion model needs to be lightened. Specifically, the structure of the image format conversion model is as follows:

the image format conversion model includes: the device comprises a first convolution channel and a second convolution channel, wherein the characteristics output by the first convolution channel are mask characteristic vectors, and the characteristics output by the second convolution channel are convolution characteristic vectors.

The first convolution channel includes: the first attention layer is connected to the output end of the first roll-up layer, and the first attention layer comprises a channel attention layer.

The second convolution channel includes: the device comprises a plurality of cascaded feature layers, wherein the output end of each feature layer is connected with a second attention layer, and the second attention layers comprise a channel attention layer and a space attention layer. In some embodiments, the second convolution channel comprises: 3 sets of cascaded feature layers and 3 sets of second attention layers, however, the number of feature layers and second attention layers included in the second convolution channel is not limited thereto, and in some embodiments, the number of feature layers and second attention layers included in the second convolution channel can be the largest or smaller, depending on the specific application scenario.

Each of the feature layers includes: and the output end of each second convolution layer is connected with a linear rectifying layer, and the output of any linear rectifying layer is used as the input of all second convolution layers arranged behind any linear rectifying layer. In some embodiments, each feature layer comprises: 5 sets of cascaded second convolution layers and linear rectifying layers, wherein linear rectifying functions are arranged in the linear rectifying layers. The output of any linear rectifying layer is used as the input of all the second convolutional layers arranged behind the any linear rectifying layer, for example, the output of the linear rectifying layer with the first bit in the bit column is used as the input of the second convolutional layers with the second, third, fourth and fifth bits, the output of the linear rectifying layer with the second bit in the bit column is used as the input of the second convolutional layers with the third, fourth and fifth bits, and so on until the linear rectifying layer with the fourth bit is input to the input of the second convolutional layer with the fifth bit. It should be noted that the number of the second convolution layers and the linear rectifying layers included in the feature layer is not limited to this, and the number of the second convolution layers and the linear rectifying layers can be more or less in some embodiments according to different application scenarios.

In this embodiment, the image format conversion model includes a loss function, which includes a linear loss function and a nonlinear loss function.

Loss＝Loss_linear+Loss_nonlinear

Wherein Loss denotes the Loss function, Loss_linearIs shown asLinear Loss function, Loss_nonlinearExpressed as a nonlinear loss function.

The linear and nonlinear Loss functions each comprise a gamut Loss function Loss_colorAnd Loss of perception function Loss_perceptual. Compared with the prior art, the loss functions are all single nonlinear loss functions or linear loss functions. However, a single linear loss function cannot make the image format conversion model converge, a single nonlinear loss function can make the image format conversion model in a distorted state, and the linear loss function and the nonlinear loss function are mixed, so that the linear loss function can make up for model distortion caused by the nonlinear loss function, the image format conversion model has stronger robustness, and the converted image has better stability.

Loss_linear＝Loss_color+Loss_perceptual

Loss_nonlinear＝Loss_color+Loss_perceptual

The perception loss refers to the difference of results between different feature layers of the image format conversion model, the difference is calculated by using an L1 paradigm, y is an annotation feature, and y' is the output of the image format conversion model.

Loss_perceptual＝L₁|VGG₁₉(y′-y)|

Gamut loss three gamuts were used for loss calculation, RGB, HSV, LAB three color spaces respectively. RGB is a three-primary color space and respectively corresponds to three colors of red, green and blue; HSV is a hue (H), saturation (S) and brightness (V) space, and the color is adjusted through an H channel and an S channel to be more approximate to the color of the marked image; in LAB, L is brightness, AB is red-blue tone, and the accuracy of the network color is improved through the difference calculation of A and B channels in the same way as HSV. The color gamut loss is calculated by adopting an L1 paradigm, such as the following formula:

the image format conversion model format is a lightweight model which can be deployed on a mobile terminal, and a linear and nonlinear loss combination mode is adopted, so that the phenomena of image distortion and over-enhancement are reduced, and the expressive force and stability of an output image of the image format conversion model are improved.

And S1300, reading a second target image output by the image format conversion model, wherein the second target image format is a second data format, and the information capacity of the second data format is greater than that of the first data format.

The first target image is input into the image format conversion model, an input channel of the image format conversion model converts the first target image into an array vector matrix, and then the array vector matrix is respectively input to the output ends of the first convolution channel, the second convolution channel, the first convolution channel and the second convolution channel.

The first convolution layer and the first attention layer in the first convolution channel extract the features of the array vector matrix, and the extracted feature vectors are mask feature vectors.

The feature layer and the second attention layer in the second convolution channel perform hierarchical extraction on the features of the array vector matrix. And extracting convolution characteristics in the logarithm group vector matrix of the hierarchy of each second convolution layer and each linear rectification layer in the characteristic layer. Finally, the feature vector output by the second convolution channel is the convolution feature vector.

At the output ends of the first convolution channel and the second convolution channel, the mask characteristic vector output by the first convolution channel is subjected to dot product operation with the array vector matrix, namely, the eye mask characteristic matrix represented by the mask characteristic vector is subjected to multiplication operation with the array vector matrix.

And when the mask characteristic vector output by the first convolution channel and the array vector matrix are subjected to dot product operation to obtain a dot product result, performing addition operation on the vector matrix obtained by the dot product result and the characteristic vector output by the second convolution channel.

After the operation result obtained by the addition operation is obtained, the operation result needs to be mapped, and the mapping mode is as follows: the operation result is mapped through a hyperbolic tangent function, the hyperbolic tangent function (tanh) is the ratio of a hyperbolic sine function (sinh) and a hyperbolic cosine function (cosh), and the operation result obtained by addition operation can be mapped in a value range of [ -1,1 ]. And finally, performing pixelization processing on the mapped vector matrix to generate a second target image.

The data format of the second target image is a second data format. In this embodiment, the second data format is an HDR format, it should be noted that the format range of the second data format is not limited to this, and the second data format can also be (without limitation) according to different application scenarios: TGA, BMP, etc.

In the embodiment, the image format conversion model is trained in advance and is a neural network model, so that the format of the input image can be converted, the information carrying capacity of the input image is improved, and the information content in the image is improved. Therefore, when a user shoots a first target image in a first data format by using the shooting device and inputs the first target image into the image format conversion model, the image format conversion model converts the format of the first target image to convert the first target image into a second target image in a second data format, the information capacity of the second data format is greater than that of the first data format, and the second target image after format conversion has higher image quality. By the method, the image quality can be rapidly improved, the limitation of the hardware performance of the shooting device is broken through, and the requirements of users are met.

In some embodiments, after the first target image is input into the image format conversion model, the first target image needs to be processed through the first convolution channel and the second convolution channel. Referring to fig. 2, fig. 2 is a schematic flow chart illustrating the generation of the second target image according to the present embodiment.

As shown in fig. 2, S1300 includes:

s1311, reading a mask feature vector output by the first convolution channel;

S1312, performing dot product operation on the mask feature vector and the array vector matrix of the first target image;

S1313, performing addition operation on the dot product operation result and the feature vector output by the second convolution channel;

And S1314, mapping the result obtained by the addition operation through a preset hyperbolic tangent function, and then generating the second target image.

In some embodiments, the image format conversion model needs to be trained by a supervised training method, please refer to fig. 3 for a training method, and fig. 3 is a schematic diagram of a training process of the image format conversion model in this embodiment.

As shown in fig. 3, the image format conversion model training method is as follows:

s1411, reading a training sample to be processed;

in this embodiment, a training sample is constructed, where the training sample includes a training image and an annotation image, where the training image is an image directly acquired by a mobile terminal, and the annotation image is a high-dynamic image synthesized from multiple sample images. Thus, the standard image has a higher information load than the training image. The data format of the training image is a first data format and the data format of the annotation image is a second data format. A training sample, consisting of a pair of training images and annotation images, enables a supervised training model to have the ability to convert a first data format to a second data format.

In this embodiment, one training sample includes one training image and one annotation image, and the training sample is stored in a training set. The training set comprises a plurality of training samples, and when training is carried out, one training sample is randomly selected to train the pre-trained model.

S1412, performing image enhancement processing on the training sample according to a preset image enhancement strategy;

in the present embodiment, in order to make the image format conversion model obtained by training more robust, it is necessary to perform enhancement processing on the training image before inputting the training sample into the model, and the method of enhancement processing includes (without being limited to): and processing the training sample by one or more image processing methods of rotate, flip, crop and brightness adjustment. For example, the training image is subjected to [ -10%, 10% ] brightness adjustment processing.

S1413, inputting the training sample subjected to image enhancement processing into a preset non-convergence model, wherein the non-convergence model is an initialization model of the image format conversion model;

and inputting the training image subjected to image enhancement processing into a preset non-convergence model, wherein the non-convergence model is an initialization model of the image format conversion model. The non-convergence model performs feature extraction and data format conversion on the training image, but the non-convergence model is not trained, so that the output result is stress output, the randomness is high, and the requirement of image format directional conversion cannot be met.

S1414, reading a stress image output by the non-convergence model, and calculating a loss distance between the stress image and a labeled image in the training sample according to a preset linear loss function and a preset non-linear loss function;

after the training sample is input into the non-convergence model, the stress image output by the non-convergence model is read, and then the loss distance between the stress image and the labeled image is calculated through the set linear loss function and the set non-linear loss function. Wherein the linear loss function and the nonlinear loss function are characterized by:

Loss＝Loss_linear+Loss_nonlinear

wherein Loss denotes the Loss function, Loss_linearExpressed as a linear Loss function, Loss_nonlinearExpressed as a nonlinear loss function.

The linear and nonlinear Loss functions each comprise a gamut Loss function Loss_colorAnd Loss of perception function Loss_perceptual. Compared with the prior art, the loss functions are single nonlinear loss functions or linear loss functions. However, a single linear loss function cannot make the image format conversion model converge, a single nonlinear loss function can make the image format conversion model in a distorted state, and the linear loss function and the nonlinear loss function are mixed, so that the linear loss function can make up for model distortion caused by the nonlinear loss function, the image format conversion model has stronger robustness, and the converted image has better stability.

Loss_linear＝Loss_color+Loss_perceptual

Loss_nonlinear＝Loss_color+Loss_perceptual

Loss_perceptual＝L₁|VGG₁₉(y′-y)|

Gamut loss the loss calculation was performed using three gamuts, RGB, HSV, LAB three color spaces. RGB is a three-primary color space and respectively corresponds to three colors of red, green and blue; HSV is a hue (H), saturation (S) and brightness (V) space, and the color is adjusted through an H channel and an S channel to be more approximate to the color of the marked image; in LAB, L is brightness, AB is red-blue tone, and the accuracy of the network color is improved through the difference calculation of A and B channels in the same way as HSV. The color gamut loss is calculated by adopting an L1 paradigm, such as the following formula:

the linear and nonlinear loss combination mode can reduce the image distortion and the over-enhancement phenomenon, and increase the expressive force and stability of the output image of the image format conversion model.

S1415, according to the loss distance and a preset feedback function, the weighted value of the non-convergence model is adjusted back and corrected, so that the loss distance between the stress image and the annotation image tends to a preset target threshold value.

And calculating to obtain a loss distance between the stress image and the labeled image, and calculating the loss distance through a return function to obtain a gradient value corrected by the training of the current round. And (3) carrying out callback correction on the weight value of the non-convergence model through the calculated gradient value, wherein the callback correction result can enable the weight value in the non-convergence model to be directionally adjusted, and the purpose of directional adjustment is to enable the loss distance between the stress image and the labeled image to tend to a preset target threshold value. The target threshold value can be set according to actual requirements, the process of supervision training is a gradient descending training process, the process of S1411-S1415 is a gradient descending complete process, and through a plurality of rounds of training samples, when the training times reach a set value or the accuracy of output images reaches a set threshold value, a non-convergence model is trained to a convergence state to form an image format conversion model.

Referring to fig. 4, fig. 4 is a schematic diagram of a basic structure of the image format conversion device according to the present embodiment.

As shown in fig. 4, an image format conversion apparatus includes: an acquisition module 1100, a processing module 1200, and an execution module 1300. The acquisition module 1100 is configured to acquire a first target image in a first data format; the processing module 1200 is configured to input the first target image into a preset image format conversion model, where the image format conversion model is constrained to a convergence state in advance through linear loss and nonlinear loss, and is a neural network model for performing format conversion on an image; the execution module 1300 is configured to read a second target image output by the image format conversion model, where the second target image format is a second data format, and an information capacity of the second data format is greater than an information capacity of the first data format.

The image format conversion device can convert the format of the input image by pre-training the image format conversion model which is a neural network model, thereby improving the information carrying capacity of the input image and the information content in the image. Therefore, when a user shoots a first target image in a first data format by using the shooting device and inputs the first target image into the image format conversion model, the image format conversion model converts the format of the first target image to convert the first target image into a second target image in a second data format, the information capacity of the second data format is greater than that of the first data format, and the second target image after format conversion has higher image quality. By the method, the image quality can be rapidly improved, the limitation of the hardware performance of the shooting device is broken through, and the requirements of users are met.

Optionally, the image format conversion model comprises: the device comprises a first convolution channel and a second convolution channel, wherein the characteristics output by the first convolution channel are mask characteristic vectors, and the characteristics output by the second convolution channel are convolution characteristic vectors.

Optionally, the first volume channel comprises: the first attention layer is connected to an output end of the first roll-up layer, and the first attention layer comprises a channel attention layer.

Optionally, the second convolution channel comprises: the device comprises a plurality of cascaded feature layers, wherein the output end of each feature layer is connected with a second attention layer, and the second attention layers comprise a channel attention layer and a space attention layer.

Optionally, the image format conversion apparatus further includes:

and the first generation submodule is used for generating the second target image after mapping a result obtained by the addition operation through a preset hyperbolic tangent function.

Optionally, the image format conversion apparatus further includes:

the first processing submodule is used for inputting the training sample after the image enhancement processing into a preset non-convergence model, wherein the non-convergence model is an initialization model of the image format conversion model;

In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 5, fig. 5 is a block diagram of a basic structure of a computer device according to the embodiment.

As shown in fig. 5, the internal structure of the computer device is schematically illustrated. The computer device includes a processor, a non-volatile storage medium, a memory, and a network interface connected by a system bus. The non-volatile storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store control information sequences, and the computer readable instructions can enable the processor to realize an image format conversion method when being executed by the processor. The processor of the computer device is used for providing calculation and control capability and supporting the operation of the whole computer device. The memory of the computer device may have stored therein computer readable instructions that, when executed by the processor, cause the processor to perform an image format conversion method. The network interface of the computer device is used for connecting and communicating with the terminal. Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In this embodiment, the processor is configured to execute specific functions of the acquisition module 1100, the processing module 1200, and the execution module 1300 in fig. 4, and the memory stores program codes and various data required for executing the modules. The network interface is used for data transmission to and from a user terminal or a server. The memory in this embodiment stores program codes and data necessary for executing all the sub-modules in the image format conversion apparatus, and the server can call the program codes and data of the server to execute the functions of all the sub-modules.

The computer device can convert the format of the input image by pre-training the image format conversion model which is a neural network model, so that the information carrying capacity of the input image is improved, and the information content in the image is improved. Therefore, when a user shoots a first target image in a first data format by using the shooting device and inputs the first target image into the image format conversion model, the image format conversion model converts the format of the first target image to convert the first target image into a second target image in a second data format, the information capacity of the second data format is greater than that of the first data format, and the second target image after format conversion has higher image quality. By the method, the image quality can be rapidly improved, the limitation of the hardware performance of the shooting device is broken through, and the requirements of users are met.

The present application also provides a storage medium storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the steps of the image format conversion method of any of the above embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

Those of skill in the art will appreciate that the various operations, methods, steps in the processes, acts, or solutions discussed in this application can be interchanged, modified, combined, or eliminated. Further, other steps, measures, or schemes in various operations, methods, or flows that have been discussed in this application can be alternated, altered, rearranged, broken down, combined, or deleted. Further, steps, measures, schemes in the prior art having various operations, methods, procedures disclosed in the present application may also be alternated, modified, rearranged, decomposed, combined, or deleted.

The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims

1. An image format conversion method, comprising:

acquiring a first target image in a first data format;

2. The image format conversion method according to claim 1, wherein the image format conversion model includes: the device comprises a first convolution channel and a second convolution channel, wherein the output characteristic of the first convolution channel is a mask characteristic vector, and the output characteristic of the second convolution channel is a convolution characteristic vector.

3. The image format conversion method according to claim 2, wherein the first convolution channel includes: the first attention layer is connected to the output end of the first roll-up layer, and the first attention layer comprises a channel attention layer.

4. The image format conversion method according to claim 2, wherein the second convolution channel includes: the device comprises a plurality of cascaded feature layers, wherein the output end of each feature layer is connected with a second attention layer, and the second attention layers comprise a channel attention layer and a space attention layer.

5. The image format conversion method according to claim 4, wherein each of the feature layers includes: and the output end of each second convolution layer is connected with a linear rectifying layer, and the output of any linear rectifying layer is used as the input of all second convolution layers arranged behind any linear rectifying layer.

6. The image format conversion method according to claim 2, wherein the reading of the second target image output by the image format conversion model includes:

reading a mask feature vector output by the first convolution channel;

performing addition operation on the dot product operation result and the feature vector output by the second convolution channel;

7. The image format conversion method according to any one of claims 1 to 6, wherein the image format conversion model is trained by:

reading a training sample to be processed;

8. An image format conversion apparatus, characterized by comprising:

9. A computer storage medium, wherein the computer readable instructions, when executed by one or more processors, cause the one or more processors to perform the steps of the image format conversion method of any one of claims 1 to 7.

10. A computer program product comprising computer program/instructions, characterized in that the computer program/instructions, when executed by a processor, implement the steps of the method as claimed in any one of claims 1 to 7.