CN114445511A - Image format conversion method and device, equipment, medium and product thereof - Google Patents

Image format conversion method and device, equipment, medium and product thereof Download PDF

Info

Publication number
CN114445511A
CN114445511A CN202210106711.1A CN202210106711A CN114445511A CN 114445511 A CN114445511 A CN 114445511A CN 202210106711 A CN202210106711 A CN 202210106711A CN 114445511 A CN114445511 A CN 114445511A
Authority
CN
China
Prior art keywords
image
format conversion
model
image format
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210106711.1A
Other languages
Chinese (zh)
Inventor
冯进亨
戴长军
丘文威
叶艾彦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Huaduo Network Technology Co Ltd
Original Assignee
Guangzhou Huaduo Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Huaduo Network Technology Co Ltd filed Critical Guangzhou Huaduo Network Technology Co Ltd
Priority to CN202210106711.1A priority Critical patent/CN114445511A/en
Publication of CN114445511A publication Critical patent/CN114445511A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/001Model-based coding, e.g. wire frame
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)

Abstract

The application discloses an image format conversion method, an image format conversion device, computer equipment and a storage medium, wherein the image format conversion method comprises the following steps: acquiring a first target image in a first data format; inputting the first target image into a preset image format conversion model, wherein the image format conversion model is constrained to a convergence state in advance through linear loss and nonlinear loss, and is used for carrying out format conversion on the image; and reading a second target image output by the image format conversion model, wherein the second target image format is a second data format, and the information capacity of the second data format is greater than that of the first data format. By the method, the image quality can be rapidly improved, the limitation of the hardware performance of the shooting device is broken through, and the requirements of users are met.

Description

Image format conversion method and device, equipment, medium and product thereof
Technical Field
The present application relates to the field of image processing technologies, and in particular, to an image format conversion method and apparatus, an electronic device, and a computer-readable storage medium.
Background
With the development and progress of science and technology, the wide-range popularization of smart phones and the high-speed development of the internet, massive image data are continuously generated and shared. Meanwhile, the quality requirements of people on images are continuously improved, and more people pursue higher definition and richer color display data.
The inventor of the present application found in research that, in the prior art, after a low information content picture is acquired by a shooting device, information content addition cannot be performed on the low information content picture, that is, a high information content image cannot be generated by a single low information content image.
Disclosure of Invention
An image format conversion method, apparatus, and computer-readable storage medium capable of obtaining a high information content image through a single low information content image are provided.
In order to achieve the above object, the present application provides an image format conversion method, including:
acquiring a first target image in a first data format;
inputting the first target image into a preset image format conversion model, wherein the image format conversion model is constrained to a convergence state in advance through linear loss and nonlinear loss, and is used for carrying out format conversion on the image;
and reading a second target image output by the image format conversion model, wherein the second target image format is a second data format, and the information capacity of the second data format is greater than that of the first data format.
Optionally, the image format conversion model includes: the device comprises a first convolution channel and a second convolution channel, wherein the characteristics output by the first convolution channel are mask characteristic vectors, and the characteristics output by the second convolution channel are convolution characteristic vectors.
Optionally, the first volume channel comprises: the first attention layer is connected to the output end of the first roll-up layer, and the first attention layer comprises a channel attention layer.
Optionally, the second convolution channel includes: the device comprises a plurality of cascaded feature layers, wherein the output end of each feature layer is connected with a second attention layer, and the second attention layers comprise a channel attention layer and a space attention layer.
Optionally, each of the feature layers comprises: and the output end of each second convolution layer is connected with a linear rectification layer, and the output of any linear rectification layer is used as the input of all second convolution layers arranged behind any linear rectification layer.
Optionally, the reading the second target image output by the image format conversion model includes:
reading a mask feature vector output by the first convolution channel;
performing dot product operation on the mask feature vector and the array vector matrix of the first target image;
adding the result of the dot product operation and the feature vector output by the second convolution channel;
and mapping the result obtained by the addition operation through a preset hyperbolic tangent function to generate the second target image.
Optionally, the training method of the image format conversion model includes:
reading a training sample to be processed;
performing image enhancement processing on the training sample according to a preset image enhancement strategy;
inputting the training sample after image enhancement processing into a preset non-convergence model, wherein the non-convergence model is an initialization model of the image format conversion model;
reading a stress image output by the non-convergence model, and calculating a loss distance between the stress image and an annotation image in the training sample according to a preset linear loss function and a preset non-linear loss function;
and according to the loss distance and a preset return function, carrying out callback correction on the weight value of the non-convergence model so as to enable the loss distance between the stress image and the labeled image to tend to a preset target threshold value.
To achieve the above object, the present application also provides an image format conversion apparatus, comprising:
the acquisition module is used for acquiring a first target image in a first data format;
the processing module is used for inputting the first target image into a preset image format conversion model, wherein the image format conversion model is constrained to a convergence state in advance through linear loss and nonlinear loss, and is a neural network model for carrying out format conversion on the image;
and the execution module is used for reading a second target image output by the image format conversion model, wherein the second target image format is a second data format, and the information capacity of the second data format is greater than that of the first data format.
Optionally, the image format conversion model includes: the device comprises a first convolution channel and a second convolution channel, wherein the characteristics output by the first convolution channel are mask characteristic vectors, and the characteristics output by the second convolution channel are convolution characteristic vectors.
Optionally, the first volume channel comprises: the first attention layer is connected to the output end of the first roll-up layer, and the first attention layer comprises a channel attention layer.
Optionally, the second convolution channel includes: the device comprises a plurality of cascaded feature layers, wherein the output end of each feature layer is connected with a second attention layer, and the second attention layers comprise a channel attention layer and a space attention layer.
Optionally, each of the feature layers comprises: and the output end of each second convolution layer is connected with a linear rectifying layer, and the output of any linear rectifying layer is used as the input of all second convolution layers arranged behind any linear rectifying layer.
Optionally, the image format conversion apparatus further includes:
the first reading submodule is used for reading the mask feature vector output by the first convolution channel;
the first operation submodule is used for performing dot product operation on the mask characteristic vector and the array vector matrix of the first target image;
the second operation submodule is used for performing addition operation on the result of the dot product operation and the feature vector output by the second convolution channel;
and the first generation submodule is used for generating the second target image after mapping the result obtained by the addition operation through a preset hyperbolic tangent function.
Optionally, the image format conversion apparatus further includes:
the second reading submodule is used for reading a training sample to be processed;
the first enhancement submodule is used for carrying out image enhancement processing on the training sample according to a preset image enhancement strategy;
the first processing submodule is used for inputting the training sample subjected to the image enhancement processing into a preset non-convergence model, wherein the non-convergence model is an initialization model of the image format conversion model;
the third calculation submodule is used for reading the stress image output by the non-convergence model and calculating the loss distance between the stress image and the labeled image in the training sample according to a preset linear loss function and a preset non-linear loss function;
and the first execution submodule is used for carrying out callback correction on the weighted value of the non-convergence model according to the loss distance and a preset return function so as to enable the loss distance between the stress image and the labeled image to tend to a preset target threshold value.
In order to solve the above technical problem, an embodiment of the present invention further provides a computer device, including a memory and a processor, where the memory stores computer-readable instructions, and the computer-readable instructions, when executed by the processor, cause the processor to execute the steps of the image format conversion method.
In order to solve the above technical problem, embodiments of the present application further provide a storage medium storing computer-readable instructions, which, when executed by one or more processors, cause the one or more processors to perform the steps of the image format conversion method.
A computer program product provided to adapt another object of the present application includes a computer program/instructions which, when executed by a processor, implement the steps of the image format conversion method described in any one of the embodiments of the present application.
The beneficial effects of the embodiment of the application are that: by training the image format conversion model in advance, the image format conversion model is a neural network model, the format of the input image can be converted, the information carrying capacity of the input image is improved, and the information content in the image is improved. Therefore, when a user shoots a first target image in a first data format by using the shooting device and inputs the first target image into the image format conversion model, the image format conversion model converts the format of the first target image to convert the first target image into a second target image in a second data format, the information capacity of the second data format is greater than that of the first data format, and the second target image after format conversion has higher image quality. By the method, the image quality can be rapidly improved, the limitation of the hardware performance of the shooting device is broken through, and the requirements of users are met.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic diagram of a basic flow chart of an image format conversion method according to an embodiment of the present application;
FIG. 2 is a schematic flow chart illustrating the generation of a second target image according to an embodiment of the present application;
FIG. 3 is a schematic diagram illustrating a training process of an image format conversion model according to an embodiment of the present application;
FIG. 4 is a diagram illustrating a basic structure of an image format conversion apparatus according to an embodiment of the present application;
fig. 5 is a block diagram of a basic structure of a computer device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
As used herein, a "terminal" includes both devices that are wireless signal receivers, devices that have only wireless signal receivers without transmit capability, and devices that have receive and transmit hardware, devices that have receive and transmit hardware capable of performing two-way communication over a two-way communication link, as will be understood by those skilled in the art. Such a device may include: a cellular or other communication device having a single line display or a multi-line display or a cellular or other communication device without a multi-line display; PCS (Personal Communications Service), which may combine voice, data processing, facsimile and/or data communication capabilities; a PDA (Personal Digital Assistant), which may include a radio frequency receiver, a pager, internet/intranet access, a web browser, a notepad, a calendar and/or a GPS (Global Positioning System) receiver; a conventional laptop and/or palmtop computer or other appliance having and/or including a radio frequency receiver. As used herein, a "terminal" may be portable, transportable, installed in a vehicle (aeronautical, maritime, and/or land-based), or situated and/or configured to operate locally and/or in a distributed fashion at any other location(s) on earth and/or in space. The "terminal" used herein may also be a communication terminal, a Internet access terminal, and a music/video playing terminal, and may be, for example, a PDA, an MID (Mobile Internet Device), and/or a Mobile phone with music/video playing function, and may also be a smart television, a set-top box, and other devices.
The hardware referred to by the names "server", "client", "service node", etc. is essentially an electronic device with the performance of a personal computer, and is a hardware device having necessary components disclosed by the von neumann principle such as a central processing unit (including an arithmetic unit and a controller), a memory, an input device, an output device, etc., a computer program is stored in the memory, and the central processing unit calls a program stored in an external memory into the internal memory to run, executes instructions in the program, and interacts with the input and output devices, thereby completing a specific function.
It should be noted that the concept of "server" as referred to in this application can be extended to the case of a server cluster. According to the network deployment principle understood by those skilled in the art, the servers should be logically divided, and in physical space, the servers may be independent from each other but can be called through an interface, or may be integrated into one physical computer or a set of computer clusters. Those skilled in the art should understand this variation and should not be so constrained as to implement the network deployment of the present application.
One or more technical features of the present application, unless expressly specified otherwise, may be deployed to a server for implementation by a client remotely invoking an online service interface provided by a capture server for access, or may be deployed directly and run on the client for access.
Unless specified in clear text, the neural network model referred to or possibly referred to in the application can be deployed in a remote server and used for remote call at a client, and can also be deployed in a client with qualified equipment capability for direct call.
Various data referred to in the present application may be stored in a server remotely or in a local terminal device unless specified in the clear text, as long as the data is suitable for being called by the technical solution of the present application.
The person skilled in the art will know this: although the various methods of the present application are described based on the same concept so as to be common to each other, they may be independently performed unless otherwise specified. In the same way, for each embodiment disclosed in the present application, it is proposed based on the same inventive concept, and therefore, concepts of the same expression and concepts of which expressions are different but are appropriately changed only for convenience should be equally understood.
The embodiments to be disclosed herein can be flexibly constructed by cross-linking related technical features of the embodiments unless the mutual exclusion relationship between the related technical features is stated in the clear text, as long as the combination does not depart from the inventive spirit of the present application and can meet the needs of the prior art or solve the deficiencies of the prior art. Those skilled in the art will appreciate variations therefrom.
Referring to fig. 1, fig. 1 is a basic flow chart illustrating an image format conversion method according to the present embodiment. As shown in fig. 1, an image format conversion method includes:
s1100, collecting a first target image in a first data format;
in the present embodiment, the photographing apparatus for image photographing is a mobile terminal, for example, a mobile phone, a tablet computer, or a DV apparatus.
The mobile terminal needs to shoot and collect images in a specific shooting mode, the configuration parameters can be stored in the server side, the server side identifies the operating system or the device information of the mobile terminal according to the device information after receiving the device information sent by the mobile terminal, and then the configuration parameters matched with the mobile terminal are further configured according to the device information.
In some embodiments, the matching of the configuration parameters needs to be performed according to a camera interface carried by the mobile terminal, and first, an API interface of the camera of the mobile terminal is collected and searched in a configuration database, and after a parameter corresponding to the API interface is obtained, the configuration parameters are sent to the mobile terminal. And when the corresponding configuration parameters cannot be found through the API, reading the SDK information of the shooting module in the mobile terminal equipment information, searching in the configuration database through the SDK information to obtain the configuration parameters corresponding to the SDK, and then sending the corresponding configuration parameters to the mobile terminal. Furthermore, after the configuration parameters cannot be matched through the SDK information, the configuration parameters can be matched through the type of the operating system of the mobile terminal.
And after the server side receives the equipment information sent by the mobile terminal, matching the corresponding configuration parameters of the mobile terminal according to the equipment information, and sending the configuration parameters to the mobile terminal. After the mobile terminal obtains the configuration parameters, the shooting parameters of the mobile terminal are configured, and the configuration mode is as follows: and setting the shooting parameters of the mobile terminal.
And the mobile terminal set according to the configuration parameters enters a set target shooting mode. It should be noted that the target shooting mode is not a mode necessary for the mobile terminal to capture an image, and in some embodiments, the mobile terminal can capture the first target image in any shooting mode.
The image acquired by the mobile terminal is a first target image, the first data format is an SDR format, and it should be noted that the first data format is not limited to the SDR format and can also be conventional image formats such as JPG, PNG and the like.
S1200, inputting the first target image into a preset image format conversion model, wherein the image format conversion model is constrained to a convergence state in advance through linear loss and nonlinear loss, and is used for carrying out format conversion on the image;
after a first target image is acquired, the first target image is input into an image format conversion model, the image format conversion model is a neural network model which is trained to a convergence state in advance through supervised training, and format conversion can be carried out on the image input into the image format conversion model.
In some embodiments, the image format conversion model is deployed in a mobile terminal, and after the mobile terminal collects the first target image, the first target image is input into the mobile terminal stored locally, and the image format conversion model performs format conversion on the image locally.
In some embodiments, the image format conversion model is deployed in a server, the mobile terminal sends a first target image to the server after acquiring the first target image, the server performs image format conversion on the first target image through the image format conversion model, and after the image format conversion is completed, the server sends the converted image to the mobile terminal.
In some embodiments, after the first target image is acquired, the acquired first target image needs to be screened. The screening method comprises the following steps: and screening is carried out through the average brightness of the first target image. Specifically, a standard luminance section value is set. The standard luminance interval value can be [30,230], but the standard luminance interval value is not limited thereto, and the critical value of the standard luminance interval value can be larger or smaller according to a specific application scene.
Calculating the average brightness of the first target image, comparing whether the average brightness value corresponding to the first target image falls into a standard brightness interval, if so, inputting the first target image into an image format conversion model, otherwise, deleting the first target image, and then shooting the image again.
The value range of the standard brightness interval value can be dynamically adjusted along with different shooting environments. Specifically, the first target image is input into a neural network model trained to a convergence state in advance and used for identifying a scene where the image is located, and a shooting environment represented by the first target image, such as different image scenes, such as indoor, outdoor, cloudy, sunny, and the like, is identified by the neural network model.
And searching a standard brightness interval value matched with the image scene in a preset scene database according to the image scene. Through scene recognition of the first target image, the adaptability of the standard brightness interval value to the environment can be improved.
In some embodiments, the image format conversion model is deployed in a mobile terminal, and since the performance of the mobile terminal is limited and a neural network model with a large scale cannot be run, the image format conversion model needs to be lightened. Specifically, the structure of the image format conversion model is as follows:
the image format conversion model includes: the device comprises a first convolution channel and a second convolution channel, wherein the characteristics output by the first convolution channel are mask characteristic vectors, and the characteristics output by the second convolution channel are convolution characteristic vectors.
The first convolution channel includes: the first attention layer is connected to the output end of the first roll-up layer, and the first attention layer comprises a channel attention layer.
The second convolution channel includes: the device comprises a plurality of cascaded feature layers, wherein the output end of each feature layer is connected with a second attention layer, and the second attention layers comprise a channel attention layer and a space attention layer. In some embodiments, the second convolution channel comprises: 3 sets of cascaded feature layers and 3 sets of second attention layers, however, the number of feature layers and second attention layers included in the second convolution channel is not limited thereto, and in some embodiments, the number of feature layers and second attention layers included in the second convolution channel can be the largest or smaller, depending on the specific application scenario.
Each of the feature layers includes: and the output end of each second convolution layer is connected with a linear rectifying layer, and the output of any linear rectifying layer is used as the input of all second convolution layers arranged behind any linear rectifying layer. In some embodiments, each feature layer comprises: 5 sets of cascaded second convolution layers and linear rectifying layers, wherein linear rectifying functions are arranged in the linear rectifying layers. The output of any linear rectifying layer is used as the input of all the second convolutional layers arranged behind the any linear rectifying layer, for example, the output of the linear rectifying layer with the first bit in the bit column is used as the input of the second convolutional layers with the second, third, fourth and fifth bits, the output of the linear rectifying layer with the second bit in the bit column is used as the input of the second convolutional layers with the third, fourth and fifth bits, and so on until the linear rectifying layer with the fourth bit is input to the input of the second convolutional layer with the fifth bit. It should be noted that the number of the second convolution layers and the linear rectifying layers included in the feature layer is not limited to this, and the number of the second convolution layers and the linear rectifying layers can be more or less in some embodiments according to different application scenarios.
In this embodiment, the image format conversion model includes a loss function, which includes a linear loss function and a nonlinear loss function.
Loss=Losslinear+Lossnonlinear
Wherein Loss denotes the Loss function, LosslinearIs shown asLinear Loss function, LossnonlinearExpressed as a nonlinear loss function.
The linear and nonlinear Loss functions each comprise a gamut Loss function LosscolorAnd Loss of perception function Lossperceptual. Compared with the prior art, the loss functions are all single nonlinear loss functions or linear loss functions. However, a single linear loss function cannot make the image format conversion model converge, a single nonlinear loss function can make the image format conversion model in a distorted state, and the linear loss function and the nonlinear loss function are mixed, so that the linear loss function can make up for model distortion caused by the nonlinear loss function, the image format conversion model has stronger robustness, and the converted image has better stability.
Losslinear=Losscolor+Lossperceptual
Lossnonlinear=Losscolor+Lossperceptual
The perception loss refers to the difference of results between different feature layers of the image format conversion model, the difference is calculated by using an L1 paradigm, y is an annotation feature, and y' is the output of the image format conversion model.
Lossperceptual=L1|VGG19(y′-y)|
Gamut loss three gamuts were used for loss calculation, RGB, HSV, LAB three color spaces respectively. RGB is a three-primary color space and respectively corresponds to three colors of red, green and blue; HSV is a hue (H), saturation (S) and brightness (V) space, and the color is adjusted through an H channel and an S channel to be more approximate to the color of the marked image; in LAB, L is brightness, AB is red-blue tone, and the accuracy of the network color is improved through the difference calculation of A and B channels in the same way as HSV. The color gamut loss is calculated by adopting an L1 paradigm, such as the following formula:
Losscolor=L1|(y′rgb-yrgb)|+L1|(y′hsv-yhsv)|+L1|(y′lab-ylab)|
the image format conversion model format is a lightweight model which can be deployed on a mobile terminal, and a linear and nonlinear loss combination mode is adopted, so that the phenomena of image distortion and over-enhancement are reduced, and the expressive force and stability of an output image of the image format conversion model are improved.
And S1300, reading a second target image output by the image format conversion model, wherein the second target image format is a second data format, and the information capacity of the second data format is greater than that of the first data format.
The first target image is input into the image format conversion model, an input channel of the image format conversion model converts the first target image into an array vector matrix, and then the array vector matrix is respectively input to the output ends of the first convolution channel, the second convolution channel, the first convolution channel and the second convolution channel.
The first convolution layer and the first attention layer in the first convolution channel extract the features of the array vector matrix, and the extracted feature vectors are mask feature vectors.
The feature layer and the second attention layer in the second convolution channel perform hierarchical extraction on the features of the array vector matrix. And extracting convolution characteristics in the logarithm group vector matrix of the hierarchy of each second convolution layer and each linear rectification layer in the characteristic layer. Finally, the feature vector output by the second convolution channel is the convolution feature vector.
At the output ends of the first convolution channel and the second convolution channel, the mask characteristic vector output by the first convolution channel is subjected to dot product operation with the array vector matrix, namely, the eye mask characteristic matrix represented by the mask characteristic vector is subjected to multiplication operation with the array vector matrix.
And when the mask characteristic vector output by the first convolution channel and the array vector matrix are subjected to dot product operation to obtain a dot product result, performing addition operation on the vector matrix obtained by the dot product result and the characteristic vector output by the second convolution channel.
After the operation result obtained by the addition operation is obtained, the operation result needs to be mapped, and the mapping mode is as follows: the operation result is mapped through a hyperbolic tangent function, the hyperbolic tangent function (tanh) is the ratio of a hyperbolic sine function (sinh) and a hyperbolic cosine function (cosh), and the operation result obtained by addition operation can be mapped in a value range of [ -1,1 ]. And finally, performing pixelization processing on the mapped vector matrix to generate a second target image.
The data format of the second target image is a second data format. In this embodiment, the second data format is an HDR format, it should be noted that the format range of the second data format is not limited to this, and the second data format can also be (without limitation) according to different application scenarios: TGA, BMP, etc.
In the embodiment, the image format conversion model is trained in advance and is a neural network model, so that the format of the input image can be converted, the information carrying capacity of the input image is improved, and the information content in the image is improved. Therefore, when a user shoots a first target image in a first data format by using the shooting device and inputs the first target image into the image format conversion model, the image format conversion model converts the format of the first target image to convert the first target image into a second target image in a second data format, the information capacity of the second data format is greater than that of the first data format, and the second target image after format conversion has higher image quality. By the method, the image quality can be rapidly improved, the limitation of the hardware performance of the shooting device is broken through, and the requirements of users are met.
In some embodiments, after the first target image is input into the image format conversion model, the first target image needs to be processed through the first convolution channel and the second convolution channel. Referring to fig. 2, fig. 2 is a schematic flow chart illustrating the generation of the second target image according to the present embodiment.
As shown in fig. 2, S1300 includes:
s1311, reading a mask feature vector output by the first convolution channel;
the first target image is input into the image format conversion model, an input channel of the image format conversion model converts the first target image into an array vector matrix, and then the array vector matrix is respectively input to the output ends of the first convolution channel, the second convolution channel, the first convolution channel and the second convolution channel.
The first convolution layer and the first attention layer in the first convolution channel extract the features of the array vector matrix, and the extracted feature vectors are mask feature vectors.
S1312, performing dot product operation on the mask feature vector and the array vector matrix of the first target image;
the feature layer and the second attention layer in the second convolution channel perform hierarchical extraction on the features of the array vector matrix. And extracting convolution characteristics in the logarithm group vector matrix of the hierarchy of each second convolution layer and each linear rectification layer in the characteristic layer. Finally, the feature vector output by the second convolution channel is the convolution feature vector.
S1313, performing addition operation on the dot product operation result and the feature vector output by the second convolution channel;
at the output ends of the first convolution channel and the second convolution channel, the mask characteristic vector output by the first convolution channel is subjected to dot product operation with the array vector matrix, namely, the eye mask characteristic matrix represented by the mask characteristic vector is subjected to multiplication operation with the array vector matrix.
And S1314, mapping the result obtained by the addition operation through a preset hyperbolic tangent function, and then generating the second target image.
And when the mask characteristic vector output by the first convolution channel and the array vector matrix are subjected to dot product operation to obtain a dot product result, performing addition operation on the vector matrix obtained by the dot product result and the characteristic vector output by the second convolution channel.
After the operation result obtained by the addition operation is obtained, the operation result needs to be mapped, and the mapping mode is as follows: the operation result is mapped through a hyperbolic tangent function, the hyperbolic tangent function (tanh) is the ratio of a hyperbolic sine function (sinh) and a hyperbolic cosine function (cosh), and the operation result obtained by addition operation can be mapped in a value range of [ -1,1 ]. And finally, performing pixelization processing on the mapped vector matrix to generate a second target image.
In some embodiments, the image format conversion model needs to be trained by a supervised training method, please refer to fig. 3 for a training method, and fig. 3 is a schematic diagram of a training process of the image format conversion model in this embodiment.
As shown in fig. 3, the image format conversion model training method is as follows:
s1411, reading a training sample to be processed;
in this embodiment, a training sample is constructed, where the training sample includes a training image and an annotation image, where the training image is an image directly acquired by a mobile terminal, and the annotation image is a high-dynamic image synthesized from multiple sample images. Thus, the standard image has a higher information load than the training image. The data format of the training image is a first data format and the data format of the annotation image is a second data format. A training sample, consisting of a pair of training images and annotation images, enables a supervised training model to have the ability to convert a first data format to a second data format.
In this embodiment, one training sample includes one training image and one annotation image, and the training sample is stored in a training set. The training set comprises a plurality of training samples, and when training is carried out, one training sample is randomly selected to train the pre-trained model.
S1412, performing image enhancement processing on the training sample according to a preset image enhancement strategy;
in the present embodiment, in order to make the image format conversion model obtained by training more robust, it is necessary to perform enhancement processing on the training image before inputting the training sample into the model, and the method of enhancement processing includes (without being limited to): and processing the training sample by one or more image processing methods of rotate, flip, crop and brightness adjustment. For example, the training image is subjected to [ -10%, 10% ] brightness adjustment processing.
S1413, inputting the training sample subjected to image enhancement processing into a preset non-convergence model, wherein the non-convergence model is an initialization model of the image format conversion model;
and inputting the training image subjected to image enhancement processing into a preset non-convergence model, wherein the non-convergence model is an initialization model of the image format conversion model. The non-convergence model performs feature extraction and data format conversion on the training image, but the non-convergence model is not trained, so that the output result is stress output, the randomness is high, and the requirement of image format directional conversion cannot be met.
S1414, reading a stress image output by the non-convergence model, and calculating a loss distance between the stress image and a labeled image in the training sample according to a preset linear loss function and a preset non-linear loss function;
after the training sample is input into the non-convergence model, the stress image output by the non-convergence model is read, and then the loss distance between the stress image and the labeled image is calculated through the set linear loss function and the set non-linear loss function. Wherein the linear loss function and the nonlinear loss function are characterized by:
Loss=Losslinear+Lossnonlinear
wherein Loss denotes the Loss function, LosslinearExpressed as a linear Loss function, LossnonlinearExpressed as a nonlinear loss function.
The linear and nonlinear Loss functions each comprise a gamut Loss function LosscolorAnd Loss of perception function Lossperceptual. Compared with the prior art, the loss functions are single nonlinear loss functions or linear loss functions. However, a single linear loss function cannot make the image format conversion model converge, a single nonlinear loss function can make the image format conversion model in a distorted state, and the linear loss function and the nonlinear loss function are mixed, so that the linear loss function can make up for model distortion caused by the nonlinear loss function, the image format conversion model has stronger robustness, and the converted image has better stability.
Losslinear=Losscolor+Lossperceptual
Lossnonlinear=Losscolor+Lossperceptual
The perception loss refers to the difference of results between different feature layers of the image format conversion model, the difference is calculated by using an L1 paradigm, y is an annotation feature, and y' is the output of the image format conversion model.
Lossperceptual=L1|VGG19(y′-y)|
Gamut loss the loss calculation was performed using three gamuts, RGB, HSV, LAB three color spaces. RGB is a three-primary color space and respectively corresponds to three colors of red, green and blue; HSV is a hue (H), saturation (S) and brightness (V) space, and the color is adjusted through an H channel and an S channel to be more approximate to the color of the marked image; in LAB, L is brightness, AB is red-blue tone, and the accuracy of the network color is improved through the difference calculation of A and B channels in the same way as HSV. The color gamut loss is calculated by adopting an L1 paradigm, such as the following formula:
Losscolor=L1|(y′rgb-yrgb)|+L1|(y′hsv-yhsv)|+L1|(y′lab-ylab)|
the linear and nonlinear loss combination mode can reduce the image distortion and the over-enhancement phenomenon, and increase the expressive force and stability of the output image of the image format conversion model.
S1415, according to the loss distance and a preset feedback function, the weighted value of the non-convergence model is adjusted back and corrected, so that the loss distance between the stress image and the annotation image tends to a preset target threshold value.
And calculating to obtain a loss distance between the stress image and the labeled image, and calculating the loss distance through a return function to obtain a gradient value corrected by the training of the current round. And (3) carrying out callback correction on the weight value of the non-convergence model through the calculated gradient value, wherein the callback correction result can enable the weight value in the non-convergence model to be directionally adjusted, and the purpose of directional adjustment is to enable the loss distance between the stress image and the labeled image to tend to a preset target threshold value. The target threshold value can be set according to actual requirements, the process of supervision training is a gradient descending training process, the process of S1411-S1415 is a gradient descending complete process, and through a plurality of rounds of training samples, when the training times reach a set value or the accuracy of output images reaches a set threshold value, a non-convergence model is trained to a convergence state to form an image format conversion model.
Referring to fig. 4, fig. 4 is a schematic diagram of a basic structure of the image format conversion device according to the present embodiment.
As shown in fig. 4, an image format conversion apparatus includes: an acquisition module 1100, a processing module 1200, and an execution module 1300. The acquisition module 1100 is configured to acquire a first target image in a first data format; the processing module 1200 is configured to input the first target image into a preset image format conversion model, where the image format conversion model is constrained to a convergence state in advance through linear loss and nonlinear loss, and is a neural network model for performing format conversion on an image; the execution module 1300 is configured to read a second target image output by the image format conversion model, where the second target image format is a second data format, and an information capacity of the second data format is greater than an information capacity of the first data format.
The image format conversion device can convert the format of the input image by pre-training the image format conversion model which is a neural network model, thereby improving the information carrying capacity of the input image and the information content in the image. Therefore, when a user shoots a first target image in a first data format by using the shooting device and inputs the first target image into the image format conversion model, the image format conversion model converts the format of the first target image to convert the first target image into a second target image in a second data format, the information capacity of the second data format is greater than that of the first data format, and the second target image after format conversion has higher image quality. By the method, the image quality can be rapidly improved, the limitation of the hardware performance of the shooting device is broken through, and the requirements of users are met.
Optionally, the image format conversion model comprises: the device comprises a first convolution channel and a second convolution channel, wherein the characteristics output by the first convolution channel are mask characteristic vectors, and the characteristics output by the second convolution channel are convolution characteristic vectors.
Optionally, the first volume channel comprises: the first attention layer is connected to an output end of the first roll-up layer, and the first attention layer comprises a channel attention layer.
Optionally, the second convolution channel comprises: the device comprises a plurality of cascaded feature layers, wherein the output end of each feature layer is connected with a second attention layer, and the second attention layers comprise a channel attention layer and a space attention layer.
Optionally, each of the feature layers comprises: and the output end of each second convolution layer is connected with a linear rectifying layer, and the output of any linear rectifying layer is used as the input of all second convolution layers arranged behind any linear rectifying layer.
Optionally, the image format conversion apparatus further includes:
the first reading submodule is used for reading the mask feature vector output by the first convolution channel;
the first operation submodule is used for performing dot product operation on the mask characteristic vector and the array vector matrix of the first target image;
the second operation submodule is used for performing addition operation on the result of the dot product operation and the feature vector output by the second convolution channel;
and the first generation submodule is used for generating the second target image after mapping a result obtained by the addition operation through a preset hyperbolic tangent function.
Optionally, the image format conversion apparatus further includes:
the second reading submodule is used for reading a training sample to be processed;
the first enhancement submodule is used for carrying out image enhancement processing on the training sample according to a preset image enhancement strategy;
the first processing submodule is used for inputting the training sample after the image enhancement processing into a preset non-convergence model, wherein the non-convergence model is an initialization model of the image format conversion model;
the third calculation submodule is used for reading the stress image output by the non-convergence model and calculating the loss distance between the stress image and the labeled image in the training sample according to a preset linear loss function and a preset non-linear loss function;
and the first execution submodule is used for carrying out callback correction on the weighted value of the non-convergence model according to the loss distance and a preset return function so as to enable the loss distance between the stress image and the labeled image to tend to a preset target threshold value.
In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 5, fig. 5 is a block diagram of a basic structure of a computer device according to the embodiment.
As shown in fig. 5, the internal structure of the computer device is schematically illustrated. The computer device includes a processor, a non-volatile storage medium, a memory, and a network interface connected by a system bus. The non-volatile storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store control information sequences, and the computer readable instructions can enable the processor to realize an image format conversion method when being executed by the processor. The processor of the computer device is used for providing calculation and control capability and supporting the operation of the whole computer device. The memory of the computer device may have stored therein computer readable instructions that, when executed by the processor, cause the processor to perform an image format conversion method. The network interface of the computer device is used for connecting and communicating with the terminal. Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In this embodiment, the processor is configured to execute specific functions of the acquisition module 1100, the processing module 1200, and the execution module 1300 in fig. 4, and the memory stores program codes and various data required for executing the modules. The network interface is used for data transmission to and from a user terminal or a server. The memory in this embodiment stores program codes and data necessary for executing all the sub-modules in the image format conversion apparatus, and the server can call the program codes and data of the server to execute the functions of all the sub-modules.
The computer device can convert the format of the input image by pre-training the image format conversion model which is a neural network model, so that the information carrying capacity of the input image is improved, and the information content in the image is improved. Therefore, when a user shoots a first target image in a first data format by using the shooting device and inputs the first target image into the image format conversion model, the image format conversion model converts the format of the first target image to convert the first target image into a second target image in a second data format, the information capacity of the second data format is greater than that of the first data format, and the second target image after format conversion has higher image quality. By the method, the image quality can be rapidly improved, the limitation of the hardware performance of the shooting device is broken through, and the requirements of users are met.
The present application also provides a storage medium storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the steps of the image format conversion method of any of the above embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).
Those of skill in the art will appreciate that the various operations, methods, steps in the processes, acts, or solutions discussed in this application can be interchanged, modified, combined, or eliminated. Further, other steps, measures, or schemes in various operations, methods, or flows that have been discussed in this application can be alternated, altered, rearranged, broken down, combined, or deleted. Further, steps, measures, schemes in the prior art having various operations, methods, procedures disclosed in the present application may also be alternated, modified, rearranged, decomposed, combined, or deleted.
The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims (10)

1. An image format conversion method, comprising:
acquiring a first target image in a first data format;
inputting the first target image into a preset image format conversion model, wherein the image format conversion model is constrained to a convergence state in advance through linear loss and nonlinear loss, and is used for carrying out format conversion on the image;
and reading a second target image output by the image format conversion model, wherein the second target image format is a second data format, and the information capacity of the second data format is greater than that of the first data format.
2. The image format conversion method according to claim 1, wherein the image format conversion model includes: the device comprises a first convolution channel and a second convolution channel, wherein the output characteristic of the first convolution channel is a mask characteristic vector, and the output characteristic of the second convolution channel is a convolution characteristic vector.
3. The image format conversion method according to claim 2, wherein the first convolution channel includes: the first attention layer is connected to the output end of the first roll-up layer, and the first attention layer comprises a channel attention layer.
4. The image format conversion method according to claim 2, wherein the second convolution channel includes: the device comprises a plurality of cascaded feature layers, wherein the output end of each feature layer is connected with a second attention layer, and the second attention layers comprise a channel attention layer and a space attention layer.
5. The image format conversion method according to claim 4, wherein each of the feature layers includes: and the output end of each second convolution layer is connected with a linear rectifying layer, and the output of any linear rectifying layer is used as the input of all second convolution layers arranged behind any linear rectifying layer.
6. The image format conversion method according to claim 2, wherein the reading of the second target image output by the image format conversion model includes:
reading a mask feature vector output by the first convolution channel;
performing dot product operation on the mask feature vector and the array vector matrix of the first target image;
performing addition operation on the dot product operation result and the feature vector output by the second convolution channel;
and mapping the result obtained by the addition operation through a preset hyperbolic tangent function to generate the second target image.
7. The image format conversion method according to any one of claims 1 to 6, wherein the image format conversion model is trained by:
reading a training sample to be processed;
performing image enhancement processing on the training sample according to a preset image enhancement strategy;
inputting the training sample after image enhancement processing into a preset non-convergence model, wherein the non-convergence model is an initialization model of the image format conversion model;
reading a stress image output by the non-convergence model, and calculating a loss distance between the stress image and an annotation image in the training sample according to a preset linear loss function and a preset non-linear loss function;
and according to the loss distance and a preset return function, carrying out callback correction on the weight value of the non-convergence model so as to enable the loss distance between the stress image and the labeled image to tend to a preset target threshold value.
8. An image format conversion apparatus, characterized by comprising:
the acquisition module is used for acquiring a first target image in a first data format;
the processing module is used for inputting the first target image into a preset image format conversion model, wherein the image format conversion model is constrained to a convergence state in advance through linear loss and nonlinear loss, and is a neural network model for carrying out format conversion on the image;
and the execution module is used for reading a second target image output by the image format conversion model, wherein the second target image format is a second data format, and the information capacity of the second data format is greater than that of the first data format.
9. A computer storage medium, wherein the computer readable instructions, when executed by one or more processors, cause the one or more processors to perform the steps of the image format conversion method of any one of claims 1 to 7.
10. A computer program product comprising computer program/instructions, characterized in that the computer program/instructions, when executed by a processor, implement the steps of the method as claimed in any one of claims 1 to 7.
CN202210106711.1A 2022-01-28 2022-01-28 Image format conversion method and device, equipment, medium and product thereof Pending CN114445511A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210106711.1A CN114445511A (en) 2022-01-28 2022-01-28 Image format conversion method and device, equipment, medium and product thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210106711.1A CN114445511A (en) 2022-01-28 2022-01-28 Image format conversion method and device, equipment, medium and product thereof

Publications (1)

Publication Number Publication Date
CN114445511A true CN114445511A (en) 2022-05-06

Family

ID=81370829

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210106711.1A Pending CN114445511A (en) 2022-01-28 2022-01-28 Image format conversion method and device, equipment, medium and product thereof

Country Status (1)

Country Link
CN (1) CN114445511A (en)

Similar Documents

Publication Publication Date Title
CN112150399B (en) Image enhancement method based on wide dynamic range and electronic equipment
CN108885782B (en) Image processing method, apparatus and computer-readable storage medium
JP2023504669A (en) Image processing method, smart device and computer program
CN109785252B (en) Night image enhancement method based on multi-scale residual error dense network
CN110463206B (en) Image filtering method, device and computer readable medium
WO2010055399A1 (en) Method and apparatus for representing and identifying feature descriptors utilizing a compressed histogram of gradients
CN114429438A (en) Image enhancement method and device, equipment, medium and product thereof
CN110599554A (en) Method and device for identifying face skin color, storage medium and electronic device
WO2024027287A9 (en) Image processing system and method, and computer-readable medium and electronic device
CN111539353A (en) Image scene recognition method and device, computer equipment and storage medium
CN113962859A (en) Panorama generation method, device, equipment and medium
WO2022105850A1 (en) Light source spectrum acquisition method and device
CN107220934A (en) Image rebuilding method and device
CN116668656A (en) Image processing method and electronic equipment
CN116843566A (en) Tone mapping method, tone mapping device, display device and storage medium
CN114445511A (en) Image format conversion method and device, equipment, medium and product thereof
CN108053452B (en) Digital image color extraction method based on mixed model
CN114881886A (en) Image processing method, image processing device, electronic equipment and storage medium
CN115619666A (en) Image processing method, image processing apparatus, storage medium, and electronic device
CN114222075B (en) Mobile terminal image processing method and device, equipment, medium and product thereof
CN108304805A (en) A kind of big data image recognition processing system
CN111711809B (en) Image processing method and device, electronic device and storage medium
CN112241941A (en) Method, device, equipment and computer readable medium for acquiring image
CN115601242B (en) Lightweight image super-resolution reconstruction method suitable for hardware deployment
Vršnak et al. Illuminant estimation error detection for outdoor scenes using transformers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination